[FFmpeg-user] Conference video-like filtergraph eating memory

Fri May 4 00:06:46 EEST 2018

Hey everyone,

I'm getting an OOM with the following ffmpeg complex filtergraph:

[0] scale=1920:1080,setsar=1/1,split=8[s1][s2][s3][s4][s5][s6][s7][s8];

[s1] trim=start=0:duration=0.144 [a];
[s2] trim=start=0.144:duration=805.2612,setpts=PTS-STARTPTS [b];
[s3] trim=start=805.4052:duration=3.11354,setpts=PTS-STARTPTS [c];
[s4] trim=start=808.51874:duration=2250.975128,setpts=PTS-STARTPTS [d];
[s5] trim=start=3059.49397:duration=1.820129,setpts=PTS-STARTPTS [e];
[s6] trim=start=3061.3141:duration=16.1968,setpts=PTS-STARTPTS [f];
[s7] trim=start=3077.5109004974:duration=5.955413,setpts=PTS-STARTPTS [g];
[s8] trim=start=3083.4663143158,setpts=PTS-STARTPTS [h];

[4] trim=start=0:duration=1669.432685 [i];
[4] trim=start=1669.432685,setpts=PTS-STARTPTS [j];

[b][1] scale2ref=iw/5:ow/mdar [b][one];
[d][2] scale2ref=iw/5:ow/mdar [d][two];
[f][3] scale2ref=iw/5:ow/mdar [f][three];
[h][i] scale2ref=iw/5:ow/mdar [h][four];

[one][b] overlay=main_w-overlay_w:main_h-overlay_h [o1];
[two][d] overlay=main_w-overlay_w:main_h-overlay_h [o2];
[three][f] overlay=main_w-overlay_w:main_h-overlay_h [o3];
[four][h] overlay=main_w-overlay_w:main_h-overlay_h [o4];

[a][o1] concat [cc1];
[cc1][c] concat [cc2];
[cc2][o2] concat [cc3];
[cc3][e] concat [cc4];
[cc4][o3] concat [cc5];
[cc5][g] concat [cc6];
[cc6][o4] concat [cc7];
[cc7][j] concat [cc8]

The first input [0] is a relatively low resolution video. Inputs [1]
through [4] are higher resolution (1920x1080) but low-frame rate video
(PowerPoint slides). The videos are lengthy but ultimately not that
much data. About 700 MB across 160 minutes in total among the 5
inputs.

I've attached an image I drew which visualizes what I'm trying to
accomplish. If it gets removed from the mailing list please let me
know. Sorry for the poor handwriting as this was a personal draft. The
filtergraph is shown on the bottom half. TR is the trim filter. OVLY
is overlay. CC is concat. Obviously it's simplified (no setpts) but
the structure is exactly the same. DV_b2 is input [0], SCR_ba is input
[1], SCR_d9 is input [2] SCR_6d is input [3] and SCR_bb is input [4].

The top half of the image shows a timeline view of what I'm doing. S
meaning the start of the output and E meaning the end of the output.
So we have the DV video (0) playing almost the entire time. It's
occasionally combined with the PowerPoint slide stream via overlay.
Very similar to what you might see from a conference video.

When actually executing this, ffmpeg outputs about 1 frame and then
stalls eating up GBs of RAM, forcing me to exit. It appears as if
while it's doing this it's actually decoding the input and feeding it
to... something as evidence of the trace log I did. It must be keeping
these frames in memory. I did a process trace and it seems to be
sending most of its time in the scale filter, which is unsurprising.
My best guess is some filter is trying to load/consume the entire
input into memory before executing instead of taking, I dunno, maybe a
frame at a time. But I don't know which filter is doing this. Maybe
concat? Is there any entirely different way I should be doing this?

Any help would be greatly appreciated. This is going to be part of an
open-source library I'm eager to share.

Here's an example of a conference video which has something like what
I'm going for:

https://media.ccc.de/v/32c3-7331-the_exhaust_emissions_scandal_dieselgate

The largest difference being my overlay is simpler.

Best regards,
Kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: filtergraph.jpg
Type: image/jpeg
Size: 307862 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-user/attachments/20180503/9791eb37/attachment.jpg>