[FFmpeg-user] Conference video-like filtergraph eating memory

Paul B Mahol onemda at gmail.com
Fri May 4 00:13:25 EEST 2018

On 5/3/18, Kevin Mark <kmark937 at gmail.com> wrote:
> Hey everyone,
> I'm getting an OOM with the following ffmpeg complex filtergraph:
> [0] scale=1920:1080,setsar=1/1,split=8[s1][s2][s3][s4][s5][s6][s7][s8];
> [s1] trim=start=0:duration=0.144 [a];
> [s2] trim=start=0.144:duration=805.2612,setpts=PTS-STARTPTS [b];
> [s3] trim=start=805.4052:duration=3.11354,setpts=PTS-STARTPTS [c];
> [s4] trim=start=808.51874:duration=2250.975128,setpts=PTS-STARTPTS [d];
> [s5] trim=start=3059.49397:duration=1.820129,setpts=PTS-STARTPTS [e];
> [s6] trim=start=3061.3141:duration=16.1968,setpts=PTS-STARTPTS [f];
> [s7] trim=start=3077.5109004974:duration=5.955413,setpts=PTS-STARTPTS [g];
> [s8] trim=start=3083.4663143158,setpts=PTS-STARTPTS [h];
> [4] trim=start=0:duration=1669.432685 [i];
> [4] trim=start=1669.432685,setpts=PTS-STARTPTS [j];
> [b][1] scale2ref=iw/5:ow/mdar [b][one];
> [d][2] scale2ref=iw/5:ow/mdar [d][two];
> [f][3] scale2ref=iw/5:ow/mdar [f][three];
> [h][i] scale2ref=iw/5:ow/mdar [h][four];
> [one][b] overlay=main_w-overlay_w:main_h-overlay_h [o1];
> [two][d] overlay=main_w-overlay_w:main_h-overlay_h [o2];
> [three][f] overlay=main_w-overlay_w:main_h-overlay_h [o3];
> [four][h] overlay=main_w-overlay_w:main_h-overlay_h [o4];
> [a][o1] concat [cc1];
> [cc1][c] concat [cc2];
> [cc2][o2] concat [cc3];
> [cc3][e] concat [cc4];
> [cc4][o3] concat [cc5];
> [cc5][g] concat [cc6];
> [cc6][o4] concat [cc7];
> [cc7][j] concat [cc8]
> The first input [0] is a relatively low resolution video. Inputs [1]
> through [4] are higher resolution (1920x1080) but low-frame rate video
> (PowerPoint slides). The videos are lengthy but ultimately not that
> much data. About 700 MB across 160 minutes in total among the 5
> inputs.
> I've attached an image I drew which visualizes what I'm trying to
> accomplish. If it gets removed from the mailing list please let me
> know. Sorry for the poor handwriting as this was a personal draft. The
> filtergraph is shown on the bottom half. TR is the trim filter. OVLY
> is overlay. CC is concat. Obviously it's simplified (no setpts) but
> the structure is exactly the same. DV_b2 is input [0], SCR_ba is input
> [1], SCR_d9 is input [2] SCR_6d is input [3] and SCR_bb is input [4].
> The top half of the image shows a timeline view of what I'm doing. S
> meaning the start of the output and E meaning the end of the output.
> So we have the DV video (0) playing almost the entire time. It's
> occasionally combined with the PowerPoint slide stream via overlay.
> Very similar to what you might see from a conference video.
> When actually executing this, ffmpeg outputs about 1 frame and then
> stalls eating up GBs of RAM, forcing me to exit. It appears as if
> while it's doing this it's actually decoding the input and feeding it
> to... something as evidence of the trace log I did. It must be keeping
> these frames in memory. I did a process trace and it seems to be
> sending most of its time in the scale filter, which is unsurprising.
> My best guess is some filter is trying to load/consume the entire
> input into memory before executing instead of taking, I dunno, maybe a
> frame at a time. But I don't know which filter is doing this. Maybe
> concat? Is there any entirely different way I should be doing this?
> Any help would be greatly appreciated. This is going to be part of an
> open-source library I'm eager to share.
> Here's an example of a conference video which has something like what
> I'm going for:
> https://media.ccc.de/v/32c3-7331-the_exhaust_emissions_scandal_dieselgate
> The largest difference being my overlay is simpler.

This is most likely because concat filter is not switched to .activate API.

So do not use concat filter, but save intermediate files and concat them with
concat demuxer.

More information about the ffmpeg-user mailing list