[FFmpeg-user] xstack command results in out of sync sound, is it possible to mix the audio in a single encoding?

Jonas O. ezjonas at gmail.com
Sun Dec 5 17:14:54 EET 2021


I wrote a python script that generates a xstack complex filter command. The
video inputs is a mixture of several formats described here:

I have 2 commands generated, one for the xstack filter, and one for the
audio mixing.

Here is the stack command: (sorry if the text doesn't wrap!)

    'c:/ydl/ffmpeg.exe',
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-filter_complex',

'[0]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf0];[rsclbf0]fps=24[rscl0];[1]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf1];[rsclbf1]fps=24[rscl1];[2]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf2];[rsclbf2]fps=24[rscl2];[3]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf3];[rsclbf3]fps=24[rscl3];[4]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf4];[rsclbf4]fps=24[rscl4];[5]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf5];[rsclbf5]fps=24[rscl5];[6]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf6];[rsclbf6]fps=24[rscl6];[7]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf7];[rsclbf7]fps=24[rscl7];[8]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf8];[rsclbf8]fps=24[rscl8];[9]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf9];[rsclbf9]fps=24[rscl9];[10]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf10];[rsclbf10]fps=24[rscl10];[11]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf11];[rsclbf11]fps=24[rscl11];[12]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf12];[rsclbf12]fps=24[rscl12];[13]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf13];[rsclbf13]fps=24[rscl13];[14]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2,
setsar=1[rsclbf14];[rsclbf14]fps=24[rscl14];[rscl0][rscl1][rscl2][rscl3][rscl4]concat=n=5[cct0];[rscl5][rscl6][rscl7]concat=n=3[cct1];[rscl8][rscl9][rscl10]concat=n=3[cct2];[rscl11][rscl12][rscl13][rscl14]concat=n=4[cct3];[cct0][cct1][cct2][cct3]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0',
    'output.mp4',


Here is the mix_audio command:

    'c:/ydl/ffmpeg.exe',
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-i', 'inputX.mp4'
    '-filter_complex',

'[0:a][1:a][2:a][3:a][4:a]concat=n=5:v=0:a=1[cct_a0];[5:a][6:a][7:a]concat=n=3:v=0:a=1[cct_a1];[8:a][9:a][10:a]concat=n=3:v=0:a=1[cct_a2];[11:a][12:a][13:a][14:a]concat=n=4:v=0:a=1[cct_a3];[cct_a0][cct_a1][cct_a2][cct_a3]amix=inputs=4[all_aud]',
    '-map',
    '15:v',
    '-map',
    '[all_aud]',
    '-c:v',
    'copy',
    'output.mp4',



Of course those are sample commands, I actually use many more videos as
input, this sample is shorter for the sake or readability.

Here are the videos I use, with relevant ffprobe data, in some HTML table:

(file is joined)

I'm getting this warning:

    [swscaler @ 0000020bac5a19c0] Warning: data is not aligned! This can
lead to a speed loss

I think this is unrelated to audio desyncing this unaligned data is about
x264 resolutions being multiple of 16, but my filter takes this into
account already.


There is a perceptible audio desyncing, which is the main problem I am
having. FFMPEG doesn't seem to get other errors. Is it because I use 2
commands to mix the audio after? How could I proceed to to the xstack stage
and the audio mixing in a single stage?


I'm a bit confused as how FFMPEG handles diverse framerates. I was told to
reencode all the video inputs before performing the xstack stage, but I
would create some disk overhead, so I'd rather do it in a single ffmpeg job
it possible.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8H4qU.png
Type: image/png
Size: 121534 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-user/attachments/20211205/7f99ed65/attachment.png>


More information about the ffmpeg-user mailing list