[FFmpeg-user] current ffmpeg creates shortened audio stream when filter amix

S Andreason sandreas41 at gmail.com
Sun Sep 29 08:28:09 EEST 2019


I am getting a shortened audio stream when including the audio filters 
aresample and amix, which later makes it impossible to concat the clips, 
because the different stream lengths lose sync between audio and video, 
with errors:
Invalid audio PTS

First, here is the output from latest ffmpeg in debian package, which 
works correctly:

$ ffmpeg-3.2.14-1~deb9u1 -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV 
-i Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a 
-filter_complex 
"[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'" 
-filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k 
-r 30 20190922_1532_ch5.1e-3.mov
ffmpeg version 3.2.14-1~deb9u1 Copyright (c) 2000-2019 the FFmpeg developers
   built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
   configuration: --prefix=/usr --extra-version='1~deb9u1' 
--toolchain=hardened --libdir=/usr/lib/i386-linux-gnu 
--incdir=/usr/include/i386-linux-gnu --enable-gpl --disable-stripping 
--enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa 
--enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca 
--enable-libcdio --enable-libebur128 --enable-libflite 
--enable-libfontconfig --enable-libfreetype --enable-libfribidi 
--enable-libgme --enable-libgsm --enable-libmp3lame --enable-libopenjpeg 
--enable-libopenmpt --enable-libopus --enable-libpulse 
--enable-librubberband --enable-libshine --enable-libsnappy 
--enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora 
--enable-libtwolame --enable-libvorbis --enable-libvpx 
--enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid 
--enable-libzmq --enable-libzvbi --enable-omx --enable-openal 
--enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libiec61883 
--enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 
--enable-shared
   libavutil      55. 34.101 / 55. 34.101
   libavcodec     57. 64.101 / 57. 64.101
   libavformat    57. 56.101 / 57. 56.101
   libavdevice    57.  1.100 / 57.  1.100
   libavfilter     6. 65.100 /  6. 65.100
   libavresample   3.  1.  0 /  3.  1.  0
   libswscale      4.  2.100 /  4.  2.100
   libswresample   2.  3.100 /  2.  3.100
   libpostproc    54.  1.100 / 54.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 
'20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
   Metadata:
     major_brand     : qt
     minor_version   : 512
     compatible_brands: qt
     encoder         : Lavf57.56.101
   Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
     Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 / 
0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97 
tbr, 30k tbn, 60k tbc (default)
     Metadata:
       handler_name    : DataHandler
     Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, 
stereo, s16, 1536 kb/s (default)
     Metadata:
       handler_name    : DataHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 
'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
   Metadata:
     major_brand     : M4A
     minor_version   : 512
     compatible_brands: isomiso2
     encoder         : Lavf57.56.101
   Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
     Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 
mono, fltp, 218 kb/s (default)
     Metadata:
       handler_name    : SoundHandler
No pixel format specified, yuvj420p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[libx264 @ 0x170dc20] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 
AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x170dc20] profile High, level 3.1
[libx264 @ 0x170dc20] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC 
codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - 
options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 
psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 
8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 
lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 
bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 
b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 
keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr 
mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 
ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-3.mov':
   Metadata:
     major_brand     : qt
     minor_version   : 512
     compatible_brands: qt
     encoder         : Lavf57.56.101
     Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), 
yuvj420p(pc), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 tbn, 30 tbc 
(default)
     Metadata:
       encoder         : Lavc57.64.101 libx264
     Side data:
       cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: -1
     Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, 
fltp, 128 kb/s (default)
     Metadata:
       encoder         : Lavc57.64.101 aac
Stream mapping:
   Stream #0:0 (h264) -> crop (graph 0)
   Stream #0:1 (pcm_s16le) -> aresample (graph 1)
   Stream #1:0 (aac) -> amix:input1 (graph 1)
   drawtext (graph 0) -> Stream #0:0 (libx264)
   amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
frame=  420 fps= 12 q=-1.0 Lsize=    8303kB time=00:00:14.01 
bitrate=4853.1kbits/s speed=0.417x
video:8063kB audio:224kB subtitle:0kB other streams:0kB global 
headers:0kB muxing overhead: 0.198401%
[libx264 @ 0x170dc20] frame I:2     Avg QP:14.83  size:195688
[libx264 @ 0x170dc20] frame P:106   Avg QP:19.65  size: 59553
[libx264 @ 0x170dc20] frame B:312   Avg QP:25.61  size:  4972
[libx264 @ 0x170dc20] consecutive B-frames:  1.0%  0.0%  0.0% 99.0%
[libx264 @ 0x170dc20] mb I  I16..4: 27.6% 29.0% 43.4%
[libx264 @ 0x170dc20] mb P  I16..4:  1.1%  1.3%  0.6%  P16..4: 30.5% 
31.5% 22.6%  0.0%  0.0%    skip:12.4%
[libx264 @ 0x170dc20] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 36.1%  
7.7%  1.3%  direct: 4.2%  skip:50.7%  L0:37.5% L1:38.6% BI:23.9%
[libx264 @ 0x170dc20] final ratefactor: 18.99
[libx264 @ 0x170dc20] 8x8 transform intra:37.9% inter:54.5%
[libx264 @ 0x170dc20] coded y,uvDC,uvAC intra: 56.2% 64.1% 51.9% inter: 
27.2% 19.7% 1.0%
[libx264 @ 0x170dc20] i16 v,h,dc,p: 73%  9% 14%  4%
[libx264 @ 0x170dc20] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 11% 40% 4%  6%  
6%  6%  5% 10%
[libx264 @ 0x170dc20] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 15% 6%  9%  
9%  9%  8% 11%
[libx264 @ 0x170dc20] i8c dc,h,v,p: 54% 25% 17%  5%
[libx264 @ 0x170dc20] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x170dc20] ref P L0: 41.9% 11.1% 40.8%  6.0%  0.2%
[libx264 @ 0x170dc20] ref B L0: 93.5%  5.9%  0.6%
[libx264 @ 0x170dc20] ref B L1: 99.4%  0.6%
[libx264 @ 0x170dc20] kb/s:4717.36
[aac @ 0x170fac0] Qavg: 582.581

Next ffprobe shows the video length:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.mov':
     encoder         : Lavf57.56.101
   Duration: 00:00:14.03, start: 0.000000, bitrate: 4849 kb/s
     Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), 
yuvj420p(pc), 1024x768, 4717 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (defaul
       encoder         : Lavc57.64.101 libx264
     Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 
stereo, fltp, 131 kb/s (default)

And to get the ACTUAL audio length, I split the audio stream to it's own 
file.mpa using ffmpeg, then ffprobe:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.m4a':
     encoder         : Lavf58.33.100
   Duration: 00:00:14.03, start: 0.000000, bitrate: 133 kb/s
     Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 
stereo, fltp, 131 kb/s (default)

Then I repeat the above with only the change to use ffmpeg current by git:

$ ffmpeg -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV -i 
Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a -filter_complex 
"[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'" 
-filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k 
-r 30 20190922_1532_ch5.1e-g.mov
ffmpeg version N-95129-g04858650b1 Copyright (c) 2000-2019 the FFmpeg 
developers
   built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
   configuration: --prefix=/usr/local --enable-gpl --enable-libmp3lame 
--enable-libvorbis --enable-libx264 --enable-libopenjpeg 
--enable-libfreetype --disable-doc --disable-htmlpages 
--disable-podpages --enable-shared --enable-libvpx 
--extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib/i386-linux-gnu 
--enable-libass --enable-libtesseract
   libavutil      56. 35.100 / 56. 35.100
   libavcodec     58. 59.101 / 58. 59.101
   libavformat    58. 33.100 / 58. 33.100
   libavdevice    58.  9.100 / 58.  9.100
   libavfilter     7. 59.100 /  7. 59.100
   libswscale      5.  6.100 /  5.  6.100
   libswresample   3.  6.100 /  3.  6.100
   libpostproc    55.  6.100 / 55.  6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 
'20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
   Metadata:
     major_brand     : qt
     minor_version   : 512
     compatible_brands: qt
     encoder         : Lavf57.56.101
   Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
     Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 / 
0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97 
tbr, 30k tbn, 60k tbc (default)
     Metadata:
       handler_name    : VideoHandler
     Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, 
stereo, s16, 1536 kb/s (default)
     Metadata:
       handler_name    : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 
'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
   Metadata:
     major_brand     : M4A
     minor_version   : 512
     compatible_brands: isomiso2
     encoder         : Lavf57.56.101
   Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
     Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 
mono, fltp, 218 kb/s (default)
     Metadata:
       handler_name    : SoundHandler
Stream mapping:
   Stream #0:0 (h264) -> crop (graph 0)
   Stream #0:1 (pcm_s16le) -> aresample (graph 1)
   Stream #1:0 (aac) -> amix:input1 (graph 1)
   drawtext (graph 0) -> Stream #0:0 (libx264)
   amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[libx264 @ 0x142f2c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 
AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x142f2c0] profile High, level 3.1
[libx264 @ 0x142f2c0] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC 
codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - 
options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 
psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 
8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 
lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 
bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 
b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 
keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr 
mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 
ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-g.mov':
   Metadata:
     major_brand     : qt
     minor_version   : 512
     compatible_brands: qt
     encoder         : Lavf58.33.100
     Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), 
yuvj420p(pc, progressive), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 
tbn, 30 tbc (default)
     Metadata:
       encoder         : Lavc58.59.101 libx264
     Side data:
       cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: N/A
     Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, 
fltp, 128 kb/s (default)
     Metadata:
       encoder         : Lavc58.59.101 aac
frame=  420 fps= 14 q=-1.0 Lsize=    8270kB time=00:00:13.90 
bitrate=4873.7kbits/s speed=0.45x
video:8061kB audio:193kB subtitle:0kB other streams:0kB global 
headers:0kB muxing overhead: 0.185768%
[libx264 @ 0x142f2c0] frame I:2     Avg QP:14.84  size:195655
[libx264 @ 0x142f2c0] frame P:106   Avg QP:19.64  size: 59577
[libx264 @ 0x142f2c0] frame B:312   Avg QP:25.62  size:  4960
[libx264 @ 0x142f2c0] consecutive B-frames:  1.0%  0.0%  0.0% 99.0%
[libx264 @ 0x142f2c0] mb I  I16..4: 27.8% 28.6% 43.6%
[libx264 @ 0x142f2c0] mb P  I16..4:  1.2%  1.3%  0.6%  P16..4: 30.5% 
31.4% 22.6%  0.0%  0.0%    skip:12.5%
[libx264 @ 0x142f2c0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 36.0%  
7.7%  1.3%  direct: 4.2%  skip:50.8%  L0:37.6% L1:38.7% BI:23.8%
[libx264 @ 0x142f2c0] final ratefactor: 18.99
[libx264 @ 0x142f2c0] 8x8 transform intra:36.9% inter:54.6%
[libx264 @ 0x142f2c0] coded y,uvDC,uvAC intra: 56.3% 63.9% 51.8% inter: 
27.2% 19.7% 1.0%
[libx264 @ 0x142f2c0] i16 v,h,dc,p: 73%  9% 14%  4%
[libx264 @ 0x142f2c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 13% 39% 4%  6%  
6%  6%  5% 10%
[libx264 @ 0x142f2c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 14% 6%  9%  
9%  9%  8% 11%
[libx264 @ 0x142f2c0] i8c dc,h,v,p: 54% 24% 17%  5%
[libx264 @ 0x142f2c0] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x142f2c0] ref P L0: 41.5% 11.5% 40.8%  6.0%  0.2%
[libx264 @ 0x142f2c0] ref B L0: 93.4%  6.0%  0.6%
[libx264 @ 0x142f2c0] ref B L1: 99.4%  0.6%
[libx264 @ 0x142f2c0] kb/s:4716.52
[aac @ 0x142d800] Qavg: 297.740

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.mov':
     encoder         : Lavf58.33.100
   Duration: 00:00:14.00, start: 0.000000, bitrate: 4838 kb/s
     Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 
1024x768, 4716 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
       encoder         : Lavc58.59.101 libx264
     Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, 
fltp, 128 kb/s (default)

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.m4a':
     encoder         : Lavf58.33.100
   Duration: 00:00:12.33, start: 0.000000, bitrate: 130 kb/s
     Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 
stereo, fltp, 128 kb/s (default)

The audio is 1.70 seconds shorter, always. Different video input lengths 
and different audio lengths result in the same 1.70 seconds lost.

If I don't have any voice input and audio filter then the output streams 
match length, since they are from the same input video.

I've also tried first resampling the voice-over audio to 48000 and 
stereo first, then removing the aresample filter, leaving only the amix. 
Still bad audio.
Since the next step would be to mix the audio in audacity and remux it 
back together, I'll stop testing now and see what you think.

Stewart



More information about the ffmpeg-user mailing list