[FFmpeg-trac] #2325(undetermined:new): MP4 AAC Audio is delayed by 2ms when converted to PCM

FFmpeg trac at avcodec.org
Wed Mar 6 16:48:00 CET 2013


#2325: MP4 AAC Audio is delayed by 2ms when converted to PCM
-------------------------------------+-------------------------------------
             Reporter:  brchapman    |                    Owner:
                 Type:  defect       |                   Status:  new
             Priority:  important    |                Component:
              Version:  git-master   |  undetermined
             Keywords:  aac mov      |               Resolution:
  regression                         |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------

Comment (by brchapman):

 Replying to [comment:11 cehoyos]:
 > Replying to [comment:10 brchapman]:
 > > Also, if I encode test100.mp4 without the aac audio stream (ie with no
 audio) and then convert it:
 > > {{{
 > > % ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov
 > > }}}
 > > I don't get the duplicate first frame bug in #2324
 >
 > You are using a different input file that is cfr, your original sample
 has a longer first frame (that needs to be duplicated to get cfr output).
 >
 > > Based on this I would guess that this would work:
 > > {{{
 > > ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov
 > > }}}
 > > However, it doesn't. The first frame is still duplicated.
 >
 > Because the timestamps still require a duplication (they do not change
 just because you don't encode the audio). Use -vsync 0 to ignore the
 timestamps so that no frame duplication happens.
 So when I use -vsync 0, the first frame isn't duplicated, but rather the
 first frame is now completely black.  Any other flags I can use to get rid
 of this?

 I'd use -ss to skip past the first frame, which works on it's own.
 However if I try to use it with -filter_complex overlay and an image
 sequence that's overlaid on top of the source video, the sequence doesn't
 end up starting until frame 2 (frame 1 on screen). Here's that command:
 {{{
 % ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps
 -r 24 -start_number 1 -i test100_hu

 ffmpeg version N-37747-g058e1f8 Copyright (c) 2000-2013 the FFmpeg
 developers
   built on Mar  5 2013 19:38:09 with llvm-gcc 4.2.1 (LLVM build
 2336.11.00)
   configuration: --prefix=/usr/local/ --enable-shared --enable-pthreads
 --enable-gpl
   libavutil      52. 17.103 / 52. 17.103
   libavcodec     54. 92.100 / 54. 92.100
   libavformat    54. 63.103 / 54. 63.103
   libavdevice    54.  3.103 / 54.  3.103
   libavfilter     3. 42.103 /  3. 42.103
   libswscale      2.  2.100 /  2.  2.100
   libswresample   0. 17.102 /  0. 17.102
   libpostproc    52.  2.100 / 52.  2.100
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4':
   Metadata:
     major_brand     : mp42
     minor_version   : 0
     compatible_brands: mp42mp41
     creation_time   : 2013-03-04 21:40:01
   Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s
     Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p,
 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc
     Metadata:
       creation_time   : 2013-03-04 21:40:01
       handler_name    : Mainconcept MP4 Video Media Handler
     Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo,
 fltp, 189 kb/s
     Metadata:
       creation_time   : 2013-03-04 21:40:01
       handler_name    : Mainconcept MP4 Sound Media Handler
 [image2 @ 0x7fe4d9033c00] max_analyze_duration 5000000 reached at 5000000
 microseconds
 Input #1, image2, from 'test100_hud/test100_transcoder%05d.png':
   Duration: 00:00:12.50, start: 0.000000, bitrate: N/A
     Stream #1:0: Video: png, rgba, 1280x720, 24 fps, 24 tbr, 24 tbn, 24
 tbc
 [prores @ 0x7fe4d9466400] encoding with ProRes standard (apcn) profile
 [prores @ 0x7fe4d946a800] encoding with ProRes standard (apcn) profile
 [prores @ 0x7fe4d946d000] encoding with ProRes standard (apcn) profile
 [prores @ 0x7fe4d946f800] encoding with ProRes standard (apcn) profile
 [prores @ 0x7fe4d9038000] encoding with ProRes standard (apcn) profile
 Output #0, mov, to 'test100_ffmpeg.mov':
   Metadata:
     major_brand     : mp42
     minor_version   : 0
     compatible_brands: mp42mp41
     encoder         : Lavf54.63.103
     Stream #0:0: Video: prores (apcn) (apcn / 0x6E637061), yuv422p10le,
 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 12288 tbn, 24 tbc
 Stream mapping:
   Stream #0:0 (h264) -> overlay:main
   Stream #1:0 (png) -> overlay:overlay
   overlay -> Stream #0:0 (prores)
 Press [q] to stop, [?] for help
 frame=  300 fps= 34 q=0.0 Lsize=    8944kB time=00:00:12.50
 bitrate=5861.8kbits/s
 video:8942kB audio:0kB subtitle:0 global headers:0kB muxing overhead
 0.021776%
 ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r
 24  15.18s user 0.24s system 174% cpu 8.838 total
 }}}

 Looking at mediainfo for test100.mp4, I can see that the video track is
 cfr, where as the audio is variable.  Which I'm guessing causes the
 "Overall bit rate mode" to become variable.  Is this what your talking
 about?
 {{{
 % mediainfo test100.mp4
 General
 Complete name                            : test100.mp4
 Format                                   : MPEG-4
 Format profile                           : Base Media / Version 2
 Codec ID                                 : mp42
 File size                                : 433 KiB
 Duration                                 : 12s 500ms
 Overall bit rate mode                    : Variable
 Overall bit rate                         : 284 Kbps
 Encoded date                             : UTC 2013-03-04 21:40:01
 Tagged date                              : UTC 2013-03-04 21:41:11
 ?TIM                                     : 00:00:00:00
 ?TSC                                     : 24
 ?TSZ                                     : 1

 Video
 ID                                       : 1
 Format                                   : AVC
 Format/Info                              : Advanced Video Codec
 Format profile                           : Main at L5.1
 Format settings, CABAC                   : Yes
 Format settings, ReFrames                : 3 frames
 Format settings, GOP                     : M=4, N=33
 Codec ID                                 : avc1
 Codec ID/Info                            : Advanced Video Coding
 Duration                                 : 12s 500ms
 Bit rate                                 : 85.5 Kbps
 Width                                    : 1 280 pixels
 Height                                   : 720 pixels
 Display aspect ratio                     : 16:9
 Frame rate mode                          : Constant
 Frame rate                               : 24.000 fps
 Standard                                 : NTSC
 Color space                              : YUV
 Chroma subsampling                       : 4:2:0
 Bit depth                                : 8 bits
 Scan type                                : Progressive
 Bits/(Pixel*Frame)                       : 0.004
 Stream size                              : 130 KiB (30%)
 Language                                 : English
 Encoded date                             : UTC 2013-03-04 21:40:01
 Tagged date                              : UTC 2013-03-04 21:40:01

 Audio
 ID                                       : 2
 Format                                   : AAC
 Format/Info                              : Advanced Audio Codec
 Format profile                           : LC
 Codec ID                                 : 40
 Duration                                 : 12s 500ms
 Source duration                          : 12s 501ms
 Bit rate mode                            : Variable
 Bit rate                                 : 192 Kbps
 Maximum bit rate                         : 329 Kbps
 Channel(s)                               : 2 channels
 Channel positions                        : Front: L R
 Sampling rate                            : 48.0 KHz
 Compression mode                         : Lossy
 Stream size                              : 289 KiB (67%)
 Source stream size                       : 289 KiB (67%)
 Language                                 : English
 Encoded date                             : UTC 2013-03-04 21:40:01
 Tagged date                              : UTC 2013-03-04 21:40:01
 }}}

 Replying to [comment:12 cehoyos]:
 > I tried different players and re-encoded the original sample and
 FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)
 >
 > At what frames are the gongs supposed to play? Ie, which numbers should
 be shown on screen at the time each gong starts?
 Gongs start on 0, 41, 84, 116, 167, 207, 246, 285

-- 
Ticket URL: <https://ffmpeg.org/trac/ffmpeg/ticket/2325#comment:13>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list