[FFmpeg-trac] #10458(undetermined:new): MPEG4 demuxing: last sample's duration ignored (was: MPEG4 AAC decoding: end padding not trimmed)

FFmpeg trac at avcodec.org
Tue Jul 11 14:48:45 EEST 2023


#10458: MPEG4 demuxing: last sample's duration ignored
-------------------------------------+-------------------------------------
             Reporter:  John Regan   |                    Owner:  (none)
                 Type:  defect       |                   Status:  new
             Priority:  normal       |                Component:
                                     |  undetermined
              Version:  unspecified  |               Resolution:
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Changes (by John Regan):

 * summary:  MPEG4 AAC decoding: end padding not trimmed => MPEG4 demuxing:
     last sample's duration ignored


Old description:

> It seems like ffmpeg is properly removing the front padding from m4a
> files, but doesn't account for the end padding added to round frames up
> to 1024 samples.
>
> How to reproduce:
> {{{
> % ffmpeg -f lavfi -i anullsrc=r=48000:d=2 source.wav
>
> # verify the created audio file as exactly 96000 samples
> % soxi -s source.wav
> 96000
>
> # encode to aac
> % ffmpeg -i source.wav -c:a aac encoded.m4a
>
> # decode back to wav
> % ffmpeg -i encoded.m4a destination.wav
>
> # observe the sample count != 96000
> % soxi -s destination.wav
> 96256
> }}}
>
> Using boxdumper from l-smash, I can verify that ffmpeg correctly added an
> edit list box:
>
> {{{
> [edts: Edit Box]
>     position = 845
>     size = 36
>     [elst: Edit List Box]
>         position = 853
>         size = 28
>         version = 0
>         flags = 0x000000
>         entry_count = 1
>         entry[0]
>             segment_duration = 2000
>             media_time = 1024
>             media_rate = 1.000000
> }}}
>
> Additionally, there's a media header box with a duration set to 97024 -
> and subtracting the 1024 from the edit list box yields 96000:
>
> {{{
> [mdhd: Media Header Box]
>     position = 889
>     size = 32
>     version = 0
>     flags = 0x000000
>     creation_time = UTC 1904/01/01, 00:00:00
>     modification_time = UTC 1904/01/01, 00:00:00
>     timescale = 48000
>     duration = 97024 (00:00:02.021)
>     language = und
>     pre_defined = 0x0000
> }}}
>
> and the Decoding Time to Sample Box also adds up to 97024 - 94 samples at
> 1024 frames and 1 sample at 768 frames.
>
> {{{
> [stts: Decoding Time to Sample Box]
>     position = 1140
>     size = 32
>     version = 0
>     flags = 0x000000
>     entry_count = 2
>     entry[0]
>         sample_count = 94
>         sample_delta = 1024
>     entry[1]
>         sample_count = 1
>         sample_delta = 768
> }}}
>
> ffmpeg version info:
> {{{
> ffmpeg version n6.0 Copyright (c) 2000-2023 the FFmpeg developers
>   built with gcc 13.1.1 (GCC) 20230429
>   configuration: --prefix=/usr --disable-debug --disable-static
> --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm
> --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-
> gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray
> --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype
> --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack
> --enable-libjxl --enable-libmfx --enable-libmodplug --enable-libmp3lame
> --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-
> libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse
> --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex
> --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora
> --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis
> --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265
> --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg
> --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-
> shared --enable-version3 --enable-vulkan
>   libavutil      58.  2.100 / 58.  2.100
>   libavcodec     60.  3.100 / 60.  3.100
>   libavformat    60.  3.100 / 60.  3.100
>   libavdevice    60.  1.100 / 60.  1.100
>   libavfilter     9.  3.100 /  9.  3.100
>   libswscale      7.  1.100 /  7.  1.100
>   libswresample   4. 10.100 /  4. 10.100
>   libpostproc    57.  1.100 / 57.  1.100
> }}}

New description:

 It seems like ffmpeg is properly removing the front padding from audio in
 mp4 files, but doesn't account for the end padding added to audio frames
 to round up to the frame length.

 This is signaled by the mp4 file listing a different duration for the
 final sample - either via the Decoding Time to Sample Box box, or for
 fragmented mp4s, the sample duration in the track fragment run box.

 How to reproduce:
 {{{
 % ffmpeg -f lavfi -i anullsrc=r=48000:d=2 source.wav

 # verify the created audio file as exactly 96000 samples
 % soxi -s source.wav
 96000

 # encode to aac
 % ffmpeg -i source.wav -c:a aac encoded.m4a

 # decode back to wav
 % ffmpeg -i encoded.m4a destination.wav

 # observe the sample count != 96000
 % soxi -s destination.wav
 96256
 }}}

 Using boxdumper from l-smash, I can verify that ffmpeg correctly added an
 edit list box that lists total media duration, as well as the samples to
 trim from the beginning of the audio (the encoder delay):

 {{{
 [edts: Edit Box]
     position = 845
     size = 36
     [elst: Edit List Box]
         position = 853
         size = 28
         version = 0
         flags = 0x000000
         entry_count = 1
         entry[0]
             segment_duration = 2000
             media_time = 1024
             media_rate = 1.000000
 }}}

 The Decoding Time to Sample Box specifies the final sample is 768 frames.
 Doing the math: (94 samples * 1024 frames) + 768 = 97024 frames. Subtract
 the 1024 frames from the previous Edit List Box and you should have 96000
 samples.

 {{{
 [stts: Decoding Time to Sample Box]
     position = 1140
     size = 32
     version = 0
     flags = 0x000000
     entry_count = 2
     entry[0]
         sample_count = 94
         sample_delta = 1024
     entry[1]
         sample_count = 1
         sample_delta = 768
 }}}

 I think the issue may be the MP4 demuxer not signaling the final decoded
 packet's duration. This occurs if I use other codecs as well, for example
 mp3:

 {{{

 # using the same source.wav as above that's 96000 samples:
 % ffmpeg -i source.wav -c:a libmp3lame encoded-in-mp3.mp4
 % ffmpeg -i encoded-in-mp3.mp4 decoded-from-mp3.wav
 % soxi -s decoded-from-mp3.wav
 96815

 }}}

 Here's the edts box and stts from encoded-in-mp3.mp4:

 {{{
 [edts: Edit Box]
     position = 32900
     size = 36
     [elst: Edit List Box]
         position = 32908
         size = 28
         version = 0
         flags = 0x000000
         entry_count = 1
         entry[0]
             segment_duration = 2000
             media_time = 1105
             media_rate = 1.000000

 [stts: Decoding Time to Sample Box]
     position = 33205
     size = 32
     version = 0
     flags = 0x000000
     entry_count = 2
     entry[0]
         sample_count = 84
         sample_delta = 1152
     entry[1]
         sample_count = 1
         sample_delta = 337
 }}}

 So again doing some math: (84 samples * 1152 frames) + 337 frames = 97105
 frames. Subtract the 1105 frames from the edit list - 96000 frames.

 Another example with opus:

 {{{
 % ffmpeg -i source.wav -c:a libopus encoded-in-opus.mp4
 % ffmpeg -i encoded-in-opus.mp4 decoded-from-opus.wav
 % soxi -s decoded-from-opus.wav
 96648
 }}}

 Same issue with a fragmented mp4 - which doesn't have the Decoding Time to
 Sample Box and instead relies on either the Track Fragment Header Box or
 the Track Fragment Run Box for sample duration signaling.

 This does not seem to apply to codecs that carry their own duration
 signaling, like FLAC in mp4.

 ffmpeg version info:
 {{{
 ffmpeg version n6.0 Copyright (c) 2000-2023 the FFmpeg developers
   built with gcc 13.1.1 (GCC) 20230429
   configuration: --prefix=/usr --disable-debug --disable-static --disable-
 stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto
 --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-
 ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b
 --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi
 --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libjxl
 --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-
 libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg
 --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e
 --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt
 --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2
 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx
 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb
 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec
 --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-
 version3 --enable-vulkan
   libavutil      58.  2.100 / 58.  2.100
   libavcodec     60.  3.100 / 60.  3.100
   libavformat    60.  3.100 / 60.  3.100
   libavdevice    60.  1.100 / 60.  1.100
   libavfilter     9.  3.100 /  9.  3.100
   libswscale      7.  1.100 /  7.  1.100
   libswresample   4. 10.100 /  4. 10.100
   libpostproc    57.  1.100 / 57.  1.100
 }}}

--
Comment:

 Discovered this isn't limited to just AAC - I think it may apply to any
 codec that relies on the mp4 file to signal the last sample's duration
 (tested with the native aac encoder, libmp3lame, and libopus). I've
 updated the bug title and description accordingly.
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10458#comment:2>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list