[FFmpeg-trac] #10477(undetermined:new): WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields

FFmpeg trac at avcodec.org
Tue Jul 18 13:51:11 EEST 2023


#10477: WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields
-------------------------------------+-------------------------------------
             Reporter:  Maximilian   |                     Type:  defect
  Mumme                              |
               Status:  new          |                 Priority:  normal
            Component:               |                  Version:  git-
  undetermined                       |  master
             Keywords:  AAC libfdk-  |               Blocked By:
  aac apple                          |
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 This only reproduces on Apple platforms (macOS, iOS).

 When playing AAC-HE files encoded with FFmpeg with an audio player that
 uses CoreAudio as its backend (e.g. QuickTime Player, QuickLook, AULab) we
 noticed the first few frames are being cut off and not audible in
 playback.
 Assuming this was a bug in CoreAudio we reported an issue to Apple
 Developer Technical Support. However, they were able to track it down to a
 bug in FFmpeg.

 Here are the steps to reproduce our findings:

 First, install FFmpeg with AAC support from homebrew:
 {{{
 % brew tap homebrew-ffmpeg/ffmpeg
 % brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac
 }}}

 In our case this installed
 {{{
 ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
   built with Apple clang version 14.0.0 (clang-1400.0.29.202)
   configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1
 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl
 --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus
 --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx
 --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-
 libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash
 --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-
 htmlpages --enable-libfdk-aac --enable-nonfree
 }}}

 The attached file `click_240bpm.wav` can be used as a sample file to
 reproduce our findings. It contains a "high-low-low-low" click pattern
 where the very first "high" click starts on the 0th frame of the file.

 We converted this file to AAC-HE using FFmpeg with the following command
 (output see [0]):
 {{{
 ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a
 aac_he click_240bpm-ffmpeg.m4a
 }}}

 As a comparison, we can also convert the file to AAC-HE with Apple's
 `afconvert` tool, which uses CoreAudio as its backend:
 {{{
 afconvert -d aach click_240bpm.wav click_240bpm-afconvert.m4a
 }}}

 Comparing these two files in a listening test with QuickTime Player we
 noticed that the `afconvert` file plays back fine while for the `ffmpeg`
 file the first "high" click is cut off so that the click pattern starts
 with "low-low-low".

 This can also be visualized by decoding the file to WAV again with
 `afconvert` and then visualizing the waveform in e.g. ocenaudio
 (screenshots attached):
 {{{
 afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffmpeg-dec.wav
 afconvert -d LEI16 click_240bpm-afconvert.m4a click_240bpm-afconvert-
 dec.wav
 }}}

 The Apple engineers then pointed us to the following reason for this
 behaviour (quote):

 > After looking into the M4A further, we figured out the root cause of the
 problem. According to `afinfo` tool, the M4A file has 2529 samples leading
 zeros and 3 samples trailing zeros.
 > {{{
 > % afinfo click_240bpm-ffmpeg.m4a
 > [...]
 > audio 1014300 valid frames + 2529 priming + 3 remainder = 1016832
 > [...]
 > }}}
 >
 > Since these numbers are based on 22.05 kHz sample rate of the AAC base
 layer codec, the actual decoder output should have 5058(=2529*2) samples
 leading zeros @ 44.1kHz sample rate. AudioCodecs has codec delay which is
 a roundtrip delay from the encoder to the decoder. The leading zero is
 corresponding to the codec delay. The decoder should skip this amount of
 leading zeros samples to align with the encoder input.
 >
 > When we tried to decode the M4A file with ffmpeg tool, we realized that
 ffmpeg tool skips just only 4096 samples ignoring “2529 priming”
 information in the M4A file, and its output is aligned with the orignal
 WAV file. ffmpeg tool should have put “2048 priming / 484 remainder” to
 the M4A file. CoreAudio skipped 5058 samples according to the priming
 information in the M4A, instead of 4096 samples, and it missed the first
 note as you described. We think this is a bug of ffmpeg tool.
 >
 > If you force the priming information to be 2048 leading zeros and 484
 trailing zeros with the following command, you would see the expected
 output.
 > {{{
 > % afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffpmeg-dec.wav
 --prime-override 2048 484
 > }}}

 We believe that this is also the root cause for the issues #2325 and
 #5910.

 ---
 [0] FFmpeg command output:
 {{{
 % ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a
 aac_he click_240bpm-ffmpeg.m4a
  ✔  miniconda3   12:44:55
 ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
   built with Apple clang version 14.0.0 (clang-1400.0.29.202)
   configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1
 --enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl
 --enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus
 --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx
 --enable-libx264 --enable-libx265 --enable-libfontconfig --enable-
 libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash
 --enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-
 htmlpages --enable-libfdk-aac --enable-nonfree
   libavutil      58.  2.100 / 58.  2.100
   libavcodec     60.  3.100 / 60.  3.100
   libavformat    60.  3.100 / 60.  3.100
   libavdevice    60.  1.100 / 60.  1.100
   libavfilter     9.  3.100 /  9.  3.100
   libswscale      7.  1.100 /  7.  1.100
   libswresample   4. 10.100 /  4. 10.100
   libpostproc    57.  1.100 / 57.  1.100
 Guessed Channel Layout for Input Stream #0.0 : stereo
 Input #0, wav, from 'click_240bpm.wav':
   Metadata:
     encoded_by      : Logic Pro X
     date            : 2023-06-2
     creation_time   : 11:40:2
     time_reference  : 158848200
     umid            :
 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
     coding_history  :
   Duration: 00:00:46.00, bitrate: 2119 kb/s
   Chapters:
     Chapter #0:0: start 0.000000, end 46.000000
       Metadata:
         title           : Tempo: 240.0
   Stream #0:0: Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, 2
 channels, s32 (24 bit), 2116 kb/s
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_s24le (native) -> aac (libfdk_aac))
 Press [q] to stop, [?] for help
 Output #0, ipod, to 'click_240bpm-ffmpeg.m4a':
   Metadata:
     encoded_by      : Logic Pro X
     date            : 2023-06-2
     coding_history  :
     time_reference  : 158848200
     umid            :
 0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
     encoder         : Lavf60.3.100
   Chapters:
     Chapter #0:0: start 0.000000, end 46.000000
       Metadata:
         title           : Tempo: 240.0
   Stream #0:0: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 44100 Hz, stereo,
 s16, 64 kb/s
     Metadata:
       encoder         : Lavc60.3.100 libfdk_aac
 size=     366kB time=00:00:45.95 bitrate=  65.3kbits/s speed= 115x
 video:0kB audio:361kB subtitle:0kB other streams:0kB global headers:0kB
 muxing overhead: 1.463942%
 }}}
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10477>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list