[FFmpeg-trac] #10477(undetermined:new): WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields
FFmpeg
trac at avcodec.org
Tue Jul 18 13:51:11 EEST 2023
#10477: WAV to AAC-HE conversion writes wrong "priming" and "remainder" info fields
-------------------------------------+-------------------------------------
Reporter: Maximilian | Type: defect
Mumme |
Status: new | Priority: normal
Component: | Version: git-
undetermined | master
Keywords: AAC libfdk- | Blocked By:
aac apple |
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
This only reproduces on Apple platforms (macOS, iOS).
When playing AAC-HE files encoded with FFmpeg with an audio player that
uses CoreAudio as its backend (e.g. QuickTime Player, QuickLook, AULab) we
noticed the first few frames are being cut off and not audible in
playback.
Assuming this was a bug in CoreAudio we reported an issue to Apple
Developer Technical Support. However, they were able to track it down to a
bug in FFmpeg.
Here are the steps to reproduce our findings:
First, install FFmpeg with AAC support from homebrew:
{{{
% brew tap homebrew-ffmpeg/ffmpeg
% brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac
}}}
In our case this installed
{{{
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.202)
configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1
--enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl
--enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus
--enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx
--enable-libx264 --enable-libx265 --enable-libfontconfig --enable-
libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash
--enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-
htmlpages --enable-libfdk-aac --enable-nonfree
}}}
The attached file `click_240bpm.wav` can be used as a sample file to
reproduce our findings. It contains a "high-low-low-low" click pattern
where the very first "high" click starts on the 0th frame of the file.
We converted this file to AAC-HE using FFmpeg with the following command
(output see [0]):
{{{
ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a
aac_he click_240bpm-ffmpeg.m4a
}}}
As a comparison, we can also convert the file to AAC-HE with Apple's
`afconvert` tool, which uses CoreAudio as its backend:
{{{
afconvert -d aach click_240bpm.wav click_240bpm-afconvert.m4a
}}}
Comparing these two files in a listening test with QuickTime Player we
noticed that the `afconvert` file plays back fine while for the `ffmpeg`
file the first "high" click is cut off so that the click pattern starts
with "low-low-low".
This can also be visualized by decoding the file to WAV again with
`afconvert` and then visualizing the waveform in e.g. ocenaudio
(screenshots attached):
{{{
afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffmpeg-dec.wav
afconvert -d LEI16 click_240bpm-afconvert.m4a click_240bpm-afconvert-
dec.wav
}}}
The Apple engineers then pointed us to the following reason for this
behaviour (quote):
> After looking into the M4A further, we figured out the root cause of the
problem. According to `afinfo` tool, the M4A file has 2529 samples leading
zeros and 3 samples trailing zeros.
> {{{
> % afinfo click_240bpm-ffmpeg.m4a
> [...]
> audio 1014300 valid frames + 2529 priming + 3 remainder = 1016832
> [...]
> }}}
>
> Since these numbers are based on 22.05 kHz sample rate of the AAC base
layer codec, the actual decoder output should have 5058(=2529*2) samples
leading zeros @ 44.1kHz sample rate. AudioCodecs has codec delay which is
a roundtrip delay from the encoder to the decoder. The leading zero is
corresponding to the codec delay. The decoder should skip this amount of
leading zeros samples to align with the encoder input.
>
> When we tried to decode the M4A file with ffmpeg tool, we realized that
ffmpeg tool skips just only 4096 samples ignoring “2529 priming”
information in the M4A file, and its output is aligned with the orignal
WAV file. ffmpeg tool should have put “2048 priming / 484 remainder” to
the M4A file. CoreAudio skipped 5058 samples according to the priming
information in the M4A, instead of 4096 samples, and it missed the first
note as you described. We think this is a bug of ffmpeg tool.
>
> If you force the priming information to be 2048 leading zeros and 484
trailing zeros with the following command, you would see the expected
output.
> {{{
> % afconvert -d LEI16 click_240bpm-ffmpeg.m4a click_240bpm-ffpmeg-dec.wav
--prime-override 2048 484
> }}}
We believe that this is also the root cause for the issues #2325 and
#5910.
---
[0] FFmpeg command output:
{{{
% ffmpeg -i click_240bpm.wav -vcodec copy -acodec libfdk_aac -profile:a
aac_he click_240bpm-ffmpeg.m4a
✔ miniconda3 12:44:55
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.202)
configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0-with-options_1
--enable-shared --cc=clang --host-cflags= --host-ldflags= --enable-gpl
--enable-libaom --enable-libdav1d --enable-libmp3lame --enable-libopus
--enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx
--enable-libx264 --enable-libx265 --enable-libfontconfig --enable-
libfreetype --enable-frei0r --enable-libass --enable-demuxer=dash
--enable-opencl --enable-audiotoolbox --enable-videotoolbox --disable-
htmlpages --enable-libfdk-aac --enable-nonfree
libavutil 58. 2.100 / 58. 2.100
libavcodec 60. 3.100 / 60. 3.100
libavformat 60. 3.100 / 60. 3.100
libavdevice 60. 1.100 / 60. 1.100
libavfilter 9. 3.100 / 9. 3.100
libswscale 7. 1.100 / 7. 1.100
libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'click_240bpm.wav':
Metadata:
encoded_by : Logic Pro X
date : 2023-06-2
creation_time : 11:40:2
time_reference : 158848200
umid :
0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
coding_history :
Duration: 00:00:46.00, bitrate: 2119 kb/s
Chapters:
Chapter #0:0: start 0.000000, end 46.000000
Metadata:
title : Tempo: 240.0
Stream #0:0: Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, 2
channels, s32 (24 bit), 2116 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s24le (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
Output #0, ipod, to 'click_240bpm-ffmpeg.m4a':
Metadata:
encoded_by : Logic Pro X
date : 2023-06-2
coding_history :
time_reference : 158848200
umid :
0x000000000000000000000000000000000000000000000000000000000000000000000000A819996B010000000000000000000000000000000000000000000000
encoder : Lavf60.3.100
Chapters:
Chapter #0:0: start 0.000000, end 46.000000
Metadata:
title : Tempo: 240.0
Stream #0:0: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 44100 Hz, stereo,
s16, 64 kb/s
Metadata:
encoder : Lavc60.3.100 libfdk_aac
size= 366kB time=00:00:45.95 bitrate= 65.3kbits/s speed= 115x
video:0kB audio:361kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 1.463942%
}}}
--
Ticket URL: <https://trac.ffmpeg.org/ticket/10477>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list