[FFmpeg-trac] #8631(undetermined:new): Audio gapless playback metadata for MP4/AAC
FFmpeg
trac at avcodec.org
Fri Apr 24 07:53:00 EEST 2020
#8631: Audio gapless playback metadata for MP4/AAC
-------------------------------------+-------------------------------------
Reporter: | Type: defect
johnkaplantech |
Status: new | Priority: normal
Component: | Version:
undetermined | unspecified
Keywords: gapless, | Blocked By:
audio, AAC |
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Background
Gapless playback of audio tracks is enabled by many standard audio players
in mobile devices. It allows successive audio tracks to play without pause
or perceptible audio flaw as a seamless whole. Gapless playback is
required for listeners to hear many live, classical, and classic rock
recordings as intended. External sources describing gapless playback in
detail include https://en.wikipedia.org/wiki/Gapless_playback and
https://wiki.hydrogenaud.io/index.php?title=Gapless_playback.
The above references describe several theoretical sources for gaps to be
introduced among varying electronic audio formats, but the prevalent
sources are lossy compression technologies such as MP3 and AAC, which
introduce extra samples before and after the original PCM data of an audio
track as a part of their encoding processes. Because the length of the
extra data can vary, and metadata describing its length is not included in
these compression standards, it cannot naturally be stripped away as a
part of the decoding process.
But the packaging technology can access the pertinent data from the audio
encoder, and include it in file metadata for the audio players to access.
This is what ffmpeg can do. The samples added to the front of the audio
are called "delay" and the samples added to the end are called "padding."
For the audio players to strip off the extra samples to get to a gapless
audio track, pertinent values are the lengths in samples of: the delay,
the original unpadded PCM audio samples, and the padding.
As far as I know, there is no documented standard specifically for gapless
playback metadata to be encoded in an audio file and interpreted by audio
players. (If anyone has any inside information to the contrary, please
include as a comment - I and a lot of others would be grateful for the
insight.) But many audio players apparently follow de facto standards
which involve reading metadata from the file headers that provide enough
information about track length and delay to frame the original unpadded
audio track.
Proposed Solution
The solution that this bug request proposes first for ffmpeg applies to
AAC audio packaged in an MP4 file. The proposal is to adapt the
moov/edts/elst atoms as described by iso14496-12 to add a single elst atom
inside a single edts atom per track. Then inside this elst, to write the
count of the unpadded audio PCM samples as the "track duration"/"time
length" field, and the count of the delay samples as the "start
time"/"media time" field. Audio players use these to skip over the delay
samples within the provided track data, isolate the original PCM audio
samples, and ignore the padding at the end, so the padding length is not
explicitly included in the metadata. My team has experimented with audio
tracks processed this way using the fdk-aac tool, and they play gaplessly
on both iOS and Android standard audio players.
Tech Details
Here are some issues about the design & coding of this request. I'm hoping
the community will jump in and comment to help me nail down the details so
I can move on to coding a patch.
As several of you have commented before, there is currently some code in
ffmpeg that produces an elst atom, controlled by a command-line switch
"use-editlist."
I believe that this use of an edit list for movie synchronization is a
different use case than its use for gapless audio. Of course if any of you
can set me straight on how a single routine could cover both use cases,
I'll attempt to satisfy that requirement. But short of that I propose to
add a second command-line switch "gapless-editlist" that will peacefully
co-exist, but be mutually exclusive with the current switch, and will
control the emission of an elst atom with gapless metadata. Whether that's
the ultimate shape of the best solution or not, at least for the time
being it will avoid regressions for users relying on the current
implementation.
One detail I need help on is how to locate the sample lengths of the
encoder delay and original PCM audio samples for an AAC encoding in the
data available to the atom-writing functions in movenc.c. (i.e. something
accessible from (AVIOContext *pb, MOVMuxContext *mov, MOVTrack *trac) )
When we discussed previously, Martin Balint suggested I look in the side
data AV_PKT_DATA_SKIP_SAMPLES for the delay value, and I found several
references to that variable for different encoding formats. Unfortunately
it seems to be used differently for each encoding format, and I don't know
how to locate the same or equivalent data for AAC encoding. I'm also not
sure of the interface between the fdk-aac module and ffmpeg. I will keep
digging and hope I eventually find it, but if any maintainers out there
have direct knowledge to point me to some code or a data structure
definition, I'd be most grateful.
As to a more general solution that works with other encoders, I'm game to
help out with that once the MP4/AAC case is done. I'll keep
experimenting/investigating while waiting for responses. I apologize for
the long time since I discussed this before, but it's up to the top of my
spare-time priority list now so I'm actively working it.
Thanks,
John
--
Ticket URL: <https://trac.ffmpeg.org/ticket/8631>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list