[FFmpeg-devel] [PATCH] mov/aac: skip initial aac padding

Justin Greer justin at zencoder.com
Wed Jul 18 00:31:17 CEST 2012


On 2012-07-17 10:20 AM, Michael Niedermayer wrote:
> On Sun, Jul 15, 2012 at 04:49:06PM +0200, Nicolas George wrote:
>> L'octidi 28 messidor, an CCXX, Michael Niedermayer a écrit :
>>
>>> the amount of samples to skip can be encoder dependant, 2112 is just
>>> the fallback value if there is nothing specified (i have no file that
>>> specifies anything it seems so i implemented just that case)
>> Until we can get our hands on samples and examine them, any solution that
>> works will do, IMHO. I am worried that doing things in the demuxer requires
>> adding special cases to all demuxers that can contain AAC, while doing it in
>> the decoder will always be in effect. That is what I kept in mind while
>> working on libopus.
> the way i understood it was that there is not enough information
> in the aac bitstream to know what and where to skip. If thats untrue
> and it can all be done purely in the decoder then this indeed would
> be very preferable
>
>

Unfortunately, I think this patch will cause some issues, because 2112 
only seems correct for files encoded with Apple's tools.  (Yes, I have 
samples -- see below.)

As far as I've found, the AAC bitstream itself (and simple ADTS-wrapped 
files) don't contain any info about the priming samples, which is 
unfortunate.  However, some of those do contain a FIL element at the 
beginning of the first AAC frame that identifies the encoder.  (It looks 
like ffmpeg adds this element too.)  So you can often identify the 
encoding tool, if you're okay with setting defaults based on the 
encoding tool.

Anyway, with an mp4/m4a file, the common thing encoders do is add 
iTunes-compatible "gapless playback" metadata with the priming/remainder 
info.  It looks roughly like this:


|   udta - User Data at 994 (421 bytes)
|   |   meta - Meta Data at 1002 (387 bytes)
|   |   |   hdlr - Handler Description at 1014 (34 bytes)
|   |   |   |   Handler Type: mdir
|   |   |   |   Handler Manufacturer: appl
|   |   |   |   Name:
|   |   |   ilst - The iTunes/iPod Container Box at 1048 (341 bytes)
|   |   |   |   (---- - iTunes Freeform Metadata) at 0 (97 bytes)
|   |   |   |   |   (mean - iTunes Freeform Metadata Meaning) at 0 (28 
bytes)
|   |   |   |   |   |   Value: com.apple.iTunes
|   |   |   |   |   (name - iTunes Freeform Metadata Name) at 28 (16 bytes)
|   |   |   |   |   |   Value: cdec
|   |   |   |   |   (data - iTunes Freeform Metadata Data) at 44 (45 bytes)
|   |   |   |   |   |   Value: ndaudio 1.5.4.0 / -q 0.50 -lc
|   |   |   |   (---- - iTunes Freeform Metadata) at 97 (188 bytes)
|   |   |   |   |   (mean - iTunes Freeform Metadata Meaning) at 0 (28 
bytes)
|   |   |   |   |   |   Value: com.apple.iTunes
|   |   |   |   |   (name - iTunes Freeform Metadata Name) at 28 (20 bytes)
|   |   |   |   |   |   Value: iTunSMPB
|   |   |   |   |   (data - iTunes Freeform Metadata Data) at 48 (132 bytes)
|   |   |   |   |   |   Value:  00000000 00000A40 00000138 
0000000000015888 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
|   |   |   |   |   |   Priming:   2624
|   |   |   |   |   |   Remainder: 312
|   |   |   |   |   |   Samples:   88200
|   |   |   |   (?too - Encoding Tool) at 285 (48 bytes)
|   |   |   |   |   (data - iTunes Freeform Metadata Data) at 0 (40 bytes)
|   |   |   |   |   |   Value: Nero AAC codec / 1.5.4.0

Sorry if that's very ugly.  But basically the line of hex data (which is 
literally a string in the file with space-delimited hex values, just as 
you see) includes the priming, remainder, and sample counts as the 2nd, 
3rd, and 4th values in the iTunSMPB data string. Also note that the 
encoder is often put into encoding tool metadata "\xA9too" (copyright 
sign, then "too").  So that can be used to make assumptions about 
priming samples.

Unfortunately, faac doesn't currently add the iTunSMPB info, but it 
tends to be a consistent 1024 samples of priming.  (And when generating 
adts files, it adds the fill element I mentioned, so 1024 can be assumed 
if that's found.)

Anyway, various samples can be found at: 
https://zencoder-public-test-files.s3.amazonaws.com/tones_aac_samples.zip (About 
a megabyte for the zip.)

I created the samples with Nero 1.5.4.0, the "afconvert" utility from 
Mac OSX Lion, faac 1.28, and Dolby Pulse 1.1.1.  I included variations 
on mono/stereo, 16 and 44.1 KHz, and LC/HE/HEv2 where applicable.  I 
also included some adts-formatted versions from tools that supported 
it.  Just for fun, there are a couple examples encoded by QuickTime 7, 
and QuickTime X.  The original WAV files are in there for comparison, too.

Unfortunately, Pulse doesn't include any recognizable information about 
its priming delays, and by manual inspection, I've found them to vary 
wildly:

mono LC = 2974 samples
mono HE = 4608 samples
stereo LC = 2880 samples
stereo HE = 4618 samples
stereo HEv2 = 7490 samples

(Those may be off by a sample or two, since it was a manual/visual 
inspection process.)

I'm sure there's info I've forgotten to include, so let me know if you 
have any further questions, or need different samples.  I could take a 
stab at patching to include some of this detection/assumption, but I'm 
out of time for today.

-- Justin



More information about the ffmpeg-devel mailing list