[FFmpeg-devel] [PATCH v3 3/3] aadec: fix seeking in mp3 content

Karsten Otto ottoka at posteo.de
Sat Jul 7 12:46:40 EEST 2018


> Am 06.07.2018 um 23:32 schrieb Michael Niedermayer <michael at niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Fri, Jul 06, 2018 at 10:49:46AM +0200, Karsten Otto wrote:
>> 
>>> Am 04.07.2018 um 23:54 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
>>>> 
>>>>> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>> 
>>>>> Signierter PGP-Teil
>>>>> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>>>>>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>>>>>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>>>>>> be used for frame or chapter detection, unfortunately.
>>>>>> 
>>>>>> More details inline below.
>>>>>> 
>>>>>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>>>> 
>>>>>>> Signierter PGP-Teil
>>>>>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>>>>>> 
>>>>>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>>>>>> 
>>>>>>>>> Signierter PGP-Teil
>>>>>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>>>>>> also adjust timestamp information accordingly.
>>>>>>>>>> ---
>>>>>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>>>>>> 
>>>>>>>>> This codec specific code in demuxers should not be needed
>>>>>>>>> 
>>>>>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>>>>>> 
>>>>>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>>>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>>>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>>>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>>>>>> painful while wearing headphones!
>>>>>>> 
>>>>>>> I think you mis-diagnose this at least somewhat
>>>>>>> your code searches for a specific mp3 header, the parser and decoder would
>>>>>>> accept a wider range of mp3 variants.
>>>>>>> But both can choose points that are not mp3 frame starts. (if that is the
>>>>>>> problem you are seeing, iam not completely sure it is)
>>>>>>> 
>>>>>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>>>>>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>>>>>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>>>>>> By looking for the more restricted header, my patch finds the real next frame at
>>>>>> offset 78.
>>>>>> 
>>>>>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>>>>>> version ID?
>>>>>> 
>>>>>>> Also is the more restricted header you search for always used or could
>>>>>>> this be failing with some files ?
>>>>>>> 
>>>>>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>>>>>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>>>>>> 
>>>>>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>>>>>> main header which points to chunks. The one file i found has 12 such chunks
>>>>>>> the first represents the whole file i suspect, the next largest the audio
>>>>>>> data, another one the metadata.
>>>>>>> I guess the remaining 2 large ones could be a cover image and an index.
>>>>>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>>>>>> order in each file. I guess thats why the original aadec.c implementation just
>>>>>> looks for the largest chunk to play.
>>>>>> 
>>>>>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>>>>>> values. the first in the file i have goes from 0 to 3 the second starts
>>>>>>> multiple times from 0 and seems monotonly increasing and staying within
>>>>>>> the filesize.
>>>>>>> The sample i have does not store mp3 but it looks like this is a index
>>>>>>> maybe offsets for packets in each of the 3 chapters.
>>>>>>> 
>>>>>>> Please look at the data, if it can be used. It would be much better than
>>>>>>> scaning the file linearly and searching for some byte sequence to find
>>>>>>> packet starts.
>>>>>>> 
>>>>>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>>>>>> offsets from the index.
>>>>>> 
>>>>> 
>>>>>> Long answer:
>>>>>> All offsets in the index are the same, and matching the "codec_second_size"
>>>>>> = crypto chunk size, roughly one second of audio:
>>>>>> - 1045 for format 2 (SIPR 8kbps)
>>>>>> - 2000 for format 3 (SIPR 16kbps)
>>>>>> - 3982 for format 4 (MP3 32kbps)
>>>>>> This is different from the respective frame size, which is 19, 20, and 104/105
>>>>> 
>>>>> The first 2 are exact multiples of the frame size.
>>>>> 
>>>>> I dont have a sample for the mp3 case so i cannot check but are you sure
>>>>> the actually writen numbers dont match up multiples of mp3 frames ?
>>>>> 
>>>> Yes, I have tested both with some old aa books I still had on disk, and some I
>>>> bought more recently. Using -fdebug ts, I can clearly see a packet size of
>>>> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
>>>> specialized header detection on every chunk, I see that the actual mp3 frame
>>>> header indeed starts at offset 0 in chunk 0. But then it changes for every
>>>> subsequent chunk, I have seen a maximum of 103, as to be expected.
>>> 
>>> Is there a public "aa" file with mp3 ?
>>> Id like to look at this myself
>>> also we should add such a file to fatesamples and add some tests with it
>>> 
>> Sadly no, at least not legally. aa files are always personalized to a user account.
>> Audible offers some free of charge titles, you can get them without a subscription,
>> if you sign up / use an Amazon account. To get the aa version, choose
>> "normal quality"; otherwise you get the successor format aax, which is mp4.
>> 
>>> 
>>> 
>>>> 
>>>>> If not the question that remains is how does the official code seek in this?
>>>>> It seems a bit odd that they would get the design of the index wrong but then
>>>>> implement demuxer side searching for a mp3 header not noticing that this is
>>>>> not working fully reliable.
>>>>> [This would imply that while they implemented the demuxer they would have
>>>>> been aware that their index doesnt work, why would they not fix it at this
>>>>> point ...]
>>>>> Its hard to imagine they would not notice this bug with the index at all,
>>>>> not that this is impossible or anything
>>>>> 
>>>> 
>>>> Agreed. Here is how I think this is actually supposed to work: Every encrypted
>>>> chunk contains roughly one second of audio. So, if you want to seek to a
>>>> specific second, you can just look up an index entry. This tells you the chapter
>>>> number and the offset in the chapter. Now, if you already know the chapter
>>>> offsets, you can jump right to the correct position in the audio section. (I assume
>>>> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
>>>> Regarding the odd mp3 frame offsets, a more restricted header detection will
>>>> take care of that, as I do in patch 3/3.
>>> 
>>> The more restricted header detection will fail less often. But it would be
>>> unexpected that it never fails. Theres not really anything that prevents
>>> these specific sequence of bits to occur in places other than the header
>>> 
>>> All this makes it a bit hard to belive for me that this is how the format is
>>> intended to work.
>>> 
>> As I said, I think mp3 in aa is more a retrofit than intentional design. It works if
>> they always use the exact same encoding (as they do from my observations).
>> Then they can used an *exact* header match to sync, and the chance of the
>> same 32 bit pattern occurring unintentionally in audio content is highly unlikely.
> 
> the code looks like it checks 21 bits not 32.
> in random data this would be expected to occur once in 2mb. That does not seem
> highly unlikely to me
> 
Sure, I did not implement a strict check at that time because I was not sure what
kinds of encoding I should expect. After further testing, I think it could be fixed to
everything but the padding bit, i.e. 31 bit check. Anyway, I will not be doing that
anymore, but leave it to the regular mpeg header parser.

(Actually I think there is a way to improve libavcodec/mpegaudiodecheader.c
by explicitly checking for the reserved pattern of the MPEG version. It will avoid
at least some of the false positives I observed. But that will be a separate patch.)

> also if you are sure it can never be vbr and its always the hardcoded format
> in the check then you can do even better probably.
> As you can calculate where each packet starts and allowing for +-1 rounding
> in the packet position it should be possible to find this without checking more
> then a few positions around the exact location based on CBR
> 
Great idea! By calculating the frame start offset, I don't need to scan for headers
anymore, but can predict where it is likely to be. The mp3 parser can do the rest.
Much cleaner solution from an architecture point of view.

I did some testing, most books don't use frame padding at all, so this frame offset
prediction is spot on! I did find one book that has padding, but it also has some
extra data at the beginning of of its main chapter (ID3 tag?). This throws off the
calculation by at most one frame, i.e. 104 bytes, which is no worse than without
offset prediction. I assume it is a rare case anyway, so I am inclined to go with
this solution. Will prepare a new patch set soon.

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> No snowflake in an avalanche ever feels responsible. -- Voltaire
> 
> 



More information about the ffmpeg-devel mailing list