[FFmpeg-devel] [PATCH] aacdec: Allow SBR after DRC.

Alex Converse alex.converse at gmail.com
Thu Dec 8 21:42:05 EET 2016

On Thu, Dec 8, 2016 at 2:14 AM, Rostislav Pehlivanov
<atomnuker at gmail.com> wrote:
> On 7 December 2016 at 01:08, Alex Converse <alex.converse at gmail.com> wrote:
>> Fixes https://www2.iis.fraunhofer.de/AAC/7.1auditionOutLeader_v2_rtb.mp4
>> Reported-by: rcombs on IRC
>> ---
>>  libavcodec/aacdec_template.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> diff --git a/libavcodec/aacdec_template.c b/libavcodec/aacdec_template.c
>> index 8cfa34b..64d46e3 100644
>> --- a/libavcodec/aacdec_template.c
>> +++ b/libavcodec/aacdec_template.c
>> @@ -3038,8 +3038,10 @@ static int aac_decode_frame_int(AVCodecContext
>> *avctx, void *data,
>>              break;
>>          }
>> -        che_prev       = che;
>> -        elem_type_prev = elem_type;
>> +        if (elem_type < TYPE_DSE) {
>> +            che_prev       = che;
>> +            elem_type_prev = elem_type;
>> +        }
>>          if (err)
>>              goto fail;
> I'm not quite following. So it prevents TYPE_DSE and above from getting
> into che_prev which goes into decode_extension_payload() which then decodes
> extensions. But DSE isn't the last in the enum, what about PCE, FIL and END
> elements?

The actual use of elem_type_prev is to describe the element type of
che_prev, but che only updates for element types SCE, CPE, CCE, and
LFE (0, 1, 2, and 3). (che_prev is currently updated every element,
but if che isn't updated on an element then it's a noop.)

>        if (elem_type < TYPE_DSE) {
>            if (!(che=get_che(ac, elem_type, elem_id))) {
>                av_log(ac->avctx, AV_LOG_ERROR, "channel element %d.%d is not allocated\n",
>                       elem_type, elem_id);
>                err = AVERROR_INVALIDDATA;
>                goto fail;
>            }
>            samples = 1024;
>            che->present = 1;
>        }

Allowing elem_type_prev to desynchronize from che_prev means that the
SBR decoder is provided a che but incorrectly informed of the type of
element the che represents.

For example, the frames of the stream look like this:
[aac @ 0x25659c0] Elem type:0 id:0  // SCE
[aac @ 0x25659c0] Elem type:6 id:3  // FIL
[aac @ 0x25659c0] extension type: 11 len:3  // FIL payload DRC
[aac @ 0x25659c0] Elem type:6 id:9  // FIL
[aac @ 0x25659c0] extension type: 13 len:9  // FIL payload SBR
[aac @ 0x25659c0] Invalid bitstream - cannot apply SBR to element type 6
[aac @ 0x25659c0] Elem type:1 id:0  // CPE
[aac @ 0x25659c0] Elem type:6 id:f  // FIL
[aac @ 0x25659c0] extension type: 13 len:18  // FIL payload SBR
[aac @ 0x25659c0] Elem type:1 id:1  // CPE
[aac @ 0x25659c0] Elem type:6 id:f  // FIL
[aac @ 0x25659c0] extension type: 13 len:18 // FIL payload SBR
[aac @ 0x25659c0] Elem type:1 id:2  // CPE
[aac @ 0x25659c0] Elem type:6 id:f  // FIL
[aac @ 0x25659c0] extension type: 13 len:18  // FIL payload SBR
[aac @ 0x25659c0] Elem type:3 id:0  // LFE
[aac @ 0x25659c0] Elem type:4 id:0  // DSE
[aac @ 0x25659c0] Elem type:7  // END
[aac @ 0x25659c0] element type mismatch 3 != 0
[aac @ 0x25659c0] element type mismatch 0 != 6

In this case we clearly want to apply the first SBR element to the
first SCE element. But the DRC element between them means that we are
going to attempt to apply it the the first SCE element, but we are
going to tell the SBR decoder that the SCE element is a FIL element
which is nonsense.

Assuming we want to support streams like this (generated by
"Fraunhofer IIS MPEG-4 Audio Encoder 03.02.15_MPEGScbr_hdaac") the
che_prev should stay live across FIL elements.

The 2009 spec says:
> SBR extension payload for the audio object
> types AAC main, AAC SSR, AAC LC and AAC LTP
> One SBR fill element is used per AAC syntactic element
> that is to be enhanced by SBR. SBR elements are
> inserted into the raw_data_block() after the
> corresponding AAC elements. Each AAC SCE, CPE or
> independently switched CCE must be succeeded by a
> corresponding SBR element. LFE elements are decoded
> according to standard AAC procedures but must be delay
> adjusted and re-sampled to match the output sample
> rate. Given below is an example of the structure of
> syntactic elements within a raw data block of a 5.1
> (multichannel) configuration, where SBR is used
> without a CRC check.
> <SCE> <FIL <EXT_SBR_DATA(SCE)>> // center
> <CPE> <FIL <EXT_SBR_DATA(CPE)>> // front L/R
> <CPE> <FIL <EXT_SBR_DATA(CPE)>> // back L/R
> <LFE> // sub
> <END> // (end of raw data block)

If someone really wanted to be pedantic about it, they could argue
that the SBR data has to follow the SCE, CPE, or CCE, but it doesn't
have to immediately follow the element. I think this is an awkward
interpretation of the text, but the text doesn't seem to otherwise
address having a different FIL element between the SCE, CPE, or CCE
and the FIL-SBR.

I could go either way on allowing DSEs and PCEs between SCE/CPE/CCE
and SBR but am not particularly motivated to add extra logic to forbid
it when the spec is ambiguous.

Regardless of whether or not we allow the DSEs and PCEs between
SCE/CPE/CCEs and their SBR elements, we should keep elem_type_prev in
sync with che_prev. If this means renaming (perhaps to che_type_prev)
it so be it. I had previously suggested that maybe it would be better
to make the element type a member of the che struct, but the
(elem_type, elem_id) pair is essentially the address of the che so
eveywhere else we already know the type of the che before we even get
its pointer. This seems to be the awkward exception.

More information about the ffmpeg-devel mailing list