[FFmpeg-devel] AAC-Main (round 2)

Robert Swain robert.swain
Thu Nov 20 02:19:00 CET 2008


2008/11/19 Alex Converse <alex.converse at gmail.com>:
> On Thu, Nov 13, 2008 at 8:04 PM, Alex Converse <alex.converse at gmail.com> wrote:
>> On Mon, Nov 10, 2008 at 4:32 PM, Alex Converse <alex.converse at gmail.com> wrote:
>>> On Sat, Nov 8, 2008 at 2:03 PM, Alex Converse <alex.converse at gmail.com>
>>> wrote:
>>>>
>>>> On Sat, Nov 8, 2008 at 12:48 PM, Michael Niedermayer <michaelni at gmx.at>
>>>> wrote:
>>>>>
>>>>> On Fri, Nov 07, 2008 at 07:20:45PM -0500, Alex Converse wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Attached are a series of patches that implements AAC Main in FFmpeg.
>>>>> > These
>>>>> > are very similar to the first round. The only major change was avoiding
>>>>> > unnecessary type punning.
>>>>> >
>>>>> > While AAC-Main is rarely used, faad2 supports it, flash claims to
>>>>> > support it
>>>>> > (I haven't tested this), and we claim to support it but do not.
>>>>> >
>>>>> > Notes:
>>>>> > 1) Frequency domain prediction is described only in ISO/IEC 13818-7 not
>>>>> > in
>>>>> > 14496-3.
>>>>>
>>>>> > 2) The prediction operation uses 16-bit floats, using 32-bit floats
>>>>> > does not
>>>>> > give adequate accuracy so emulation routines to round to 16-bit are
>>>>> > included.
>>>>>
>>>>> > 3) As only 16-bit floats are required it could be possible to store the
>>>>> > prediction state with half the memory but I'm not sure how to approach
>>>>> > that
>>>>> > situation without resorting to IEEE type punning.
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > Alex Converse
>>>>>
>>>>> [...]
>>>>> > +static void reset_predictor_group(PredictorState * ps, int group_num)
>>>>> > {
>>>>> > +    int i;
>>>>> > +    if (group_num)
>>>>> > +        for (i = group_num-1; i < MAX_PREDICTORS; i+=30)
>>>>> > +            reset_predict_state(&ps[i]);
>>>>> > +}
>>>>>
>>>>> i think it would be clearer if the if() was moved out of the function
>>>>
>>>> ok
>>>>
>>>>>
>>>>> [...]
>>>>> > @@ -786,6 +837,95 @@ static int decode_spectrum_and_dequant(AACContext
>>>>> > * ac, float coef[1024], GetBit
>>>>> >      return 0;
>>>>> >  }
>>>>> >
>>>>> > +static av_always_inline float flt16_round(float pf) {
>>>>> > +    int exp;
>>>>> > +    pf = frexp(pf, &exp);
>>>>> > +    pf = ldexp(roundf(ldexp(pf, 8)), exp-8);
>>>>> > +    return pf;
>>>>> > +}
>>>>> > +
>>>>> > +static av_always_inline float flt16_even(float pf) {
>>>>> > +    int exp;
>>>>> > +    pf = frexpf(pf, &exp);
>>>>> > +    pf = ldexp(rintf(ldexp(pf, 8)), exp-8);
>>>>> > +    return pf;
>>>>> > +}
>>>>> > +
>>>>> > +static av_always_inline float flt16_trunc(float pf) {
>>>>> > +    int exp;
>>>>> > +    pf = frexpf(pf, &exp);
>>>>> > +    pf = ldexp(truncf(ldexp(pf, 8)), exp-8);
>>>>> > +    return pf;
>>>>> > +}
>>>>> > +
>>>>>
>>>>> are these faster or slower than the code suggested in the spec?
>>>>
>>>> The code suggested in the spec uses type puns for flt16_round and
>>>> flt16_trunc.
>>>>
>>>> flt16_even as suggested in the spec casts back and forth between float,
>>>> double, and int, uses floating point multiplies and divides rather than
>>>> ldexpf (which can take advantage of IEEE-754 on sane platforms), and uses
>>>> two ifs. I haven't benchmarked it, but I'm pretty sure mine is faster.
>>>>>
>>>>>
>>>>> > +static void predict(AACContext * ac, PredictorState * ps, float* coef,
>>>>> > int output_enable) {
>>>>>
>>>>> > +    const float a = 0.953125;
>>>>>
>>>>> 61.0/64 (could be in a comment too if the literal float is prefered ...)
>>>>>
>>>>>
>>>>> > +    const float alpha = 0.90625;
>>>>>
>>>>> 29.0/32
>>>>
>>>>
>>>> ok, do you have any idea why those values were chosen? I suppose I could
>>>> try to hunt down some of the original literature.
>>>>>
>>>>>
>>>>> > +
>>>>> > +    float e0, e1;
>>>>> > +    float pv;
>>>>> > +
>>>>> > +    float k1, k2;
>>>>> > +
>>>>> > +    if (ps->var0 <= 1)
>>>>> > +        k1 = 0;
>>>>> > +    else
>>>>> > +        k1 = ps->cor0*flt16_even(a/ps->var0);
>>>>> > +
>>>>> > +    if (ps->var1 <= 1)
>>>>> > +        k2 = 0;
>>>>> > +    else
>>>>> > +        k2 = ps->cor1*flt16_even(a/ps->var1);
>>>>> > +
>>>>> > +    pv = k1*ps->r0 + k2*ps->r1;
>>>>> > +    pv = flt16_round(pv);
>>>>> > +    if (output_enable)
>>>>> > +        *coef += pv/-1024;
>>>>> > +
>>>>> > +    e0 = *coef*-1024;
>>>
>>> This -1024 constant is no good. It's tied to MMX dsputil.
>>>
>>>>>
>>>>> > +    e1 = e0-k1*ps->r0;
>>>>> > +
>>>>>
>>>>> > +    ps->cor1 = alpha*ps->cor1 + ps->r1*e1;
>>>>> > +    ps->var1 = alpha*ps->var1 + 0.5 * (ps->r1*ps->r1 + e1*e1);
>>>>> > +    ps->cor0 = alpha*ps->cor0 + ps->r0*e0;
>>>>> > +    ps->var0 = alpha*ps->var0 + 0.5 * (ps->r0*ps->r0 + e0*e0);
>>>>> > +
>>>>> > +    ps->r1 = a*(ps->r0-k1*e0);
>>>>> > +    ps->r0 = a*e0;
>>>>> > +
>>>>>
>>>>> > +    ps->r0   = flt16_trunc(ps->r0);
>>>>> > +    ps->r1   = flt16_trunc(ps->r1);
>>>>> > +    ps->cor0 = flt16_trunc(ps->cor0);
>>>>> > +    ps->cor1 = flt16_trunc(ps->cor1);
>>>>> > +    ps->var0 = flt16_trunc(ps->var0);
>>>>> > +    ps->var1 = flt16_trunc(ps->var1);
>>>>>
>>>>> the flt16_trunc() calls could be done when the values are calculated
>>>>
>>>> I had left that there in case the consensus was that we wanted to store
>>>> thes
>>>
>>> In addition to the changes discussed, I also fixed the non-MMX dsputil
>>> scaling
>>>
>>
>> In light of suggestions from the floating point help thread, I have
>> yet another new version that uses IEEE-754 puns on x86 for the
>> rounding functions. Callgrind shows these to be significantly faster.
>> I only enabled them on x86 because I have no means to test or
>> benchmark on any other platform.
>>
>
> I split these off into a separate patch based on Rob's suggestion.

Thanks. I promise I'll check these out tomorrow morning (UK time).

Regards,
Rob




More information about the ffmpeg-devel mailing list