[FFmpeg-devel] [PATCH] AAC decoder

Michael Niedermayer michaelni
Fri May 23 19:24:59 CEST 2008


On Fri, May 23, 2008 at 01:59:41PM +0100, Robert Swain wrote:
> 2008/5/23 Robert Swain <robert.swain at gmail.com>:
> > 2008/5/23 Robert Swain <robert.swain at gmail.com>:
> >> 2008/5/23 Robert Swain <robert.swain at gmail.com>:
> >>> 2008/4/2 Michael Niedermayer <michaelni at gmx.at>:
> >>>> On Tue, Apr 01, 2008 at 04:56:48PM +0200, Andreas ?man wrote:
> >>>>> Andreas ?man wrote:
> >>>
> >>> [...]
> >>>
> >>>>> +static inline float ivquant(AACContext * ac, int a) {
> >>>>> +    static const float sign[2] = { -1., 1. };
> >>>>> +    int tmp = (a>>31);
> >>>>> +    int abs_a = (a^tmp)-tmp;
> >>>>> +    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
> >>>>> +        return sign[tmp+1] * ac->ivquant_tab[abs_a];
> >>>>
> >>>> What is the point of the sign splitout? it seems that it would be simpler
> >>>> to have that in teh table as well
> >>>
> >>> Kostya is in favour of removing the ivquant_tab table because it
> >>> caches only a small number of possible values and its general impact
> >>> on decoding speedup is not obvious.
> >>>
> >>> Attached is a patch that removes the ivquant_tab table and simplifies
> >>> and moves the ivquant() functionality into the calling loop and
> >>> removes the ivquant() function altogether as it isn't really needed to
> >>> wrap pow().
> >>
> >> Oops! sign * pow(abs(a), 4./3) != pow(a, 4./3) . Fixed patch attached
> >> with bit magic returned.
> >>
> >> I'll do some benchmarks too, just for good measure.
> >
> > Actually, I won't. abs(a) is normally <16 but can be up to 4351 (if I
> > understood the escape sequence decoding).
> >
> > http://article.gmane.org/gmane.comp.video.ffmpeg.soc/2002/match=ivquant
> >
> > Andreas just did another benchmark and this has a large impact on decoding time.
> >
> > I'll look at merging the sign into the table as originally suggested.
> > I suspect the table size can be reduced to 16 (vs 256) with little to
> > no impact on speed. Ignore the patch for the moment. Sorry for the
> > noise.
> 
> Well, I've done it but I'm not really convinced by the results. See
> attached patch.
> 
> I tested on an FAAC-encoded South Park episode:
> 
> new size 64
> 
> 13310 dezicycles in ivquant, 1 runs, 0 skips
> 7975 dezicycles in ivquant, 2 runs, 0 skips
> 4867 dezicycles in ivquant, 4 runs, 0 skips
> 3286 dezicycles in ivquant, 8 runs, 0 skips
> 2674 dezicycles in ivquant, 16 runs, 0 skips
> 3172 dezicycles in ivquant, 32 runs, 0 skips
> 2956 dezicycles in ivquant, 64 runs, 0 skips
> 2860 dezicycles in ivquant, 128 runs, 0 skips
> 2856 dezicycles in ivquant, 256 runs, 0 skips
> 2890 dezicycles in ivquant, 511 runs, 1 skips
> 2871 dezicycles in ivquant, 1023 runs, 1 skips
> 2946 dezicycles in ivquant, 2046 runs, 2 skips
> 3094 dezicycles in ivquant, 4094 runs, 2 skips
> 2988 dezicycles in ivquant, 8188 runs, 4 skips
> 3377 dezicycles in ivquant, 16379 runs, 5 skips
> 3652 dezicycles in ivquant, 32758 runs, 10 skips
> 3818 dezicycles in ivquant, 65522 runs, 14 skips
> 3982 dezicycles in ivquant, 131052 runs, 20 skips
> 4203 dezicycles in ivquant, 262107 runs, 37 skipsup=0 drop=0
> 4215 dezicycles in ivquant, 524209 runs, 79 skipsdup=0 drop=0
> 4191 dezicycles in ivquant, 1048410 runs, 166 skipsp=0 drop=0
> 4190 dezicycles in ivquant, 2096828 runs, 324 skipsup=0 drop=0
> 
> new size 128
> 
> 7700 dezicycles in ivquant, 1 runs, 0 skips
> 5115 dezicycles in ivquant, 2 runs, 0 skips
> 3437 dezicycles in ivquant, 4 runs, 0 skips
> 2571 dezicycles in ivquant, 8 runs, 0 skips
> 2323 dezicycles in ivquant, 16 runs, 0 skips
> 2997 dezicycles in ivquant, 32 runs, 0 skips
> 2866 dezicycles in ivquant, 64 runs, 0 skips
> 2818 dezicycles in ivquant, 128 runs, 0 skips
> 2832 dezicycles in ivquant, 256 runs, 0 skips
> 2875 dezicycles in ivquant, 511 runs, 1 skips
> 2866 dezicycles in ivquant, 1023 runs, 1 skips
> 2859 dezicycles in ivquant, 2047 runs, 1 skips
> 2856 dezicycles in ivquant, 4095 runs, 1 skips
> 2869 dezicycles in ivquant, 8189 runs, 3 skips
> 2942 dezicycles in ivquant, 16379 runs, 5 skips
> 3436 dezicycles in ivquant, 32755 runs, 13 skips
> 3704 dezicycles in ivquant, 65520 runs, 16 skips
> 3925 dezicycles in ivquant, 131047 runs, 25 skips
> 4127 dezicycles in ivquant, 262090 runs, 54 skipsup=0 drop=0
> 4181 dezicycles in ivquant, 524199 runs, 89 skipsdup=0 drop=0
> 4168 dezicycles in ivquant, 1048415 runs, 161 skipsp=0 drop=0
> 4179 dezicycles in ivquant, 2096843 runs, 309 skipsup=0 drop=0
> 
> new size 256
> 
> 7480 dezicycles in ivquant, 1 runs, 0 skips
> 5005 dezicycles in ivquant, 2 runs, 0 skips
> 3327 dezicycles in ivquant, 4 runs, 0 skips
> 2530 dezicycles in ivquant, 8 runs, 0 skips
> 2303 dezicycles in ivquant, 16 runs, 0 skips
> 2983 dezicycles in ivquant, 32 runs, 0 skips
> 2858 dezicycles in ivquant, 64 runs, 0 skips
> 2803 dezicycles in ivquant, 128 runs, 0 skips
> 2826 dezicycles in ivquant, 256 runs, 0 skips
> 2871 dezicycles in ivquant, 512 runs, 0 skips
> 2862 dezicycles in ivquant, 1024 runs, 0 skips
> 2860 dezicycles in ivquant, 2048 runs, 0 skips
> 2856 dezicycles in ivquant, 4096 runs, 0 skips
> 2869 dezicycles in ivquant, 8192 runs, 0 skips
> 2944 dezicycles in ivquant, 16384 runs, 0 skips
> 3510 dezicycles in ivquant, 32765 runs, 3 skips
> 3786 dezicycles in ivquant, 65525 runs, 11 skips
> 3967 dezicycles in ivquant, 131053 runs, 19 skips
> 4149 dezicycles in ivquant, 262109 runs, 35 skipsup=0 drop=0
> 4270 dezicycles in ivquant, 524224 runs, 64 skipsdup=0 drop=0
> 4224 dezicycles in ivquant, 1048488 runs, 88 skipsup=0 drop=0
> 4213 dezicycles in ivquant, 2097039 runs, 113 skipsup=0 drop=0
> 
> old size 256
> 
> 5500 dezicycles in ivquant, 1 runs, 0 skips
> 3850 dezicycles in ivquant, 2 runs, 0 skips
> 2805 dezicycles in ivquant, 4 runs, 0 skips
> 2282 dezicycles in ivquant, 8 runs, 0 skips
> 2179 dezicycles in ivquant, 16 runs, 0 skips
> 2839 dezicycles in ivquant, 32 runs, 0 skips
> 2731 dezicycles in ivquant, 64 runs, 0 skips
> 2688 dezicycles in ivquant, 128 runs, 0 skips
> 2712 dezicycles in ivquant, 256 runs, 0 skips
> 2753 dezicycles in ivquant, 512 runs, 0 skips
> 2744 dezicycles in ivquant, 1024 runs, 0 skips
> 2738 dezicycles in ivquant, 2048 runs, 0 skips
> 2734 dezicycles in ivquant, 4096 runs, 0 skips
> 2747 dezicycles in ivquant, 8191 runs, 1 skips
> 2814 dezicycles in ivquant, 16382 runs, 2 skips
> 3266 dezicycles in ivquant, 32763 runs, 5 skips
> 3512 dezicycles in ivquant, 65526 runs, 10 skips
> 3716 dezicycles in ivquant, 131054 runs, 18 skips
> 3912 dezicycles in ivquant, 262107 runs, 37 skipsup=0 drop=0
> 3951 dezicycles in ivquant, 524196 runs, 92 skipsdup=0 drop=0
> 3940 dezicycles in ivquant, 1048421 runs, 155 skipsp=0 drop=0
> 3950 dezicycles in ivquant, 2096877 runs, 275 skipsup=0 drop=0
> 
> The new method with size 128 worked better than  on
> http://samples.mplayerhq.hu/A-codecs/AAC/ct_faac.mp4 . If you want me
> to test more, I can. If you have any suggestions for improvements,
> they're very welcome.
> 
> Rob

> Index: aac.c
> ===================================================================
> --- aac.c	(revision 2185)
> +++ aac.c	(working copy)
> @@ -366,7 +366,7 @@
>      DECLARE_ALIGNED_16(float, sine_short_128[128]);
>      DECLARE_ALIGNED_16(float, pow2sf_tab[256]);
>      DECLARE_ALIGNED_16(float, intensity_tab[256]);
> -    DECLARE_ALIGNED_16(float, ivquant_tab[256]);
> +    DECLARE_ALIGNED_16(float, ivquant_tab[128]);
>      MDCTContext mdct;
>      MDCTContext mdct_small;
>      MDCTContext *mdct_ltp;
> @@ -890,8 +890,11 @@
>      // BIAS method instead needs values -1<x<1
>      for (i = 0; i < 256; i++)
>          ac->intensity_tab[i] = pow(0.5, (i - 100) / 4.);
> -    for (i = 0; i < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]); i++)
> -        ac->ivquant_tab[i] = pow(i, 4./3);
> +    for (i = 0; i < sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1); i++) {
> +        int idx = i<<1;
> +        ac->ivquant_tab[idx]     =  pow(i, 4./3);
> +        ac->ivquant_tab[idx + 1] = -ac->ivquant_tab[idx];
> +    }
>  
>      if(ac->dsp.float_to_int16 == ff_float_to_int16_c) {
>          ac->add_bias = 385.0f;

> @@ -1035,13 +1038,12 @@
>  }
>  
>  static inline float ivquant(AACContext * ac, int a) {

> -    static const float sign[2] = { -1., 1. };
>      int tmp = (a>>31);
>      int abs_a = (a^tmp)-tmp;
> -    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
> -        return sign[tmp+1] * ac->ivquant_tab[abs_a];
> +    if (abs_a < sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1))
> +        return ac->ivquant_tab[(abs_a<<1) + !!tmp];

ehh... this should be:

if(a + 127U < 255U)
    return ivquant_tab[a + 127U];

(or other constants depending on what table size is best ...)


>      else
> -        return sign[tmp+1] * pow(abs_a, 4./3);
> +        return (2 * tmp + 1) * pow(abs_a, 4./3);

pow(fabs(a), 1./3) * a;

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080523/9fa2c11e/attachment.pgp>



More information about the ffmpeg-devel mailing list