[FFmpeg-devel] [PATCH] avcodec/aac_tablegen: speed up table initialization

Ganesh Ajjanagadde gajjanagadde at gmail.com
Fri Nov 27 12:42:21 CET 2015


On Fri, Nov 27, 2015 at 5:35 AM, Rostislav Pehlivanov
<atomnuker at gmail.com> wrote:
> LGTM, but could you leave (just comment it out) the old code in there
> so it's a little easier to follow?
>>         //ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
>>         //ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
>
> The accuracy increase is always nice.

Done and pushed. Thanks.
BTW, do you or others think that the new performance figures are
sufficient to justify getting rid of config_hardcoded_tables and
associated ifdefry and C file here?

>
> On Thu, 2015-11-26 at 16:31 -0500, Ganesh Ajjanagadde wrote:
>> This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the
>> point
>> where it can be argued that runtime initialization can always be done
>> instead of
>> hard-coded tables. The only cost is essentially a trivial increase in
>> the stack size.
>>
>> Even if one does not care about this, the patch also improves
>> accuracy
>> as detailed below.
>>
>> Performance:
>> Benchmark obtained by looping 10^4 times over ff_aac_tableinit.
>>
>> Sample benchmark (x86-64, Haswell, GNU/Linux):
>> old:
>> 1295292 decicycles in ff_aac_tableinit,     512 runs,      0 skips
>> 1275981 decicycles in ff_aac_tableinit,    1024 runs,      0 skips
>> 1272932 decicycles in ff_aac_tableinit,    2048 runs,      0 skips
>> 1262164 decicycles in ff_aac_tableinit,    4096 runs,      0 skips
>> 1256720 decicycles in ff_aac_tableinit,    8192 runs,      0 skips
>>
>> new:
>> 25691 decicycles in ff_aac_tableinit,     505 runs,      7 skips
>> 25130 decicycles in ff_aac_tableinit,    1016 runs,      8 skips
>> 25973 decicycles in ff_aac_tableinit,    2036 runs,     12 skips
>> 25911 decicycles in ff_aac_tableinit,    4078 runs,     18 skips
>> 25816 decicycles in ff_aac_tableinit,    8154 runs,     38 skips
>>
>> Accuracy:
>> The previous code was resulting in needless loss of
>> accuracy due to the pow being called in succession. As an
>> illustration
>> of this:
>> ff_aac_pow34sf_tab[3]
>> old : 0.000000000007598092294225
>> new : 0.000000000007598091426864
>> real: 0.000000000007598091778545
>>
>> truncated to float
>> old : 0.000000000007598092294225
>> new : 0.000000000007598091426864
>> real: 0.000000000007598091426864
>>
>> showing that the old value was not correctly rounded. This affects a
>> large number of elements of the array.
>>
>> Patch tested with FATE.
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> ---
>>  libavcodec/aac_tablegen.h | 38 ++++++++++++++++++++++++++++++++++++-
>> -
>>  1 file changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/libavcodec/aac_tablegen.h b/libavcodec/aac_tablegen.h
>> index 8b223f9..255723b 100644
>> --- a/libavcodec/aac_tablegen.h
>> +++ b/libavcodec/aac_tablegen.h
>> @@ -35,9 +35,43 @@ float ff_aac_pow34sf_tab[428];
>>  av_cold void ff_aac_tableinit(void)
>>  {
>>      int i;
>> +
>> +    /* 2^(i/16) for 0 <= i <= 15 */
>> +    const double exp2_lut[] = {
>> +        1.00000000000000000000,
>> +        1.04427378242741384032,
>> +        1.09050773266525765921,
>> +        1.13878863475669165370,
>> +        1.18920711500272106672,
>> +        1.24185781207348404859,
>> +        1.29683955465100966593,
>> +        1.35425554693689272830,
>> +        1.41421356237309504880,
>> +        1.47682614593949931139,
>> +        1.54221082540794082361,
>> +        1.61049033194925430818,
>> +        1.68179283050742908606,
>> +        1.75625216037329948311,
>> +        1.83400808640934246349,
>> +        1.91520656139714729387,
>> +    };
>> +    double t1 = 8.8817841970012523233890533447265625e-16; // 2^(-50)
>> +    double t2 = 3.63797880709171295166015625e-12; // 2^(-38)
>> +    int t1_inc_cur, t2_inc_cur;
>> +    int t1_inc_prev = 0;
>> +    int t2_inc_prev = 8;
>> +
>>      for (i = 0; i < 428; i++) {
>> -        ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
>> -        ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
>> +        t1_inc_cur = 4 * (i % 4);
>> +        t2_inc_cur = (8 + 3*i) % 16;
>> +        if (t1_inc_cur < t1_inc_prev)
>> +            t1 *= 2;
>> +        if (t2_inc_cur < t2_inc_prev)
>> +            t2 *= 2;
>> +        ff_aac_pow2sf_tab[i] = t1 * exp2_lut[t1_inc_cur];
>> +        ff_aac_pow34sf_tab[i] = t2 * exp2_lut[t2_inc_cur];
>> +        t1_inc_prev = t1_inc_cur;
>> +        t2_inc_prev = t2_inc_cur;
>>      }
>>  }
>>  #endif /* CONFIG_HARDCODED_TABLES */


More information about the ffmpeg-devel mailing list