[FFmpeg-devel] [PATCHv2] lavc/cbrt_tablegen: speed up tablegen

Tue Jan 5 17:30:57 CET 2016

On Tue, Jan 5, 2016 at 8:08 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Tue, Jan 5, 2016 at 7:44 AM, Daniel Serpell <dserpell at gmail.com> wrote:
>> Hi!,
>>
>> El Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde escribio:
>>> This exploits an approach based on the sieve of Eratosthenes, a popular
>>> method for generating prime numbers.
>>>
>>> Tables are identical to previous ones.
>>>
>>> Tested with FATE with/without --enable-hardcoded-tables.
>>>
>>> Sample benchmark (Haswell, GNU/Linux+gcc):
>>> prev:
>>> 7860100 decicycles in cbrt_tableinit,       1 runs,      0 skips
>>> 7777490 decicycles in cbrt_tableinit,       2 runs,      0 skips
>>> [...]
>>> 7582339 decicycles in cbrt_tableinit,     256 runs,      0 skips
>>> 7563556 decicycles in cbrt_tableinit,     512 runs,      0 skips
>>>
>>> new:
>>> 2099480 decicycles in cbrt_tableinit,       1 runs,      0 skips
>>> 2044470 decicycles in cbrt_tableinit,       2 runs,      0 skips
>>> [...]
>>> 1796544 decicycles in cbrt_tableinit,     256 runs,      0 skips
>>> 1791631 decicycles in cbrt_tableinit,     512 runs,      0 skips
>>>
>>
>> See attached code, function "test1", based on an approximation of:
>>
>>   (i+1)^(1/3) ~= i^(1/3) * ( 1 + 1/(3i) - 1/(9i) + 5/(81i) - .... )
>
> I assume 1/(3i), 1/(9i^2), etc obtained via a Taylor series at x = 0.
>
>>
>> Generated values are the same as original floats (max error in double
>> is < 4*10^-10), it is faster (and I think, simpler) than your version.
>
> Had thought of these ideas, but did not examine as I was a little
> concerned about accuracy. Thanks, will give it a spin. Or
> alternatively, you can submit a patch since you put it into action.
>
> Alternatively, one could directly expand the series for (i+1)^(4/3).
> And it may be possible to tighten the number of terms needed by
> expanding not about x = 0, but x = i to get i+1.

Scratch the x = 0 remark, I misread the code. Remainder still applies.

> Or fancier polynomial
> approximations can be used. Have you tried these?
>
>>
>> Perhaps altering the constants it could be made faster still, but it is
>> currently dominated by de division in the main loop.
>
> Thanks for letting me know.
>
>>
>>     Daniel.
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>