[FFmpeg-devel] [PATCHv2] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

Ganesh Ajjanagadde gajjanagadde at gmail.com
Tue Oct 20 13:54:39 CEST 2015


On Tue, Oct 20, 2015 at 12:52 AM, James Almer <jamrial at gmail.com> wrote:
> On 10/11/2015 12:45 AM, Michael Niedermayer wrote:
>> On Sat, Oct 10, 2015 at 09:58:47PM -0400, Ganesh Ajjanagadde wrote:
>>> This uses Stein's binary GCD algorithm:
>>> https://en.wikipedia.org/wiki/Binary_GCD_algorithm
>>> to get a roughly 4x speedup over Euclidean GCD on standard architectures
>>> with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
>>> At the moment, the compiler intrinsic is used on GCC and Clang due to
>>> its easy availability.
>>>
>>> Quick note regarding overflow: yes, subtractions on int64_t can, but the
>>> llabs takes care of that. The llabs is also guaranteed to be safe, with
>>> no annoying INT64_MIN business since INT64_MIN being a power of 2, is
>>> shifted down before being sent to llabs.
>>>
>>> The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
>>> GCC, this is provided by a built-in. On Microsoft, there is a
>>> BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
>>> Apparently it is not available on 32 bit builds; so this may or may not
>>> work correctly. On Intel, per the documentation there is only an
>>> intrinsic for _bit_scan_forward and people have posted on forums
>>> regarding _bit_scan_forward64, but often their documentation is
>>> woeful. Again, I don't have it, so I can't test.
>>>
>>> As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
>>> use a compiled version based on the De-Bruijn method of Leiserson et al:
>>> http://supertech.csail.mit.edu/papers/debruijn.pdf.
>>>
>>> Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
>>> with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
>>> make fate.
>>>
>>> aac-am00_88.err:
>>> builtin:
>>> 714 decicycles in av_gcd,    4095 runs,      1 skips
>>>
>>> de-bruijn:
>>> 1440 decicycles in av_gcd,    4096 runs,      0 skips
>>>
>>> previous:
>>> 2889 decicycles in av_gcd,    4096 runs,      0 skips
>>>
>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>>> ---
>>>  libavutil/intmath.h     | 19 +++++++++++++++++++
>>>  libavutil/mathematics.c | 26 +++++++++++++++++++++-----
>>>  2 files changed, 40 insertions(+), 5 deletions(-)
>>
>> applied
>>
>> thanks
>
> This broke fate with -ftrapv
> http://fate.ffmpeg.org/history.cgi?slot=x86_64-freebsd10-clang33-ftrapv
> http://fate.ffmpeg.org/history.cgi?slot=x86_64-openbsd5.6-gcc4.2-ftrapv
> I can reproduce this with GCC 5 on x86_64 ArchLinux as well.
>
> The problem seems to be in av_gcd() and not the De-Bruijn version of
> ff_ctzll since the gcc builtin is being used.

Don't have the time to investigate right now - revert for now unless
someone can fix it quickly.

>


More information about the ffmpeg-devel mailing list