[FFmpeg-devel] [PATCHv2] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Michael Niedermayer
michael at niedermayer.cc
Sun Oct 11 05:45:05 CEST 2015
On Sat, Oct 10, 2015 at 09:58:47PM -0400, Ganesh Ajjanagadde wrote:
> This uses Stein's binary GCD algorithm:
> https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> to get a roughly 4x speedup over Euclidean GCD on standard architectures
> with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
> At the moment, the compiler intrinsic is used on GCC and Clang due to
> its easy availability.
>
> Quick note regarding overflow: yes, subtractions on int64_t can, but the
> llabs takes care of that. The llabs is also guaranteed to be safe, with
> no annoying INT64_MIN business since INT64_MIN being a power of 2, is
> shifted down before being sent to llabs.
>
> The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
> GCC, this is provided by a built-in. On Microsoft, there is a
> BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
> Apparently it is not available on 32 bit builds; so this may or may not
> work correctly. On Intel, per the documentation there is only an
> intrinsic for _bit_scan_forward and people have posted on forums
> regarding _bit_scan_forward64, but often their documentation is
> woeful. Again, I don't have it, so I can't test.
>
> As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
> use a compiled version based on the De-Bruijn method of Leiserson et al:
> http://supertech.csail.mit.edu/papers/debruijn.pdf.
>
> Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
> with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
> make fate.
>
> aac-am00_88.err:
> builtin:
> 714 decicycles in av_gcd, 4095 runs, 1 skips
>
> de-bruijn:
> 1440 decicycles in av_gcd, 4096 runs, 0 skips
>
> previous:
> 2889 decicycles in av_gcd, 4096 runs, 0 skips
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
> libavutil/intmath.h | 19 +++++++++++++++++++
> libavutil/mathematics.c | 26 +++++++++++++++++++++-----
> 2 files changed, 40 insertions(+), 5 deletions(-)
applied
thanks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151011/16500ece/attachment.sig>
More information about the ffmpeg-devel
mailing list