[FFmpeg-devel] [PATCHv2] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Ganesh Ajjanagadde
gajjanag at mit.edu
Sun Oct 11 18:37:52 CEST 2015
On Sun, Oct 11, 2015 at 12:33 PM, wm4 <nfxjfg at googlemail.com> wrote:
> On Sun, 11 Oct 2015 09:59:39 -0400
> Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>
>> On Sun, Oct 11, 2015 at 9:34 AM, wm4 <nfxjfg at googlemail.com> wrote:
>> > On Sat, 10 Oct 2015 21:58:47 -0400
>> > Ganesh Ajjanagadde <gajjanagadde at gmail.com> wrote:
>> >
>> >> This uses Stein's binary GCD algorithm:
>> >> https://en.wikipedia.org/wiki/Binary_GCD_algorithm
>> >> to get a roughly 4x speedup over Euclidean GCD on standard architectures
>> >> with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
>> >> At the moment, the compiler intrinsic is used on GCC and Clang due to
>> >> its easy availability.
>> >>
>> >> Quick note regarding overflow: yes, subtractions on int64_t can, but the
>> >> llabs takes care of that. The llabs is also guaranteed to be safe, with
>> >> no annoying INT64_MIN business since INT64_MIN being a power of 2, is
>> >> shifted down before being sent to llabs.
>> >>
>> >> The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
>> >> GCC, this is provided by a built-in. On Microsoft, there is a
>> >> BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
>> >> Apparently it is not available on 32 bit builds; so this may or may not
>> >> work correctly. On Intel, per the documentation there is only an
>> >> intrinsic for _bit_scan_forward and people have posted on forums
>> >> regarding _bit_scan_forward64, but often their documentation is
>> >> woeful. Again, I don't have it, so I can't test.
>> >>
>> >> As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
>> >> use a compiled version based on the De-Bruijn method of Leiserson et al:
>> >> http://supertech.csail.mit.edu/papers/debruijn.pdf.
>> >>
>> >> Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
>> >> with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
>> >> make fate.
>> >>
>> >> aac-am00_88.err:
>> >> builtin:
>> >> 714 decicycles in av_gcd, 4095 runs, 1 skips
>> >>
>> >> de-bruijn:
>> >> 1440 decicycles in av_gcd, 4096 runs, 0 skips
>> >>
>> >> previous:
>> >> 2889 decicycles in av_gcd, 4096 runs, 0 skips
>> >>
>> >> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >> ---
>> >> libavutil/intmath.h | 19 +++++++++++++++++++
>> >> libavutil/mathematics.c | 26 +++++++++++++++++++++-----
>> >> 2 files changed, 40 insertions(+), 5 deletions(-)
>> >>
>> >> diff --git a/libavutil/intmath.h b/libavutil/intmath.h
>> >> index 08d54a6..b412385 100644
>> >> --- a/libavutil/intmath.h
>> >> +++ b/libavutil/intmath.h
>> >> @@ -114,6 +114,9 @@ static av_always_inline av_const int ff_log2_16bit_c(unsigned int v)
>> >> #ifndef ff_ctz
>> >> #define ff_ctz(v) __builtin_ctz(v)
>> >> #endif
>> >> +#ifndef ff_ctzll
>> >> +#define ff_ctzll(v) __builtin_ctzll(v)
>> >> +#endif
>> >> #endif
>> >> #endif
>> >>
>> >> @@ -158,6 +161,22 @@ static av_always_inline av_const int ff_ctz_c( int v )
>> >> #endif
>> >> #endif
>> >>
>> >> +#ifndef ff_ctzll
>> >> +#define ff_ctzll ff_ctzll_c
>> >> +/* We use the De-Bruijn method outlined in:
>> >> + * http://supertech.csail.mit.edu/papers/debruijn.pdf. */
>> >> +static av_always_inline av_const int ff_ctzll_c(long long v)
>> >> +{
>> >> + static const int debruijn_ctz64[64] = {
>> >> + 0, 1, 2, 53, 3, 7, 54, 27, 4, 38, 41, 8, 34, 55, 48, 28,
>> >> + 62, 5, 39, 46, 44, 42, 22, 9, 24, 35, 59, 56, 49, 18, 29, 11,
>> >> + 63, 52, 6, 26, 37, 40, 33, 47, 61, 45, 43, 21, 23, 58, 17, 10,
>> >> + 51, 25, 36, 32, 60, 20, 57, 16, 50, 31, 19, 15, 30, 14, 13, 12
>> >> + };
>> >> + return debruijn_ctz64[(uint64_t)((v & -v) * 0x022FDD63CC95386D) >> 58];
>> >> +}
>> >> +#endif
>> >> +
>> >
>> > Is this duplicated from somewhere?
>>
>> It may be obtained from a number of sources (or generated onself from
>> the link I gave, which actually is an original source):
>> "The De Bruijn bitscan was devised in 1997, according to Donald Knuth
>> [3] by Martin Läuter, and independently by Charles Leiserson, Harald
>> Prokop and Keith H. Randall a few month later." (I use the Leiserson
>> et al reference in the code).
>>
>> It is not a unique sequence, there are many available on the web:
>> https://chessprogramming.wikispaces.com/BitScan,
>> https://gist.github.com/deffi420/e700f0adefc82f28c0d7 (the sequence I
>> randomly picked).
>
> I just thought I've seen this somewhere else in libav*, but I didn't
> find anything. Ignore me.
I was actually thinking of legal/licensing issues of which I have no
knowledge. Anyway, good to see the confusion cleared up.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list