[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

Michael Niedermayer michael at niedermayer.cc
Sun Oct 11 01:51:01 CEST 2015


On Sun, Oct 11, 2015 at 01:14:27AM +0200, Michael Niedermayer wrote:
> On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
> > On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
> > <gajjanagadde at gmail.com> wrote:
> > > This uses Stein's binary GCD algorithm:
> > > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> > > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
> > > Have not benchmarked, so I can't comment
> > 
> > Before you submit a patch that's supposed to make something faster,
> > you should benchmark it to verify that it is in fact faster. Do this
> > with inputs of various sizes on both 32- and 64-bit architectures and
> > both with and without compilers that support __builtin_ctzll(v).
> 
> without __builtin_ctzll() the old code seems faster in a simple test
> like:
> make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov
> 
> also it seems a simple
>     while (u != v) {
>         if (u > v)
>             FFSWAP(int64_t, v, u);
>         v -= u;
>         do {
>             v >>= 1;
>         } while(!(v & 1));
>     }
> is faster than the non builtin ctzll in its place but still not as
> fast as the current code

heres a  ff_ctzll_c with LUTs, if someone wants to play more with
it (its almost as fast as the old code when theres no __builtin_ctzll())
for the matrixbench_mpeg2 -> mov testcase

static av_always_inline int ff_ctzll_c(long long v)
+{
+    int c;
+    static const uint8_t lut[256] =
+        {
+            8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+            4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+        };
+    c = lut[v&255];
+
+    if (c<8)
+        return c;
+
+    c = 0;
+    if (!(v & 0xffffffff)) {
+        v >>= 32;
+        c += 32;
+    }
+    if (!(v & 0xffff)) {
+        v >>= 16;
+        c += 16;
+    }
+    if (!(v & 0xff)) {
+        v >>= 8;
+        c += 8;
+    }
+    return c + lut[v&255];
+}


[...]


-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151011/583dbfb5/attachment.sig>


More information about the ffmpeg-devel mailing list