[FFmpeg-devel] [PATCH] Faster ff_sqrt()

Michael Niedermayer michaelni
Sun Jan 13 21:14:10 CET 2008


On Sun, Jan 13, 2008 at 08:49:17PM +0100, Michael Niedermayer wrote:
> On Sun, Jan 13, 2008 at 08:13:00PM +0100, Michael Niedermayer wrote:
> > On Sun, Jan 13, 2008 at 05:50:52PM +0100, Vitor Sessak wrote:
> > > Hi,
> > >
> > > I saw in the thread "Copy-on-write pages statistics" that roqaudioenc.c
> > > is a big copy-on-write memory usage offender. I think that
> > > increasing the memory usage of FFmpeg by 16kb just for a table used in a
> > > game format encoder is not a good idea. So I'm sending a new, faster
> > > ff_sqrt(), so I can remove the table completely without too much speed 
> > > loss.
> > >
> > > Unfortunately, even with the new ff_sqrt, the encoder is not as fast as 
> > > before :
> > > (for a 3 min wav file input)
> > >
> > > Before:
> > > 0m0.340s
> > >
> > > After:
> > > 0m0.556s
> > >
> > > But it's the audio encoder of a VQ codec and it takes forever to encode the 
> > > video. So I don't think a few ms more to encode the audio can make any 
> > > difference...
> > >
> > > The code for ff_sqrt() was taken from public domain code at 
> > > http://atoms.alife.co.uk/sqrt/ (Warning: original in java).
> > [...]
> > > +/**
> > > + * Fast square-root. Completely accurate for x < 2147483648 (i.e. 2^31)...
> > > + * Code from http://atoms.alife.co.uk/sqrt/
> > > + */
> > > +static inline int ff_sqrt(int x) {
> > > +    int xn;
> > >  
> > ...
> > > +    if (x >= 0x10000) {
> > > +        if (x >= 0x1000000) {
> > > +            if (x >= 0x10000000) {
> > > +                if (x >= 0x40000000) {
> > > +                    xn = ff_sqrt_tab[x >> 24] << 8;
> > > +                } else {
> > > +                    xn = ff_sqrt_tab[x >> 22] << 7;
> > > +                }
> > > +            } else {
> > > +                if (x >= 0x4000000) {
> > > +                    xn = ff_sqrt_tab[x >> 20] << 6;
> > > +                } else {
> > > +                    xn = ff_sqrt_tab[x >> 18] << 5;
> > > +                }
> > > +            }
> > >  
> > ...
> > > +            xn = (xn + 1 + (x / xn)) >> 1;
> > > +            xn = (xn + 1 + (x / xn)) >> 1;
> > > +            return ((xn * xn) > x) ? --xn : xn;
> > > +        } else {
> > > +            if (x >= 0x100000) {
> > > +                if (x >= 0x400000) {
> > > +                    xn = ff_sqrt_tab[x >> 16] << 4;
> > > +                } else {
> > > +                    xn = ff_sqrt_tab[x >> 14] << 3;
> > > +                }
> > > +            } else {
> > > +                if (x >= 0x40000) {
> > > +                    xn = ff_sqrt_tab[x >> 12] << 2;
> > > +                } else {
> > > +                    xn = ff_sqrt_tab[x >> 10] << 1;
> > > +                }
> > > +            }
> > > +
> > > +            xn = (xn + 1 + (x / xn)) >> 1;
> > > +
> > > +            return ((xn * xn) > x) ? --xn : xn;
> > >          }
> > > +    } else {
> > > +        if (x >= 0x100) {
> > > +            if (x >= 0x1000) {
> > > +                if (x >= 0x4000) {
> > > +                    xn = (ff_sqrt_tab[x >> 8]) + 1;
> > > +                } else {
> > > +                    xn = (ff_sqrt_tab[x >> 6] >> 1) + 1;
> > > +                }
> > > +            } else {
> > > +                if (x >= 0x400) {
> > > +                    xn = (ff_sqrt_tab[x >> 4] >> 2) + 1;
> > > +                } else {
> > > +                    xn = (ff_sqrt_tab[x >> 2] >> 3) + 1;
> > > +                }
> > > +            }
> > > +
> > > +            return ((xn * xn) > x) ? --xn : xn;
> > > +        } else {
> > > +            if (x >= 0) {
> > > +                return ff_sqrt_tab[x] >> 4;
> > > +            }
> > > +        }
> > >      }
> > > -    return ret;
> > > +
> > > +    return -1;
> > 
> > for a less idiotic and simpler variant i wrote a few years ago see
> > http://guru.multimedia.cx/fast-integer-square-root/
> > 
> > if anyone can simplify this further that would be welcome :)
> 
> noone? well heres a simpler variant, though possibly slower (didnt benchmark)
> 
> static inline unsigned int sqrt4(unsigned int a)
> {
>     unsigned int b;
> 
>     if(a<(1<<16)){
>         if(a<(1<<10)-3)      b=sqrt_tab[(a+ 3)>>2 ]>>3;
>         else{
>             if(a<(1<<14)-28) b=sqrt_tab[(a+28)>>6 ]>>1;
>             else             b=sqrt_tab[ a    >>8 ];
>         }
>     }else{
>         int s= (av_log2(a)-12)>>1;
>         b= sqrt_tab[a>>(2*s+6)];
>         b= (FASTDIV(a,b)>>s) + (b<<(s-2));
>     }
> 
>     if(a<b*b) b--;
> 
>     return b;
> }

next one, just reordering the if() to make smaller values faster and some
cosmetics

static inline unsigned int sqrt3(unsigned int a)
{
    unsigned int b;

    if     (a<(1<<10)- 3) b=sqrt_tab[(a+ 3)>>2 ]>>3;
    else if(a<(1<<14)-28) b=sqrt_tab[(a+28)>>6 ]>>1;
    else if(a<(1<<16)   ) b=sqrt_tab[ a    >>8 ]   ;
    else{
        int s= av_log2(a)>>1;
        b= sqrt_tab[a>>(2*s-6)];
        b= (FASTDIV(a,b)>>(s-6)) + (b<<(s-8));
    }

    return b - (a<b*b);
}

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I wish the Xiph folks would stop pretending they've got something they
do not.  Somehow I fear this will remain a wish. -- M?ns Rullg?rd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080113/50073997/attachment.pgp>



More information about the ffmpeg-devel mailing list