[FFmpeg-devel] libavutil simd

Michael Niedermayer michaelni
Tue Oct 2 20:15:42 CEST 2007


Hi

On Tue, Oct 02, 2007 at 05:29:18PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > personally iam in favor of the simplest and least hackish solution
> > for x86 we can easily and cleanly figure out the cpu capabilities
> > 
> > for ppc and sparc there is no simple and clean way, what really is the
> > big problem with treating ppc without altivec like a different
> > architecture than ppc with altivec?
> > with x86 we cant as there are so many different variants (mmx, 3dnow, mmx2
> > sse, sse2, ....)
> > 
> > the whole thread seems to be centered around "we must do it at runtime
> > no matter what" i just cant help but keep wondering why that is so
> > important?
> 
> certain binary distributors may have yet another headache about that...

since when do we care about certain binary distributions ;)
also they can patch the code as they want which they do anyway, and they
have the advantage of not having to solve the detection for more than 1 OS


[...]
> > 
> > PS: let me remind everyone that libavutil is supposed to be LIGHTweight,
> > simple, modular and fast
> > and i really would rather drop SIMD in libavutil completely before we
> > fill it up with some of the idiotic hacks suggested by the army of
> > bloated zombies in this thread. Many of you really sound like win32
> > users who want their ideas implemented no matter how stupid
> 
> having a 4 times faster adler doesn't sound stupid if you are going to
> use it quite often...

you are comparing naive C code against optimized altivec
you can easily work with 4 bytes at a time, try something like

for(...){
    ss0=ss1=sum0=sum1=0;
    for(...){
        s= ((uint64_t*)src)[i];
        a0=  s     & 0x00FF00FF00FF00FF;
        a1= (s>>8) & 0x00FF00FF00FF00FF;
        ss0  += sum0;
        ss1  += sum1;
        sum0 += a0;
        sum1 += a1;
    }
    tmp= sum0 + sum1;
    X += (0x0001000100010001*tmp)>>48;
                                                                    //  3000000020000000100000000
    ss0 += ss1;                                                     // 33000000220000001100000000
    ss0= (ss0&0x0000FFFF0000FFFF) + ((ss0>>16)&0x0000FFFF0000FFFF); // 33002200220011001100000000
    ss0+= ss0>>32;                                                  // 33222222221111111100000000
    ss0&= 0xFFFFFFFF;
    Y += 8*ss0;                                                     // 24,16,8,0
    Y += (0x0001000300050007*sum0 + 0x0002000400060008*sum1)>>48; //   876543218765432187654321
}

(totally untested and iam certain it does contain some bugs its just to
demonstrate how it can be done and yes it can be opimized further)


> same goes for sha1, md5, aes...

yes

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071002/d0f3d5a1/attachment.pgp>



More information about the ffmpeg-devel mailing list