[FFmpeg-devel] [PATCH 1/7] x86: sbrdsp: implement SSE/SSE2 qmf_pre_shuffle
Christophe Gisquet
christophe.gisquet at gmail.com
Sat Apr 6 15:06:07 CEST 2013
(On phone)
Please note I haven't tested the avx version that Jason probably wrote on
the fly.
Le 6 avr. 2013 15:00, "Michael Niedermayer" <michaelni at gmx.at> a écrit :
> On Sat, Apr 06, 2013 at 10:52:08AM +0000, Christophe Gisquet wrote:
> > From 253 to 70(sse)/52(sse2) cycles on Arrandale and Win64.
> > 61/55 cycles on SandyBridge.
>
> SSE2 is 41 cycles now on sb :)
>
>
> > ---
> > libavcodec/x86/sbrdsp.asm | 54
> ++++++++++++++++++++++++++++++++++++++++++++
> > libavcodec/x86/sbrdsp_init.c | 7 ++++++
> > 2 files changed, 61 insertions(+)
> >
> > diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> > index 999e5af..f3c30d0 100644
> > --- a/libavcodec/x86/sbrdsp.asm
> > +++ b/libavcodec/x86/sbrdsp.asm
> > @@ -242,3 +242,57 @@ cglobal sbr_neg_odd_64, 1,2,4,z
> > cmp zq, r1q
> > jne .loop
> > REP_RET
> > +
> > +%macro SBR_QMF_PRE_SHUFFLE 0
> > +cglobal sbr_qmf_pre_shuffle, 1,4,7,z
> > +%define OFFSET (32*4-2*mmsize)
> > + mov r3q, OFFSET
> > + lea r1q, [zq + (32+1)*4]
> > + lea r2q, [zq + 64*4]
> > + mova m6, [ps_neg]
> > +.loop:
> > + movu m0, [r1q]
> > + movu m2, [r1q + mmsize]
> > + movu m1, [zq + r3q + 4 + mmsize]
> > + movu m3, [zq + r3q + 4]
> > +%if cpuflag(sse2)
> > +%define XOR pxor
> > +%define SHUFFLE pshufd
> > +%define UNPACKL punpckldq
> > +%define UNPACKH punpckhdq
> > +%define MOVH movq
> > +%else
> > +%define XOR xorps
> > +%define SHUFFLE shufps
> > +%define UNPACKL unpcklps
> > +%define UNPACKH unpckhps
> > +%define MOVH movlps
> > +%endif
> > +
>
> > + XOR m2, m6
> > + XOR m0, m6
> > + SHUFFLE m2, m2, q0123
> > + SHUFFLE m0, m0, q0123
>
> doing the shuffles before the XOR is 1 cycle faster on my sb
> if its not for you then ignore
>
> @@ -269,10 +269,10 @@ cglobal sbr_qmf_pre_shuffle, 1,4,7,z
> %define MOVH movlps
> %endif
>
> - XOR m2, m6
> - XOR m0, m6
> SHUFFLE m2, m2, q0123
> SHUFFLE m0, m0, q0123
> + XOR m2, m6
> + XOR m0, m6
> mova m5, m2
> mova m4, m0
> UNPACKL m2, m3
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Many things microsoft did are stupid, but not doing something just because
> microsoft did it is even more stupid. If everything ms did were stupid they
> would be bankrupt already.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
More information about the ffmpeg-devel
mailing list