[FFmpeg-devel] [PATCH 3/9] SBR DSP x86: implement SSE qmf_deint_bfly
Michael Niedermayer
michaelni at gmx.at
Fri Apr 5 15:44:44 CEST 2013
On Thu, Apr 04, 2013 at 07:45:47PM +0000, Christophe Gisquet wrote:
> From 713 to 209 cycles on Arrandale and Win64.
> Having a loop counter is a 7 cycle gain.
> Unrolling is another 7 cycle gain.
> Working in reverse scan is another 6 cycles.
> ---
> libavcodec/x86/sbrdsp.asm | 28 ++++++++++++++++++++++++++++
> libavcodec/x86/sbrdsp_init.c | 2 ++
> 2 files changed, 30 insertions(+)
>
> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> index 85e197a..573981a 100644
> --- a/libavcodec/x86/sbrdsp.asm
> +++ b/libavcodec/x86/sbrdsp.asm
> @@ -273,3 +273,31 @@ cglobal sbr_qmf_deint_neg, 2,3,3,v,src,vrev
> cmp vq, vrevq
> jl .loop
> REP_RET
> +
> +INIT_XMM sse
> +; sbr_qmf_deint_bfly(float *v, const float *src0, const float *src1)
> +cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c
> + mov cq, 64*4-2*mmsize
> + lea vrevq, [vq + 64*4]
> +.loop:
> + mova m0, [src0q+cq]
> + mova m1, [src1q]
> + mova m4, [src0q+cq+mmsize]
> + mova m5, [src1q+mmsize]
> + shufps m2, m0, m0, q0123
> + shufps m3, m1, m1, q0123
> + shufps m6, m4, m4, q0123
> + shufps m7, m5, m5, q0123
replacing these by pshufd changes it from 68 to 47 cycles on
sandybridge
> + addps m5, m2
> + subps m0, m7
> + addps m1, m6
> + subps m4, m3
> + mova [vrevq], m1
> + mova [vrevq+mmsize], m5
> + mova [vq+cq], m0
> + mova [vq+cq+mmsize], m4
> + add src1q, 2*mmsize
> + add vrevq, 2*mmsize
> + sub cq, 2*mmsize
> + jge .loop
i tried to reorder the instructions but didnt see a speedgain from it
but in theory memory accesses might benefit from being done in order
that is 8 7 6 5 4 3 instead of 7 8 5 6 3 4
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
There will always be a question for which you do not know the correct answer.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130405/02d43443/attachment.asc>
More information about the ffmpeg-devel
mailing list