[FFmpeg-devel] [PATCH 4/7] x86: sbrdsp: implement SSE hf_apply_noise

Michael Niedermayer michaelni at gmx.at
Sat Apr 6 15:44:19 CEST 2013


On Sat, Apr 06, 2013 at 10:52:11AM +0000, Christophe Gisquet wrote:
> 233 to 115(sse)/110(sse2) cycles on Arrandale and Win64.
> Replacing the multiplication by s_m[m] by an andps and an xorps with
> appropriate vectors is slower. Unrolling is a 15 cycles win.
> ---
>  libavcodec/x86/sbrdsp.asm    | 145 +++++++++++++++++++++++++++++++++++++++++++
>  libavcodec/x86/sbrdsp_init.c |  32 ++++++++++
>  2 files changed, 177 insertions(+)
> 
> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> index 65c972e..a7998fa 100644
> --- a/libavcodec/x86/sbrdsp.asm
> +++ b/libavcodec/x86/sbrdsp.asm
> @@ -26,6 +26,12 @@ SECTION_RODATA
>  ps_mask         times 2 dd 1<<31, 0
>  ps_mask2        times 2 dd 0, 1<<31
>  ps_neg          times 4 dd 1<<31
> +ps_noise0       times 2 dd  1.0,  0.0,
> +ps_noise2       times 2 dd -1.0,  0.0
> +ps_noise13      dd  0.0,  1.0, 0.0, -1.0
> +                dd  0.0, -1.0, 0.0,  1.0
> +                dd  0.0,  1.0, 0.0, -1.0
> +cextern         sbr_noise_table
>  
>  SECTION_TEXT
>

> @@ -358,3 +364,142 @@ SBR_QMF_DEINT_BFLY
>      
>  INIT_XMM sse2
>  SBR_QMF_DEINT_BFLY
> +
> +%if WIN64
> +%define NREGS 0
> +%else

> +%ifndef PIC

ifdef


[...]
> +%endif
> +    mulps      m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +    mulps      m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +    mova       m3, [s_mq + count]
> +    ; TODO: replace by a vpermd in AVX2

> +%if cpuflag(sse2)
> +    punpckhdq  m4, m3, m3
> +    punpckldq  m3, m3, m3
> +%else
> +    unpckhps   m4, m3, m3
> +    unpcklps   m3, m3, m3
> +%endif

it might make sense to do something in some header with a macro
maybe so that punpckl/dq get turned into unpck* on SSE1


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Avoid a single point of failure, be that a person or equipment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130406/f406ae5b/attachment.asc>


More information about the ffmpeg-devel mailing list