[FFmpeg-devel] [PATCH 1/5] x264asm: extend SBUTTERFLY to support SSE1

Michael Niedermayer michaelni at gmx.at
Mon Apr 8 14:16:07 CEST 2013


On Mon, Apr 08, 2013 at 10:04:33AM +0200, Christophe Gisquet wrote:
> 2013/4/7 Christophe Gisquet <christophe.gisquet at gmail.com>:
> > The other solution is probably to have 2 paths depending on %ifidn %2, %3
> 
> Here's a proposal for this. I haven't strongly tested it (just ran
> fate-aac without my other patches) because, in short, I can't at the
> moment.
> 
> And for the wd unpacks, I guess there is probably too much shuffling
> and shifting to do to add support for SSE1, as it kills opportunities
> for better scheduling.
> 
> --
> Christophe

>  x86util.asm |   32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 4cc1ffd78cd5800397de610dc00dfbd3cdee5026  0001-x264asm-SBUTTERFLY-SSE1-and-identical-args.patch
> From a768a9352ddde88e99f6b729b70fdddc20297f5c Mon Sep 17 00:00:00 2001
> From: Christophe Gisquet <christophe.gisquet at gmail.com>
> Date: Mon, 8 Apr 2013 09:42:26 +0200
> Subject: [PATCH] x264asm: SBUTTERFLY: SSE1 and identical args
> 
> SSE1 now supports dq and qdq types of unpacking.
> Also, the output when %2 == %3 is now correct, and %3 == %4 generates an
> error.
> ---
>  libavutil/x86/x86util.asm |   32 +++++++++++++++++++++++++++++++-
>  1 files changed, 31 insertions(+), 1 deletions(-)
> 
> diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
> index 79a023f..5adb80c 100644
> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -30,10 +30,40 @@
>  %include "libavutil/x86/x86inc.asm"
>  
>  %macro SBUTTERFLY 4
> -%if avx_enabled == 0
> +%ifidn %3, %4
> +  %error Third and fourth arguments must be different
> +%endif
> +%if notcpuflag(sse2) && mmsize == 16
> +  %ifidn %1, dq
> +    mova      m%4, m%2

> +    %ifidn %2, %3
> +    unpcklps  m%2, m%3
> +    unpckhps  m%4, m%3
> +    %else
> +    unpckhps  m%4, m%3

this looks flipped


> +    unpcklps  m%2, m%3
> +    %endif
> +  %elifdn %1, qdq
>      mova      m%4, m%2

> +    %ifidn %2, %3
> +    shufps    m%2, m%3, q1010
> +    shufps    m%4, m%3, q3232
> +    %else
> +    shufps    m%4, m%3, q1010

this too looks like the 2 alternatives are fliped


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130408/e9f3f7f0/attachment.asc>


More information about the ffmpeg-devel mailing list