[FFmpeg-devel] [PATCH 1/5] x264asm: extend SBUTTERFLY to support SSE1
Michael Niedermayer
michaelni at gmx.at
Sun Apr 7 22:51:16 CEST 2013
On Sun, Apr 07, 2013 at 08:20:30PM +0000, Christophe Gisquet wrote:
> This was discussed as an alternative to manipulating instructions directly.
> This version also fixes the case where %2 == %3.
> ---
> libavutil/x86/x86util.asm | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
> index 8908444..07ca768 100644
> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -30,10 +30,18 @@
> %include "libavutil/x86/x86inc.asm"
>
> %macro SBUTTERFLY 4
> -%if avx_enabled == 0
> +%if notcpuflag(sse2) && mmsize == 16
> + %ifidn %1, dq
> + mova m%4, m%2
> + unpckhps m%4, m%3
> + unpcklps m%2, m%3
> + %else
> + %error Only dq unpack is supported by SBUTTERFLY on SSE1
> + %endif
> +%elif avx_enabled == 0
> mova m%4, m%2
> - punpckl%1 m%2, m%3
> punpckh%1 m%4, m%3
> + punpckl%1 m%2, m%3
before this the mova and first punpckl%1 could execute at the same
time or the 2 punpck could execute at the same time after the mova
after the patch the first punpckl%1 has to wait for the mova
So maybe this should be benchmarked to ensure it has no negative
effects, SBUTTERFLY is used alot
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130407/25964852/attachment.asc>
More information about the ffmpeg-devel
mailing list