[FFmpeg-devel] [PATCH] vp9: add 12bpp sse2 versions of iadst4.
Henrik Gramner
henrik at gramner.com
Wed Oct 7 16:30:13 CEST 2015
On Wed, Oct 7, 2015 at 3:59 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> diff --git a/libavcodec/x86/vp9itxfm_16bpp.asm b/libavcodec/x86/vp9itxfm_16bpp.asm
> +%macro IADST4_12BPP_1D 0
> + pand m4, m0, [pd_3fff]
> + pand m5, m1, [pd_3fff]
> + psrad m0, 14
> + psrad m1, 14
> + packssdw m5, m1
> + packssdw m4, m0
> + punpckhwd m1, m4, m5
> + punpcklwd m4, m5
> + pand m5, m2, [pd_3fff]
> + pand m6, m3, [pd_3fff]
mova m6, [pd_3fff]
> + pmaddwd m7, m5, [pw_15212_9929]
> + pmaddwd m6, m4, [pw_5283_13377]
> + pmaddwd m2, m3, [pw_15212_9929]
> + pmaddwd m0, m1, [pw_5283_13377]
mova m2, [pw_15212_9929]
mova m0, [pw_5283_13377]
> + pmaddwd m7, m5, [pw_m13377_13377]
> + pmaddwd m2, m4, [pw_13377_0]
> + pmaddwd m8, m3, [pw_m13377_13377]
> + pmaddwd m9, m1, [pw_13377_0]
mova m8, [pw_m13377_13377]
mova m9, [pw_13377_0]
> + pmaddwd m7, m5, [pw_m5283_m15212]
> + pmaddwd m6, m4, [pw_9929_13377]
> + pmaddwd m8, m3, [pw_m5283_m15212]
> + pmaddwd m9, m1, [pw_9929_13377]
mova m8, [pw_m5283_m15212]
mova m9, [pw_9929_13377]
> +%macro IADST4_12BPP_FN 4
> +INIT_XMM sse2
I'd use INIT_* when invoking the macro instead unless there's a reason not to
> +cglobal vp9_%1_%3_4x4_add_12, 3, 3, 10, dst, stride, block, eob
[...]
> + paddd m0, [pd_8]
> + paddd m1, [pd_8]
> + paddd m2, [pd_8]
> + paddd m3, [pd_8]
> + psrad m0, 4
> + psrad m1, 4
> + psrad m2, 4
> + psrad m3, 4
Store [pd_8] in a register.
In general SIMD code is usually not load-bound (and modern CPUs has
two load units) so having redundant loads of the same value multiple
times is fine, but it's often a good idea to only do a single load to
a register when doing so reduces code size.
More information about the ffmpeg-devel
mailing list