[FFmpeg-devel] [PATCH] vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions.
Henrik Gramner
henrik at gramner.com
Tue Oct 6 20:41:26 CEST 2015
On Tue, Oct 6, 2015 at 5:42 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> +cglobal vp9_%1_%3_4x4_add_10, 3, 3, 0, dst, stride, block, eob
[...]
> + mova m0, [blockq+0*16+0]
> + mova m4, [blockq+0*16+8]
> + mova m1, [blockq+1*16+0]
> + mova m5, [blockq+1*16+8]
> + packssdw m0, m4
> + packssdw m1, m5
> + mova m2, [blockq+2*16+0]
> + mova m4, [blockq+2*16+8]
> + mova m3, [blockq+3*16+0]
> + mova m5, [blockq+3*16+8]
> + packssdw m2, m4
> + packssdw m3, m5
Use packssdw with a memory arg as the second operand.
The mixing of MMX and SSE is quite ugly in general, but whatever works.
More information about the ffmpeg-devel
mailing list