[FFmpeg-devel] [PATCH] vp9: add 12bpp sse2 versions of iadst4.

Ronald S. Bultje rsbultje at gmail.com
Mon Oct 12 16:25:01 CEST 2015


Hi,

On Wed, Oct 7, 2015 at 10:30 AM, Henrik Gramner <henrik at gramner.com> wrote:

> On Wed, Oct 7, 2015 at 3:59 AM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > diff --git a/libavcodec/x86/vp9itxfm_16bpp.asm
> b/libavcodec/x86/vp9itxfm_16bpp.asm
>
> > +%macro IADST4_12BPP_1D 0
> > +    pand                m4, m0, [pd_3fff]
> > +    pand                m5, m1, [pd_3fff]
> > +    psrad               m0, 14
> > +    psrad               m1, 14
> > +    packssdw            m5, m1
> > +    packssdw            m4, m0
> > +    punpckhwd           m1, m4, m5
> > +    punpcklwd           m4, m5
> > +    pand                m5, m2, [pd_3fff]
> > +    pand                m6, m3, [pd_3fff]
>
> mova m6, [pd_3fff]
>
> > +    pmaddwd             m7, m5, [pw_15212_9929]
> > +    pmaddwd             m6, m4, [pw_5283_13377]
> > +    pmaddwd             m2, m3, [pw_15212_9929]
> > +    pmaddwd             m0, m1, [pw_5283_13377]
>
> mova m2, [pw_15212_9929]
> mova m0, [pw_5283_13377]
>
> > +    pmaddwd             m7, m5, [pw_m13377_13377]
> > +    pmaddwd             m2, m4, [pw_13377_0]
> > +    pmaddwd             m8, m3, [pw_m13377_13377]
> > +    pmaddwd             m9, m1, [pw_13377_0]
>
> mova m8, [pw_m13377_13377]
> mova m9, [pw_13377_0]
>
> > +    pmaddwd             m7, m5, [pw_m5283_m15212]
> > +    pmaddwd             m6, m4, [pw_9929_13377]
> > +    pmaddwd             m8, m3, [pw_m5283_m15212]
> > +    pmaddwd             m9, m1, [pw_9929_13377]
>
> mova m8, [pw_m5283_m15212]
> mova m9, [pw_9929_13377]
>

All done.

> +%macro IADST4_12BPP_FN 4
> > +INIT_XMM sse2
>
> I'd use INIT_* when invoking the macro instead unless there's a reason not
> to
>

Done also.

> +cglobal vp9_%1_%3_4x4_add_12, 3, 3, 10, dst, stride, block, eob
> [...]
> > +    paddd               m0, [pd_8]
> > +    paddd               m1, [pd_8]
> > +    paddd               m2, [pd_8]
> > +    paddd               m3, [pd_8]
> > +    psrad               m0, 4
> > +    psrad               m1, 4
> > +    psrad               m2, 4
> > +    psrad               m3, 4
>
> Store [pd_8] in a register.


I did this, but it's in a separate patch, that tries to do it for all code
at once. I did this because most of this code gets re-touched in later
patches and rebasing got painful, so doing it at the end seemed easier.
Hope that's OK.

Ronald


More information about the ffmpeg-devel mailing list