[FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD

Henrik Gramner henrik at gramner.com
Tue May 1 22:47:40 EEST 2018


On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol <onemda at gmail.com> wrote:
> +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> +        movu        m2, [aq+2*xq]
> +        pand        m2, m3
> +        movu        m6, [aq+2*xq]
> +        pand        m6, m7
> +        psrlw       m6, 8
> +        paddw       m2, m6
> +        psrlw       m2, 1
> +        movu        m6, [aq+2*xq]
> +        pand        m6, m3
> +        paddw       m2, m6
> +        psrlw       m2, 1

I believe this can be simplified to something like (untested):

    movu        m1, [aq+2*xq]
    pandn       m2, m3, m1
    psllw       m1, 8
    pavgw       m2, m1
    pavgw       m2, m1
    psrlw       m2, 8

> +cglobal overlay_row_20, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> +        movu        m2, [aq+2*xq]
> +        pand        m2, m3
> +        movu        m6, [aq+2*xq]
> +        pand        m6, m7
> +        psrlw       m6, 8
> +        paddw       m2, m6
> +        movu        m6, [daq+2*xq]
> +        pand        m6, m3
> +        paddw       m2, m6
> +        movu        m6, [daq+2*xq]
> +        pand        m6, m7
> +        psrlw       m6, 8
> +        paddw       m2, m6
> +        psrlw       m2, 2

And this to (untested):

    mova        m6, [pb_1]
...
    movu        m2, [aq+2*xq]
    movu        m1, [daq+2*xq]
    pmaddubsw   m2, m6
    pmaddubsw   m1, m6
    paddw       m2, m1
    psrlw       m2, 2


More information about the ffmpeg-devel mailing list