[FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD
Henrik Gramner
henrik at gramner.com
Tue May 1 22:47:40 EEST 2018
On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol <onemda at gmail.com> wrote:
> +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> + movu m2, [aq+2*xq]
> + pand m2, m3
> + movu m6, [aq+2*xq]
> + pand m6, m7
> + psrlw m6, 8
> + paddw m2, m6
> + psrlw m2, 1
> + movu m6, [aq+2*xq]
> + pand m6, m3
> + paddw m2, m6
> + psrlw m2, 1
I believe this can be simplified to something like (untested):
movu m1, [aq+2*xq]
pandn m2, m3, m1
psllw m1, 8
pavgw m2, m1
pavgw m2, m1
psrlw m2, 8
> +cglobal overlay_row_20, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> + movu m2, [aq+2*xq]
> + pand m2, m3
> + movu m6, [aq+2*xq]
> + pand m6, m7
> + psrlw m6, 8
> + paddw m2, m6
> + movu m6, [daq+2*xq]
> + pand m6, m3
> + paddw m2, m6
> + movu m6, [daq+2*xq]
> + pand m6, m7
> + psrlw m6, 8
> + paddw m2, m6
> + psrlw m2, 2
And this to (untested):
mova m6, [pb_1]
...
movu m2, [aq+2*xq]
movu m1, [daq+2*xq]
pmaddubsw m2, m6
pmaddubsw m1, m6
paddw m2, m1
psrlw m2, 2
More information about the ffmpeg-devel
mailing list