[FFmpeg-devel] [PATCH 3/4] x86/vvcdec: inter, add optical flow avx2 code
James Almer
jamrial at gmail.com
Sun Aug 18 06:19:09 EEST 2024
On 8/17/2024 10:48 PM, Nuo Mi wrote:
> + pxor m6, m6
> + phaddw m%2, m6
> + phaddw m%2, m6
Horizonal adds are slow. Can't you do this with normal adds, shifts and
blend?
> + vpermq m%2, m%2, q0020
> + pshufd m%2, m%2, q1120
> + pmovsxwd m%2, xmm%2 ; 4 sgxgy
> +
> + pmulld m%2, m11 ; 4 vx * sgxgy
Similarly, pmulld is super slow (Ten cycles in some architectures), and
that's on top of a pmovsx.
Since you have m6 zeroed already, wouldn't pmaddwd work here? The pd_15
and pd_m15 constants would need to be changed to words, as would the
values to be clipped.
> + psrad m%2, 1
More information about the ffmpeg-devel
mailing list