[FFmpeg-devel] [PATCH 3/4] x86/vvcdec: inter, add optical flow avx2 code
Nuo Mi
nuomi2021 at gmail.com
Tue Aug 20 16:25:22 EEST 2024
On Sun, Aug 18, 2024 at 11:18 AM James Almer <jamrial at gmail.com> wrote:
> On 8/17/2024 10:48 PM, Nuo Mi wrote:
> > + pxor m6, m6
> > + phaddw m%2, m6
> > + phaddw m%2, m6
>
> Horizonal adds are slow. Can't you do this with normal adds, shifts and
> blend?
>
> > + vpermq m%2, m%2, q0020
> > + pshufd m%2, m%2, q1120
> > + pmovsxwd m%2, xmm%2 ; 4 sgxgy
> > +
> > + pmulld m%2, m11 ; 4 vx * sgxgy
>
> Hi James,
thank you for the review
> Similarly, pmulld is super slow (Ten cycles in some architectures), and
> that's on top of a pmovsx.
>
fixed in v2
> Since you have m6 zeroed already, wouldn't pmaddwd work here?
fixed
> The pd_15
> and pd_m15 constants would need to be changed to words, as would the
> values to be clipped.
>
We are clipping the dword, not a word,
>
> > + psrad m%2, 1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list