[FFmpeg-devel] [PATCH 3/4] x86/vvcdec: inter, add optical flow avx2 code

Tue Aug 20 16:25:22 EEST 2024

On Sun, Aug 18, 2024 at 11:18 AM James Almer <jamrial at gmail.com> wrote:

> On 8/17/2024 10:48 PM, Nuo Mi wrote:
> > +    pxor                    m6, m6
> > +    phaddw                 m%2, m6
> > +    phaddw                 m%2, m6
>
> Horizonal adds are slow. Can't you do this with normal adds, shifts and
> blend?
>
> > +    vpermq                 m%2, m%2, q0020
> > +    pshufd                 m%2, m%2, q1120
> > +    pmovsxwd               m%2, xmm%2               ; 4 sgxgy
> > +
> > +    pmulld                 m%2, m11                 ; 4 vx * sgxgy
>
> Hi James,
thank you for the review

> Similarly, pmulld is super slow (Ten cycles in some architectures), and
> that's on top of a pmovsx.
>
fixed in v2

> Since you have m6 zeroed already, wouldn't pmaddwd work here?

fixed

> The pd_15
> and pd_m15 constants would need to be changed to words, as would the
> values to be clipped.
>
We are clipping the dword,  not a word,

>
> > +    psrad                  m%2, 1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>