[FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations

Henrik Gramner henrik at gramner.com
Wed Sep 4 23:47:14 EEST 2019


On Wed, Sep 4, 2019 at 10:01 PM James Almer <jamrial at gmail.com> wrote:
> On 9/4/2019 4:28 PM, Paul B Mahol wrote:
> > +        vpmulld          m3, m1, m0
> > +        vpaddd           m1, m3, m2
>
> pmulld m1, m0
> paddd  m1, m2

Could use pmaddwd instead as well, it's faster than pmulld on pretty
much every CPU.

> > +        mova             m2, m4
>
> Pointless mova. Just use m4 in the vpgatherdd below.

No, it's required. Gathers overwrite the mask register.

> > +        vpgatherdd       m5, [srcq + m1], m2
> > +        vextracti128    xm3, m5, 1
> > +        vpshufb          m1, m5, m6
> > +        vpshufb          m2, m3, m6
>
> You could make these two pshufb use xmm regs, since you don't care
> what's in the upper 128 bits.

Or a single ymm pshufb before the vectracti128.


More information about the ffmpeg-devel mailing list