[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

Martin Vignali martin.vignali at gmail.com
Thu Dec 14 12:16:54 EET 2017


2017-12-13 17:37 GMT+01:00 Henrik Gramner <henrik at gramner.com>:

> On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali <martin.vignali at gmail.com>
> wrote:
> > the idea in AVX2 is to load 128bits of data (2x 64 bits)
> > then shuffle accross lane, the two 64 bits in the low part of each lane,
> to
> > keep the rest of the process similar
> > to the sse version
>
> What about using pmovzxbw instead of movu + vpermq + punpcklbw?
>

You're right, this is faster (tested on the first one with intermediate
16bits processing (grainextract)

vpermq load

grainextract_c: 22162.2
grainextract_sse2: 1160.9
grainextract_avx2: 1154.2


vpmovzxbw

grainextract_c: 22165.7
grainextract_sse2: 1155.7
grainextract_avx2: 772.9


>
> > for the store, the idea is similar in the opposite way (shuffle before
> > store)
>
> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
> packuswb + vpermq.
>
>
Sorry don't understand this part
do you mean 128 bit packuswb + movh for each lane ?
or something else ?

Martin


More information about the ffmpeg-devel mailing list