[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel
Ronald S. Bultje
rsbultje at gmail.com
Sat Mar 1 13:44:41 CET 2014
On Sat, Mar 1, 2014 at 2:11 AM, Christophe Gisquet <
christophe.gisquet at gmail.com> wrote:
> 2014-02-28 15:24 GMT+01:00 Pierre Edouard Lepere
> <Pierre-Edouard.Lepere at insa-rennes.fr>:
> > here are 2 patches for the HEVC decoder :
> > 1) changes in the C for epel and qpel. it is now possible to have
> fixed-width functions for each epel/qpel function.
> > 2) adding ASM files. each function has a fixed width and has its loop
> A very cursory look from me.
> You now have arrays that avoid unpacking the coefficients. Good.
> Please put an "align 16" (32 for avx2?) on the line before
> hevc_epel_filters_asm_8 to guarantee the coeffs addresses are aligned.
> A next step would be to do (eg in QPEL_FILTER) something like:
> %if ARCH_X86_64
> movdqa m12, [rfilterq + %2q + 16]
> %define COEFFS23 m12
> %define COEFFS23 [rfilterq + %2q + 16]
> But someone caring for 32bits systems may do that in your stead.
> I also see you doing a lot of movdqu m?, [%2q+N] with -4<N<5. I think
> this qualifies for SSSE3's palignr but this might need some
> benchmarking to validate.
The multi-movu is done on vp8/vp9 also, and I confirmed that approach to be
faster than palignr on Arrandale. Would be interested in explicit numbers
on other CPUs.
More information about the ffmpeg-devel