[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel

Christophe Gisquet christophe.gisquet at gmail.com
Sat Mar 1 08:11:48 CET 2014


2014-02-28 15:24 GMT+01:00 Pierre Edouard Lepere
<Pierre-Edouard.Lepere at insa-rennes.fr>:
> here are 2 patches for the HEVC decoder :
> 1) changes in the C for epel and qpel. it is now possible to have fixed-width functions for each epel/qpel function.
> 2) adding ASM files. each function has a fixed width and has its loop unrolled.

A very cursory look from me.

You now have arrays that avoid unpacking the coefficients. Good.
Please put an "align 16" (32 for avx2?) on the line before
hevc_epel_filters_asm_8 to guarantee the coeffs addresses are aligned.

A next step would be to do (eg in QPEL_FILTER) something like:
%if ARCH_X86_64
movdqa           m12, [rfilterq + %2q + 16]
%define COEFFS23 m12
%define COEFFS23 [rfilterq + %2q + 16]
But someone caring for 32bits systems may do that in your stead.

I also see you doing a lot of movdqu m?, [%2q+N] with -4<N<5. I think
this qualifies for SSSE3's palignr but this might need some
benchmarking to validate.

And that's the final comment. I don't know how you validate your
changes besides validness, but it is nice providing timings to compare
before/after. If you decide to do that, include "libavutil/timer.h"
and add {START_TIMER and STOP_TIMER("some name")} around the
benchmarked function, run the program and check the decicycles
reported. It may be require some logging flag on the command-line for

Make sure your CPU does not {under,over}clock across measurements by
setting an appropriate power profile.


More information about the ffmpeg-devel mailing list