[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel
Ronald S. Bultje
rsbultje at gmail.com
Mon Mar 3 19:32:41 CET 2014
On Mon, Mar 3, 2014 at 12:44 PM, Christophe Gisquet <
christophe.gisquet at gmail.com> wrote:
> 2014-03-03 15:23 GMT+01:00 Pierre Edouard Lepere
> <Pierre-Edouard.Lepere at insa-rennes.fr>:
> > here's a new version of the patches. The first one did not change, but
> the second changed by adding macros, diminishing substantially the code.
> I don't understand why you need to shuffle the input pixels with
> SBUTTERFLY. Anyway, I feel uncomfortable being the only reviewer when
> my reviews take like 2 minutes and are far from thorough.
> Also, one last trick would be to use pmulhrw to perform the
> rounding+shift in one instruction but that's, again, not something
> worth more wait.
I'll try to have a look tonight. From memory, the punpcklbw is to merge one
level of adds with the pmaddubsw, like this:
memory1: times 4 db coef1, coef2
memory2: times 4 db coef3, coef4
memory3: times 4 db coef5, coef6
memory4: times 4 db coef7, coef8
(load 8 pixels in m0-7 (movq), and then:)
punpcklbw m0, m1
punpcklbw m2, m3
punpcklbw m4, m5
punpcklbw m6, m7
pmaddubsw m0, [memory1] (or load [memoryX] in some register and use that
pmaddubsw m2, [memory2]
pmaddubsw m4, [memory3]
pmaddubsw m6, [memory4]
paddw m0, m2
paddw m4, m6
paddw m0, m4
[round, shift, store]
And then use SBUTTERLY bw, .. if loading 16 pixels instead of 8 (and
appropriate doubling of pmaddubsw/paddw calls).
I agree the final round/shift should be done using pmulhrsw and an
appropriate constant if that is feasible. I also noticed you're doing psllw
mx, 6 after some punpcklbws, Why?
More information about the ffmpeg-devel