[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel
Pierre Edouard Lepere
Pierre-Edouard.Lepere at insa-rennes.fr
Thu Mar 6 16:40:04 CET 2014
new patch, now all in a single, smaller file !
>> + sub srcq, 1
>Why? Just subtract one from src when you dereference from it [srcq-1]
>instead of [srcq]).
because it's more convenient, having filters start at src whether we are in h, v or hv.
> + EPEL_FILTER 8, mx
> + LOOP_INIT epel_h_h_2_8
Labels with a dot are local and thus don't need full function name scope,
just loop instead of epel_h_h_2_8 is fine.
> + EPEL_LOAD 8, src, 1
> + EPEL_COMPUTE 8, 2
> + PEL_STORE2 dst, m0, m1
> + LOOP_END epel_h_h_2_8, dst, dststride, src, srcstride
> + RET
OK, so the actual code. For play, can you show the _actual disassembly_
that all these macros eventually got us to? I wonder what it actually gives.
>I can understand the pmaddwd approach for second pass may be faster for
>half-registers, since you fill the register up to full width and save one
>instruction - but did you measure it?
>Then, for second, you're just spending instructions shuffling. I don't
>think 2a is faster than 2b, in fact I expect it to be significantly slower.
This was done first with intrinsics, and pmulhw was needed, so it adds just too much instructions.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 76675 bytes
Desc: not available
More information about the ffmpeg-devel