[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel

Wed Mar 5 11:11:24 CET 2014

Hi,

>> > +%macro PEL_STORE2 3
>> +    movd           [%1q], %2
>> +%endmacro
>> +%macro PEL_STORE4 3
>> +    movq           [%1q], %2
>> +%endmacro
>> +%macro PEL_STORE6 3
>> +    movq           [%1q], %2
>> +    psrldq            %2, 8
>> +    movd         [%1q+8], %2
>> +%endmacro
>> +%macro PEL_STORE8 3
>> +    movdqa         [%1q], %2
>> +%endmacro
>> +%macro PEL_STORE12 3
>> +    movdqa         [%1q], %2
>> +    movq        [%1q+16], %3
>> +%endmacro
>> +%macro PEL_STORE16 3
>> +    PEL_STORE8        %1, %2, %3
>> +    movdqa      [%1q+16], %3
>> +%endmacro
>
>Most of these macros don't use %3, so make it take 2 arguments? Also I
>wonder if you really need macros for some of this...

this is to be able to use PEL_STORE%1, which can be from 2 (needing only 2 arguments) to 16 (needing 3 argmuents).

>> +%macro MC_PIXEL_COMPUTE16_8 0
>> +    MC_PIXEL_COMPUTE2_8
>> +    punpckhbw         m2, m0, m15
>> +    psllw             m2, 6
>> +%endmacro
>> +
>> +%macro MC_PIXEL_COMPUTE2_10 0
>> +    psllw             m1, m0, 4
>> +%endmacro
>
>So, these are only used for full-pixel MC, right? I think for these kind of
>situations, you can see that splitting MC and weighting in two functions is
>just not a good idea. I'm going to guess that the large majority of calls
>doesn't actually use weighting in practice, so you're effectively (for e.g.
>a still) unpacking to words, just to pack back to bytes afterwards.
>
>I'd strongly advise to look into merging weighting and MC together. I
>understand it's more code, but most of your code here is boilerplate and I
>feel it can be simplified a lot further. A lot of the extra code is just
>"stuff" that's not really needed.

Can this be done in another patch ? I'm going to look into it, but I'd like to have this version of ASM optimizations validated first.

>> +INIT_XMM sse4                                    ; adds ff_ and _sse4 to
>function name
>
>This looks ssse3? Are you sure it's sse4?

some are SSSE3, but others don't work in SSSE3. I need to try again on my non-SSE4 machine.

>> +cglobal hevc_put_hevc_pel_pixels2_8, 8, 14, 0 , dst, dststride, src,
>srcstride,width,height
>> +    LOOP_INIT        pixels_2_8
>> +    movdqu            m0, [srcq]            ; load data from source
>
>movd, or even pinsrw, not movdqu. Same for other functions below (w=4 =>
>movd, w=6/8 => movq).

I've done the changes, but doesn't that make things too specific ?

>I don't really think you're taking advantage of the bigger width here; this
>sequence of instructions is essentially identical to if you had a C wrapper
>for all functions larger than w=16. Make your macros more configurable and
>load directly from [srcq+16] instead of the intermediate leas, that also
>avoids needing an extra register for tsrcq/tdstq (and all instructions to
>set stuff like that up).
>
>Or just use a C wrapper and avoid all this extra cruft in the assembly. I
>think for fullpel MC, it's fine to do fullwidth in assembly, but do use it
>to its fullest extent.

I think it's more about the limit of 128 registers. Once I can use AVX2, I'll be able to use bigger steps. what do you suggest in fullpel MC to use the assembly "to the fullest extent" ?
I've also tried adding the shift into the macro, but to no avail. :/

Thanks for the feedback

-Pierre-Edouard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-added-ASM-functions-for-HEVC.patch
Type: text/x-patch
Size: 85359 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140305/7b5ae9f3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-changed-type-for-mx-my-from-int-to-intptr_t-for-bett.patch
Type: text/x-patch
Size: 55398 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140305/7b5ae9f3/attachment-0001.bin>