[FFmpeg-devel] [PATCH 1/2] x86: move horizonal add macros to x86util
Henrik Gramner
henrik at gramner.com
Sat Apr 12 09:58:51 CEST 2014
On Sat, Apr 12, 2014 at 1:45 AM, James Almer <jamrial at gmail.com> wrote:
> On 11/04/14 8:14 PM, Ronald S. Bultje wrote:
>> Hi
>>
>> On Fri, Apr 11, 2014 at 7:00 PM, James Almer <jamrial at gmail.com> wrote:
>>
>>> Also port relevant AVX2/XOP optimizations from x264
>>>
>>
>> Did you get permission from them to relicense to LGPL? I know it's trivial
>> code but really, but better safe than sorry.
>
> No. Since we were importing changes from x264's x86inc/util when they were useful
> I assumed it was ok.
>
> I wrote the HADDD xop optimization, but not the AVX2 and HADDW xop ones. I can
> remove those two if Henrik and Jason are against this.
> I'm CCing them in any case.
>
>>
>>> +%macro HADDD 2 ; sum junk
>>> +%if sizeof%1 == 32
>>> +%define %2 xmm%2
>>> + vextracti128 %2, %1, 1
>>> +%define %1 xmm%1
>>> + paddd %1, %2
>>> +%endif
>>> +%if mmsize >= 16
>>> +%if cpuflag(xop) && sizeof%1 == 16
>>> + vphadddq %1, %1
>>> +%endif
>>> + movhlps %2, %1
>>> + paddd %1, %2
>>> +%endif
>>> +%if notcpuflag(xop)
>>> + PSHUFLW %2, %1, q0032
>>> + paddd %1, %2
>>> +%endif
>>> +%undef %1
>>> +%undef %2
>>> +%endmacro
>>> +
>>> +%macro HADDW 2 ; reg, tmp
>>> +%if cpuflag(xop) && sizeof%1 == 16
>>> + vphaddwq %1, %1
>>> + movhlps %2, %1
>>> + paddd %1, %2
>>> +%else
>>> + pmaddwd %1, [pw_1]
>>> + HADDD %1, %2
>>> +%endif
>>> +%endmacro
>>
>>
>> So, these require some comments on what they do - the naming is terrible.
>> It suggests that they act like phaddw/d, but they actually just act on the
>> lower half of the output register (or the full half of one, rather than
>> both, input registers). You probably want to make that explicit in a
>> command, maybe even rename just to prevent the obvious confusion.
>
> They are not supposed to behave like phaddd/w, which is why they are not called
> PHADDD/W.
>
> Not sure what kind of comment to add. And I'd rather not rename them. I don't
> want to deviate too much from x264's x86util unless necessary.
>
>>
>> Ronald
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>
Relicensing this as LGPL is fine with me.
Henrik
More information about the ffmpeg-devel
mailing list