[FFmpeg-devel] [PATCH 1/2] x86: move horizonal add macros to x86util

Ronald S. Bultje rsbultje at gmail.com
Sat Apr 12 03:02:09 CEST 2014


Hi,

On Fri, Apr 11, 2014 at 7:45 PM, James Almer <jamrial at gmail.com> wrote:

> On 11/04/14 8:14 PM, Ronald S. Bultje wrote:
> > Hi
> >
> > On Fri, Apr 11, 2014 at 7:00 PM, James Almer <jamrial at gmail.com> wrote:
> >
> >> Also port relevant AVX2/XOP optimizations from x264
> >>
> >
> > Did you get permission from them to relicense to LGPL? I know it's
> trivial
> > code but really, but better safe than sorry.
>
> No. Since we were importing changes from x264's x86inc/util when they were
> useful
> I assumed it was ok.
>
> I wrote the HADDD xop optimization, but not the AVX2 and HADDW xop ones. I
> can
> remove those two if Henrik and Jason are against this.
> I'm CCing them in any case.
>
> >
> >> +%macro HADDD 2 ; sum junk
> >> +%if sizeof%1 == 32
> >> +%define %2 xmm%2
> >> +    vextracti128 %2, %1, 1
> >> +%define %1 xmm%1
> >> +    paddd   %1, %2
> >> +%endif
> >> +%if mmsize >= 16
> >> +%if cpuflag(xop) && sizeof%1 == 16
> >> +    vphadddq %1, %1
> >> +%endif
> >> +    movhlps %2, %1
> >> +    paddd   %1, %2
> >> +%endif
> >> +%if notcpuflag(xop)
> >> +    PSHUFLW %2, %1, q0032
> >> +    paddd   %1, %2
> >> +%endif
> >> +%undef %1
> >> +%undef %2
> >> +%endmacro
> >> +
> >> +%macro HADDW 2 ; reg, tmp
> >> +%if cpuflag(xop) && sizeof%1 == 16
> >> +    vphaddwq  %1, %1
> >> +    movhlps   %2, %1
> >> +    paddd     %1, %2
> >> +%else
> >> +    pmaddwd %1, [pw_1]
> >> +    HADDD   %1, %2
> >> +%endif
> >> +%endmacro
> >
> >
> > So, these require some comments on what they do - the naming is terrible.
> > It suggests that they act like phaddw/d, but they actually just act on
> the
> > lower half of the output register (or the full half of one, rather than
> > both, input registers). You probably want to make that explicit in a
> > command, maybe even rename just to prevent the obvious confusion.
>
> They are not supposed to behave like phaddd/w, which is why they are not
> called
> PHADDD/W.
>
> Not sure what kind of comment to add. And I'd rather not rename them. I
> don't
> want to deviate too much from x264's x86util unless necessary.


Hm, you're right, I missed that the P is missing. OK, I'm fine with that
then, if Jason/Hendrik don't mind (poke them on IRC maybe), then patches OK
(other one looked good).

Ronald


More information about the ffmpeg-devel mailing list