[FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls
jamrial at gmail.com
Thu Jan 14 17:48:33 CET 2016
On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote:
> On Thu, Jan 14, 2016 at 11:16 AM, James Almer <jamrial at gmail.com> wrote:
>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
>>> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner <henrik at gramner.com> wrote:
>>>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
>>>> gets assembled as FMA3) since normal FMA3 opcodes are horrible to
>>>> read, nobody ever remembers the ordering of operands.
>>> 1. It is very easy to remember: take fmadd231pd x, y, z for instance.
>>> This means 2*3 + 1, so x = y*z+x. How the macro is more readable is
>>> beyond me; especially with some side cases that are undocumented, see
>> fmaddps dst, src1, src2, src3 is always going to be easier to read for anyone
>> without having to think about what number belongs to what operation and what
>> operand. And it will output either FMA4 or FMA3 depending on the value passed
>> to INIT_[XY]MM.
> The fma3/fma4 thing is the only benefit. Even that is generally not a
> big deal; AMD quickly started supporting fma3.
Nobody is asking you to write an FMA4 version of this function. We're asking
you to use the x86inc FMA4-like macros for readability purposes.
>>> 2. If anything, the macro is harder, since it is not Intel supported,
>> Of course it wont be there, it's not defined by them. Non-destructive four
>> operand fma is defined by AMD.
> Of course I know this.
>>> I can't look it up at
>> Neither are any of the dozens other compat macros in x86utils. And many of
>> them are also undocumented within x86utils. This point is absurd.
> How is it absurd? You expect me to use something that lacks clear
> documentation, and claim that it is "more readable". What other macros
> have/lack is irrelevant to the point.
If you want documentation for FMA4 look at AMD docs, just like you didn't
hesitate to look at Intel's.
>>> 3. The macro does not seem to take care of the mov's (if any), still
>>> requiring explicit thought on the part of the programmer.
>> Yes, and? It's not an emulation macro like the uppercase ones that become
>> several instructions. It translate a single FMA4-like instruction into
>> either an FMA4 or FMA3 one.
>> fmaddps xmm0, xmm0, xmm1, xmm2
>> vfmaddps xmm0, xmm0, xmm1, xmm2 if FMA4
>> vfmadd132ps xmm0, xmm2, xmm1 if FMA3
>> If you try to use it with four different operands, it will work with FMA4
>> but not FMA3, since as i said it's not trying to emulate anything.
> Thanks for mentioning the convention; but this is an important one and
> AFAIK not mentioned in any documentation within FFmpeg.
>>> 4. The macro lacks documentation. In particular, it is not a thorough
>>> fma4 emulation in the spirit of
>>> Or put in other words, IMO not good.
>> No, it's good and what's done in every other asm file precisely for being
>> more flexible and readable.
> Flexibility, yes, readability still no.
dst = src1 * src2 + src3
That's all you need to know to read an FMA4-like instruction. Are you going to
tell me that the clusterfuck that's FMA3 with varying numbers that change the
order or operations and meaning of operands is easier to read?
With the compat macros in x86inc, as long as two of the four operands are the
same register then it's going to output the relevant FMA3 instruction for you.
>> Especially since it allows one to write both
>> FMA4 and FMA3 functions without duplicating code.
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel