[FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

Henrik Gramner henrik at gramner.com
Thu Jan 14 17:45:53 CET 2016

On Thu, Jan 14, 2016 at 5:26 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> readability still no.

"<instruction> dst, mult1, mult2, add" is significantly more readable
than "<instruction_with_some_digits> src1, src2, src3" where you need
to mentally parse which source operand corresponds to which
mathematical operator depending on the order of the digits.

Compare the following instruction sequences which are identical (just
a random example I made up on the spot):

; m0 = m2 * m4 + m0
; m1 = m2 * m1 + m3
; m2 = m2 * m3 + m4

fmaddpd m0, m2, m4, m0
fmaddpd m1, m2, m1, m3
fmaddpd m2, m2, m3, m4

vfmadd231pd m0, m2, m4
vfmadd213pd m1, m2, m3
vfmadd132pd m2, m4, m3

In the first section it's immediately clear at a quick glance which
registers get multiplied by which.
The second section on the other hand takes some time to parse.

More information about the ffmpeg-devel mailing list