[FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

Henrik Gramner henrik at gramner.com
Thu Dec 8 00:17:44 EET 2016

On Wed, Dec 7, 2016 at 2:07 PM, James Darnley <jdarnley at obe.tv> wrote:
> Because a few instructions using 3 operand form should be quicker.  The
> fact that it doesn't show is no doubt down to the out of order execution
> managing to do the moves earlier than written.

Register-register moves are handled in the register renaming stage
without consuming any execution resources and are often essentially
free (on modern Intel CPUs, e.g. Haswell+ at least, I don't remember
if this applied to Sandy Bridge as well). The advantage of 3-arg
operands is reduced decoding pressure and slightly denser encoding
which can save some cache if you have a lot of register-register

More information about the ffmpeg-devel mailing list