[FFmpeg-devel] [PATCH] x86: hevc_mc: better register allocation

James Almer jamrial at gmail.com
Sat May 17 20:13:43 CEST 2014


On 17/05/14 11:58 AM, Christophe Gisquet wrote:
> Hi,
> 
> this is more a proof of concept to show that the register allocation
> can be improved. This is the first simple example I found, albeit used
> only in a few cases.
> 
> Benchmark under Win64:
> before:
> 3872 decicycles in a32, 32761 runs, 7 skips
> 2194 decicycles in a16, 32766 runs, 2 skips
> 
> after:
> 3767 decicycles in a32, 32765 runs, 3 skips
> 2119 decicycles in a16, 32767 runs, 1 skips
> 

[...]

> diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
> index 1fae38c..89bbecd 100644
> --- a/libavcodec/x86/hevc_mc.asm
> +++ b/libavcodec/x86/hevc_mc.asm
> @@ -1098,19 +1098,24 @@ cglobal hevc_put_hevc_bi_qpel_hv%1_%2, 9, 11, 16, dst, dststride, src, srcstride
>  %endmacro
>  
>  %macro WEIGHTING_FUNCS 2
> -cglobal hevc_put_hevc_uni_w%1_%2, 8, 10, 11, dst, dststride, src, srcstride, height, denom, wx, ox, shift
> -    lea          shiftd, [denomd+14-%2]          ; shift = 14 - bitd + denom
> -    shl             oxd, %2-8                    ; ox << (bitd - 8)
> -    movd             m2, wxd        ; WX
> -    movd             m3, oxd        ; OX
> -    movd             m4, shiftd     ; shift
> +cglobal hevc_put_hevc_uni_w%1_%2, 4, 5, 7, dst, dststride, src, srcstride, height, denom, wx, ox

Even before your refactor, the function wasn't even using 11 xmm regs, 
or 10 gprs for that matter.
There are tons of functions in this file requesting >10 gp/xmm registers 
but ultimately using less than that. This is especially bad for win64 
where the xmm regs need to be clobbered.


More information about the ffmpeg-devel mailing list