[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC

Fri Apr 13 07:13:46 CEST 2012

On Fri, 13 Apr 2012, Roland Scheidegger wrote:

> This adds a hand-optimized assembly version for get_cabac much like the
> existing one, but it works if the table offsets are RIP-relative.
> Compared to the non-RIP-relative version this adds 4 instructions
> (3 RIP-relative movs, 1 lea) and needs one extra register, two of the
> rip-relative movs could get eliminated by using a single table and using offets
> instead.
> Since x86_64 cpus always support cmov also always use this (I don't care
> if you have a P4 Prescott whose cmov implementation is useless).
> There is a surprisingly large performance improvement over the c version (more
> so than the generated assembly seems to suggest) just in get_cabac, I measured
> roughly 40% faster for get_cabac on a K8.
> There are similar functions which could get the same treatment but they
> are less frequently used and since this isn't very nice as we can't use the
> same assembly template focus on this function alone for now.

> mov    ff_h264_lps_range at GOTPCREL(%%rip), "tmp2q"
> movzbl ("tmp2q", %%rcx), "range"
> mov    ff_h264_norm_shift at GOTPCREL(%%rip), "tmp2q"
> movzbl ("tmp2q", "rangeq"), %%ecx
> mov    ff_h264_mlps_state at GOTPCREL(%%rip), "tmpq"
> movzbl 128("tmpq", "retq"), "tmp"

@GOTPCREL isn't actually necessary unless you want the application to be
able to override those symbols (which we don't).

lea    ff_h264_lps_range(%%rip), "tmp2q"
movzbl ("tmp2q", %%rcx), "range"
movzbl ff_h264_norm_shift-ff_h264_lps_range("tmp2q", "rangeq"), %%ecx
movzbl ff_h264_mlps_state-ff_h264_lps_range+128("tmpq", "retq"), "tmp"

...Which fails to compile. Well, you can do something like that in yasm,
but I don't know how to subtract one symbol from another in inline asm.

--Loren Merritt