[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC
lorenm at u.washington.edu
Fri Apr 13 07:13:46 CEST 2012
On Fri, 13 Apr 2012, Roland Scheidegger wrote:
> This adds a hand-optimized assembly version for get_cabac much like the
> existing one, but it works if the table offsets are RIP-relative.
> Compared to the non-RIP-relative version this adds 4 instructions
> (3 RIP-relative movs, 1 lea) and needs one extra register, two of the
> rip-relative movs could get eliminated by using a single table and using offets
> Since x86_64 cpus always support cmov also always use this (I don't care
> if you have a P4 Prescott whose cmov implementation is useless).
> There is a surprisingly large performance improvement over the c version (more
> so than the generated assembly seems to suggest) just in get_cabac, I measured
> roughly 40% faster for get_cabac on a K8.
> There are similar functions which could get the same treatment but they
> are less frequently used and since this isn't very nice as we can't use the
> same assembly template focus on this function alone for now.
> mov ff_h264_lps_range at GOTPCREL(%%rip), "tmp2q"
> movzbl ("tmp2q", %%rcx), "range"
> mov ff_h264_norm_shift at GOTPCREL(%%rip), "tmp2q"
> movzbl ("tmp2q", "rangeq"), %%ecx
> mov ff_h264_mlps_state at GOTPCREL(%%rip), "tmpq"
> movzbl 128("tmpq", "retq"), "tmp"
@GOTPCREL isn't actually necessary unless you want the application to be
able to override those symbols (which we don't).
lea ff_h264_lps_range(%%rip), "tmp2q"
movzbl ("tmp2q", %%rcx), "range"
movzbl ff_h264_norm_shift-ff_h264_lps_range("tmp2q", "rangeq"), %%ecx
movzbl ff_h264_mlps_state-ff_h264_lps_range+128("tmpq", "retq"), "tmp"
...Which fails to compile. Well, you can do something like that in yasm,
but I don't know how to subtract one symbol from another in inline asm.
More information about the ffmpeg-devel