[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v5)
derek.buitenhuis at gmail.com
Sat Apr 21 21:07:51 CEST 2012
On 21/04/2012 11:51 AM, Roland Scheidegger wrote:
> This adds a hand-optimized assembly version for get_cabac much like the
> existing one, but it works if the table offsets are RIP-relative.
> Compared to the non-RIP-relative version this adds 2 lea instructions
> and it needs one extra register.
> There is a surprisingly large performance improvement over the c version (more
> so than the generated assembly seems to suggest) just in get_cabac, I measured
> roughly 40% faster for get_cabac on a K8. However, overall the difference is
> not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> Hopefully it still compiles on x86 32bit...
> v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> for every table, and got rid of unnecessary @GOTPCREL.
> v3: apply similar fixes to the the decode_significance functions, and use
> same macro arguments for non-pic case.
> v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> the c code to be faster otherwise since both cmov and sbb suck hard on a
> Prescott, even can't construct the mask with a 64bit shift as that's just as
> terrible - it's quite difficult to find usable instructions on that chip...).
> This is tested to work but not on a P4, in theory it _should_ be fast there.
> v5: based on suggestion by Reimar Döffinger add LABEL_MANGLE macros. Should
> hopefully fix compilation on Darwin (untested).
> libavcodec/h264_cabac.c | 2 +-
> libavcodec/x86/cabac.h | 90 ++++++++++++++++++++++++++++++++++++++++----
> libavcodec/x86/h264_i386.h | 58 ++++++++++++++++++++--------
> 3 files changed, 125 insertions(+), 25 deletions(-)
I guess asking you too yasm-ify the whole thing would be insane? :P
More information about the ffmpeg-devel