[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v5)

Derek Buitenhuis derek.buitenhuis at gmail.com
Sat Apr 21 21:07:51 CEST 2012

On 21/04/2012 11:51 AM, Roland Scheidegger wrote:
> This adds a hand-optimized assembly version for get_cabac much like the
> existing one, but it works if the table offsets are RIP-relative.
> Compared to the non-RIP-relative version this adds 2 lea instructions
> and it needs one extra register.
> There is a surprisingly large performance improvement over the c version (more
> so than the generated assembly seems to suggest) just in get_cabac, I measured
> roughly 40% faster for get_cabac on a K8. However, overall the difference is
> not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> Hopefully it still compiles on x86 32bit...
> v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> for every table, and got rid of unnecessary @GOTPCREL.
> v3: apply similar fixes to the the decode_significance functions, and use
> same macro arguments for non-pic case.
> v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> the c code to be faster otherwise since both cmov and sbb suck hard on a
> Prescott, even can't construct the mask with a 64bit shift as that's just as
> terrible - it's quite difficult to find usable instructions on that chip...).
> This is tested to work but not on a P4, in theory it _should_ be fast there.
> v5: based on suggestion by Reimar Döffinger add LABEL_MANGLE macros. Should
> hopefully fix compilation on Darwin (untested).
> ---
>  libavcodec/h264_cabac.c    |    2 +-
>  libavcodec/x86/cabac.h     |   90 ++++++++++++++++++++++++++++++++++++++++----
>  libavcodec/x86/h264_i386.h |   58 ++++++++++++++++++++--------
>  3 files changed, 125 insertions(+), 25 deletions(-)

I guess asking you too yasm-ify the whole thing would be insane? :P

- Derek

More information about the ffmpeg-devel mailing list