[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v4)
michaelni at gmx.at
Sat Apr 21 01:26:54 CEST 2012
On Fri, Apr 20, 2012 at 02:10:57AM +0200, Roland Scheidegger wrote:
> This adds a hand-optimized assembly version for get_cabac much like the
> existing one, but it works if the table offsets are RIP-relative.
> Compared to the non-RIP-relative version this adds 2 lea instructions
> and it needs one extra register.
> There is a surprisingly large performance improvement over the c version (more
> so than the generated assembly seems to suggest) just in get_cabac, I measured
> roughly 40% faster for get_cabac on a K8. However, overall the difference is
> not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> Hopefully it still compiles on x86 32bit...
> v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> for every table, and got rid of unnecessary @GOTPCREL.
> v3: apply similar fixes to the the decode_significance functions, and use
> same macro arguments for non-pic case.
> v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> the c code to be faster otherwise since both cmov and sbb suck hard on a
> Prescott, even can't construct the mask with a 64bit shift as that's just as
> terrible - it's quite difficult to find usable instructions on that chip...).
> This is tested to work but not on a P4, in theory it _should_ be fast there.
if someone has more ideas on how to improve it, it can easily be done
lets hope it doesnt fail on any odd platforms ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel