[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v4)
michaelni at gmx.at
Sat Apr 21 02:10:46 CEST 2012
On Sat, Apr 21, 2012 at 01:26:54AM +0200, Michael Niedermayer wrote:
> On Fri, Apr 20, 2012 at 02:10:57AM +0200, Roland Scheidegger wrote:
> > This adds a hand-optimized assembly version for get_cabac much like the
> > existing one, but it works if the table offsets are RIP-relative.
> > Compared to the non-RIP-relative version this adds 2 lea instructions
> > and it needs one extra register.
> > There is a surprisingly large performance improvement over the c version (more
> > so than the generated assembly seems to suggest) just in get_cabac, I measured
> > roughly 40% faster for get_cabac on a K8. However, overall the difference is
> > not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> > Hopefully it still compiles on x86 32bit...
> > v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> > for every table, and got rid of unnecessary @GOTPCREL.
> > v3: apply similar fixes to the the decode_significance functions, and use
> > same macro arguments for non-pic case.
> > v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> > the c code to be faster otherwise since both cmov and sbb suck hard on a
> > Prescott, even can't construct the mask with a 64bit shift as that's just as
> > terrible - it's quite difficult to find usable instructions on that chip...).
> > This is tested to work but not on a P4, in theory it _should_ be fast there.
> if someone has more ideas on how to improve it, it can easily be done
> lets hope it doesnt fail on any odd platforms ...
fails on darwin
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel