[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v4)
michaelni at gmx.at
Sat Apr 21 02:15:46 CEST 2012
On Sat, Apr 21, 2012 at 02:10:46AM +0200, Michael Niedermayer wrote:
> On Sat, Apr 21, 2012 at 01:26:54AM +0200, Michael Niedermayer wrote:
> > On Fri, Apr 20, 2012 at 02:10:57AM +0200, Roland Scheidegger wrote:
> > > This adds a hand-optimized assembly version for get_cabac much like the
> > > existing one, but it works if the table offsets are RIP-relative.
> > > Compared to the non-RIP-relative version this adds 2 lea instructions
> > > and it needs one extra register.
> > > There is a surprisingly large performance improvement over the c version (more
> > > so than the generated assembly seems to suggest) just in get_cabac, I measured
> > > roughly 40% faster for get_cabac on a K8. However, overall the difference is
> > > not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> > > Hopefully it still compiles on x86 32bit...
> > > v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> > > for every table, and got rid of unnecessary @GOTPCREL.
> > > v3: apply similar fixes to the the decode_significance functions, and use
> > > same macro arguments for non-pic case.
> > > v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> > > the c code to be faster otherwise since both cmov and sbb suck hard on a
> > > Prescott, even can't construct the mask with a 64bit shift as that's just as
> > > terrible - it's quite difficult to find usable instructions on that chip...).
> > > This is tested to work but not on a P4, in theory it _should_ be fast there.
> > applied
> > if someone has more ideas on how to improve it, it can easily be done
> > lets hope it doesnt fail on any odd platforms ...
> fails on darwin
ill revert it in a moment, dont want to leave compile broken until
another solution is proposed
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
DNS cache poisoning attacks, popular search engine, Google internet authority
dont be evil, please
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel