[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v5)

Michael Niedermayer michaelni at gmx.at
Sat Apr 21 21:18:17 CEST 2012

On Sat, Apr 21, 2012 at 03:07:51PM -0400, Derek Buitenhuis wrote:
> On 21/04/2012 11:51 AM, Roland Scheidegger wrote:
> > This adds a hand-optimized assembly version for get_cabac much like the
> > existing one, but it works if the table offsets are RIP-relative.
> > Compared to the non-RIP-relative version this adds 2 lea instructions
> > and it needs one extra register.
> > There is a surprisingly large performance improvement over the c version (more
> > so than the generated assembly seems to suggest) just in get_cabac, I measured
> > roughly 40% faster for get_cabac on a K8. However, overall the difference is
> > not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
> > Hopefully it still compiles on x86 32bit...
> > v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
> > for every table, and got rid of unnecessary @GOTPCREL.
> > v3: apply similar fixes to the the decode_significance functions, and use
> > same macro arguments for non-pic case.
> > v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
> > the c code to be faster otherwise since both cmov and sbb suck hard on a
> > Prescott, even can't construct the mask with a 64bit shift as that's just as
> > terrible - it's quite difficult to find usable instructions on that chip...).
> > This is tested to work but not on a P4, in theory it _should_ be fast there.
> > v5: based on suggestion by Reimar Döffinger add LABEL_MANGLE macros. Should
> > hopefully fix compilation on Darwin (untested).
> > ---
> >  libavcodec/h264_cabac.c    |    2 +-
> >  libavcodec/x86/cabac.h     |   90 ++++++++++++++++++++++++++++++++++++++++----
> >  libavcodec/x86/h264_i386.h |   58 ++++++++++++++++++++--------
> >  3 files changed, 125 insertions(+), 25 deletions(-)
> I guess asking you too yasm-ify the whole thing would be insane? :P

i dont know if its insane but i would guess it would be
slower due to the extra call overhead

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many things microsoft did are stupid, but not doing something just because
microsoft did it is even more stupid. If everything ms did were stupid they
would be bankrupt already.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120421/1450f3c3/attachment.asc>

More information about the ffmpeg-devel mailing list