[FFmpeg-devel] [PATCH] VP8 arithcoder asm

Jason Garrett-Glaser darkshikari
Sun Jul 4 11:25:18 CEST 2010


Attached is some asm (currently x86_32 only -- it'll break on anything
else) for the VP8 arithcoder.  Unfortunately, with gcc 4.3 on a Core
i7 with --cpu=host, it's currently slower: the vp56_get_rac is
slightly faster alone, but the full tree version is a bit slower.

This is rather odd, considering that the code looks a whole lot better
than what gcc generates, so there must be something stalling my code
that I'm missing, assuming my numbers are right.  It couldn't possibly
be the extra pushes and pops implied by an extern call -- because at
least for me, calling the vp56_rac asm function repeatedly instead of
the merged tree function is actually faster, despite vastly more stack
thrashing.

It's obviously nowhere near ready to commit, so I'm mainly just
throwing it out there for people to play with.  It's far easier to
optimize working code than to try to write a messy asm function from
scratch, so hopefully it should make it easier for people.

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.diff
Type: application/octet-stream
Size: 10113 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100704/2fa26bee/attachment.obj>



More information about the ffmpeg-devel mailing list