[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)
Tue Jun 22 22:07:13 CEST 2010
On Tue, Jun 22, 2010 at 12:35 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> as per $subj.
> Speed gain:
> - dc_add goes from 1800 to 1350 cycles (where 1150 is overhead,
> measured as empty asm func), so about 3-3.5x faster.
> - The MC functions are each about 4-5x faster (I only measured the 4x4
> ones, the rest I assume are similarly faster but not measured).
> - Total time spent on a shell-script that decodes the whole testsuite
> (vp8-test-vectors-r1, file 001-017) including shell overhead and
> everything goers from 2.3 to 2.1 seconds with these applied.
> Results are bit-identical, and this is my first MMX/etc. ever! Thanks
> to Jason for teaching me. ;-).
Some of these are suboptimal; I'll optimize them after commit. Sooner
is better than later; I take responsibility for making them perfect.
More information about the ffmpeg-devel