[FFmpeg-devel] [PATCH] move h264 chromaMC x86 code to yasm
Ronald S. Bultje
rsbultje
Sun Aug 29 02:50:53 CEST 2010
Hi,
On Sat, Aug 28, 2010 at 7:14 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> as per subj. FATE passes on x86-32/64 OSX and this patch fixes
> fate-vp6 on Win64 (which currently fails because of unmarked
> clobbering of xmm registers). There's some nice optimizations that
> could be done after this is applied, e.g. adding rv40 ssse3 mc should
> be easy-as-hell, but all that is left for later.
>
> Since this doesn't change the SIMD code in any significant way, I
> didn't profile it, but I can do that if preferred.
Just to make sure:
after:
437 dezicycles in w=8, 8388464 runs, 144 skips
526 dezicycles in w=4, 524274 runs, 14 skips
436 dezicycles in w=8, 8388489 runs, 119 skips
525 dezicycles in w=4, 524277 runs, 11 skips
444 dezicycles in w=8, 8388392 runs, 216 skips
530 dezicycles in w=4, 524262 runs, 26 skips
435 dezicycles in w=8, 8388455 runs, 153 skips
522 dezicycles in w=4, 524277 runs, 11 skips
442 dezicycles in w=8, 8388452 runs, 156 skips
530 dezicycles in w=4, 524280 runs, 8 skips
before:
454 dezicycles in w=8, 8388477 runs, 131 skips
566 dezicycles in w=4, 524276 runs, 12 skips
448 dezicycles in w=8, 8388482 runs, 126 skips
571 dezicycles in w=4, 524278 runs, 10 skips
450 dezicycles in w=8, 8388485 runs, 123 skips
568 dezicycles in w=4, 524274 runs, 14 skips
450 dezicycles in w=8, 8388466 runs, 142 skips
568 dezicycles in w=4, 524273 runs, 15 skips
449 dezicycles in w=8, 8388475 runs, 133 skips
564 dezicycles in w=4, 524266 runs, 22 skips
So it's actually microscopically faster than the inline asm. No idea
why, I didn't change much, if anything at all...
Ronald
More information about the ffmpeg-devel
mailing list