[FFmpeg-devel] [PATCH] H264 MC8 SSSE3 minor speedups

Ronald S. Bultje rsbultje
Sat Aug 21 15:39:42 CEST 2010


Hi,

since everything else failed, and this is one of the highest functions
in my profiling tests, I went for simple changes instead, so I can at
least tell myself I helped speed it up, in whatever minor way. ;-).
H264 MC8 SSSE3 uses a combination of movq+movlhps, which could also be
done as a movdqa, which is a few cycles faster.

before (mc8, mx==0^my==0 in h264.c around chroma_op() call):
604 dezicycles in w=8, 65535 runs, 1 skips
603 dezicycles in w=8, 131067 runs, 5 skips
606 dezicycles in w=8, 262137 runs, 7 skips
606 dezicycles in w=8, 524275 runs, 13 skips
605 dezicycles in w=8, 1048552 runs, 24 skips

after changing constant to aligned and removing movlhps:
574 dezicycles in w=8, 65529 runs, 7 skips
572 dezicycles in w=8, 131062 runs, 10 skips
570 dezicycles in w=8, 262133 runs, 11 skips
570 dezicycles in w=8, 524271 runs, 17 skips
578 dezicycles in w=8, 1048539 runs, 37 skips

Then further down, there's a case where it copies a variable which it
doesn't actually double-write to, so the copy appears not necessary.
Removing it again saves a few cycles.

before (mc8, !(mx&7)&&(my&7) in same position in h264.c):
733 dezicycles in w=8, 16383 runs, 1 skips
731 dezicycles in w=8, 32767 runs, 1 skips
729 dezicycles in w=8, 65534 runs, 2 skips
718 dezicycles in w=8, 131068 runs, 4 skips
720 dezicycles in w=8, 262136 runs, 8 skips

after removing extraneous movdqa:
687 dezicycles in w=8, 16383 runs, 1 skips
683 dezicycles in w=8, 32766 runs, 2 skips
681 dezicycles in w=8, 65534 runs, 2 skips
679 dezicycles in w=8, 131070 runs, 2 skips
672 dezicycles in w=8, 262138 runs, 6 skips

Measured on OSX 10.6.4, i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1
(Apple Inc. build 5664), Intel Core i7.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264-mc8-ssse3.patch
Type: application/octet-stream
Size: 3298 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100821/3ed8432a/attachment.obj>



More information about the ffmpeg-devel mailing list