[FFmpeg-devel] [PATCH] unroll loop in h264_idct_add8_sse2()

Ronald S. Bultje rsbultje
Sat Sep 18 19:43:51 CEST 2010


Hi,

On Sat, Sep 18, 2010 at 1:22 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> The same trick can likely be applied to add16intra as well. (It could
> likely also be done for pre-SSE2, but I doubt that's used much in
> reality...)

See attached. Using OSX 10.6.4, cathedral sample.

before
1360 dezicycles in add16intra, 262118 runs, 26 skips
1377 dezicycles in add16intra, 524240 runs, 48 skips
1432 dezicycles in add16intra, 1048485 runs, 91 skips

time
8.204
8.264
8.236
8.227
8.241
(avg 8.234)

after
1160 dezicycles in add16intra, 262137 runs, 7 skips
1169 dezicycles in add16intra, 524268 runs, 20 skips
1218 dezicycles in add16intra, 1048529 runs, 47 skips
(~15% faster)

time
8.180
8.198
8.196
8.241
8.182
(avg 8.199 = ~0.4% faster)

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264-idct-inline2.patch
Type: application/octet-stream
Size: 2296 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100918/714c7f2c/attachment.obj>



More information about the ffmpeg-devel mailing list