[FFmpeg-devel] [RFC] optimize ff_emulated_edge_mc

Ronald S. Bultje rsbultje
Thu Dec 30 04:03:04 CET 2010


On Wed, Dec 29, 2010 at 8:06 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> emu_edge_mc looks optimizable and shows up in my profilings. A simple
> loop->memcpy makes things a lot faster already (see attached):
> after
> 6165 dezicycles in ff_emulated_edge_mc, 1048040 runs, 536 skips
> 6115 dezicycles in ff_emulated_edge_mc, 1048044 runs, 532 skips
> 6087 dezicycles in ff_emulated_edge_mc, 1048158 runs, 418 skips
> before
> 9104 dezicycles in ff_emulated_edge_mc, 1047805 runs, 771 skips
> 9131 dezicycles in ff_emulated_edge_mc, 1047866 runs, 710 skips
> 9097 dezicycles in ff_emulated_edge_mc, 1047874 runs, 702 skips

Another few more changes attached, doing memcpy() on top/bottom edge
brings it to 540 cycles:

5414 dezicycles in ff_emulated_edge_mc, 1048331 runs, 245 skips

and then reordering the left/right edge loop a little brings it to 520:

5186 dezicycles in ff_emulated_edge_mc, 1048288 runs, 288 skips

I'm too lazy to run this multiple times.

For the left/right edge fills, I tried using memset(), but that slows
it down considerably, it appears it doesn't inline it. Jason said he
saw the same on some compilers withthe memcpy() trick. Which makes me
think, maybe we can emulate the inline memset() trick with some more
elaborate C code? What I'm thinking is basically edge_val *=
0x01010101U; while (to_write >= 4) write(edge_val); if (to_write&2)
write(edge_val); if (to_write & 1) write(edge_val); or so. Also, since
most time is spent in copying the blocks quite literally, the main
copy block could certainly use some optimizations, especially since
width is generally something like 16...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: emu_edge_mc.patch
Type: application/octet-stream
Size: 1462 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101229/d5cd3356/attachment.obj>

More information about the ffmpeg-devel mailing list