[FFmpeg-devel] [PATCH] Make VP3/Theora Decoder Much Faster

Jason Garrett-Glaser darkshikari
Wed Dec 2 09:41:25 CET 2009

Another optimization patch attached.  Should be pretty obvious what it does.

Also, a few optimization targets for Mike:

1.  unpack_vectors is atrociously inefficient crap.  It makes up 10%
of decoding time and can be made at least twice as fast.
2.  The fragment/superblock index error checking all over the place
seems redundant and likely prevents significant future optimizations.
3.  There's no asm version of put_no_rnd_pixels8_l2 ...
4.  Motion vector handling seems to be done in a very silly fashion,
with all 16x16 partitions being treated as groups of 8x8 partitions.
This accordingly prevents faster 16x16 motion compensation functions
from being used in 16x16 partitions, despite 8x8 partitions being
5.  There's a huge amount of if(x>0) and if(y>0) and if(x<width-1) and
so forth.  Why not just pad the edges of these data structures and
eliminate the conditionals all over the place?

I can think of more, but I'm lazy.  Also, here's a profile with
-fno-inline-functions and -fno-inline-functions-called-once, on a Core
i7 (from before my changes, but after Mike's):

4914     22.3709  unpack_vlcs
2392     10.8896  reverse_dc_prediction
2102      9.5693  render_slice
2031      9.2461  unpack_vectors
2008      9.1414  ff_vp3_idct_put_sse2
1397      6.3598  ff_vp3_h_loop_filter_mmx2
1096      4.9895  vp3_decode_frame
959       4.3658  ff_vp3_idct_add_sse2
873       3.9743  put_pixels8_mmx
867       3.9470  ff_vp3_v_loop_filter_mmx2
782       3.5600  unpack_superblocks
525       2.3901  put_no_rnd_pixels8_l2_c
512       2.3309  apply_loop_filter
370       1.6844  unpack_modes
278       1.2656  add_pixels_clamped_mmx
251       1.1427  put_signed_pixels_clamped_mmx
182       0.8286  put_no_rnd_pixels8_y2_mmx2
160       0.7284  put_no_rnd_pixels8_x2_mmx2
112       0.5099  clear_block_sse
76        0.3460  ff_emulated_edge_mc

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp3opts.diff
Type: application/octet-stream
Size: 7319 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091202/0ca54b0a/attachment.obj>

More information about the ffmpeg-devel mailing list