[Ffmpeg-devel] A question about 'dct_unquantize_h263_intra'
Tue Jan 2 03:23:00 CET 2007
I started optimizing 'dct_unquantize_h263_intra' for ARM (armv5te). Attached
patch improves performance already ('dct_unquantize_h263_helper_armv5te' is
twice faster than 'dct_unquantize_h263_helper_c' , also there is a visible
improvement for overall video decoding performance). This code is a
straightforward optimization of 'mpegvideo.c' (only assuming that result of
multiplication does not overflow 16-bits). Right now it takes about 7 cycles
to process each element. But I checked 'mpegvideo_mmx.c' and got some
more optimization ideas.
Is it safe to assume:
1. Result of 'level = level * qmul - qadd' will never overflow signed 16-bits?
2. DCTELEM *block is always at least 8 bytes aligned?
3. Processing extra elements after block[nCoeffs] is safe (up to but not
including block[(nCoeffs + 7) / 8 * 8])?
It that all is safe (and if I understand mpegvideo_mmx.c code, it should be
safe) it is still possible to squeeze a bit more performance.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 8368 bytes
Desc: not available
More information about the ffmpeg-devel