[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding

Guillaume POIRIER gpoirier
Mon Oct 9 00:05:30 CEST 2006

Attached patch should provide a 2% decoding speed-up if I do the math right.

This patch isn't meant to be merged as it is now, as in addition to 
adding idct8 routine, it moves TRANSPOSE8 macro to dsputil_altivec.h as 
this macro is already duplicated in vc1dsp_altivec.c, and 

This patch also carries some macros that are useful in Altivec 
programming. They are taken from x264 project, and I have permission 
from the author to re-licence them in LGPL.

One more thing: if the dst array is 8 or 16 bytes aligned, it should be 
possible to make the routine even faster. Unfortunately, I can't manage 
to make an implementation that works.

I've left the optimized routines ALTIVEC_STORE_SUM_CLIP_ALIGN8_A (16 
bytes aligned *dst) and ALTIVEC_STORE_SUM_CLIP_ALIGN8_B (8 bytes aligned 
*dst (but _not_ 16 bytes aligned) so ppl can have a look at them and 
hopefully find what is wrong.
As far as I can see, ALTIVEC_STORE_SUM_CLIP_ALIGN8_A works as expected, 
but ALTIVEC_STORE_SUM_CLIP_ALIGN8_B doesn't (that's really surprising 
considering how much alike they are).

Anyway, comments and tests welcome.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ff_idct8_altivec.diff
Type: text/x-patch
Size: 13964 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20061009/abf7f869/attachment.bin>

More information about the ffmpeg-devel mailing list