[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding

Michael Niedermayer michaelni
Mon Oct 9 10:40:05 CEST 2006


On Mon, Oct 09, 2006 at 12:05:30AM +0200, Guillaume POIRIER wrote:
> Hi,
> Attached patch should provide a 2% decoding speed-up if I do the math right.
> This patch isn't meant to be merged as it is now, as in addition to 
> adding idct8 routine, it moves TRANSPOSE8 macro to dsputil_altivec.h as 
> this macro is already duplicated in vc1dsp_altivec.c, and 
> mpegvideo_altivec.c.
> This patch also carries some macros that are useful in Altivec 
> programming. They are taken from x264 project, and I have permission 
> from the author to re-licence them in LGPL.

could you send a seperate patch for the TRANSPOSE move and these?

> One more thing: if the dst array is 8 or 16 bytes aligned, it should be 
> possible to make the routine even faster. Unfortunately, I can't manage 
> to make an implementation that works.
> I've left the optimized routines ALTIVEC_STORE_SUM_CLIP_ALIGN8_A (16 
> bytes aligned *dst) and ALTIVEC_STORE_SUM_CLIP_ALIGN8_B (8 bytes aligned 
> *dst (but _not_ 16 bytes aligned) so ppl can have a look at them and 
> hopefully find what is wrong.
> As far as I can see, ALTIVEC_STORE_SUM_CLIP_ALIGN8_A works as expected, 
> but ALTIVEC_STORE_SUM_CLIP_ALIGN8_B doesn't (that's really surprising 
> considering how much alike they are).

1. check that the stuff is really 8byte aligned (yes it should be but ...)
2. maybe some print_vec() function which prints the contents of a vec*
   together with a check at the end if the calculaton matches what you
   expect could help
   my idea is something like:
    vec_u8_t dstv = vec_ld(0, dest);
    vec_st(sum8, 0, temp);\
    for(i=0; i<16; i++)
        if(temp[i] != dest[i])

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is

More information about the ffmpeg-devel mailing list