[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding

Guillaume POIRIER poirierg
Mon Oct 9 11:04:36 CEST 2006


Hi,

On 10/9/06, Michael Niedermayer <michaelni at gmx.at> wrote:
> Hi
>
> On Mon, Oct 09, 2006 at 12:05:30AM +0200, Guillaume POIRIER wrote:
> > Hi,
> > Attached patch should provide a 2% decoding speed-up if I do the math right.
> >
> > This patch isn't meant to be merged as it is now, as in addition to
> > adding idct8 routine, it moves TRANSPOSE8 macro to dsputil_altivec.h as
> > this macro is already duplicated in vc1dsp_altivec.c, and
> > mpegvideo_altivec.c.
> >
> > This patch also carries some macros that are useful in Altivec
> > programming. They are taken from x264 project, and I have permission
> > from the author to re-licence them in LGPL.
>
> could you send a seperate patch for the TRANSPOSE move and these?

Yes, please find them in attachement if this mail.
I shall make an updated patch with my idct8 implementation when I have
improved it.


> > One more thing: if the dst array is 8 or 16 bytes aligned, it should be
> > possible to make the routine even faster. Unfortunately, I can't manage
> > to make an implementation that works.
> >
> > I've left the optimized routines ALTIVEC_STORE_SUM_CLIP_ALIGN8_A (16
> > bytes aligned *dst) and ALTIVEC_STORE_SUM_CLIP_ALIGN8_B (8 bytes aligned
> > *dst (but _not_ 16 bytes aligned) so ppl can have a look at them and
> > hopefully find what is wrong.
> > As far as I can see, ALTIVEC_STORE_SUM_CLIP_ALIGN8_A works as expected,
> > but ALTIVEC_STORE_SUM_CLIP_ALIGN8_B doesn't (that's really surprising
> > considering how much alike they are).
>
> well
> 1. check that the stuff is really 8byte aligned (yes it should be but ...)
> 2. maybe some print_vec() function which prints the contents of a vec*
>    together with a check at the end if the calculaton matches what you
>    expect could help
>    my idea is something like:
>     print_8byte(dest);
>     vec_u8_t dstv = vec_ld(0, dest);
>     print_vec(dstv);
>     ...
>     working_sum_clip();
>     print_8byte(dest);
>     vec_st(sum8, 0, temp);\
>     print_8byte(temp);
>     for(i=0; i<16; i++)
>         if(temp[i] != dest[i])
>             assert(0);

I'll see what I can do. Thanks for the suggestion.

Guillaume
-- 
With DADVSI (http://en.wikipedia.org/wiki/DADVSI), France finally has
a lead on USA on selling out individuals right to corporations!
Vive la France!




More information about the ffmpeg-devel mailing list