[Ffmpeg-devel] Using Intel's fDCT

Sat Nov 19 22:00:53 CET 2005

g. <the_ether at lycos.co.uk> writes:

> I've been trying to use Intel's fDCT from their IPP libs to see if it is 
> faster than the SSE2 one in ffmpeg. I tried simply replacing the line from 
> mpegvideo_mmx_template.c
>
>     RENAMEl(ff_fdct) (block); //cant be anything else ...
>
> with Intel's function
>
>     ippiDCT8x8Fwd_16s_C1I( block );
>
> All runs okay (and noticeably faster) but the resulting MPEG2 video
> produced is a mess.
>
> The Intel routine simply does a fDCT on a 8x8 block and writes the
> results in the same place as the original data. There is no
> initialisation required.
>
> What is going on in ff_fdct_sse2() other than a pure fDCT transform,
> and have you any tips of how I could integrate Intel's routine?

IIRC, the output from the MMX/SSE DCT functions is permuted because of
some design quirk of the CPU.  There's a flag somewhere indicating
this.  Make sure it is set correctly.

-- 
M?ns Rullg?rd
mru at inprovide.com