[Ffmpeg-devel] VP3/Theora Perfection

Michael Niedermayer michaelni
Mon May 16 20:33:42 CEST 2005


Hi

On Monday 16 May 2005 15:30, Mike Melanson wrote:
> Diego Biurrun wrote:
> > What samples are you using to test?  Your last commit fixed vp31.avi and
> > all the other samples on mphq decode flawlessly now, albeit the FFmpeg
> > decoder takes 2-3 times the CPU of the binary decoder.
>
> 	Not surprising. The loop filter is quite computationally intensive
> (32-64 new multiplications per coded fragment). The original On2 source

u mean the 32 *1 and 32 *3 ones? *3 is just x+x+x or x+(x<<1) and gcc will 
change this for you

anyway, vp3.c is very inefficiently written
* the switch / case mess used for some vlc decoding 
* the if(get_bits1()) branch trees for the remainng vlcs
* using a 2*width*height array to store dct coefficients, which is memset(0) 
for every frame
* dquant+idct which is passed a coeff_count which is always 64 (note i didnt 
check that but it has to be as the code wont work if it werent 64), actually 
the dequant should be done during bitstream decoding
* the idct uses its own API incompatible to the idct system used in lavc
* mmx.h based asm code (slow due to gcc bugs, and problematic due to bugs in 
mmx.h)
* no slices
* the loop filter is applied after the whole frame has been decoded


> has MMX and SSE2 optimizations that I can port over when I am confident
> that the C-based loop filter works.

note, please do not use mmx.h, furthermore are you sure the original on2 
source is under a lgpl compatible license? maybe it is, iam just asking
and why port instead of writing our own, the loops are relatively trivial?

[...]
-- 
Michael





More information about the ffmpeg-devel mailing list