[Ffmpeg-devel] VP3/Theora Perfection

Mon May 16 23:19:20 CEST 2005

On Mon, May 16, 2005 at 02:10:29PM -0600, Mike Melanson wrote:
> >* dquant+idct which is passed a coeff_count which is always 64 (note i 
> >didnt check that but it has to be as the code wont work if it werent 64), 
> 
> 	Hmm, I do not think that coeff_count needs to be tracked as part of 
> 	a fragment. It looks like last_coeff was supposed to be sent to 
> dquant+idct. The reason for this (ideally) is to select between 3 
> different IDCTs depending on the number of non-zero coeffs (On2 is 
> really proud of this since it seems to show up in all of their codecs).

Somehow I expect their specialized code is slower than our general
idct..

> >* using a 2*width*height array to store dct coefficients, which is 
> >memset(0) for every frame

very bad..

> >* no slices
> >* the loop filter is applied after the whole frame has been decoded
> 
> 	To address these issues, it may be necessary to rework the render 
> process. Render slice 0 (all planes). Render slice 1, apply loop filter 
> on slice 0, dispatch slice 0. Render slice (n), apply loop filter on 
> slice (n-1), dispatch slice (n-1).

this is necessary anyway. otherwise it will be incredibly slow, doing
loop filter in a separate pass..

> >note, please do not use mmx.h, 
> 
> 	Please give me a good reason. I have checked code generated from 
> 	mmx.h against objdump and the generated ASM is correct.

it's been found to generate really bad code, e.g. loading addresses
into registers immediatelt before the instruction that's going to use
the pointer. this leads to huge stalls.

> >and why port instead of writing our own, the loops are relatively trivial?
> 
> 	Maybe trivial according to you. And there is no way I am writing new 
> ASM functions using that AT&T syntax slop.

then someone else should write it.. really, though, it's easy and
logical...

rich