[Ffmpeg-devel] VP3/Theora Perfection

Tue May 17 18:52:22 CEST 2005

Rich Felker wrote:
> On Tue, May 17, 2005 at 01:55:52PM +0200, Michael Niedermayer wrote:
> 
>>>called a lot and perhaps should be inline'd. Otherwise, the actual
>>>switch/case logic should reduce to a jump table. On2's original code
>>
>>you dont seem to be aware that jump tables with unpredictable jump targets are 
>>very slow
> 
> 
> Indeed. I once measured it as 110-190 cycles on my k6, and it's
> probably worse on newer intel cpus, although better but still bad on
> amd.

	I am prototyping some methods to get rid of all of those "evil" 
switch/case blocks.

>>>	Why? Dequantization is a parallelizable operation that can be optimized
>>>with SIMD instructions. That is why it is done at the same time as the
>>>optimized IDCTs.
>>
>>i prefer to multiply 2 elements without SIMD over multiplying 64 with SIMD
> 
> 
> :)

	Okay, so I'm really confused now. There are 64 DCT coefficients. All 64 
need to be dequantized. That means you have to do 64 multiplications. 
Where do you come up with the "multiply 2 elements", smart guy(s)? Are 
you talking about optimizations like checking for 0-value coeffs during 
the decode process and skipping the mult.?

	Thanks...
-- 
	-Mike Melanson