[Ffmpeg-devel] VP3/Theora Perfection
Tue May 17 18:52:22 CEST 2005
Rich Felker wrote:
> On Tue, May 17, 2005 at 01:55:52PM +0200, Michael Niedermayer wrote:
>>>called a lot and perhaps should be inline'd. Otherwise, the actual
>>>switch/case logic should reduce to a jump table. On2's original code
>>you dont seem to be aware that jump tables with unpredictable jump targets are
> Indeed. I once measured it as 110-190 cycles on my k6, and it's
> probably worse on newer intel cpus, although better but still bad on
I am prototyping some methods to get rid of all of those "evil"
>>> Why? Dequantization is a parallelizable operation that can be optimized
>>>with SIMD instructions. That is why it is done at the same time as the
>>i prefer to multiply 2 elements without SIMD over multiplying 64 with SIMD
Okay, so I'm really confused now. There are 64 DCT coefficients. All 64
need to be dequantized. That means you have to do 64 multiplications.
Where do you come up with the "multiply 2 elements", smart guy(s)? Are
you talking about optimizations like checking for 0-value coeffs during
the decode process and skipping the mult.?
More information about the ffmpeg-devel