[Ffmpeg-devel] VP3/Theora Perfection

Tue May 17 18:25:24 CEST 2005

On Tue, May 17, 2005 at 01:55:52PM +0200, Michael Niedermayer wrote:
> > called a lot and perhaps should be inline'd. Otherwise, the actual
> > switch/case logic should reduce to a jump table. On2's original code
> 
> you dont seem to be aware that jump tables with unpredictable jump targets are 
> very slow

Indeed. I once measured it as 110-190 cycles on my k6, and it's
probably worse on newer intel cpus, although better but still bad on
amd.

> > > actually
> > > the dequant should be done during bitstream decoding
> >
> > 	Why? Dequantization is a parallelizable operation that can be optimized
> > with SIMD instructions. That is why it is done at the same time as the
> > optimized IDCTs.
> 
> i prefer to multiply 2 elements without SIMD over multiplying 64 with SIMD

:)

Rich