[FFmpeg-devel] [PATCH] Make VP3/Theora Decoder Much Faster

Måns Rullgård mans
Mon Dec 7 18:08:51 CET 2009

Michael Niedermayer <michaelni at gmx.at> writes:

> On Mon, Dec 07, 2009 at 04:28:24PM +0000, Loren Merritt wrote:
>> On Mon, 7 Dec 2009, Mike Melanson wrote:
>>> I'm a little surprised to realize that this functionality doesn't already 
>>> exist (been a long time since I wrote the decoder). The original VP3 
>>> decoder had IDCTs for 1- and 3-element fragments in addition to the full 
>>> flavor IDCT. I think perhaps I tried to bring them over but someone 
>>> convinced me that those other cases don't occur often enough to make it 
>>> worthwhile. Have you found a lot of fragments with 1-3 non-zero coeffs?
>> I've never examined a Theora bitstream, and I'm not about to start now.
>> However, if Theora doesn't have lots of DC-only blocks, it's either very 
>> different from every other inter-predicted DCT codec out there, or you're 
>> encoding at a ridiculously high bitrate.
>> I don't remember why I never committed such a change to mpegvideo, but it's 
>> not that it didn't help. Maybe this isn't bitexact and I never bothered to 
>> figure out why?
> I remember faintly that i tried a dc only idct for mpegvideo _many_ years ago
> and back then it wasnt faster IIRC . I dont know if i investigated why it
> wasnt faster but the simple_idct already checks for things being 0 and the
> extra check on block_last_index isnt free either (branch mispredictions)
> and it needs more code cache ...
> But maybe things where slower for me back then due to gcc stupidity, i dont
> know that.

In addition to a special dc-only function, how about passing the index
of the last coeff to the idct?  It would simplify things in at least
some of the simd ones.

M?ns Rullg?rd
mans at mansr.com

