[FFmpeg-devel] [PATCH] Make VP3/Theora Decoder Much Faster

Mon Dec 7 18:08:51 CET 2009

Michael Niedermayer <michaelni at gmx.at> writes:

> On Mon, Dec 07, 2009 at 04:28:24PM +0000, Loren Merritt wrote:
>> On Mon, 7 Dec 2009, Mike Melanson wrote:
>>
>>> I'm a little surprised to realize that this functionality doesn't already 
>>> exist (been a long time since I wrote the decoder). The original VP3 
>>> decoder had IDCTs for 1- and 3-element fragments in addition to the full 
>>> flavor IDCT. I think perhaps I tried to bring them over but someone 
>>> convinced me that those other cases don't occur often enough to make it 
>>> worthwhile. Have you found a lot of fragments with 1-3 non-zero coeffs?
>>
>> I've never examined a Theora bitstream, and I'm not about to start now.
>> However, if Theora doesn't have lots of DC-only blocks, it's either very 
>> different from every other inter-predicted DCT codec out there, or you're 
>> encoding at a ridiculously high bitrate.
>> I don't remember why I never committed such a change to mpegvideo, but it's 
>> not that it didn't help. Maybe this isn't bitexact and I never bothered to 
>> figure out why?
>
> I remember faintly that i tried a dc only idct for mpegvideo _many_ years ago
> and back then it wasnt faster IIRC . I dont know if i investigated why it
> wasnt faster but the simple_idct already checks for things being 0 and the
> extra check on block_last_index isnt free either (branch mispredictions)
> and it needs more code cache ...
> But maybe things where slower for me back then due to gcc stupidity, i dont
> know that.

In addition to a special dc-only function, how about passing the index
of the last coeff to the idct?  It would simplify things in at least
some of the simd ones.

-- 
M?ns Rullg?rd
mans at mansr.com