[FFmpeg-devel] [PATCH] Make VP3/Theora Decoder Much Faster

Michael Niedermayer michaelni
Mon Dec 7 17:56:41 CET 2009

On Mon, Dec 07, 2009 at 04:28:24PM +0000, Loren Merritt wrote:
> On Mon, 7 Dec 2009, Mike Melanson wrote:
>> I'm a little surprised to realize that this functionality doesn't already 
>> exist (been a long time since I wrote the decoder). The original VP3 
>> decoder had IDCTs for 1- and 3-element fragments in addition to the full 
>> flavor IDCT. I think perhaps I tried to bring them over but someone 
>> convinced me that those other cases don't occur often enough to make it 
>> worthwhile. Have you found a lot of fragments with 1-3 non-zero coeffs?
> I've never examined a Theora bitstream, and I'm not about to start now.
> However, if Theora doesn't have lots of DC-only blocks, it's either very 
> different from every other inter-predicted DCT codec out there, or you're 
> encoding at a ridiculously high bitrate.
> I don't remember why I never committed such a change to mpegvideo, but it's 
> not that it didn't help. Maybe this isn't bitexact and I never bothered to 
> figure out why?

I remember faintly that i tried a dc only idct for mpegvideo _many_ years ago
and back then it wasnt faster IIRC . I dont know if i investigated why it
wasnt faster but the simple_idct already checks for things being 0 and the
extra check on block_last_index isnt free either (branch mispredictions)
and it needs more code cache ...
But maybe things where slower for me back then due to gcc stupidity, i dont
know that.

All that said, if its consistently faster now (or at least faster in
"significantly" more cases than not) then iam of course in favor of a
dc only idct.

benchmarks on 2 or 3 different cpus would be welcome if we have volunteers
for that.

Also note that when AC prediction is used in mpeg4, block_last_index
is set to 63 that should ruin any gain a intra dc idct has.
and mpeg2 dequantization does evil things to the coefficient 63 which
also can ruin the speed gain. (note mpeg2 dequant can be used in mpeg4)
and there is alternate_scan which too should ruin the dc idct idea ...

just some ideas for areas where optimizations can possibly still be done

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091207/e426b047/attachment.pgp>

More information about the ffmpeg-devel mailing list