[FFmpeg-devel] [RFC] [PATCH] Indicate better when transposing zigzag scantables is needed in some codecs

Christophe GISQUET christophe.gisquet
Sat Jan 19 18:50:41 CET 2008


Kostya a ?crit :
>> So we have to find a 3rd one. Any idea?
> 
> make a function to permutate permutation tables and merge it with
> vc1dsp_init*()

I'm not sure I understand you here. Do you want me to #include "vc1.h"
in the vc1dsp_(mmx|altivec).c files, then in their vc1dsp_init
functions, cast AVCodecContext priv_data to a VC1Context, then permutate
the hopefully already loaded zz scantable?

However, the 8x4 and 4x8 scantables depend on the sequence profile,
which is determined after dsp functions are set, I'm afraid.

> But first of all, can you provide a benchmark for real speedup
> when using MMX version of 4x4 transform? Preferably on several samples
> from samples.mplayerhq.hu, big and small.

Only nokia sequence in "small" resolution decodes without problem (which
are either old version or incorrect output). This sequence didn't use
8x4, 4x8 or 4x4 transforms.

For HD sequences (the ones I use: Robotica and Amazing Caves), the
speedup is lost in the noise: below 0.01s, quite less than the std dev
of the timings. The only figure I can give is that C and MMX version
take respectively roughly 0.5% and 0.2% of the decoding time according
to oprofile.

So it doesn't appear worth concentrating on the 4x4 dct. But I'm more
after the use of the macro for the 8x4 and 4x8 version that represent
7-10% of the decoding time.

If you want, we can drop the use of the 4x4 dct function, but I think
all other ones are worth optimizing. They will need the zz scantables to
be transposed.

Best regards,
-- 
Christophe GISQUET




More information about the ffmpeg-devel mailing list