[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans

Michael Niedermayer michaelni
Wed May 14 17:23:09 CEST 2008

On Wed, May 14, 2008 at 04:05:28PM +0100, Ramiro Polla wrote:
> Hello,
> Michael Niedermayer wrote:
>> On Tue, May 13, 2008 at 04:39:39AM +0100, Ramiro Polla wrote:
>>> Hello,
>>> If I use dct_quantize_c for the mimic encoder, all goes well. But if I
>>> use MMX dct_quantize, it doesn't work the same. The last_non_zero value
>>> returned is wrong. There is a note about it in
>>> libavcodec/mpegvideo_enc.c and it searches again for the last_non_zero
>>> coeff after all calls to dct_quantize if they're not dct_quantize_c.
>>> MMX dct_quantize uses inv_zigzag_direct16 independing of the scantable,
>>> so that's why (I think) it returns the wrong value. If I change
>>> ScanTable to also have an inverse[64] table on MMX and use that instead,
>>> it returns the right value. But MMX dct_quantize uses custom code to
>>> permute the block in the end, and with this new last_non_zero value
>>> (from inverse[]) it doesn't copy the blocks correctly. If I add
>>> non-optimized block permuting code for the cases of alternate scan, it
>>> works again.
>>> I thought maybe the current code always returned a larger last_non_zero
>>> value, but it doesn't happen all the time (although it happens most of
>>> the time).
>>> But then I got confused... Isn't that custom permute code only good for
>>> ff_zigzag_direct? Shouldn't it copy the wrong values for alternate
>>> scans? Or are the values being permuted somewhere else again?
>>> Anyways, attached patch gives same resulting file for tempete.cif
>>> encoded in mpeg4 with -flags +alt.
>> what happens with other optimized dct quantizers (non x86)
> bfin finds last_non_zero coeff using a loop. Altivec uses an inverse table 
> like the one I'm proposing to be used for MMX too. So they do return the 
> appropriate value.
> bfin and altivec optimized dct_quantize use ff_block_permute for the output 
> block (same as  dct_quantize_c). Compn confirmed on IRC that they give the 
> same output.
>> and try the regression tests with forced mmx quant (by default it uses C)
>> to ensure that the patch does not change the mmx output.
> Regression tests are consistent when MMX quantization is used (that is: no 
> bitexact, and auto idct and dct).

>> and id like to see benchmarks as well, so we can be sure this doesnt
>> slow the code down
> Adding custom permutation code for ff_alternate_vertical_scan the same way 
> it's done for ff_zigzag_direct gives these results (the source file is 
> properly cached in memory, 10 runs removing the highest and lowest):
> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 -y 
> -f rawvideo /dev/null
>       ref     new
> avg   2.7905  2.7390
> stdev 0.0318  0.0287

I do not care at all about alternate_scan speed. I care about zigzag_direct
speed! Its whats used 99% of the time.

even a 0.1% speedloss for zigzag means rejected patch!

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080514/ab984339/attachment.pgp>

More information about the ffmpeg-devel mailing list