[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Wed Jan 16 22:16:30 CET 2008

Hi,

considering the amount of rework, mostly because of my oversight of the
overflows, I'll split next patches like:
- first, i4x4
- then i8x8
- then i4x8 / i8x4

This mail is therefore only for things related to i4x4. I'll start a new
thread if requested to.

Michael Niedermayer a ?crit :
> you do not need temporary storeage
> the butterflies can be implemented like:
> b+=a
> a+=a
> a-=b

'Trick' used as far as I could get (which doesn't mean it's far...) in
the current patch.

>> +static void vc1_inv_trans_8x8_mmx(DCTELEM block[64])
>> +{
>> +    transpose8x8_mmx(block);
> 
> all initial permutations (here a transpose) MUST be merged into the scantable
> all other codecs do this too! vc1 wont become an exception

Pending a decision on how to signal that the zz scantable must be
transposed at loading, I've left the useless transpose in there. It'll
just be a matter of not calling the macro and propagating the new
registers used.

>> +#define IDCT4_1D(R0, R1, R2, R3, TMP1, TMP2, TMP3, SHIFT)      \
[...]
> same as above the multiply can be done before the butterfly and
> thus 1 bias add can be avoided

The solution I came up with to avoid overflow problems ((8*A+B)>>3 = 8 +
(B)>>3) doesn't seem to allow me such trick.

This solution has its share of problems:
- forces me to perform the butterflies twice
- waste of mm7, but don't know where to use it
- not very readable...

I hope I haven't missed too many obvious optimizations this time...
Currently it clocks at 1339 dezicycles (vs 2100 for the improved C
version), so it's 20% slower than my previous, overflowing version.
Maybe an improved version of the later could be kept for flags2 fast...

Best regards,
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1-dct-4x4.diff
Type: text/x-patch
Size: 4928 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080116/46418911/attachment.bin>