[FFmpeg-devel] [RFC] [PATCH] Indicate better when transposing zigzag scantables is needed in some codecs

Michael Niedermayer michaelni
Sat Jan 19 13:55:18 CET 2008

On Sat, Jan 19, 2008 at 12:28:18PM +0100, Christophe GISQUET wrote:
> > alternatively someone could try to write a c idct which is faster with
> > transposed input, i wouldnt be surprised if with 64bit HW it might be
> > possible to write a idct based on the same idea as the SIMD idcts in
> > plain C. That is working with 4 16bit values at a time in a 64bit int
> This would be an exercise in style. I'm not sure however how shifts
> (probably absolutely requiring to be processed separately so as to not
> spill into the neighbor 16 bits)

shifts are just something like
(a>>2) & 0x3FFF3FFF3FFF3FFF

> and 2-complement arithmetic would do.

if all numbers have 32768 added to them before (that is, they are unsigend)
c[i]= a[i] + b[i] - 32768
c= a+b-0x8000800080008000ULL;

c[i]= a[i] - b[i] + 32768
c= a-b+0x8000800080008000ULL;

c[i]= (a[i] + b[i])>>1
c= (a&b) + (((a^b)&0xFFFEFFFEFFFEFFFEULL)>>1)

c[i]= (a[i] + b[i] + 1)>>1
c= (a|b) - (((a^b)&0xFFFEFFFEFFFEFFFEULL)>>1)
the last 2 will work with 17bit a[i] + b[i]

> But if we consider the targets:
> - x86-64 are probably better off with mmx, and also sse2 in the 8x8 case
> - ppc64 has altivec
> the remaining 64b systems are not very obvious to me (sun cpus?), and
> totally not worth the effort if I consider non-educational purposes.

yes, certainly true


Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct awnser.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080119/ccb83e5d/attachment.pgp>

More information about the ffmpeg-devel mailing list