[FFmpeg-devel] [RFC] [PATCH] Indicate better when transposing zigzag scantables is needed in some codecs

Christophe GISQUET christophe.gisquet
Sat Jan 19 12:28:18 CET 2008


Hi,

Michael Niedermayer a ?crit :
> until someone writes a SIMD h264/vc1 idct which is faster with a permutation
> different from the transpose, id assume that its fine to transpose all or
> none

I just looked at the ppc 8x8 code: it does the transpose twice. Granted
removing the first transpose from that is even simpler. However, the ppc
code, until changed, is cause for distinction:
- C don't need initial transpose and ppc version does it internally
- mmx version needs it

pseudo-code would be:
if (idct!=idct_c && idct!=idct_ppc) // or (idct==idct_mmx)
  transpose4x4;

This is not satisfactory, as idct_ppc and idct_mmx are conditionally
compiled. Overall, in that particular case, a false problem, as ppc
should be changed first.

The same seems true for the 4x8 idct (initial transpose that should be
merged with the zz table).

However, when thinking about the 8x8 mmx idct (we should merge that
discussion back to the previous thread or split it to a new one), I was
afraid I would need to go 32bits and use pmaddwd, thus benefiting for a
particular permutation to change the memory layout.

Your reply for the 4x4 case however makes me reconsider the situation,
so I'll dig more before I come back on the 8x8 topic.

> alternatively someone could try to write a c idct which is faster with
> transposed input, i wouldnt be surprised if with 64bit HW it might be
> possible to write a idct based on the same idea as the SIMD idcts in
> plain C. That is working with 4 16bit values at a time in a 64bit int

This would be an exercise in style. I'm not sure however how shifts
(probably absolutely requiring to be processed separately so as to not
spill into the neighbor 16 bits) and 2-complement arithmetic would do.

But if we consider the targets:
- x86-64 are probably better off with mmx, and also sse2 in the 8x8 case
- ppc64 has altivec
the remaining 64b systems are not very obvious to me (sun cpus?), and
totally not worth the effort if I consider non-educational purposes.

> so i do not see any real need to change anything, the ugly solution we
> have is simple and does exactly what is needed that is if not the C idct
> do transpose

I didn't know ppc was not needing initial transpose because it did it
internally until recently. The point is moot indeed if they get changed.

Best regards,
Christophe GISQUET




More information about the ffmpeg-devel mailing list