[FFmpeg-devel] [PATCH] Split H.264 luma dc idct, implement MMX/SSE2 versions
Thu Jan 13 05:06:54 CET 2011
This patch splits the H.264 i16x16 luma-dc idct and implements it in
asm on x86. It does this by storing the DC coefficients in a separate
location initially, then scattering them at the end of the asm
function. This lets us use SIMD on the inverse transform and dequant.
The result is 1043 -> 413 dezicycles spend in the inverse transform.
1. Don't do the idct_dc/dequant if there are no coefficients. In the
current architecture we don't know this; we'd need to add an entry to
scan8 (x264 does this) or move the idct-dc call into cabac/cavlc (I'm
fine with this too). You'd still have to modify them in the latter
case to, for example, return the number of coefficients.
2. THIS PROBABLY BREAKS ARM/PPC/SIMILAR because of an extremely
stupid architectural problem in ffh264. That is, the scantables are
transposed in the case of asm, but not in the case of C. So this
means that if my idct_dc function isn't implemented in asm for all
architectures that have idct implemented in asm, they'll probably
break. The best solution would be to just throw out the
non-transposed scan table: there is zero benefit to having it at all
and it just adds complexity and binary size.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 19209 bytes
Desc: not available
More information about the ffmpeg-devel