[FFmpeg-devel] Special optimization for PS3 Cell processor
Luca Barbato
lu_zero
Wed May 9 09:37:02 CEST 2007
Sakur wrote:
> ----- IDCT put/add
> *vec_mradds(A,B,C)*:(for PPE)
> I am not sure if this is for ((A*B+2^14)>>15) + C.
> On SPE,there's no such a function,only spu_madd(A,B,C),also no
> saturation. Anyone know how to apply some tricks for this?
/* vec_mradds (vector multiply round and add saturate)
* ==========
*/
static inline vec_short8 vec_mradds(vec_short8 a, vec_short8 b,
vec_short8 c)
{
vec_int4 round = (vec_int4)spu_splats(0x4000);
vec_short8 hi, lo;
hi = (vec_short8)(spu_sl(spu_add(spu_mule(a, b), round), 1));
lo = (vec_short8)(spu_rlmask(spu_add(spu_mulo(a, b), round), -15));
return (vec_adds(spu_sel(hi, lo, ((vec_ushort8){0, 0xFFFF, 0, 0xFFFF,
0, 0xFFFF, 0, 0xFFFF})), c));
}
from vmx2spu.h
>
> P.S: According to some interesting test, the avi decoding performance on
> Cell PPE is only 20% of normal PC ,is that true?
You have a system, test for yourself (I'm not really sure what you mean
with avi decoding, avi is a container). For h264 decoding it behaves
slightly worse than a G5 with the same clock more or less and twice as
fast as my G4, that has half of the clock right now. Given that the PPU
was expected to perform _REALLY_ poorly I'm quite impressed, looks like
ffmpeg code + gcc-4.3 strangely behave relatively nicely.
--
Luca Barbato
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero
More information about the ffmpeg-devel
mailing list