[FFmpeg-devel] Special optimization for PS3 Cell processor

Luca Barbato lu_zero
Wed May 9 09:37:02 CEST 2007

Sakur wrote:
> ----- IDCT put/add
> *vec_mradds(A,B,C)*:(for PPE)
> I am not sure if this is for ((A*B+2^14)>>15) + C.
> On SPE,there's no such a function,only spu_madd(A,B,C),also no
> saturation. Anyone know how to apply some tricks for this?

/* vec_mradds (vector multiply round and add saturate)
 * ==========
static inline vec_short8 vec_mradds(vec_short8 a, vec_short8 b,
vec_short8 c)
  vec_int4 round = (vec_int4)spu_splats(0x4000);
  vec_short8 hi, lo;

  hi = (vec_short8)(spu_sl(spu_add(spu_mule(a, b), round), 1));
  lo = (vec_short8)(spu_rlmask(spu_add(spu_mulo(a, b), round), -15));

  return (vec_adds(spu_sel(hi, lo, ((vec_ushort8){0, 0xFFFF, 0, 0xFFFF,
0, 0xFFFF, 0, 0xFFFF})), c));

from vmx2spu.h

> P.S: According to some interesting test, the avi decoding performance on
> Cell PPE is only 20% of normal PC ,is that true?

You have a system, test for yourself (I'm not really sure what you mean
with avi decoding, avi is a container). For h264 decoding it behaves
slightly worse than a G5 with the same clock more or less and twice as
fast as my G4, that has half of the clock right now. Given that the PPU
was expected to perform _REALLY_ poorly I'm quite impressed, looks like
ffmpeg code + gcc-4.3 strangely behave relatively nicely.


Luca Barbato

Gentoo/linux Gentoo/PPC

More information about the ffmpeg-devel mailing list