[Ffmpeg-devel] [RFC] AltiVec optimizations, try 2

Thu Aug 3 11:44:08 CEST 2006

Guillaume POIRIER wrote:
> 
> Just out of curiosity, is it necessary to explicit vec_splat_s32 so that
> gcc uses the "splat" asm instruction, otherwise it will allocate 64, 7,
> ... on the stack and load each register with these constants?

You want to not use the stack at all but just have it inlined as direct
operation since vec_splat_(s|u)(8|16|32) doesn't require memory access
at all.

> 
> Also, as far as I understood how vec_splat_s32 works, it should be
> possible to generate a vector full of "64" with a single
> vec_splat_s32(64)...

nope you can put in a ppc instruction a value in the range of -16 .. 15
if is an immediate.
vec_splat_* take an immediate, not a register.

> so why is it desirable to use the form with more
> instructions (more decoding bw, more dependencies, more computation unit
> slots used up)... is this an optimization specific to G4 or to Altivec
> in general?

generic optimization, in Altivec the most expensive operation is memory
access (think it about 3-4 times slower than every other instructions)

> 
> Or am I just to blind to see the obvious solution?
> 

not blind, just not used to it.

In theory you'd like to have those const values in some registers and
not have to pay a visit to the memory and then keep them there.

since we already splatted 4 somewhere the vec_sl will just use this
register if there aren't deps on it too near, so even splatting 64 would
be a single algebric op.

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero