[FFmpeg-devel] [RFC] snow SSE2 optimizations

Zuxy Meng zuxy.meng
Wed Aug 29 08:31:36 CEST 2007


Hi,

2007/8/29, Michael Niedermayer <michaelni at gmx.at>:
> Hi
>
> On Tue, Aug 28, 2007 at 05:10:18PM +0200, Luca Barbato wrote:
> > Michael Niedermayer wrote:
> > > On Tue, Aug 28, 2007 at 01:09:54PM +0200, Guillaume POIRIER wrote:
> > >> Exactly. You need a CPU that has full-width (128bits) ALU to almost
> > >> guarantee that SSE will be faster. Core2 and upcoming K10 have
> > >> full-with SSE ALUs.
> > >
> > > another way to say it is that you need a cpu which has 2 mmx units and
> > > can use both for sse instructions but can only use 1 for mmx
> > >
> > > if that is a step in the correct direction well ...
> > >
> >
> > Start guessing why there is just one altivec (across 3 generations of
> > cpus) and SPU is still quite similar...
> >
> > The intel design for instructions set wasn't and isn't the smarter and
> > they keep adding irregular changes...
>
> true, but it also isnt the most stupid, sparc-vis beats them by quite a bit
> and i dont think a perfectly regular set is a good idea either, because
> 90% of the resulting instructions are never used by anyone, but the
> cpu must support them, it makes the cpu more complex and slower
>
> where i think intel did mess up is:
> * the mmx design which uses the floating point registers is sick
>
> * the fact that both mmx and sse have just 8 registers is sick
>  it was well known that 8 is a limiting factor in many cases
>  and with IA64 intel demonstrated that you can as well do it wrong
>  in the opposite direction by having hundreads of registers ...
>
> * i want 8bit shifts, signed average, pack with shift and rounding and
>  some lea like instruction for mmx
>
> * the stack based FPU registers ...
>
> * having implicit source and destination registers for some instructions
>  like the 32x32->64 bit multiply
>
> * integer fixed point multiply (multiply + rounding + shift down) like
>  pmulhrsw but for normal integers is missing ...

Intel said that its most of its SSE4 instructions originate from
developers' suggestions; maybe you can let Intel knows your ideas and
let them be part of SSE5 :-)

-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6




More information about the ffmpeg-devel mailing list