[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)
Tue Aug 28 17:04:53 CEST 2007
On Tue, Aug 28, 2007 at 01:09:54PM +0200, Guillaume POIRIER wrote:
> On 8/28/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
> > > Right, right, I just missed a few lines of code while reading the C
> > > version, thus the confusion.
> > > Since the diff is unreadable, do you think the following is better than
> > > the current code (I mean visually, it does decode correctly after all ;-),
> > > though it is not measurably faster than the mmx code on my PC):
> > SSE2 is rarely faster than MMX its because most cpus need 2x as long to
> > execute SSE2 instructions than MMX ...
> Exactly. You need a CPU that has full-width (128bits) ALU to almost
> guarantee that SSE will be faster. Core2 and upcoming K10 have
> full-with SSE ALUs.
another way to say it is that you need a cpu which has 2 mmx units and
can use both for sse instructions but can only use 1 for mmx
if that is a step in the correct direction well ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I hate to see young programmers poisoned by the kind of thinking
Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel