[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)
Tue Aug 28 13:09:54 CEST 2007
On 8/28/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
> > Right, right, I just missed a few lines of code while reading the C
> > version, thus the confusion.
> > Since the diff is unreadable, do you think the following is better than
> > the current code (I mean visually, it does decode correctly after all ;-),
> > though it is not measurably faster than the mmx code on my PC):
> SSE2 is rarely faster than MMX its because most cpus need 2x as long to
> execute SSE2 instructions than MMX ...
Exactly. You need a CPU that has full-width (128bits) ALU to almost
guarantee that SSE will be faster. Core2 and upcoming K10 have
full-with SSE ALUs.
A soldier will fight long and hard for a bit of colored ribbon.
-- Napoleon Bonaparte
More information about the ffmpeg-devel