[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)
Thu Aug 30 15:23:26 CEST 2007
On Thu, Aug 30, 2007 at 01:10:59PM +0200, Reimar D?ffinger wrote:
> On Tue, Aug 28, 2007 at 05:32:04AM +0200, Michael Niedermayer wrote:
> > On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
> > > On Mon, Aug 27, 2007 at 11:34:44PM +0200, Michael Niedermayer wrote:
> > > > > > also theres some shift by 4 missing here
> > > > >
> > > > > I don't think so, there is a "psraw $4, %%xmm0 \n\t"
> > > > > further down. And I know the code is an unreadable mess. I'll try to
> > > > > reimplement it somewhen if noone else will do it...
> > > >
> > > > the daa after obmc is 16bit unsigned, the data after the IDWT is 13bit
> > > > signed the white point differs by a factor of 16 a shift by 4 is needed to get
> > > > them on the same level before adding ...
> > >
> > > Right, right, I just missed a few lines of code while reading the C
> > > version, thus the confusion.
> > > Since the diff is unreadable, do you think the following is better than
> > > the current code (I mean visually, it does decode correctly after all ;-),
> > > though it is not measurably faster than the mmx code on my PC):
> > SSE2 is rarely faster than MMX its because most cpus need 2x as long to
> > execute SSE2 instructions than MMX ...
> > and yes the code is MUCH more readable than before
> Can you tell which option to set (preferably for mencoder) to get a
> block width of 16?
simply dont use v4mv
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The worst form of inequality is to try to make unequal things equal.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel