[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)

Thu Aug 30 15:23:26 CEST 2007

Hi

On Thu, Aug 30, 2007 at 01:10:59PM +0200, Reimar D?ffinger wrote:
> Hello,
> On Tue, Aug 28, 2007 at 05:32:04AM +0200, Michael Niedermayer wrote:
> > On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
> > > On Mon, Aug 27, 2007 at 11:34:44PM +0200, Michael Niedermayer wrote:
> > > > > > also theres some shift by 4 missing here
> > > > > 
> > > > > I don't think so, there is a "psraw $4, %%xmm0               \n\t"
> > > > > further down. And I know the code is an unreadable mess. I'll try to
> > > > > reimplement it somewhen if noone else will do it...
> > > > 
> > > > the daa after obmc is 16bit unsigned, the data after the IDWT is 13bit
> > > > signed the white point differs by a factor of 16 a shift by 4 is needed to get
> > > > them on the same level before adding ...
> > > 
> > > Right, right, I just missed a few lines of code while reading the C
> > > version, thus the confusion.
> > > Since the diff is unreadable, do you think the following is better than
> > > the current code (I mean visually, it does decode correctly after all ;-),
> > > though it is not measurably faster than the mmx code on my PC):
> > 
> > SSE2 is rarely faster than MMX its because most cpus need 2x as long to
> > execute SSE2 instructions than MMX ...
> > 
> > and yes the code is MUCH more readable than before
> 
> Can you tell which option to set (preferably for mencoder) to get a
> block width of 16?

simply dont use v4mv

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070830/0acaad76/attachment.pgp>