[Ffmpeg-devel] [PATCH] Snow mmx+sse2 asm optimizations

Rich Felker dalias
Mon Feb 6 18:12:35 CET 2006


On Mon, Feb 06, 2006 at 02:12:45PM +0100, Michael Niedermayer wrote:
> 2. if you want to decrease the overhead:
> then change:
> for(){
>  func_ptr()
> }
> to
> func_mmx(){
>  for(){
>   mmx()
>  }
> }
> func_c(){
>  for(){
>   c()
>  }
> }
> 
> yeah you duplicate a few lines of code, but its MUCH cleaner
> and if there is lots of other stuff in the loop which needs to be duplicated
> then that should be split into its own inline function ...

Nice approach.

> > @@ -1409,6 +1484,121 @@
> >          spatial_compose53i_dy(&cs, buffer, width, height, stride);
> >  }
> >  
> > +static void interleave_line(DWTELEM * low, DWTELEM * high, DWTELEM *b, int width){
> > +    int i = width - 2;
> > +    
> > +    if (width & 1)
> > +    {
> > +        b[i+1] = low[(i+1)/2];
> 
> dividing signed integers by 2^x is slow due to the special case of negative
> numbers, so use a >>1 here ur use unsigned

Yes, because CPUs (and now the C language, with C99, which
standardized the / operator) have a very stupid idea of the definition
of division. Rounding towards zero is almost always the incorrect
behavior; to mathematicians, division always gives a remainder in the
range [0,denom-1] (for positive denominator). I recently ran into this
same problem (altho not with powers of 2, so the only solution was
adding lots of nasty conditionals) while working on some time code
that has to deal with leapyears. :(

Rich





More information about the ffmpeg-devel mailing list