[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)

Ivo ivop
Sat Apr 14 12:55:46 CEST 2007


On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > On Friday 13 April 2007 19:19, Ivo wrote:
> > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> >
> > Old C code:
> > [..]
> > Avg: 71106977
> >
> > New C code:
> > [..]
> > Avg: 67607306
> >
> > Old MMX code:
> > [..]
> > Avg: 68040665
> >
> > New MMX code:
> > [..]
> > Avg: 67486036
> >
> > My CPU is an AMD Sempron 2400+.

Which is a 32-bit Sempron BTW. Not many were made I believe.

> > +	__asm __volatile(
> > +		"	"PREFETCH" (%1)		\n"
> > +		"	movq %3, %%mm7		\n"
> > +		"	pxor %4, %%mm7		\n"
> > +		"	pxor %5, %%mm7		\n"
> >
> > +		"	movq %%mm7, %%mm6	\n"
>
> this is senseless, rather use the register for something usefull
> like avoiding reading %3 twice in the loop from memory

Originally it was meant to improve instruction pairing as I didn't see any 
drop in performance by reading from memory, but I suppose that is more 
noticable on lower-end CPU's. I changed the purpose of mm6 and currently 
avoid all reads from memory in the loop.

> > [..]
> > +    for (; s<end; s+=4, d+=4) {
> > +        int v = *(uint32_t *)s;
> > +        int r = v & 0xff, g = (v>>8) & 0xff, b = (v>>16) & 0xff;
> > +        *(uint32_t *)d = b + (g<<8) + (r<<16);
>
> int v = *(uint32_t *)s;
> int g = v&0xFF00;
> v &= 0xFF00FF;
> *(uint32_t *)d = (v>>16) + (v<<16) + g
>
> 2 shift less
> 1 and less
>
> the same trick can be done with the mmx code to avoid one pand
> also all the shifts and register-register movq can be replaced
> by a pshufw on mmx2

How's the following patch?

New C Code:
69985150 dezicycles in rgb32tobgr32, 1 runs, 0 skips
70566460 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67979870 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67129280 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67166970 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67337970 dezicycles in rgb32tobgr32, 1 runs, 0 skips
70481800 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66668770 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67293370 dezicycles in rgb32tobgr32, 1 runs, 0 skips
68729570 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 68333921

New MMX Code:
66505730 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66386220 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64076890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64582190 dezicycles in rgb32tobgr32, 1 runs, 0 skips
68187940 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65565120 dezicycles in rgb32tobgr32, 1 runs, 0 skips
75394570 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65170580 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67334190 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66102720 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 66930615

New MMX2 Code:
66537630 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66355890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64868640 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66130640 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66320290 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67119610 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67560890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67194460 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64999600 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65498400 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 66258605

I indented the ifdef for MMX2 for readabilities sake.

--Ivo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgb32tobgr32.new.patch
Type: text/x-diff
Size: 2927 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/e5ad22d7/attachment.patch>



More information about the ffmpeg-devel mailing list