[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Michael Niedermayer
michaelni
Fri May 16 13:02:35 CEST 2008
On Fri, May 16, 2008 at 12:34:01PM +0200, Michael Niedermayer wrote:
> On Fri, May 16, 2008 at 09:48:11AM +0300, Siarhei Siamashka wrote:
> > On Friday 16 May 2008, Michael Niedermayer wrote:
> >
> > [...]
> >
> > > 2nd try, now it is a P3
> > >
> > > gcc-4.3 -O2 -fno-math-errno
> > > 221951 dezicycles in conv_cast, 16254 runs, 130 skips
> > > 107203 dezicycles in conv_lrint, 16291 runs, 93 skips
> > > 103967 dezicycles in conv_bias, 16286 runs, 98 skips
> > >
> > > gcc-4.2 -O2 -fno-math-errno -lm
> > > 214423 dezicycles in conv_cast, 16250 runs, 134 skips
> > > 114627 dezicycles in conv_lrint, 16325 runs, 59 skips
> > > 53196 dezicycles in conv_bias, 16334 runs, 50 skips
> > >
> > > gcc-4.1 -O2 -fno-math-errno -lm
> > > 212703 dezicycles in conv_cast, 16258 runs, 126 skips
> > > 111271 dezicycles in conv_lrint, 16318 runs, 66 skips
> > > 84831 dezicycles in conv_bias, 16316 runs, 68 skips
> > >
> > > gcc-4.0 -O2 -fno-math-errno -lm
> > > 215119 dezicycles in conv_cast, 16274 runs, 110 skips
> > > 169588 dezicycles in conv_lrint, 16282 runs, 102 skips
> > > 53398 dezicycles in conv_bias, 16338 runs, 46 skips
> > >
> > > gcc-3.4 -O2 -fno-math-errno -lm
> > > 215642 dezicycles in conv_cast, 16221 runs, 163 skips
> > > 105947 dezicycles in conv_lrint, 16318 runs, 66 skips
> > > 48505 dezicycles in conv_bias, 16338 runs, 46 skips
> > >
> > > after a little bit hacking on the code:
> > > 65010 dezicycles in conv_lrint, 16321 runs, 63 skips
> > >
> > > but this is still quite a but slower
> > >
> > > So it seems the bias code is faster on P3(P2/Ppro) cpus
> > > which also means i wont approv its removial unless someone
> > > beats gcc-3.4 -O2 conv_bias on a P3/P2/PPro
> > >
> > > [...]
> >
> > Please also try to benchmark this alternative code (use of 16-bit FISTP)
> > on P2/P3/PPro. I did not run extensive tests, but it is was even slower
> > than lrintf with gcc 4.1 on Pentium-M:
> >
> > 242987 dezicycles in conv_cast, 16378 runs, 6 skips
> > 40055 dezicycles in conv_lrint, 16382 runs, 2 skips
> > 47085 dezicycles in conv_x87_asm, 16380 runs, 4 skips
> > 866920 dezicycles in conv_x87_asm_ex, 16380 runs, 4 skips
> > 43762 dezicycles in conv_bias, 16376 runs, 8 skips
>
> P3 gcc-3.4
> 215036 dezicycles in conv_cast, 16363 runs, 21 skips
> 115577 dezicycles in conv_lrint, 16361 runs, 23 skips
> 63010 dezicycles in conv_x87_asm, 16350 runs, 34 skips
> 664136 dezicycles in conv_x87_asm_ex, 16380 runs, 4 skips
> 48501 dezicycles in conv_bias, 16357 runs, 27 skips
>
> And at that point i found a little bug in the benchmark, it should have been
> in[i]= i + i*i*0.3 - 32780;
>
> with that its:
> 228574 dezicycles in conv_cast, 16363 runs, 21 skips
> 107110 dezicycles in conv_lrint, 16359 runs, 25 skips
> 62921 dezicycles in conv_x87_asm, 16357 runs, 27 skips
> 58373 dezicycles in conv_x87_asm_ex, 16355 runs, 29 skips
> 43850 dezicycles in conv_bias, 16352 runs, 32 skips
src += len;
dst += len;
len= - 2*len;
__asm__ __volatile__(
"finit\n\t" /* dirty hack to disable floating point exceptions */
"flds f32767\n\t"
"flds fminus32768\n\t"
"1:\n\t"
"flds -4(%[src],%[len],2)\n\t"
"flds (%[src],%[len],2)\n\t"
"flds 4(%[src],%[len],2)\n\t"
"flds 8(%[src],%[len],2)\n\t"
"fcomi %%st(5), %%st(0)\n\t"
"fcmovnbe %%st(5), %%st(0)\n\t"
"fxch %%st(2)\n\t"
"fcomi %%st(5), %%st(0)\n\t"
"fcmovnbe %%st(5), %%st(0)\n\t"
"fxch %%st(1)\n\t"
"fcomi %%st(5), %%st(0)\n\t"
"fcmovnbe %%st(5), %%st(0)\n\t"
"fxch %%st(3)\n\t"
"fcomi %%st(5), %%st(0)\n\t"
"fcmovnbe %%st(5), %%st(0)\n\t"
"fistps -2(%[dst],%[len])\n\t"
"fistps 0(%[dst],%[len])\n\t"
"fxch %%st(1)\n\t"
"fistps 2(%[dst],%[len])\n\t"
"fistps 4(%[dst],%[len])\n\t"
"add $8, %[len]\n\t"
"jnz 1b\n\t"
"ffree %%st(0)\n\t"
"fincstp\n\t"
"ffree %%st(0)\n\t"
"fincstp\n\t"
: [dst] "+&r" (dst), [src] "+&r" (src), [len] "+&r" (len)
:
: "cc", "memory");
51606 dezicycles in conv_x87_asm_ex, 16354 runs, 30 skips
but thats still quite a bit behind the bias code (which we did not try to
optimize at all ...)
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I hate to see young programmers poisoned by the kind of thinking
Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080516/a2ac4713/attachment.pgp>
More information about the ffmpeg-devel
mailing list