[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Michael Niedermayer
michaelni
Fri May 16 12:34:01 CEST 2008
On Fri, May 16, 2008 at 09:48:11AM +0300, Siarhei Siamashka wrote:
> On Friday 16 May 2008, Michael Niedermayer wrote:
>
> [...]
>
> > 2nd try, now it is a P3
> >
> > gcc-4.3 -O2 -fno-math-errno
> > 221951 dezicycles in conv_cast, 16254 runs, 130 skips
> > 107203 dezicycles in conv_lrint, 16291 runs, 93 skips
> > 103967 dezicycles in conv_bias, 16286 runs, 98 skips
> >
> > gcc-4.2 -O2 -fno-math-errno -lm
> > 214423 dezicycles in conv_cast, 16250 runs, 134 skips
> > 114627 dezicycles in conv_lrint, 16325 runs, 59 skips
> > 53196 dezicycles in conv_bias, 16334 runs, 50 skips
> >
> > gcc-4.1 -O2 -fno-math-errno -lm
> > 212703 dezicycles in conv_cast, 16258 runs, 126 skips
> > 111271 dezicycles in conv_lrint, 16318 runs, 66 skips
> > 84831 dezicycles in conv_bias, 16316 runs, 68 skips
> >
> > gcc-4.0 -O2 -fno-math-errno -lm
> > 215119 dezicycles in conv_cast, 16274 runs, 110 skips
> > 169588 dezicycles in conv_lrint, 16282 runs, 102 skips
> > 53398 dezicycles in conv_bias, 16338 runs, 46 skips
> >
> > gcc-3.4 -O2 -fno-math-errno -lm
> > 215642 dezicycles in conv_cast, 16221 runs, 163 skips
> > 105947 dezicycles in conv_lrint, 16318 runs, 66 skips
> > 48505 dezicycles in conv_bias, 16338 runs, 46 skips
> >
> > after a little bit hacking on the code:
> > 65010 dezicycles in conv_lrint, 16321 runs, 63 skips
> >
> > but this is still quite a but slower
> >
> > So it seems the bias code is faster on P3(P2/Ppro) cpus
> > which also means i wont approv its removial unless someone
> > beats gcc-3.4 -O2 conv_bias on a P3/P2/PPro
> >
> > [...]
>
> Please also try to benchmark this alternative code (use of 16-bit FISTP)
> on P2/P3/PPro. I did not run extensive tests, but it is was even slower
> than lrintf with gcc 4.1 on Pentium-M:
>
> 242987 dezicycles in conv_cast, 16378 runs, 6 skips
> 40055 dezicycles in conv_lrint, 16382 runs, 2 skips
> 47085 dezicycles in conv_x87_asm, 16380 runs, 4 skips
> 866920 dezicycles in conv_x87_asm_ex, 16380 runs, 4 skips
> 43762 dezicycles in conv_bias, 16376 runs, 8 skips
P3 gcc-3.4
215036 dezicycles in conv_cast, 16363 runs, 21 skips
115577 dezicycles in conv_lrint, 16361 runs, 23 skips
63010 dezicycles in conv_x87_asm, 16350 runs, 34 skips
664136 dezicycles in conv_x87_asm_ex, 16380 runs, 4 skips
48501 dezicycles in conv_bias, 16357 runs, 27 skips
And at that point i found a little bug in the benchmark, it should have been
in[i]= i + i*i*0.3 - 32780;
with that its:
228574 dezicycles in conv_cast, 16363 runs, 21 skips
107110 dezicycles in conv_lrint, 16359 runs, 25 skips
62921 dezicycles in conv_x87_asm, 16357 runs, 27 skips
58373 dezicycles in conv_x87_asm_ex, 16355 runs, 29 skips
43850 dezicycles in conv_bias, 16352 runs, 32 skips
>
> Anyway, looks like assembly optimized version of whatever float->int
> conversion function still may be needed for legacy x86 processors as
> you have demonstrated very different results with different compiler
> versions :)
yes
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
