[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Siarhei Siamashka
siarhei.siamashka
Fri May 16 08:48:11 CEST 2008
On Friday 16 May 2008, Michael Niedermayer wrote:
[...]
> 2nd try, now it is a P3
>
> gcc-4.3 -O2 -fno-math-errno
> 221951 dezicycles in conv_cast, 16254 runs, 130 skips
> 107203 dezicycles in conv_lrint, 16291 runs, 93 skips
> 103967 dezicycles in conv_bias, 16286 runs, 98 skips
>
> gcc-4.2 -O2 -fno-math-errno -lm
> 214423 dezicycles in conv_cast, 16250 runs, 134 skips
> 114627 dezicycles in conv_lrint, 16325 runs, 59 skips
> 53196 dezicycles in conv_bias, 16334 runs, 50 skips
>
> gcc-4.1 -O2 -fno-math-errno -lm
> 212703 dezicycles in conv_cast, 16258 runs, 126 skips
> 111271 dezicycles in conv_lrint, 16318 runs, 66 skips
> 84831 dezicycles in conv_bias, 16316 runs, 68 skips
>
> gcc-4.0 -O2 -fno-math-errno -lm
> 215119 dezicycles in conv_cast, 16274 runs, 110 skips
> 169588 dezicycles in conv_lrint, 16282 runs, 102 skips
> 53398 dezicycles in conv_bias, 16338 runs, 46 skips
>
> gcc-3.4 -O2 -fno-math-errno -lm
> 215642 dezicycles in conv_cast, 16221 runs, 163 skips
> 105947 dezicycles in conv_lrint, 16318 runs, 66 skips
> 48505 dezicycles in conv_bias, 16338 runs, 46 skips
>
> after a little bit hacking on the code:
> 65010 dezicycles in conv_lrint, 16321 runs, 63 skips
>
> but this is still quite a but slower
>
> So it seems the bias code is faster on P3(P2/Ppro) cpus
> which also means i wont approv its removial unless someone
> beats gcc-3.4 -O2 conv_bias on a P3/P2/PPro
>
> [...]
Please also try to benchmark this alternative code (use of 16-bit FISTP)
on P2/P3/PPro. I did not run extensive tests, but it is was even slower
than lrintf with gcc 4.1 on Pentium-M:
242987 dezicycles in conv_cast, 16378 runs, 6 skips
40055 dezicycles in conv_lrint, 16382 runs, 2 skips
47085 dezicycles in conv_x87_asm, 16380 runs, 4 skips
866920 dezicycles in conv_x87_asm_ex, 16380 runs, 4 skips
43762 dezicycles in conv_bias, 16376 runs, 8 skips
Anyway, looks like assembly optimized version of whatever float->int
conversion function still may be needed for legacy x86 processors as
you have demonstrated very different results with different compiler
versions :)
--
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: float2int_test.c
Type: text/x-csrc
Size: 4422 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080516/fbf1cfc0/attachment.c>
More information about the ffmpeg-devel
mailing list