[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Fri May 16 12:09:47 CEST 2008
On Fri, May 16, 2008 at 09:25:09AM +0200, Benjamin Larsson wrote:
> Michael Niedermayer wrote:
> > On Thu, May 15, 2008 at 09:14:15PM +0200, Benjamin Larsson wrote:
> >> Michael Niedermayer wrote:
> >>>> Well when I tried the last time I did't get it to work, there was some
> >>>> overlap issue that wasn't trivial to sort out.
> >>> You just add 384 or what it was after the windowing/overlap.
> >> Just to be clear, this bias scale thing is about not having to use the
> >> fstp fpu call or whatever it is called on other cpus. To perform it you
> >> first scale down your samples to -1 and 1. This scaling operation is
> >> most often performed for free by scaling a suitable table somewhere.
> >> Then you add 384 so you can cast the float value directly to an integer.
> >> So you trade a float add against fstp which must have been faster on
> >> some cpu (or else they wouldn't have used it).
> >> In FFmpeg we also have 3dnow, sse and altivec code that can do float to
> >> int16 conversion. I think we can agree that the simd code is faster then
> >> the bias trick on all processors that supports the simd code. Then we
> >> are left with Intel cpus before P3, the Motorola G3 and various other
> >> cpus with only fpus and no simd unit. I'm pretty sure that this trick is
> >> the best when we are dealing with P2 cpus and lower but I'm not sure it
> >> is for the G3.
> >> So then we come to the matter of performance, you want benchmarks to
> >> justify changing or adding a new scaling method. As I don't have access
> >> to any machines that doesn't have a simd unit I can't do any usable
> >> benchmarks. But I'm quite sure that if I had access it would show that
> >> doing the bias trick would be faster. So one could argue that well ok
> >> then we keep the code as it is. But my opinion is that we should scrap
> >> this anyway, it makes the code complex, it slows down the simd code
> >> (very little though) for no good reason, it complicates the development
> >> of a proper audio api and filter system. Cpus with slow fpus should use
> >> fixed point code instead.
> >> So I propose that we start cleaning out this.
> > Ohh well, why do i always have to do the work? You could have safed me
> > some time by just saying that you wont do the benchmarks.
> What I'm saying is that I think it is irrelevant that the bias trick is
> faster on a P3 then lrint because on the P3 we have sse available. Thus
> on P3 we can beat the bias trick. What would be interesting is P2 and
> lower. But as I don't have access to any machine like that I can't make
> any relevant benchmark.
P3 = P2 + some SIMD
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel