[FFmpeg-devel] [PATCH] unscaled float 2 int conversion

Fri May 16 09:25:09 CEST 2008

Michael Niedermayer wrote:
> On Thu, May 15, 2008 at 09:14:15PM +0200, Benjamin Larsson wrote:
>   
>> Michael Niedermayer wrote:
>>     
>>>> Well when I tried the last time I did't get it to work, there was some
>>>> overlap issue that wasn't trivial to sort out. 
>>>>         
>>> You just add 384 or what it was after the windowing/overlap.
>>>
>>>       
>> Just to be clear, this bias scale thing is about not having to use the
>> fstp fpu call or whatever it is called on other cpus. To perform it you
>> first scale down your samples to -1 and 1. This scaling operation is
>> most often performed for free by scaling a suitable table somewhere.
>> Then you add 384 so you can cast the float value directly to an integer.
>> So you trade a float add against fstp which must have been faster on
>> some cpu (or else they wouldn't have used it).
>>
>> In FFmpeg we also have 3dnow, sse and altivec code that can do float to
>> int16 conversion. I think we can agree that the simd code is faster then
>> the bias trick on all processors that supports the simd code. Then we
>> are left with Intel cpus before P3, the Motorola G3 and various other
>> cpus with only fpus and no simd unit. I'm pretty sure that this trick is
>> the best when we are dealing with P2 cpus and lower but I'm not sure it
>> is for the G3.
>>
>> So then we come to the matter of performance, you want benchmarks to
>> justify changing or adding a new scaling method. As I don't have access
>> to any machines that doesn't have a simd unit I can't do any usable
>> benchmarks. But I'm quite sure that if I had access it would show that
>> doing the bias trick would be faster. So one could argue that well ok
>> then we keep the code as it is. But my opinion is that we should scrap
>> this anyway, it makes the code complex, it slows down the simd code
>> (very little though) for no good reason, it complicates the development
>> of a proper audio api and filter system. Cpus with slow fpus should use
>> fixed point code instead.
>>
>> So I propose that we start cleaning out this.
>>     
>
> Ohh well, why do i always have to do the work? You could have safed me
> some time by just saying that you wont do the benchmarks.
>   

What I'm saying is that I think it is irrelevant that the bias trick is 
faster on a P3 then lrint because on the P3 we have sse available. Thus 
on P3 we can beat the bias trick. What would be interesting is P2 and 
lower. But as I don't have access to any machine like that I can't make 
any relevant benchmark.

> PS: yes i dont give a damn what you or anyone else thinks, either
> i see benchmarks or people can go talking to their next wall.
> It would have taken you less time to disable MMX*/SSE* and write
> a benchmark than explaining why its better not to.
>   
No P2, no benchmark, or do you suggest that the C code always have to be 
the fastest available possible? Even if 99% of the time people will run 
the simd version?

MvH
Benjamin Larsson