[Ffmpeg-devel] Fixed vs. Floating Point AAC

Rich Felker dalias
Thu Mar 9 15:30:11 CET 2006


On Thu, Mar 09, 2006 at 02:21:09AM +0100, Michael Niedermayer wrote:
> Hi
> 
> On Wed, Mar 08, 2006 at 07:38:22PM -0500, Rich Felker wrote:
> > On Thu, Mar 09, 2006 at 12:37:05AM +0100, Michael Niedermayer wrote:
> > > but what about the dynamic range? if all samples are 1.0 (max) then a
> > > dc component would have a value of N^0.5 which for lets say N=1024 would
> > > be 32, so we would need 21 bits, wheres the problem now, 21bits * 21bits=
> > > 42bits and that doesnt fit in 32bits so no fast 32*32->32bit muliplies 
> > 
> > A 32*32 multiply gives a 64bit result. This is fast. If a cpu sucks
> > too much to give the full result, that's the particular platform's
> > problem and users who insist on using a broken cpu arch will have to
> > deal with it being somewhat slower. x86 does it correctly, and has
> > done so ever since the 8088...
> 
> the x86 can output the 64bits only in a single register pair
> and needs one of the inputs also to be in a specific register
> this is a nasty restriction which doesnt help the compiler generating fast
> code, and the instruction timings dont look favorably for this either

The compiler already makes stupid register allocations similar to
this for everything it does. Just read the output of gcc -S...

> throughput for 32*32->64 on P4 is 1/8 for 32*32->32 its 1/4.5 and for
> floating point FMUL its 1/2
> iam ignoring latency here but the order is the same

Why not compare Athlon? P4 is known to suck...it takes multiple cycles
just for a bitshift. Even my K6 has 1-cycle MUL/IMUL.

> for the athlon the timings arent clear from the docs i have, only that 
> 32*32->32 seems 1/4 and 32*32->64 worse then 1/6 if the high value is used
> and FMUL >=1/4, also note fmul is direct path imul vector path so imul
> cannot excute with anything else together while fmul can

Could you explain the p/q notation you're using for throughput?

> so i think i provided enough "proof", your only argument seems that low
> prcission integer tremor is faster then libvorbis, now AFAIK these are
> 2 different implemenattions, i dont see how a comparission between them has any
> meaning, i can also compare libavcodecs mp3 decoder which uses integers
> mostly against the one in mplayer which is mostly floats, you know
> which is faster ...

Yes. And no one's ever been able to explain why. But clearly it's
unrelated to floats, since MPlayer's version is even faster on my K6
with very slow float.

Rich





More information about the ffmpeg-devel mailing list