[Ffmpeg-devel] Fixed vs. Floating Point AAC

Fri Mar 10 02:33:54 CET 2006

On Thu, Mar 09, 2006 at 03:36:35PM -0800, Michel Lespinasse wrote:
> Hope this helps, though I'd be surprised if integer supporters changed
> their mind about this. In my opinion, they've been ignoring the evidence
> for 10 years already.

Enough of the false accusations. In 1996, the common desktop cpu was
at best an original-pentium. 486's were even common in many settings
at the time. On both of these, float is extremely slow compared to
integer. Since I'm sure you won't care about the 486's I'll focus on
the pentium; everyone knows how abysmal float performance was on 486
anyway. On pentium, basic integer arithmetic had throughput of 1/2
cycle (it could execute in pipeline beside another instruction) and
latency of 1 cycle. Multiply was ~10 cycles and divide was ~30 iirc.
Floating point had at best throughput of one whole cycle per
operation, and latency of several cycles. Multiply and divide were
similar to integer in latency, except that the cpu was free to perform
other non-float tasks at the same time. In principle this could have
given large performance gains, but it's rarely possible to interleave
code that well without hand-written asm and without interleaving two
unrelated pieces of code (i.e. most intensive float dsp functions
don't have integer stuff to do at the same time). Moreover, since
final output would always be integers, you have the additional penalty
of conversion to integer. This can be done with bias-add hacks to
obtain decent performance (but still not great); otherwise it will
take at least 20-30 cycles just to store the result, iirc.

The evidence is clear that even on present-day cpus, integer
arithmetic is often faster than floating point. Floating point fanboys
will cite cycle counts or isolated benchmarks, ignoring the overhead
of converting to/from float and ignoring the gains from using
vectorized integer operations with small operands. I don't want to
compare "implementation X with arithmetic switched between int and
float". What I want to see compared is: "what is the fastest possible
decoder you can make with integer arithmetic, versus the fastest
possible with float arithmetic?" Precision is a non-issue as long as
the difference cannot be detected by doubleblind testing. The example
I cite is libvorbis vs tremor. On my K6 tremor is several times
faster, and on Athlons it's reportedly something like 25-50% faster.
Clearly the mp3lib example runs the opposite direction; however I've
never seen an integer mp3 implementation that even claims to aim for
performance.

Rich