[Ffmpeg-devel] Fixed vs. Floating Point AAC

Fri Mar 10 02:36:54 CET 2006

On Thu, Mar 09, 2006 at 04:00:54PM -0800, Loren Merritt wrote:
> On Thu, 9 Mar 2006, Rich Felker wrote:
> >On Thu, Mar 09, 2006 at 01:07:41PM -0800, Loren Merritt wrote:
> >>On Thu, 9 Mar 2006, Michael Niedermayer wrote:
> >>
> >>>so reusing some benchmark proggy here are the results, nicely written and
> >>>source attached, feel free to design your own cpu which can do
> >>>integer multiplies faster then floatingpoint ones
> >>>
> >>>                      latency throughput
> >>>P3
> >>>int     32*32    ->32   4       1
> >>>int     32*32>>32->32   5.5     1/4.5
> >>>float   32*32    ->32   5       1/2
> >>>
> >>>Duron
> >>>int     32*32    ->32   4       1/2
> >>>int     32*32>>32->32   6       1/4.5
> >>>float   32*32    ->32   3.5     1
> >>>
> >>>Athlon
> >>>int     32*32    ->32   4       1/2
> >>>int     32*32>>32->32   6.5     1/5
> >>>float   32*32    ->32   3.5     1
> >>
> >>P4
> >>int     32*32    ->32   14      1/4.5
> >>int     32*32>>32->32   19      1/10.5
> >>float   32*32    ->32    7      1/2
> >
> >How can 32*32>>32 possibly be slower than 32*32? It's just a matter of
> >whether you read the result from eax or edx...
> 
> It's a matter of whether you put the source in eax and read the result 
> from edx (32*32>>32) or use any two registers you want (32*32).

Are you sure? Last I checked the x86 always used eax for the low 32
bits of a product. There is a special instruction that does not store
the high 32 bits to edx, but I don't think you get to choose where the
low 32 bits go. Apologies if I'm mistaken on this..

> In the latency test, the extra 2x mov are expensive.

This is a gcc bug -- not choosing good registers. Try the comparison
with asm next time..

> And in the throughput 
> test, the extra reg caused spillage too.

This is an actual issue.

Rich