[Ffmpeg-devel] Fixed vs. Floating Point AAC
Rich Felker
dalias
Fri Mar 10 02:36:54 CET 2006
On Thu, Mar 09, 2006 at 04:00:54PM -0800, Loren Merritt wrote:
> On Thu, 9 Mar 2006, Rich Felker wrote:
> >On Thu, Mar 09, 2006 at 01:07:41PM -0800, Loren Merritt wrote:
> >>On Thu, 9 Mar 2006, Michael Niedermayer wrote:
> >>
> >>>so reusing some benchmark proggy here are the results, nicely written and
> >>>source attached, feel free to design your own cpu which can do
> >>>integer multiplies faster then floatingpoint ones
> >>>
> >>> latency throughput
> >>>P3
> >>>int 32*32 ->32 4 1
> >>>int 32*32>>32->32 5.5 1/4.5
> >>>float 32*32 ->32 5 1/2
> >>>
> >>>Duron
> >>>int 32*32 ->32 4 1/2
> >>>int 32*32>>32->32 6 1/4.5
> >>>float 32*32 ->32 3.5 1
> >>>
> >>>Athlon
> >>>int 32*32 ->32 4 1/2
> >>>int 32*32>>32->32 6.5 1/5
> >>>float 32*32 ->32 3.5 1
> >>
> >>P4
> >>int 32*32 ->32 14 1/4.5
> >>int 32*32>>32->32 19 1/10.5
> >>float 32*32 ->32 7 1/2
> >
> >How can 32*32>>32 possibly be slower than 32*32? It's just a matter of
> >whether you read the result from eax or edx...
>
> It's a matter of whether you put the source in eax and read the result
> from edx (32*32>>32) or use any two registers you want (32*32).
Are you sure? Last I checked the x86 always used eax for the low 32
bits of a product. There is a special instruction that does not store
the high 32 bits to edx, but I don't think you get to choose where the
low 32 bits go. Apologies if I'm mistaken on this..
> In the latency test, the extra 2x mov are expensive.
This is a gcc bug -- not choosing good registers. Try the comparison
with asm next time..
> And in the throughput
> test, the extra reg caused spillage too.
This is an actual issue.
Rich
More information about the ffmpeg-devel
mailing list