[Ffmpeg-devel] Fixed vs. Floating Point AAC

Fri Mar 10 07:08:48 CET 2006

On Thu, 9 Mar 2006, Rich Felker wrote:
> On Thu, Mar 09, 2006 at 04:00:54PM -0800, Loren Merritt wrote:
>> On Thu, 9 Mar 2006, Rich Felker wrote:
>>>
>>> How can 32*32>>32 possibly be slower than 32*32? It's just a matter of
>>> whether you read the result from eax or edx...
>>
>> It's a matter of whether you put the source in eax and read the result
>> from edx (32*32>>32) or use any two registers you want (32*32).
>
> Are you sure? Last I checked the x86 always used eax for the low 32
> bits of a product. There is a special instruction that does not store
> the high 32 bits to edx, but I don't think you get to choose where the
> low 32 bits go. Apologies if I'm mistaken on this..

"imul foo" calculates foo*eax and stores the 64bit result in edx:eax
"imul foo,bar" calculates foo*bar and stores the low 32bits in foo.

>> In the latency test, the extra 2x mov are expensive.
>
> This is a gcc bug -- not choosing good registers. Try the comparison
> with asm next time..

Not a bug. The benchmark code is a sequence of multiplies, and nothing 
else. You can't allocate all the variables to eax. Yes, in real code, 
register allocation may help.

>> And in the throughput
>> test, the extra reg caused spillage too.
>
> This is an actual issue.

--Loren Merritt