[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Trent Piepho xyzzy
Mon Aug 27 20:43:00 CEST 2007

On Mon, 27 Aug 2007, Loren Merritt wrote:
> On Sun, 26 Aug 2007, Mike Giacomelli wrote:
> > Modern x86 chips have pipelined adders and multipliers, so the add and
> > multiply rate is the same (at least assuming they have equal numbers
> > of each).  I believe Intel has been doing this since the pentium pro
> > in the mid 90s, and AMD since the K7 in the late 90s.
> But they don't have equal numbers of each.
> Sorry, I screwed up my throughput test. mul is only 3x slower.
> Both K8 and Core2 have:
> add is latency 1, throughput 3.
> 32x32->32 mul is latency 3, throughput 1.
> 64x64->64 mul is latency 4, throughput 1.
> 32x32->64 mul is latency 3, and can't be used pipelined due to its use
> of implicit registers.

Doesn't register renaming mean that operations that use the same registers
can be run at the same time?  Since the register are in fact allocated from
a much larger pool of virtual registers, the eax and edx in one instruction
are not necessary the same registers as the eax and edx in another

