[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization
Mon Aug 27 13:51:05 CEST 2007
On Mon, Aug 27, 2007 at 04:19:16AM -0600, Loren Merritt wrote:
> On Sun, 26 Aug 2007, Mike Giacomelli wrote:
> >> As opposed to recent x86 chips, where 32x32 mul is 9 times slower than add?
> > Modern x86 chips have pipelined adders and multipliers, so the add and
> > multiply rate is the same (at least assuming they have equal numbers
> > of each). I believe Intel has been doing this since the pentium pro
> > in the mid 90s, and AMD since the K7 in the late 90s.
> But they don't have equal numbers of each.
> Sorry, I screwed up my throughput test. mul is only 3x slower.
> Both K8 and Core2 have:
> add is latency 1, throughput 3.
> 32x32->32 mul is latency 3, throughput 1.
> 64x64->64 mul is latency 4, throughput 1.
> 32x32->64 mul is latency 3, and can't be used pipelined due to its use
> of implicit registers.
add is latency 1, throughput 2
32x32->32 imul is latency 4, throughput 1.
add is latency 0.5 throughput 4 (claimed by intels datasheets)
add is latency 0.5-1 throughput 3 (truth, the P4 cannot execute more then 3
instructions per cycle, well actually less, and intels 0.5 cycle pipelined
adders only have a 0.5 latency if the next instruction is another very simple
instruction which doesnt mind that the high 16 bit will be available 0.5
cycles later then the low 16bit)
32x32->32 imul is latency 14 throughput 2/9
32x32->64 (i)mul is latency 16 throughput 1/8
add is latency 1 throughput 4 (again likely not true the P4* cannot execute
any 4 instructions at once on average from what i remember)
32x32->32 imul is latency 10 throughput 2/5
64x64->64 imul is latency 11 throughput 2/5
32x32->64 (i)mul is latency 11 throughput ?
so your 9 times claim is not so wrong for P4*
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No snowflake in an avalanche ever feels responsible. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel