[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Fri Aug 24 23:22:38 CEST 2007

On 24 August 2007, Loren Merritt wrote:

> On Thu, 23 Aug 2007, Mike Giacomelli wrote:
> >> well first part you reduce the number of multiplications split radix
> >> would do that but i guess you can make x operations faster by low level
> >> optimizations than half of the same operations to which the same low
> >> level optimizations could be applied
> >
> > Most targets without fpus also have very slow integer multiplications
> > too.  On ARM for instance, the 32x32->64 multiplies needed to do fix
> > point ops can be 4 times slower or more then a simple add as I recall.
>
> As opposed to recent x86 chips, where 32x32 mul is 9 times slower than add?

Moreover, at least ARM9E and ARM11 cores execute 32x32->64 MAC in 3 cycles
(that means they also have an extra addition per multiplication for free). 
And if you only need 32x16->(high 32 bits of the result), such MAC operation
executes in only a single cycle (with some result latency though). So modern
ARM cores are pretty fast at doing multiplications.