[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Wed Aug 22 06:29:09 CEST 2007

On Tue 21 Aug 2007 17:02, Marc Hoffman pondered:
> mmh at yoda$ for i in 8 9 10 11 12 ; do echo $i; fft $i; done
> 8
> 256
> FFT32:time: 42.3 us/transform [total time=1.39 s its=32768]
> FFTR2:time: 7.0 us/transform [total time=1.82 s its=262144]
> FFT-ffmpeg:time: 6.3 us/transform [total time=1.65 s its=262144]
> FFTR4/2:time: 5.2 us/transform [total time=1.37 s its=262144]
> SRFFTR:time: 7.1 us/transform [total time=1.87 s its=262144]
> 9
> 512
> FFT32:time: 94.7 us/transform [total time=1.55 s its=16384]
> FFTR2:time: 15.0 us/transform [total time=1.96 s its=131072]
> FFT-ffmpeg:time: 14.2 us/transform [total time=1.86 s its=131072]
> FFTR4/2:time: 11.5 us/transform [total time=1.51 s its=131072]
> SRFFTR:time: 15.1 us/transform [total time=1.98 s its=131072]
> 10
> 1024
> FFT32:time: 206.6 us/transform [total time=1.69 s its=8192]
> FFTR2:time: 34.4 us/transform [total time=1.13 s its=32768]
> FFT-ffmpeg:time: 31.7 us/transform [total time=1.04 s its=32768]
> FFTR4/2:time: 26.4 us/transform [total time=1.73 s its=65536]
> SRFFTR:time: 39.1 us/transform [total time=1.28 s its=32768]
> 11
> 2048
> FFT32:time: 454.7 us/transform [total time=1.86 s its=4096]
> FFTR2:time: 86.0 us/transform [total time=1.41 s its=16384]
> FFT-ffmpeg:time: 70.1 us/transform [total time=1.15 s its=16384]
> FFTR4/2:time: 62.9 us/transform [total time=1.03 s its=16384]
> SRFFTR:time: 97.5 us/transform [total time=1.60 s its=16384]
> 12
> 4096
> FFT32:time: 1004.5 us/transform [total time=1.03 s its=1024]
> FFTR2:time: 217.5 us/transform [total time=1.78 s its=8192]
> FFT-ffmpeg:time: 155.8 us/transform [total time=1.28 s its=8192]
> FFTR4/2:time: 129.6 us/transform [total time=1.06 s its=8192]
> SRFFTR:time: 242.3 us/transform [total time=1.99 s its=8192]

Not that I want to add to the work - but I assume this is done on a desktop 
somewhere? and that a system with different memory subsystem/compiler 
optimisations, cache sizes/latencies could get different results?

If you send me the code, I can run it on a different embedded platform to see 
if the overall trends are much different.

-Robin