[FFmpeg-devel] [PATCH] activate ac3 decoder

Wed Feb 20 08:32:50 CET 2008

On 19 February 2008, Michael Niedermayer wrote:
[...]
> > > According to the following link, looks like djbfft is supposed to be
> > > public domain now:
> > > http://linux.slashdot.org/article.pl?sid=07/11/30/0430201
> > >
> > > Is it a good time to reconsider the "cannibalizing" plan again? Has
> > > anybody benchmarked it already?
> >
> > A proof of concept patch that adds djbfft support to ffmpeg (IFFT only)
> > is attached. Below are some benchmark results using fft-test utility
> > (tested on AMD Athlon-XP 2400+, 2.0GHz):
> >
> > IFFT             256   512  1024  2048   4096   8192
> > ----------------------------------------------------
> > fft ffmpeg(SIMD) 3.9   8.4  18.2  39.8  100.5  268.8
> > fft ffmpeg(C)    8.2  18.4  40.4  89.4  200.8  499.4
> > djbfft           6.3  14.2  32.1  72.8  159.7  380.2
> >
> > I can try to do benchmarks on ARM11 later (actually these are the most
> > interesting for me).
> >
> > So what next? Are there still any plans regarding FFT/IMDCT improvements?
> > I would like to start adding some ARM VFP optimizations to FFT in the
> > near future, so it would be nice to know current situation.
>
> The current situatio is the same as it was. I review patches, and if they
> improve speed, are optimal (split radix), clean, simple, ...
> i approve them else i reject them.

Fair enough. Except that I don't quite understand your opinion regarding the
fixed point FFT submission that was rejected (or at least ignored) earlier. If
it works and the implementation is well hidden behind FFTContext, then who
cares if it is not efficient (yet)? Provided that it is acknowledged that it
is not the best one and plans for painlessly upgrading it in the future exist.

> [...]
>
> > +    if (s->tmp_buf) {
> > +        /* TODO: handle DJBFFT permute in a more optimal way, probably
> > in-place */ +        for(j=0;j<np;j++) s->tmp_buf[revtab[j]] = z[j];
> > +        memcpy(z, s->tmp_buf, np * sizeof(FFTComplex));
> > +        return;
> > +    }
> > +
> >      /* reverse */
> > -    np = 1 << s->nbits;
> >      for(j=0;j<np;j++) {
> >          k = revtab[j];
> >          if (k < j) {
>
> This looks odd, are you duplicating the bit reverse code?

DJBFFT is not using bit reverse code, but a different type of permutation (it
is not a simple case of swapping just pairs of values). So the old code is
not directly applicable to it. Technically, DJBFFT permutation should be
possible to be efficiently done in-place, but 'ff_fft_permute' performance is
not important for most audio codecs (and the others can be tweaked not to use
it), that's why I went with an easy way of adding a temporary buffer for now.

This is not a final patch, but some test code which shows that djbfft can be
relatively painlessly integrated into FFmpeg FFT framework.

So far I like how DJBFFT performs :) It is quite faster than FFmpeg C FFT
implementation on ARM11 (tested it both standalone and as part of ffvorbis
decoder). I'm not providing the numbers yet, as I'll try to check various
variants of code (generic, pentium, sparc, ..) and various optimization
settings first.

Benchmarks from other platforms are welcome (the patch I provided in the
previous post makes it easy to build ffmpeg/mplayer with djbfft support).

By the way, I also tried to benchmark FFTW3 (support for it can be also
added to FFTContext very easily), but the results were not too good. My guess
is that FFTW might be mostly optimized for scientific use, performing FFT on
huge sets of data which is not the case and overkill for just multimedia
decoding.

On the other hand, I find these links quite interesting:
http://lists.xiph.org/pipermail/vorbis-dev/2003-May/016206.html
http://lists.xiph.org/pipermail/vorbis-dev/2003-June/016217.html

I also wonder how would split-radix perform with SIMD optimizations in order
to directly compare it with the current ffmpeg FFT. Looks like liba52 from
mplayer should have some 3DNOW optimizations for split-radix though it 
is a huge mess. I'll try to extract something from it to do benchmarks
(fortunately I have AMD processor :)).

What other FFT implementations can be also considered, before making a
decision what FFT implementation to use in ffmpeg?

If DJBFFT would be the final choice, it still needs SIMD optimizations.
But somebody needs to confirm its current legal status to be completely sure.
And if DJBFFT is really public domain now, can it be relicensed to LGPL?