[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Thu Aug 2 02:58:08 CEST 2007

Hi

On Wed, Aug 01, 2007 at 04:58:36PM -0400, Marc Hoffman wrote:
> On 8/1/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > Hi
> >
> > On Mon, Jul 30, 2007 at 10:33:16PM -0400, Marc Hoffman wrote:
> > [...]
> > > I will do it if it makes sense right now I don't see it being the most
> > > efficient. Lets get through the basic acceptance and when you and I
> > > decide to move forward we can talk about more efficient mechanisms for
> > > general machines.  (I'm not going anywhere so even if you accept this
> > > we can change it in the future).
> > >
> > > The split radix is not the most efficient way to do things on the
> > > BlackFin machine.  It has to do with all the extra pointer stuff you
> > > need to maintain.  On other machines this is more efficient.  Not to
> > > go into this too much anyways I agreed earlier to implement this for
> > > us (ffmpeg-devel) and I will just not right now.  I really want to see
> > > if I/we can get one audio codec to work in fixEDpoint and achieve high
> > > quality I think this is what you/we really care about anyways.
> >
> > iam not sure what extra pointers you are talking about
> > if you are thinking of a recursive implementation which gets 3 pointers
> > to the 3 input parts this is unacceptable the current cooley tukey FFT
> > also isnt written by passing pointers recursively around
> >
> >
> 
> Just to clarify... In terms of pure number of operations required,
> split radix is more efficient than radix-4. But implemetation on a
> specific platform may not be. For example, BlackFin 16x16 radix-4 FFT
> kernel has 6 MAC cycles (to do 3 complex multiplies) and 4
> add/subtract cycles. Split radix in best case improves multiplies by
> about 10% and add/subtracts by a very negligible amount. Thus, total
> theoretical improvement is .6 cycles (from MACs), i.e. 10 kernel
> cycles become 9.4 cycles. Thus, the total improvement is only 6%.
> Combine this with first two stages (that should be done separately
> since they have no multiplies) and improvement is even less.  Now the
> overhead of split radix (such as addressing) negates the cycle
> improvements altogether.
> 
> Finding the most efficient algorithm is not as simple as finding the
> algorithm with the lowest number of operations. One usually has to
> consider the peculiarities of the architecture in question.

yes absolutely, but libavcodec/ architecture is ISO C not bfin
anyway, i think implementing both split radix and radix-4 would cost less
time than this disscussion, it would also provide a more definite
awnser on which is faster on which architecture ...

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070802/163b9ec8/attachment.pgp>