[FFmpeg-devel] [PATCH] AAC Encoder, Round 2

Sun Aug 24 20:05:54 CEST 2008

On Sun, Aug 24, 2008 at 06:45:58PM +0200, Michael Niedermayer wrote:
> On Sun, Aug 24, 2008 at 06:44:07PM +0300, Kostya wrote:
> > On Sun, Aug 24, 2008 at 04:10:12PM +0200, Michael Niedermayer wrote:
> [...]
> >  
> > > >  
> > > > > except that, i think the previous reviews have not been dealt with yet.
> > > > > That is the various suggestions for quality improvment should be tried
> > > > > what is better should be adopted
> > > > > Also everything that Gabriel Bouvign suggested should be tried.
> > > > 
> > > > Err, when I find a way to download them. $20 for three-page paper is a bit
> > > > high to me.
> > > 
> > > forget the papers, implement what does not depend on pay per view paper
> > > IIRC he said something about scalefactors and 3gpp as well.
> > 
> > He did, but that also influences psy model interface (see below). 
> 
> Anyway i suggest that you read some of the RD papers about video coding
> (even if you read the audio related ones)

I will, I've got some of the recommended papers too. 

> > > >  
> > > > > I do not mind if we leave some of the harder things like viterbi based window
> > > > > decission to after svn ci, but the majority of the things suggested should
> > > > > be tried before the code is commited.
> > > > 
> > > > Comment on interface then or propose your own.
> > > > It will be needed to plug any psychoacoustic model.
> > > > Also it would allow to finish encoder faster and then concentrate on
> > > > model(s).
> > > 
> > > The split between psy and encoder is odd to say at least.
> > > 
> > > things psy can provide IMHO
> > > * find perceptual weights per band or per coefficient used for RD
> > > * find the perceptual distortion between 2 time domain signals
> > > * find the perceptual distortion between 2 freq domain signals, possibly
> > >   just a single band or coeff
> >  
> > Since Gabriel recommended exactly that model, I've tried to implement it in least
> > intrusive way. As you demand highest possible quality, let's discuss how it should
> > be done.
> > 
> > My proposition (everybody uses slightly different terms, so I may get something wrong):
> 
> > 0. Initialize everything
> 
> of course ...
> 
> 
> > 1. Perform some input filtering (lowpass, highpass, stereo attenuation, whatever)
> 
> Its debateable in how far this should be here or seperate and outside of the
> encoder.

indeed, that's why I haven't marked the place where it is done 

> > 2. Model decides window type (well, in distant future it can be 'undecided' and encoder
> > will try both)
> 
> > 3. Encoder performs windowing and MDCT (and grouping?)
> 
> i dont think grouping can be done at this point, at least not optimally.

well, from my POV, you can just merge groups with similar scalefactors after
they are known

> > 4. Model calculates perceptual entropy and thresholds
> > 5. Ratecontrol module in encoder uses them to produce final thresholds
> > 5.1 maybe it will call psy model to calculate perceptual distortion for the band
> > 6. Encoder quantizes input with scalefactors
> > 7. Encoder determines and encodes band info and coefficients
> > 8. Fetch next frame and goto step 1 unless it was the last frame
> > 
> > Any ideas/suggestions/patches?
> 
> Iam not sure, this is quite vague
> 
> 
> A few points that are IMO important
> * decissions must NOT be bundled into psy models, that is when we implement
>   3 differnt heuristics to choose the MDCT/window size they must be choosable
>   independant of the remaining unrelated psy model, this also applies to
>   things like stereo attenution coeffs, the way low/highpass cutoff is
>   choosen and so on ...

then how? select separate module for each psy step?

> * The primary goal is highest quality encoding, anything that would make
>   achiving this goal harder will be rejected.

Well, I can implement it in asymptotical time :)

> * coeff quantization and scalefactors must be decided based on RD.
>   Its perfectly fine to support faster alternatives in addition ...

I think that should be done in encoder.
As I previously mentioned, I like to keep encoder and psy model separated
and I like to have them working ASAP.

As I have working AAC encoder, I'd like to make it fit for making optimal
and perfect it piece by piece then. Rewriting it from scratch will require
clear requirements too. So let's settle on some workflow scheme.

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Observe your enemies, for they first find out your faults. -- Antisthenes