[FFmpeg-devel] [PATCH] AAC Encoder, Round 2

Sun Aug 24 21:27:44 CEST 2008

On Sun, Aug 24, 2008 at 09:05:54PM +0300, Kostya wrote:
> On Sun, Aug 24, 2008 at 06:45:58PM +0200, Michael Niedermayer wrote:
[...]
> > > > >  
> > > > > > I do not mind if we leave some of the harder things like viterbi based window
> > > > > > decission to after svn ci, but the majority of the things suggested should
> > > > > > be tried before the code is commited.
> > > > > 
> > > > > Comment on interface then or propose your own.
> > > > > It will be needed to plug any psychoacoustic model.
> > > > > Also it would allow to finish encoder faster and then concentrate on
> > > > > model(s).
> > > > 
> > > > The split between psy and encoder is odd to say at least.
> > > > 
> > > > things psy can provide IMHO
> > > > * find perceptual weights per band or per coefficient used for RD
> > > > * find the perceptual distortion between 2 time domain signals
> > > > * find the perceptual distortion between 2 freq domain signals, possibly
> > > >   just a single band or coeff
> > >  
> > > Since Gabriel recommended exactly that model, I've tried to implement it in least
> > > intrusive way. As you demand highest possible quality, let's discuss how it should
> > > be done.
> > > 
> > > My proposition (everybody uses slightly different terms, so I may get something wrong):
> > 
> > > 0. Initialize everything
> > 
> > of course ...
> > 
> > 
> > > 1. Perform some input filtering (lowpass, highpass, stereo attenuation, whatever)
> > 
> > Its debateable in how far this should be here or seperate and outside of the
> > encoder.
> 
> indeed, that's why I haven't marked the place where it is done 
>  

> > > 2. Model decides window type (well, in distant future it can be 'undecided' and encoder
> > > will try both)
> > 
> > > 3. Encoder performs windowing and MDCT (and grouping?)
> > 
> > i dont think grouping can be done at this point, at least not optimally.
> 
> well, from my POV, you can just merge groups with similar scalefactors after
> they are known

well you dont know the scalefactors yet ...
besides what is "similar"

>  
> > > 4. Model calculates perceptual entropy and thresholds
> > > 5. Ratecontrol module in encoder uses them to produce final thresholds
> > > 5.1 maybe it will call psy model to calculate perceptual distortion for the band
> > > 6. Encoder quantizes input with scalefactors
> > > 7. Encoder determines and encodes band info and coefficients
> > > 8. Fetch next frame and goto step 1 unless it was the last frame
> > > 
> > > Any ideas/suggestions/patches?
> > 
> > Iam not sure, this is quite vague
> > 
> > 
> > A few points that are IMO important
> > * decissions must NOT be bundled into psy models, that is when we implement
> >   3 differnt heuristics to choose the MDCT/window size they must be choosable
> >   independant of the remaining unrelated psy model, this also applies to
> >   things like stereo attenution coeffs, the way low/highpass cutoff is
> >   choosen and so on ...
> 
> then how? select separate module for each psy step?

not sure i would call it "module" but yes in princple

i was more thinking of 
if(avctx->something == something){
}else{
}
though, the struct, function point, ... system seem a little overkill here

> 
> > * The primary goal is highest quality encoding, anything that would make
> >   achiving this goal harder will be rejected.
> 
> Well, I can implement it in [...] time :)

great ;)))

> 
> > * coeff quantization and scalefactors must be decided based on RD.
> >   Its perfectly fine to support faster alternatives in addition ...
>  
> I think that should be done in encoder.

yes
IMHO the psy model should just tell the encoder how important each band is
in terms of audibility of distortion that is should provide perceptual weights.
That way the psy model does not need to mess with anything aac specific ...
and the encoder can do all the RD, bit counting quantization, ...
Sadly this is not exactly how the simlpe 3gpp model is designed ...

> As I previously mentioned, I like to keep encoder and psy model separated
> and I like to have them working ASAP.
> 
> As I have working AAC encoder, I'd like to make it fit for making optimal
> and perfect it piece by piece then. Rewriting it from scratch will require
> clear requirements too. So let's settle on some workflow scheme.

i didnt ask for a rewrite ...

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080824/7bfddc16/attachment.pgp>