[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Sat Aug 30 20:20:38 CEST 2008

On Sat, Aug 30, 2008 at 07:28:29PM +0300, Kostya wrote:
> On Sat, Aug 30, 2008 at 04:51:10PM +0200, Michael Niedermayer wrote:
> > On Sat, Aug 30, 2008 at 01:21:54PM +0300, Kostya wrote:
> > > On Thu, Aug 28, 2008 at 10:36:57PM +0200, Michael Niedermayer wrote:
> > > > On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> > > [...]
> > > > > /**
> > > > >  * windowing related information
> > > > >  */
> > > > > typedef struct FFWindowInfo{
> > > > 
> > > > >     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous
> > > > 
> > > > How is this "transitional" going to work with many different frame lengths?
> > > > is there 1? N*N ?
> > >  
> > > that's for AAC (i.e. requires a bit of different windowing),
> > > encoder will set that to internal value
> > 
> > I think the psy model should not bother with what a specific format may or
> > may not do or need.
> > There are short blocks, and there are long blocks in AAC, furthermore AAC
> > is restricted to have short blocks in consecutive multiplies of 8. Other
> > codecs do not have such restrictions.
> > Also if AAC needs to specially mark long blocks before and after short
> > ones that is the problem of the AAC encoder, not the psy model.
> > The window shape of a block surely depends on the next and previous block,
> > that is not AAC specific.
>  
> would it better to store elsewhere or just introduce next window type? 
> I think with introducing next window type it would be obvious what
> transition type we have.

Do whatever you think is best ...

> 
> > > 
> > > [...] 
> > > > > /**
> > > > >  * Get psychoacoustic model suggestion about coding two bands as M/S
> > > > >  */
> > > > > enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);
> > > > 
> > > > iam a little unsure about this one, but iam not objecting ...
> > >  
> > > dropped for now, may revive later
> > > 
> > > Here's another draft - it's psychoacoustic model interface with
> > > partial implementation (there are some inaccuracies and debugs there,
> > > but's this is RFC, not a final patch).
> > > 
> > > I plan to use it this way with my encoder.
> > > 
> > > General flow:
> > > 
> > 
> > > init
> > > while(frame){
> > >   suggest window()
> > >   [encoder may ignore that]
> > >   set band info() = calculate thresholds for all bands with provided window type
> > 
> > so far i have no objections
> > 
> > 
> > >   psy analyze() = get distortions and weight for band quantized with a series of
> > >                   quantizers, my encoder will use that for RD-aware quantization
> > 
> > the distortion is only known after the RD "aware" quantization, the weight
> > is needed before RD "aware" quantization, so iam somewhat confused by what
> > you suggest
> 
> from the paper I've read ("Cascaded Trellis-Based Rate-Distortion Control 
> Algorithm for MPEG-4 Advanced Audio Coding" aka 01621212.pdf),
> it is suggested to calculate optimum quantizer from costs
> C = quant_distortion / threshold + lambda * bits

there is only 1 factor
C = quant_distortion*F + bits
F here would contain lamda * threshold

> 
> so model tries to calculate those for further Viterbi search

yes but quant_distortion and bits are from RD based quantization

basically if we ignore all optimizations
1. we have a lambda for each band (or coefficient)
2A.now each band is quantized which each scalefactor and band_type,
   that is for each 2-4 coeffs we find the vector that has the lowest
   quant_distortion/lambda + bits
2B. we now know for each band, scalefactor and band type the optimal
   distortion and number of bits.
3. we now perform the viterbi search to find all the band types and
   scalefactors for all bands using the now known distortions and bits

In practive 2A/2B may for example be exceuted as the viterbi search needs
to know the values to avoid calculating things that arent needed. It may
also be approximated and only at the very end the coeffs would be quantized
but i would like full RD to be supported so all optimizations that loose
quality can be checked on how much they loose.

[...]
> > [...]
> > > #ifdef ENABLE_AAC_ENCODER
> > > #include "aac.h"
> > > #include "aactab.h"
> > > 
> > > /**
> > >  * Quantize one coefficient.
> > >  * @return absolute value of the quantized coefficient
> > >  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > >  */
> > > static av_always_inline int quant(float coef, const float Q)
> > > {
> > >     return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > > }
> > > 
> > > static inline float psy_aac_get_approximate_quant_error(const float *c, int size,
> > >                                                         const float Q, const float IQ)
> > > {
> > 
> > I would prefer if the psy model is not full of #if AAC or if(aac)
>  
> for now that's the only implementation
> Can you suggest something more clean? 

keep the codec specific parts out of the psy model :)
and what is really needed (i hope not much) can be used through callbacks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080830/cd317239/attachment.pgp>