[FFmpeg-devel] [RFC] AAC Encoder

Sun Aug 17 15:06:01 CEST 2008

On Sun, Aug 17, 2008 at 02:46:47PM +0300, Kostya wrote:
> On Sat, Aug 16, 2008 at 11:06:33PM +0200, Michael Niedermayer wrote:
> > On Sat, Aug 16, 2008 at 06:00:39PM +0300, Kostya wrote:
> [...]
> > > > [...]
> > > > > /**
> > > > >  * Quantize one coefficient.
> > > > >  * @return absolute value of the quantized coefficient
> > > > >  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > > > >  */
> > > > > static av_always_inline int quant(float coef, const float Q)
> > > > > {
> > > > >     return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > > > > }
> > > > 
> > > > converting float to int by casting is rather slow on x86
> > > > also i do not see why the cliping against 0 is done
> > > > 
> > > > and where does the 0.4054 come from? How has this value been selected?
> > > 
> > > ask 3GPP folks, in their spec (there's a reference in the comment above)
> > > it's also called MAGIC_NUMBER.
> > 
> > ideg
> > morons
> > anyway, its
> > 1.0 - 0.5^0.75
> > 
> > and i seriously doubt this is optimal in the psychoacoustic sense or any
> > rate distortion sense.
> > It IS optimal in the "least squares distortion but i dont care about the bits"
> > sense
> > please add a note that this constant needs to be finetuned with listening
> > tests or some more solid math!
> > 
> > 
> > > 
> > > as for clipping, it seemed more logical than applying FFMIN()
> > 
> > speaking of cliping, can this even overflow 8191? and if so is it even
> > correct to clip?
> > most signals do not like being cliped randomly
>  
> values > 8191 can't be coded with AAC codebook

thats fine but you cannot just clip them and pretend they wherent larger
correct RD code would handle such things as a sideeffect ...

without RD the scalefactors would need to be adjusted to prevent cliping
(with RD code it wouldnt be needed because the scale factor would be choosen
so as to minimize distortion&rate)

[...]
> [...]
> > > > >         for(ch = 0; ch < chans; ch++){
> > > > >             prev_scale = -1;
> > > > >             for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
> > > > >                 for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
> > > > >                     g2 = w*16 + g;
> > > > 
> > > > >                     cpe->ch[ch].zeroes[w][g] = pch->band[ch][g2].thr >= pch->band[ch][g2].energy;
> > > > 
> > > > how much quality is lost compared to full RD decission ? its just a matter of
> > > > checking how many bits this would need which is likely negligible speed wise.
> > > > (assuming you can unentangle the threshold check into a distortion
> > > >  computation)
> > >  
> > > well, energy < threshold means resulting band will be zero anyway,
> > > and without that check weird values for perceptual entropy start
> > > to appear
> > 
> > iam perfectly fine with making bands that would quantize to all zero, 
> > "zero bands" but this again isnt optimal because a band with just a single
> > +1 coeff almost certainly would do better as "zero band" as well.
> 
> that's a question too. But for that threshold-manipulating tricks are used

please implement a proper RD based encoder!

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080817/bcd90858/attachment.pgp>