[FFmpeg-devel] [RFC] AAC Encoder, now more optimal

Kostya kostya.shishkov
Sat Sep 6 19:28:48 CEST 2008


On Sat, Sep 06, 2008 at 06:53:59PM +0200, Michael Niedermayer wrote:
> On Sat, Sep 06, 2008 at 04:27:21PM +0300, Kostya wrote:
[...]
> > The issues mentioned above disallow testing. But to my ear it was better,
> > especially on transitions.
> > There are 2-3 thing I have to deal with before it is suitable for SVN:
> > * M/S detection - but how to incorporate it? Should it be performed during
> > quantizers search or after and how?
> > * Speed optimization
> > * Other tricks (pulse tool, TNS) - less important though
> 
> IMO inclusion in SVN requires to produce equal or better quality / bitrate
> than the encoder from that paper. and better than at least one common encoder
> like faac. (reaching the paper one should be trivial by just implementing what
> the paper describes, deviation from this have to be better not worse quality
> wise)
> The paper contains some graphs that compare it against the reference encoder
> and it should be possible to similarly generate such graphs for your encoder.
> This is a good check to ensure that things are correctly implemented.
> 
> I also think we should apply much stricter tests in the future for SOC project
> decoders, that is PSNR/RMS difference, from the binary decoder but ideally
> bit identical. To ensure that no bugs that are very hard to debug later sneak
> in.
> 
> 
> > 
> > And about quantizers search method (I document it here in hope it would
> > be easier to understand and discuss it):
> > 
> > * the code iterates over band groups (bands with the same number in different
> > windows of window group) for all window groups since they are quantized
> > with the same quantizer
> > * for each of band groups all quantizers are tried (actually I determine
> > quantizers for which quantizing have sence - i.e. outside them distortion
> > and number of bits needed to code are the same as on the boundary - and
> > search only in that range) to find out distortion and number of bits
> 
> > ** quantizing and bits estimation is not optimal since it will slow down
> > encoding even more
> 
> > ** distortion = sum of squared quantizing errors
> 
> yes, but as quantization is approximate so is that
> 
> 
> > * then the cost function is calculated:
> >   C_{q1,q2} = SUM_{w} (quanterror_w / threshold_w * lambda + bits_w) + TC(q1,q2)
> > where quanterror - sum of squared quantisation errors for band in window w,
> >       threshold  - band threshold (provided by psychoacoustic model)
> >       lambda     - rate control parameter
> >       bits       - number of bits needed to encode that quantized band
> >       TC(a,b)    - number of bits needed to encode scalefactor difference (q2-q1)
> > 
> > and path is calculated where the total cost is minimal.
> > 
> > I use several tricks to reduce computations for zero bands and to ensure
> > final quantizers will not differ by more than 60.
> > 
> > The most problematic steps are quantization and (less so) inverse quantization.
> > By replacing inverse quantization process (x*cbrt(x)*IQ) with table lookup
> > (with size 8192*256, so not for final encoder), I've managed to reduce
> > coding time from 72 seconds to mere 59 seconds. Unfortunately, it's not easy
> > to speedup quantizing.
> > 
> > But there's an idea: represent coefficients in 'AAC domain', i.e. apply
> > power to 3/4 and represent it as A * 2^(B/4) with integers, so it will
> > be easier to quantize. Do you think it's worth trying?
> 
> You have a table of vector quantizers, quantization is finding the one
> with the lowest RD, as the table contains the unquantized vectors
> as well i have difficulty mapping your problems onto it.

well, %s/quantization/scaling/g
indeed, the problem is to represent coefficients with vectors scaled by some
scalefactors and get minimum distortion.
My main problem is that optimal search takes too much time. And here
the tradeoffs begin.
 
> And i honestly have no interrest in optimizing an approximation for which
> we neither know how much speed it gains nor how much quality it looses.
> Or has the design you use here been compared in some paper against the
> optimal one?
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Many that live deserve death. And some that die deserve life. Can you give
> it to them? Then do not be too eager to deal out death in judgement. For
> even the very wise cannot see all ends. -- Gandalf




More information about the ffmpeg-devel mailing list