[FFmpeg-devel] [RFC] AAC encoder optimizations

Kostya kostya.shishkov
Sun Sep 14 16:09:56 CEST 2008


On Sun, Sep 14, 2008 at 02:34:09PM +0200, Michael Niedermayer wrote:
> On Sun, Sep 14, 2008 at 08:34:11AM +0300, Kostya wrote:
> > On Sun, Sep 14, 2008 at 02:45:02AM +0200, Michael Niedermayer wrote:
> > > On Sat, Sep 13, 2008 at 10:29:27AM +0300, Kostya wrote:
> > > > Here's the next experimental version.
> > > > I've replaced quantization and codebook determining functions
> > > > and reworked scalefactor search code.
> > > > Now it's ~30 time slower than realtime on Core2.
> > > 
> > > what about quality? 
> > > Does it beat faac?
> > > Does it reach the quality from that paper?
> >  
> > Actually its quality IMO is worse than before.
> 
> So if your encoder is worse than it was before, then question is why and
> how could it happen?
> I mean i would have expected you to test it after each change and not
> add changes that worsen quality.

It was cardinally changed since the last time where no RD was taken into
account and since it was tested mostly at work with bad headphones,
listening to the same encoded sample at home with good headphones revealed
the difference.

> When writing an encoder a significant part of the time should be spend
> testing it.
> Without carefull tests one out of 3 time a developer changes something a
> new bug is added that would have been caught by testing. Finding bugs
> through review and comparission to another encoder is alot harder.
> And in parts the reviewer does not understand well like psychoacoustics
> in my case bugs would never be found.

To put it mildly, psychoacoustics is a black magic of assigning importance
to some spectrum lines and quantizing them according to it. Depending
on shaman results may differ.
 
> >  
> > > > 
> > > > Please comment on those functions and give your suggestions.
> > > 
> > > Your code looks cleaner but i suspect its untested and more buggy
> > > than before. I hope iam wrong, and appologize for the flaming comment if
> > > i am, but if not then we arent moving any closer to svn, rather we move
> > > in circles.
> > 
> > That's true. Except I try to run it on short samples and test that it
> > actually works
> > 
> 
> > > Its not funny to spend time thinking about and suggesting some optimizations
> > > to correct code to see it replaced by totally different and wrong code.
> > > So my main comment is
> > > * total lack of communication, i really would appreciate to know what
> > >   you are planing to change, how, why, if it has been tested, and what
> > >   the results of the tests where.
> > 
> > Since I can work on it mostly on spare time during work
> 
> i understand that
> 
> 
> > I would be grateful
> > if you suggest some tests for quality.
> 
> First id like to suggest that you test functions that are changed
> individually, that is if something is supposed to be identical, so
> should its output, printf() and the diff tool or md5sum of the
> encoded output should be used whenever a change should not affect
> the output.
> 
> As next step when output does and should change, simply using
> tiny_psnr is a good idea too, not to find quality improvments as
> plain PSNR is not good here but to catch unexpected worsening of quality.
> 
> As third, you have a psychoacoustic model that provides some kind of
> psychoacoustic weights, it should not be hard to calculate wheighted
> distortion from that. and always print it so unexpected changes are
> caught
> 
> Fourth, as you know both the number of bits, and the weighted distortion
> as well as lambda, the encoder can easily calculate the per file
> rate distortion and print that as well, this should be done with code seperate
> from the actual encoding not just using the rd found by the optimizations.

I will stick to distortion-centric model for now since I don't have
a stable reference point.

> > Oh, and headphones are one of the
> > cheapest there.
> 
> Once you cant find a difference anymore, it might be a good idea that someone
> would donates some more expensive headphones to you.

Gabriel has done that already, but I don't dare to take them to work
and encoding at home is a waaay slower. 
 
> > 
> > For now I'm trying to produce slow yet optimal encoder that can be referenced
> > in the future in production of faster but worse encoder to control
> > tradeoffs.
> >  
> > > [...]
> > > > /**
> > > >  * Calculate rate distortion cost for quantizing with given codebook
> > > >  * 
> > > >  * @return quantization distortion
> > > >  */
> > > > static float quantize_band_cost(const float *in, int size, int scale_idx, int cb,
> > > >                                  const float lambda, const float uplim)
> > > > {
> > > >     const float Q = ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS + SCALE_DIV_512];
> > > >     int i, j, k;
> > > >     float cost = 0;
> > > >     const int dim = cb < FIRST_PAIR_BT ? 4 : 2;
> > > >     
> > > > START_TIMER
> > > >     if(!cb){
> > > >         for(i = 0; i < size; i++)
> > > >             cost += in[i]*in[i]*lambda;
> > > >         return cost;
> > > >     }
> > > >     for(i = 0; i < size; i += dim){
> > > >         float mincost = INFINITY;
> > > >         int minidx = 0;
> > > >         int minbits = 0;
> > > >         const float *vec = ff_aac_codebook_vectors[cb-1];
> > > >         for(j = 0; j < ff_aac_spectral_sizes[cb-1]; j++, vec += dim){
> > > >             float rd = 0.0f;
> > > >             int curbits = ff_aac_spectral_bits[cb-1][minidx];
> > > >             if(IS_CODEBOOK_UNSIGNED(cb)){
> > > >                 for(k = 0; k < dim; k++){
> > > >                     float t = fabsf(in[i+k]);
> > > >                     float di;
> > > >                     //do not code with escape sequence small values
> > > >                     if(vec[k] == 64.0f && t < 39.0f*Q){
> > > >                         rd = INFINITY;
> > > >                         break;
> > > >                     }
> > > >                     if(vec[k] == 64.0f){//FIXME: slow
> > > >                         if(t >= 165140.0f*Q){ // clipped value
> > > >                             di = t - 165140.0f;
> > > >                             curbits += 21;
> > > >                         }else{
> > > >                             int c = quant(t, 1.0/Q);
> > > >                             di = t - c*cbrt(c)*Q;
> > > >                             curbits += av_log2(c)*2 - 4 + 1;
> > > >                         }
> > > >                     }else{
> > > >                         di = t - vec[k]*Q;
> > > >                     }
> > > >                     if(vec[k] != 0.0f)
> > > >                         curbits++;
>                               ^^^^^^^^^^
> > > >                     rd += di*di*lambda;
> > > >                 }
> > > >             }else{
> > > >                 for(k = 0; k < dim; k++){
> > > >                     float di = in[i+k] - vec[k]*Q;
> > > >                     rd += di*di*lambda;
> > > >                 }            
> > > >             }
> > > >             rd += curbits;
>                   ^^^^^^^^^^^^^^
> > > >             if(rd < mincost){
> > > >                 mincost = rd;
>                       ^^^^^^^^^^^^^
> > > >                 minidx = j;
> > > >                 minbits = curbits;
> > > >             }
> > > >         }
> > > >         cost += mincost;
>               ^^^^^^^^^^^^^^^^
> > > >         if(cost >= uplim)
> > > >             return uplim;
> > > 
> > > >         minbits = 0;
> > > >         if(IS_CODEBOOK_UNSIGNED(cb))
> > > >             for(j = 0; j < dim; j++)
> > > >                 if(ff_aac_codebook_vectors[cb-1][minidx*2+j] != 0.0f)
> > > >                     minbits++;
> > > >         cost += minbits;
> > > 
> > > what is this doing? This looks strange
> > > cost already includes the bits.
> > 
> > those are sign bits 
> 
> I know, and look above you already added them so they would be
> there twice now.

oops 
 
> [...]
> > > > 
> > > > /**
> > > >  * structure used in optimal codebook search
> > > >  */
> > > > typedef struct BandCodingPath {
> > > >     int prev_idx; ///< pointer to the previous path point
> > > >     int codebook; ///< codebook for coding band run
> > > >     float cost;   ///< path cost
> > > >     int run;
> > > > } BandCodingPath;
> > > > 
> > > > /**
> > > >  * Encode band info for single window group bands.
> > > >  */
> > > > static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce,
> > > >                                      int win, int group_len, const float lambda)
> > > > {
> > > >     BandCodingPath path[120][12];
> > > >     int w, swb, cb, start, start2, size;
> > > >     int i, j;
> > > >     const int max_sfb = sce->ics.max_sfb;
> > > >     const int run_bits = sce->ics.num_windows == 1 ? 5 : 3;
> > > >     const int run_esc = (1 << run_bits) - 1;
> > > >     int idx, ppos, count;
> > > >     int stackrun[120], stackcb[120], stack_len;
> > > > 
> > > >     start = win*128;
> > > >     for(cb = 0; cb < 12; cb++){
> > > >         path[0][cb].cost = 0.0f;
> > > >         path[0][cb].prev_idx = -1;
> > > >         path[0][cb].run = 0;
> > > >     }
> > > >     for(swb = 0; swb < max_sfb; swb++){
> > > >         start2 = start;
> > > >         size = sce->ics.swb_sizes[swb];
> > > >         if(sce->zeroes[win*16 + swb]){
> > > >             for(cb = 0; cb < 12; cb++){
> > > >                 path[swb+1][cb].prev_idx = cb;
> > > >                 path[swb+1][cb].cost = path[swb][cb].cost;
> > > >                 path[swb+1][cb].run = path[swb][cb].run + 1;
> > > >             }
> > > >         }else{
> > > >             float minrd = INFINITY;
> > > >             int mincb = 0;
> > > >             for(cb = 0; cb < 12; cb++){
> > > >                 float rd = 0.0f;
> > > >                 for(w = 0; w < group_len; w++){
> > > >                     rd += quantize_band_cost(sce->coeffs + start + w*128, size, sce->sf_idx[(win+w)*16+swb], cb, lambda, minrd);
> > > >                 }
> > > >                 path[swb+1][cb].prev_idx = cb;
> > > >                 path[swb+1][cb].cost = path[swb][cb].cost + rd;
> > > >                 path[swb+1][cb].run = path[swb][cb].run + 1;
> > > >                 if(rd < minrd){
> > > >                     minrd = rd;
> > > >                     mincb = cb;
> > > >                 }
> > > >             }
> > > >             for(cb = 0; cb < 12; cb++){
> > > >                 float cost = path[swb][cb].cost + minrd + run_bits + 4;
> > > >                 if(cost < path[swb+1][cb].cost){
> > > >                     path[swb+1][cb].prev_idx = mincb;
> > > >                     path[swb+1][cb].cost = cost;
> > > >                     path[swb+1][cb].run = 1;
> > > >                 }
> > > >             }
> > > >         }
> > > >         start += sce->ics.swb_sizes[swb];
> > > >     }
> > > 
> > > Have you tested this code against the previous cb viterbi search?
> > > do they both find the global optimum? Iam asking because this code looks
> > > wrong in several ways, but i may be missing something. Also the paper looked
> > > wrong from what i remember though not in that many ways, the algo described
> > > there for choosing cbs did seem to ignore the run escapes. This may or may not
> > > be significant, but requires tests to understand what effect ignoring
> > > run escapes has, the previous code and what i suggested did not ignore them.
> > 
> > Well, run escape can happen only for run lengths >= 31 (with 51 max)
> > for long window and >=7 of 15 bands for short windows. I will add check for
> > that and recheck code.
> 
> thanks, but note i do not think the run esc is the only bug in this code.
> the non run case and where the min is added looks quite odd.

In a nutshell, at each point it just calculates the cost of continuing the
run sequence and then tests if the switch to the codebook with minimum cost
results in less rate distortion. 

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> I am the wisest man alive, for I know one thing, and that is that I know
> nothing. -- Socrates




More information about the ffmpeg-devel mailing list