[FFmpeg-devel] [RFC] AAC encoder optimizations

Sun Sep 14 14:34:09 CEST 2008

On Sun, Sep 14, 2008 at 08:34:11AM +0300, Kostya wrote:
> On Sun, Sep 14, 2008 at 02:45:02AM +0200, Michael Niedermayer wrote:
> > On Sat, Sep 13, 2008 at 10:29:27AM +0300, Kostya wrote:
> > > Here's the next experimental version.
> > > I've replaced quantization and codebook determining functions
> > > and reworked scalefactor search code.
> > > Now it's ~30 time slower than realtime on Core2.
> > 
> > what about quality? 
> > Does it beat faac?
> > Does it reach the quality from that paper?
>  
> Actually its quality IMO is worse than before.

So if your encoder is worse than it was before, then question is why and
how could it happen?
I mean i would have expected you to test it after each change and not
add changes that worsen quality.
When writing an encoder a significant part of the time should be spend
testing it.
Without carefull tests one out of 3 time a developer changes something a
new bug is added that would have been caught by testing. Finding bugs
through review and comparission to another encoder is alot harder.
And in parts the reviewer does not understand well like psychoacoustics
in my case bugs would never be found.

>  
> > > 
> > > Please comment on those functions and give your suggestions.
> > 
> > Your code looks cleaner but i suspect its untested and more buggy
> > than before. I hope iam wrong, and appologize for the flaming comment if
> > i am, but if not then we arent moving any closer to svn, rather we move
> > in circles.
> 
> That's true. Except I try to run it on short samples and test that it
> actually works
> 

> > Its not funny to spend time thinking about and suggesting some optimizations
> > to correct code to see it replaced by totally different and wrong code.
> > So my main comment is
> > * total lack of communication, i really would appreciate to know what
> >   you are planing to change, how, why, if it has been tested, and what
> >   the results of the tests where.
> 
> Since I can work on it mostly on spare time during work

i understand that

> I would be grateful
> if you suggest some tests for quality.

First id like to suggest that you test functions that are changed
individually, that is if something is supposed to be identical, so
should its output, printf() and the diff tool or md5sum of the
encoded output should be used whenever a change should not affect
the output.

As next step when output does and should change, simply using
tiny_psnr is a good idea too, not to find quality improvments as
plain PSNR is not good here but to catch unexpected worsening of quality.

As third, you have a psychoacoustic model that provides some kind of
psychoacoustic weights, it should not be hard to calculate wheighted
distortion from that. and always print it so unexpected changes are
caught

Fourth, as you know both the number of bits, and the weighted distortion
as well as lambda, the encoder can easily calculate the per file
rate distortion and print that as well, this should be done with code seperate
from the actual encoding not just using the rd found by the optimizations.

> Oh, and headphones are one of the
> cheapest there.

Once you cant find a difference anymore, it might be a good idea that someone
would donates some more expensive headphones to you.

> 
> For now I'm trying to produce slow yet optimal encoder that can be referenced
> in the future in production of faster but worse encoder to control
> tradeoffs.
>  
> > [...]
> > > /**
> > >  * Calculate rate distortion cost for quantizing with given codebook
> > >  * 
> > >  * @return quantization distortion
> > >  */
> > > static float quantize_band_cost(const float *in, int size, int scale_idx, int cb,
> > >                                  const float lambda, const float uplim)
> > > {
> > >     const float Q = ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS + SCALE_DIV_512];
> > >     int i, j, k;
> > >     float cost = 0;
> > >     const int dim = cb < FIRST_PAIR_BT ? 4 : 2;
> > >     
> > > START_TIMER
> > >     if(!cb){
> > >         for(i = 0; i < size; i++)
> > >             cost += in[i]*in[i]*lambda;
> > >         return cost;
> > >     }
> > >     for(i = 0; i < size; i += dim){
> > >         float mincost = INFINITY;
> > >         int minidx = 0;
> > >         int minbits = 0;
> > >         const float *vec = ff_aac_codebook_vectors[cb-1];
> > >         for(j = 0; j < ff_aac_spectral_sizes[cb-1]; j++, vec += dim){
> > >             float rd = 0.0f;
> > >             int curbits = ff_aac_spectral_bits[cb-1][minidx];
> > >             if(IS_CODEBOOK_UNSIGNED(cb)){
> > >                 for(k = 0; k < dim; k++){
> > >                     float t = fabsf(in[i+k]);
> > >                     float di;
> > >                     //do not code with escape sequence small values
> > >                     if(vec[k] == 64.0f && t < 39.0f*Q){
> > >                         rd = INFINITY;
> > >                         break;
> > >                     }
> > >                     if(vec[k] == 64.0f){//FIXME: slow
> > >                         if(t >= 165140.0f*Q){ // clipped value
> > >                             di = t - 165140.0f;
> > >                             curbits += 21;
> > >                         }else{
> > >                             int c = quant(t, 1.0/Q);
> > >                             di = t - c*cbrt(c)*Q;
> > >                             curbits += av_log2(c)*2 - 4 + 1;
> > >                         }
> > >                     }else{
> > >                         di = t - vec[k]*Q;
> > >                     }
> > >                     if(vec[k] != 0.0f)
> > >                         curbits++;
                              ^^^^^^^^^^
> > >                     rd += di*di*lambda;
> > >                 }
> > >             }else{
> > >                 for(k = 0; k < dim; k++){
> > >                     float di = in[i+k] - vec[k]*Q;
> > >                     rd += di*di*lambda;
> > >                 }            
> > >             }
> > >             rd += curbits;
                  ^^^^^^^^^^^^^^
> > >             if(rd < mincost){
> > >                 mincost = rd;
                      ^^^^^^^^^^^^^
> > >                 minidx = j;
> > >                 minbits = curbits;
> > >             }
> > >         }
> > >         cost += mincost;
              ^^^^^^^^^^^^^^^^
> > >         if(cost >= uplim)
> > >             return uplim;
> > 
> > >         minbits = 0;
> > >         if(IS_CODEBOOK_UNSIGNED(cb))
> > >             for(j = 0; j < dim; j++)
> > >                 if(ff_aac_codebook_vectors[cb-1][minidx*2+j] != 0.0f)
> > >                     minbits++;
> > >         cost += minbits;
> > 
> > what is this doing? This looks strange
> > cost already includes the bits.
> 
> those are sign bits 

I know, and look above you already added them so they would be
there twice now.

[...]
> > > 
> > > /**
> > >  * structure used in optimal codebook search
> > >  */
> > > typedef struct BandCodingPath {
> > >     int prev_idx; ///< pointer to the previous path point
> > >     int codebook; ///< codebook for coding band run
> > >     float cost;   ///< path cost
> > >     int run;
> > > } BandCodingPath;
> > > 
> > > /**
> > >  * Encode band info for single window group bands.
> > >  */
> > > static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce,
> > >                                      int win, int group_len, const float lambda)
> > > {
> > >     BandCodingPath path[120][12];
> > >     int w, swb, cb, start, start2, size;
> > >     int i, j;
> > >     const int max_sfb = sce->ics.max_sfb;
> > >     const int run_bits = sce->ics.num_windows == 1 ? 5 : 3;
> > >     const int run_esc = (1 << run_bits) - 1;
> > >     int idx, ppos, count;
> > >     int stackrun[120], stackcb[120], stack_len;
> > > 
> > >     start = win*128;
> > >     for(cb = 0; cb < 12; cb++){
> > >         path[0][cb].cost = 0.0f;
> > >         path[0][cb].prev_idx = -1;
> > >         path[0][cb].run = 0;
> > >     }
> > >     for(swb = 0; swb < max_sfb; swb++){
> > >         start2 = start;
> > >         size = sce->ics.swb_sizes[swb];
> > >         if(sce->zeroes[win*16 + swb]){
> > >             for(cb = 0; cb < 12; cb++){
> > >                 path[swb+1][cb].prev_idx = cb;
> > >                 path[swb+1][cb].cost = path[swb][cb].cost;
> > >                 path[swb+1][cb].run = path[swb][cb].run + 1;
> > >             }
> > >         }else{
> > >             float minrd = INFINITY;
> > >             int mincb = 0;
> > >             for(cb = 0; cb < 12; cb++){
> > >                 float rd = 0.0f;
> > >                 for(w = 0; w < group_len; w++){
> > >                     rd += quantize_band_cost(sce->coeffs + start + w*128, size, sce->sf_idx[(win+w)*16+swb], cb, lambda, minrd);
> > >                 }
> > >                 path[swb+1][cb].prev_idx = cb;
> > >                 path[swb+1][cb].cost = path[swb][cb].cost + rd;
> > >                 path[swb+1][cb].run = path[swb][cb].run + 1;
> > >                 if(rd < minrd){
> > >                     minrd = rd;
> > >                     mincb = cb;
> > >                 }
> > >             }
> > >             for(cb = 0; cb < 12; cb++){
> > >                 float cost = path[swb][cb].cost + minrd + run_bits + 4;
> > >                 if(cost < path[swb+1][cb].cost){
> > >                     path[swb+1][cb].prev_idx = mincb;
> > >                     path[swb+1][cb].cost = cost;
> > >                     path[swb+1][cb].run = 1;
> > >                 }
> > >             }
> > >         }
> > >         start += sce->ics.swb_sizes[swb];
> > >     }
> > 
> > Have you tested this code against the previous cb viterbi search?
> > do they both find the global optimum? Iam asking because this code looks
> > wrong in several ways, but i may be missing something. Also the paper looked
> > wrong from what i remember though not in that many ways, the algo described
> > there for choosing cbs did seem to ignore the run escapes. This may or may not
> > be significant, but requires tests to understand what effect ignoring
> > run escapes has, the previous code and what i suggested did not ignore them.
> 
> Well, run escape can happen only for run lengths >= 31 (with 51 max)
> for long window and >=7 of 15 bands for short windows. I will add check for
> that and recheck code.

thanks, but note i do not think the run esc is the only bug in this code.
the non run case and where the min is added looks quite odd.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080914/d4a360e4/attachment.pgp>