[FFmpeg-devel] [RFC] AAC Encoder, now more optimal

Kostya kostya.shishkov
Sat Sep 6 17:36:16 CEST 2008


On Sat, Sep 06, 2008 at 03:13:54AM +0200, Michael Niedermayer wrote:
> On Fri, Sep 05, 2008 at 04:13:58PM +0300, Kostya wrote:
[...]
> ok, so only optimization related comments
> 
> [...]
> 
> > /**
> >  * Encode band info for single window group bands.
> >  */
> > static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce,
> >                                      int win, int group_len)
> > {
> >     BandCodingPath path[64];
> >     int band_bits[64][12];
> >     int w, swb, cb, start, start2, size;
> >     int i, j;
> >     const int max_sfb = sce->ics.max_sfb;
> >     const int run_bits = sce->ics.num_windows == 1 ? 5 : 3;
> >     const int run_esc = (1 << run_bits) - 1;
> >     int bits, sbits, idx, count;
> >     int stack[64], stack_len;
> > 
> >     start = win*128;
> >     for(swb = 0; swb < max_sfb; swb++){
> >         int maxval = 0;
> >         start2 = start;
> >         size = sce->ics.swb_sizes[swb];
> >         if(sce->zeroes[win*16 + swb]){
> >             maxval = 0;
> >         }else{
> >             for(w = 0; w < group_len; w++){
> >                 for(i = start2; i < start2 + size; i++){
> >                     maxval = FFMAX(maxval, FFABS(sce->icoefs[i]));
> >                 }
> >                 start2 += 128;
> >             }
> >         }
> >         sbits = calculate_band_sign_bits(s, sce, group_len, start, size);
> >         for(cb = 0; cb < 12; cb++){
> >             if(aac_cb_info[cb].maxval < maxval){
> >                 band_bits[swb][cb] = INT_MAX;
> >             }else{
> >                 band_bits[swb][cb] = calculate_band_bits(s, sce, group_len, start, size, cb);
> >                 if(IS_CODEBOOK_UNSIGNED(cb-1)){
> >                     band_bits[swb][cb] += sbits;
> >                 }
> >             }
> >         }
> >         start += sce->ics.swb_sizes[swb];
> >     }
> >     path[0].bits = 0;
> >     for(i = 1; i <= max_sfb; i++)
> >         path[i].bits = INT_MAX;
> 
> >     for(i = 0; i < max_sfb; i++){
> >         for(cb = 0; cb < 12; cb++){
> >             int sum = 0;
> >             for(j = 1; j <= max_sfb - i; j++){
> >                 if(band_bits[i+j-1][cb] == INT_MAX)
> >                     break;
> >                 sum += band_bits[i+j-1][cb];
> >                 bits = sum + path[i].bits + run_value_bits[sce->ics.num_windows == 8][j];
> >                 if(bits < path[i+j].bits){
> >                     path[i+j].bits     = bits;
> >                     path[i+j].codebook = cb;
> >                     path[i+j].prev_idx = i;
> >                 }
> >             }
> >         }
> >     }
> 
> below should be faster, but _very_ important, check that it still finds the
> global optimum, it should. That said, always check all optimizations you do
> to ensure they do not change anything unexpected. debuging such things later
> can be very time consuming.
> 
> for(i = 0; i < max_sfb; i++){
>     for(cb = 0; cb < 12; cb++){
>         int r= last[cb].run;
>         if(run_value_bits[r] == run_value_bits[r+1]){
>             last[cb].bits += band_bits[i][cb];
>             last[cb].run++;
>             bits = band_bits[i][cb] + path[i-1].bits + run_value_bits[1];
>             if(bits < last[cb].bits){
>                 last[cb].bits= bits;
>                 last[cb].run= run;
>             }
>         }else{
>             int sum = 0;
>             last[cb].bits = INT_MAX;
>             for(run = 1; run<=i+1; run++){
>                 sum += band_bits[i-run+1][cb];
>                 bits = sum + path[i-run].bits + run_value_bits[run];
>                 if(bits < last[cb].bits){
>                     last[cb].bits= bits;
>                     last[cb].run= run;
>                 }
>             }
>         }
>         if(last[cb].bits < path[i].bits){
>             path[i].bits= last[cb].bits;
>             path[i].codebook= cb;
>             path[i].run= last[cb].run;
>         }
>     }
> }
> 
> also this should be working with rate distortion values not just bits.

IMO it should encode losslessly at this stage, so no rate distortion.

While this seems a bit faster it's significantly worse - on my test sample it's
222809 bytes against 209792 with current code. 
 
> [...]
> > typedef struct TrellisPath {
> >     float cost;
> >     int prev;
> >     int min_val;
> >     int max_val;
> > } TrellisPath;
> > 
> 
> > static void search_for_quantizers(AACEncContext *s, ChannelElement *cpe, int channels)
> > {
> >     int q, ch, w, w2, g, start = 0;
> >     int i;
> >     int qcoeffs[128];
> >     int idx;
> >     TrellisPath paths[256*121];
> >     int bandaddr[121];
> >     const float lambda = 5e-7f;
> >     int minq;
> >     float mincost;
> > 
> 
> >     for(ch = 0; ch < channels; ch++){
> 
> isnt that channels loop prettier outside the function?

I was intending to embed M/S coding decision hewe in some way.
Any tips on that?
 
> the rest of this function looks pretty messy and IMHO needs some cleanup
> before it can be optimized, optimiztaions will add more mess and that would
> not be good for a unction that should be about 1/4 the size it is.
> one obvious speed issue is running cbrt() and pow() in the very innermost
> loop though.

Since it's partially optimized already...

> Also as it is based on approximate quantization errors and bits we cannot
> try heuristics, as we have no way to test what effect the heuristics have
> thats because we lack a optimal but slow reference.

>From my debugging I can say that while bits needed to code band are
slowly decreasing in monotone way, distortion tends to vary significantly
inside quantizer groups of four (i.e. 2^0, 2^1/4, 2^2/4 and 2^3/4),
i.e. it may be minimal for quantizers X mod 4 == 3, so it's a bit hard to predict.

> a very simple optimization though that could be tried is to simple pick the
> so far best node and only consider it and all nodes that are less than 19
> bits worse, because the max scalefactor diff VLC has 19 bits the others
> cant be on the optimal path.
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> I have never wished to cater to the crowd; for what I know they do not
> approve, and what they approve I do not know. -- Epicurus




More information about the ffmpeg-devel mailing list