[FFmpeg-devel] [RFC] AAC Encoder

Michael Niedermayer michaelni
Tue Aug 12 14:14:20 CEST 2008


On Tue, Aug 12, 2008 at 08:33:15AM +0300, Kostya wrote:
> On Mon, Aug 11, 2008 at 08:03:49PM +0200, Michael Niedermayer wrote:
> > On Mon, Aug 11, 2008 at 12:18:03PM +0300, Kostya wrote:
> > > Here's my implementation of AAC encoder.
> > > 
> > > Since AAC decoder is not fully in SVN yet, I'm omitting data structures
> > > and declarations used from it, I'll sort it after decoder commit.
> > > The main difference is that psy model fills data structures and encoder
> > > writes bitstream with them, so several structures have additional members.
> > > 
> > > I've also stripped out model based on 3GPP 26.403 to make review easier.
> > 
[...]
> [...]
> > > +            }while(w < cpe->ch[channel].ics.num_windows && cpe->ch[channel].ics.group_len[w]);
> > > +        }
> > > +        if(score < best){
> > > +            best = score;
> > > +            bestcb = cb;
> > > +        }
> > > +    }
> > > +    return bestcb;
> > > +}
> > 
> > This code does not seem to consider the distortion
> > please use rate distortion not just the number of bits!
>  
> Umm, why?
> By this point psy model produced coefficients to encode and encoder
> tries to write them in a minimum number of bits.
> It's lossless operation.

We have a problem here, because this isnt optimal
It seems we agree that each bit counts the same no matter what psy says.
Maybe a example will best show the problem
lets assume we have a coeff of 11.5, the psy model decides that a change
to 10 would be ok for the given audio quality/bitrate and thus outputs 10
let us assume that storing a coefficient of 10 and one of 11 both take
7 bit, the decission to store 10 clearly was bad. OTOH it could have
been that storing 11 requires twice as many bits in which case the
decission would have been good. One simply cannot quantize values optimally
without considering the number of bits they need. This is even more true
for vector quantization based codecs than it is for scalar quantization.
it may very well be that psy thinks that both {-1,1} and {-2,0} are an
equally good representation of the exact {-1.5,0.5} but its not until
the encoding that it becomes known which of the two need fewer bits.

Id say the psy model should return an array of perceptual weights W[i]
and the bitstream encode should choose the (global) minimum of
bits[i] + distortion(W[i], coeff[i]-stored[i])
where distortion is a appropriate function whos output matches how audible
a change is, this may be a simple W[i]*(coeff[i]-stored[i])^2 but iam no
psychoacoustic expert so there may be better choices.

And of course the suggested system above needs to be compared to what you
have currenty so that we can be sure it really does sound better.

[...]
> > > +        }
> > > +        if(count){
> > > +            while(count >= esc){
> > > +                put_bits(&s->pb, bits, esc);
> > > +                count -= esc;
> > > +            }
> > > +            put_bits(&s->pb, bits, count);
> > > +        }
> > > +    }
> > > +}
> > 
> > > +
> > > +/**
> > > + * Encode scalefactors.
> > > + */
> > > +static void encode_scale_factor_data(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> > 
> > s/_data/s/
> > 
> > 
> > > +{
> > > +    int off = cpe->ch[channel].mixing_gain, diff;
> > > +    int i, w;
> > > +
> > > +    for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
> > > +        if(cpe->ch[channel].ics.group_len[w]) continue;
> > > +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > > +            if(!cpe->ch[channel].zeroes[w][i]){
> > > +                diff = cpe->ch[channel].sf_idx[w][i] - off + SCALE_DIFF_ZERO;
> > 
> > > +                if(diff < 0 || diff > 120) av_log(avctx, AV_LOG_ERROR, "Scalefactor difference is too big to be coded\n");
> > 
> > assert() ?
> 
> Is it recommended to be used here?  

i do not know, question is what you mean by this check?
is it that it cannot happen? -> assert()
is it that it would break the bitstream -> bug that should be fixed
is it that it is a inevitable shortcomming of the bitstream format
   so that there is no way for an encoder to encode some kinds of
   input -> it should probably be just WARNING and more clear

[...]
> [...]
> > > +static av_always_inline float lowpass_iir_filter(LPFilterState *s, const float *coeffs, float in)
> > > +{
> > > +    memmove(s->x, s->x + 1, sizeof(s->x) - sizeof(s->x[0]));
> > > +    memmove(s->y, s->y + 1, sizeof(s->y) - sizeof(s->y[0]));
> > 
> > no we will not do memmoves in av_always_inline functions
> 
> loop then 

hmpf, thats not what i meant.
write the filter so it does not need to move data around.
i think unrolling the loop by 4 or something would allow fixed indexes
to be used.


>  
> > > +    s->x[IIR_ORDER] = in * coeffs[1];
> > > +    //FIXME: made only for 4th order filter
> > > +    s->y[IIR_ORDER] = (s->x[0] + s->x[4])*1 + (s->x[1] + s->x[3])*4 + s->x[2]*6
> > > +                    + coeffs[2]*s->y[0] + coeffs[3]*s->y[1] + coeffs[4]*s->y[2] + coeffs[5]*s->y[3];
> > > +    return s->y[IIR_ORDER];
> > > +}
> > 
> > the lowpass code belongs in a seperate file and patch and should probably
> > be applied outside the codec. Unless of course it uses some feedback from the
> > psy model ...
> > besides lowpass, things should be run through a highpass filter to remove
> > the DC component, its inaudible and expensive to represent after MDCT transform
> 
> ok, I will try to prepare such patch
> and any pointers about how to remove DC? channel->coeffs[ch][0] = 0.0f seems enough to me 

^_^;;
no, channel->coeffs[ch][0] = 0.0f is not good.
MDCT is BAD at storing DC, just try it, it will smear a flat 1.0 over all
coefficients. coeff[0] should be the biggest but its not the only one.
iam no IIR filter expert so i cannot recommand any specific one. Also this
is not overly important, iam fine if you just add a FIXME so its not forgotten.
If you do want to work on this, well (citeseer, google, wikipedia and maybe a
quick grep in AC3/AAC specs for highpass would be places to find some filter)


>  
> > [...]
> > 
> > > +void ff_aac_psy_preprocess(AACPsyContext *ctx, int16_t *audio, int16_t *dest, int tag, int type)
> > > +{
> > > +    int chans = type == ID_CPE ? 2 : 1;
> > > +    const int chstride = ctx->avctx->channels;
> > > +    int i, ch;
> > > +    float t[2];
> > > +
> > 
> > > +    if(chans == 1 || (ctx->flags & PSY_MODEL_NO_PREPROC) == PSY_MODEL_NO_PREPROC){
> > > +        for(ch = 0; ch < chans; ch++){
> > > +            for(i = 0; i < 1024; i++){
> > > +                dest[i * chstride + ch] = audio[i * chstride + ch];
> > > +            }
> > > +        }
> > 
> > memcpy()
> > 
> > 
> > > +    }else{
> > > +        for(i = 0; i < 1024; i++){
> > > +            if(ctx->flags & PSY_MODEL_NO_ST_ATT){
> > > +                for(ch = 0; ch < 2; ch++)
> > > +                    t[ch] = audio[i * chstride + ch];
> > 
> > > +            }else{
> > > +                t[0] = audio[i * chstride + 0] * (0.5 + ctx->stereo_att) + audio[i * chstride + 1] * (0.5 - ctx->stereo_att);
> > > +                t[1] = audio[i * chstride + 0] * (0.5 - ctx->stereo_att) + audio[i * chstride + 1] * (0.5 + ctx->stereo_att);
> > > +            }
> > 
> > this stereo attenuation looks like it should be in a filter before the codec
> 
> logically it is, as it's applied to raw audio before it's used by encoder
> but I will gladly move that code to libavfilter when possible 

add a FIXME at least please,ideally at the top of the file so its easily found
needs to be done when other prerequesties are available.


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The greatest way to live with honor in this world is to be what we pretend
to be. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080812/9ba18cb4/attachment.pgp>



More information about the ffmpeg-devel mailing list