[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.

Sun Jun 29 13:32:09 CEST 2008

On Sun, Jun 29, 2008 at 10:42:40AM +0100, Robert Swain wrote:
> 2008/6/27 Michael Niedermayer <michaelni at gmx.at>:
> > On Fri, Jun 27, 2008 at 03:35:06PM +0100, Robert Swain wrote:
> >> 2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
> >> > On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
> >> >> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
> >> >> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
> >> >> > [...]
> >> >> >> +
> >> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
> >> >> >> +        for (i = 0; i < ics->max_sfb; i++) {
> >> >> >> +            if (cb[g][i] == NOISE_HCB) {
> >> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
> >> >> >> +                    float energy = 0;
> >> >> >> +                    float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
> >> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
> >> >> >> +                        energy += (float)icoef[group*128+k] * icoef[group*128+k];
> >> >> >> +                    scale *= sf[g][i] / sqrt(energy);
> >> >> >
> >> >> > are you sure that the random values have to be normalized like that?
> >> >> > I suspect energy is supposed tp be a constant.
> >> >>
> >> >> That's how it is in the spec. From section 4.6.13 Perceptual Noise
> >> >> Substitution (PNS):
> >> >
> >> > Ive checked the spec before my reply, and i belive your code is wrong.
> >> >
> >> >> The energy information for percpetual noise substitution decoding is
> >> >> represented by a "noise energy" value indicating the overall power of
> >> >> the substituted spectral coefficients in steps of 1.5 dB. If noise
> >> >> substitution coding is active for a particular group and scalefactor
> >> >> band, a noise energy value is transmitted instead of the scalefactor
> >> >> of the respective channel.
> >> >
> >> > Doesnt say that the output from the random number generator should be choped
> >> > up in bands and each independantly renormalized.
> >> >
> >> > Heres what the spec says:
> >> >    /* Decode noise energies for this group */
> >> >    for (sfb=0; sfb<max_sfb; sfb++)
> >> >        if (is_noise(g,sfb))
> >> >            noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
> >> >    /* Do perceptual noise substitution decoding */
> >> >    for (b=0; b<window_group_length[g]; b++) {
> >> >        for (sfb=0; sfb<max_sfb; sfb++) {
> >> >            if (is_noise(g,sfb)) {
> >> >                offs = swb_offset[sfb];
> >> >                size = swb_offset[sfb+1] - offs;
> >> >                /* Generate random vector */
> >> >                gen_rand_vector( &spec[g][b][sfb][0], size );
> >> >                scale = 1/(size * sqrt(MEAN_NRG));
> >> >                scale *= 2.0^(0.25*noise_nrg [g][sfb]);
> >> >                /* Scale random vector to desired target energy */
> >> >                for (i=0; i<len; i++)
> >> >                    spec[g][b][sfb][i] *= scale;
> >> >            }
> >> >        }
> >> >    }
> >> >    ...
> >> >    The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
> >> >    average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
> >> >    multiplication/accumulation per random value.
> >> >
> >> > No weird renormalization!
> >> > also the size factor is commented out in our code, i guess to cancel the
> >> > incorrect normalization mostly out.
> >>
> >> OK, I understand what you mean now. I see that the current code we
> >> have calculates the energy of the band in question (hence there's no
> >> need for /size) and scales to that energy rather than scaling to
> >> 1/(size * MEAN_NRG). I have a few questions:
> >>
> >
> >> - Is the av_random() & 0xFFFF code OK or should the 32-bit random
> >> value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
> >> >> 16).
> >
> > Why dont you use the whole 32bit value? Or was the array int16_t in
> > which case >>16 is as good as &0xFFFF given the random number generator
> > is good.
> 
> I might as well and I'll probably move the av_random() calls to
> quant_to_spec_tool().
> 
> >> - How does one analytically calculate the mean energy of the noise we
> >> generate such that this value can be defined as a constant and used in
> >> our code?
> >
> > How do you define energy?
> 
> For a discrete signal x_i for i = [0, N], the energy of the signal is
> sum(x_i * x_i, i=0, i=N).
> 
> > What is the distribution of the values av_random() outputs?
> 
> /** generates a random number on [0,0xffffffff]-interval */
> 
> If it claims to generate a random number on that interval then without
> stating any bias in the distribution, it should mean generates values
> in that interval with equal probability.
> 
> >> After assuming white noise and dabbling about a bit with
> >> some maths it seems it should be 1/3 * maximum possible energy i.e.
> >> #define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
> >> correct?
> >
> > If you know how you define energy then you can easily check this by
> > calculating the exact nummerical energy of many values from av_random()
> > your calculation should be close to the value returned ...
> 
> I tried my sum for large values of N with x_i on the interval [0.0,
> 1.0] and it came close to 1/3.
> 
> Ideally, for the mean energy, one would use the limit as N->inf of the
> above-mentioned sum / N
> 
> > If you further assume random() has a flat equi distibuted output than
> > a simple
> > for(i=0; i<=0xFFFF; i++)
> >    find_energy(&energy, i);
> > will give you the exact value
> > ... the analytical one, assuming you solved the correct integral ;) is
> > actually just an approximation for the continuous case ...
> 
> Right, and I was just wondering if you could clarify the continuous case. :)
> 
> By the way, I've just looked at the FAAD and 3GPP implementations and
> they both scale per random vector, i.e. per band as the current code
> does. They both actually do the scaling within the random vector
> generation function though I guess FAAD is based off the 3GPP
> reference code.
> 
> So, which should we use?
> 
> - Scaling per band as in the reference code
> - Scaling with constant mean energy

the second, because
1. thats what the spec we have says
2. its faster
3. its simpler
4. its random, rescaling subsets of numbers "dynamically" is not random
   anymore.

If of course all files we had with thsis feature sounded better with the
faad/3gpp variant then we should choose the first but IIRC we dont have
any such files ...

> 
> If the latter, what precision should be used for the sqrt(3.0) * 2^31 constant?

as the output is float, a float constant seems logic ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080629/f064dce8/attachment.pgp>