[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.

Robert Swain robert.swain
Sun Jun 29 16:19:24 CEST 2008


2008/6/29 Robert Swain <robert.swain at gmail.com>:
> 2008/6/29 Michael Niedermayer <michaelni at gmx.at>:
>> On Sun, Jun 29, 2008 at 10:42:40AM +0100, Robert Swain wrote:
>>> 2008/6/27 Michael Niedermayer <michaelni at gmx.at>:
>>> > On Fri, Jun 27, 2008 at 03:35:06PM +0100, Robert Swain wrote:
>>> >> 2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
>>> >> > On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
>>> >> >> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
>>> >> >> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
>>> >> >> > [...]
>>> >> >> >> +
>>> >> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>>> >> >> >> +        for (i = 0; i < ics->max_sfb; i++) {
>>> >> >> >> +            if (cb[g][i] == NOISE_HCB) {
>>> >> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>>> >> >> >> +                    float energy = 0;
>>> >> >> >> +                    float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
>>> >> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>>> >> >> >> +                        energy += (float)icoef[group*128+k] * icoef[group*128+k];
>>> >> >> >> +                    scale *= sf[g][i] / sqrt(energy);
>>> >> >> >
>>> >> >> > are you sure that the random values have to be normalized like that?
>>> >> >> > I suspect energy is supposed tp be a constant.
>>> >> >>
>>> >> >> That's how it is in the spec. From section 4.6.13 Perceptual Noise
>>> >> >> Substitution (PNS):
>>> >> >
>>> >> > Ive checked the spec before my reply, and i belive your code is wrong.
>>> >> >
>>> >> >> The energy information for percpetual noise substitution decoding is
>>> >> >> represented by a "noise energy" value indicating the overall power of
>>> >> >> the substituted spectral coefficients in steps of 1.5 dB. If noise
>>> >> >> substitution coding is active for a particular group and scalefactor
>>> >> >> band, a noise energy value is transmitted instead of the scalefactor
>>> >> >> of the respective channel.
>>> >> >
>>> >> > Doesnt say that the output from the random number generator should be choped
>>> >> > up in bands and each independantly renormalized.
>>> >> >
>>> >> > Heres what the spec says:
>>> >> >    /* Decode noise energies for this group */
>>> >> >    for (sfb=0; sfb<max_sfb; sfb++)
>>> >> >        if (is_noise(g,sfb))
>>> >> >            noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
>>> >> >    /* Do perceptual noise substitution decoding */
>>> >> >    for (b=0; b<window_group_length[g]; b++) {
>>> >> >        for (sfb=0; sfb<max_sfb; sfb++) {
>>> >> >            if (is_noise(g,sfb)) {
>>> >> >                offs = swb_offset[sfb];
>>> >> >                size = swb_offset[sfb+1] - offs;
>>> >> >                /* Generate random vector */
>>> >> >                gen_rand_vector( &spec[g][b][sfb][0], size );
>>> >> >                scale = 1/(size * sqrt(MEAN_NRG));
>>> >> >                scale *= 2.0^(0.25*noise_nrg [g][sfb]);
>>> >> >                /* Scale random vector to desired target energy */
>>> >> >                for (i=0; i<len; i++)
>>> >> >                    spec[g][b][sfb][i] *= scale;
>>> >> >            }
>>> >> >        }
>>> >> >    }
>>> >> >    ...
>>> >> >    The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
>>> >> >    average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
>>> >> >    multiplication/accumulation per random value.
>>> >> >
>>> >> > No weird renormalization!
>>> >> > also the size factor is commented out in our code, i guess to cancel the
>>> >> > incorrect normalization mostly out.
>>> >>
>>> >> OK, I understand what you mean now. I see that the current code we
>>> >> have calculates the energy of the band in question (hence there's no
>>> >> need for /size) and scales to that energy rather than scaling to
>>> >> 1/(size * MEAN_NRG). I have a few questions:
>>> >>
>>> >
>>> >> - Is the av_random() & 0xFFFF code OK or should the 32-bit random
>>> >> value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
>>> >> >> 16).
>>> >
>>> > Why dont you use the whole 32bit value? Or was the array int16_t in
>>> > which case >>16 is as good as &0xFFFF given the random number generator
>>> > is good.
>>>
>>> I might as well and I'll probably move the av_random() calls to
>>> quant_to_spec_tool().
>>>
>>> >> - How does one analytically calculate the mean energy of the noise we
>>> >> generate such that this value can be defined as a constant and used in
>>> >> our code?
>>> >
>>> > How do you define energy?
>>>
>>> For a discrete signal x_i for i = [0, N], the energy of the signal is
>>> sum(x_i * x_i, i=0, i=N).
>>>
>>> > What is the distribution of the values av_random() outputs?
>>>
>>> /** generates a random number on [0,0xffffffff]-interval */
>>>
>>> If it claims to generate a random number on that interval then without
>>> stating any bias in the distribution, it should mean generates values
>>> in that interval with equal probability.
>>>
>>> >> After assuming white noise and dabbling about a bit with
>>> >> some maths it seems it should be 1/3 * maximum possible energy i.e.
>>> >> #define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
>>> >> correct?
>>> >
>>> > If you know how you define energy then you can easily check this by
>>> > calculating the exact nummerical energy of many values from av_random()
>>> > your calculation should be close to the value returned ...
>>>
>>> I tried my sum for large values of N with x_i on the interval [0.0,
>>> 1.0] and it came close to 1/3.
>>>
>>> Ideally, for the mean energy, one would use the limit as N->inf of the
>>> above-mentioned sum / N
>>>
>>> > If you further assume random() has a flat equi distibuted output than
>>> > a simple
>>> > for(i=0; i<=0xFFFF; i++)
>>> >    find_energy(&energy, i);
>>> > will give you the exact value
>>> > ... the analytical one, assuming you solved the correct integral ;) is
>>> > actually just an approximation for the continuous case ...
>>>
>>> Right, and I was just wondering if you could clarify the continuous case. :)
>>>
>>> By the way, I've just looked at the FAAD and 3GPP implementations and
>>> they both scale per random vector, i.e. per band as the current code
>>> does. They both actually do the scaling within the random vector
>>> generation function though I guess FAAD is based off the 3GPP
>>> reference code.
>>>
>>> So, which should we use?
>>>
>>> - Scaling per band as in the reference code
>>> - Scaling with constant mean energy
>>
>> the second, because
>> 1. thats what the spec we have says
>> 2. its faster
>> 3. its simpler
>> 4. its random, rescaling subsets of numbers "dynamically" is not random
>>   anymore.
>>
>> If of course all files we had with thsis feature sounded better with the
>> faad/3gpp variant then we should choose the first but IIRC we dont have
>> any such files ...
>>
>>
>>>
>>> If the latter, what precision should be used for the sqrt(3.0) * 2^31 constant?
>>
>> as the output is float, a float constant seems logic ...
>
> See attached.

Damnit. Why do I always spot small errors right after I've decided I'm
happy with a patch and have submitted it? The return of av_random is
unsigned and it needs to be signed so cast to int32_t. See attached
again. Sorry for the noise.

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080629-1503-noise-use_mean_energy.diff
Type: text/x-diff
Size: 2311 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080629/89db4cb0/attachment.diff>



More information about the ffmpeg-devel mailing list