[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Alex Converse alex.converse
Tue Oct 7 21:23:50 CEST 2008


On Tue, Oct 7, 2008 at 1:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Oct 06, 2008 at 11:30:56PM -0400, Alex Converse wrote:
>> On Mon, Oct 6, 2008 at 10:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Mon, Oct 06, 2008 at 09:52:51PM -0400, Alex Converse wrote:
>> >> On Mon, Oct 6, 2008 at 9:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > On Mon, Oct 06, 2008 at 08:52:06PM -0400, Alex Converse wrote:
>> >> >> On Mon, Oct 6, 2008 at 8:22 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> > On Mon, Oct 06, 2008 at 03:46:55PM -0400, Alex Converse wrote:
>> >> >> >> On Tue, Sep 30, 2008 at 11:25 PM, Alex Converse <alex.converse at gmail.com> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > The attached patch should fix AAC PNS scaling [Issue 664]. It will not
>> >> >> >> > fix PNS conformance.
>> >> >> >>
>> >> >> >> Here's a sightly updated patch (sqrtf instead of sqrt). The current
>> >> >> >> method of PNS will never conform because sample energy simpl doesn't
>> >> >> >> converge to it's mean fast enough. The spec explicitly states that PNS
>> >> >> >> should be normalized per band. Not doing it that way causes PNS-1
>> >> >> >> conformance to fail for 45 bands.
>> >> >> >
>> >> >> > elaborate, what part of the spec says what?
>> >> >>
>> >> >> 14496-3:2005/4.6.13.3 p184 (636 of the PDF)
>> >> >>
>> >> >> > what is PNS-1 conformance ?
>> >> >>
>> >> >> 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF)
>> >> >> 14496-5/conf_pns folder
>> >> >
>> >> > do you happen to have URLs for these?
>> >> >
>> >> >
>> >> >>
>> >> >> > the part that feels a little odd is normalizing random data on arbitrary
>> >> >> > and artificial bands, this simply makes things non random.
>> >> >> > This would be most extreemly vissibly with really short bands of 1 or 2
>> >> >> > coeffs ...
>> >> >> > another way to see the issue is to take 100 coeffs and split them into
>> >> >> > 10 bands, if you now normalize litterally these 10 bands then the 100
>> >> >> > coeffs will no longer be random at all, they will be significantly
>> >> >> > correlated. This may be inaudible, it may or may not sound better and
>> >> >> > may or may not be what the spec wants but still it feels somewhat wrong
>> >> >> > to me ...
>> >> >> >
>> >> >>
>> >> >> Ralph Sperschneider from FhG/MPEG spelled it all out:
>> >> >> http://lists.mpegif.org/pipermail/mp4-tech/2003-June/002358.html
>> >> >>
>> >> >> I'm not saying it's a smart way to design a CODEC but it's what MPEG picked.
>> >> >
>> >> > yes, so i guess the most sensible solution would be to precalculate
>> >> > a second of noise normalized to the band sizes and randomly pick from
>> >> > these.
>> >> >
>> >>
>> >> That sounds messy and overly complex. What's wrong with doing it the
>> >> way MPEG tells us to?
>> >
>> > that is what mpeg tells us to do, they do not mandate any specific way
>> > to calculate random values. And i do not like doing sqrt() per band ...
>> >
>>
>> One sqrtf() per band isn't that intense. To stick with the current
>> approach we still need to do a sqrt on the band size. We could even
>> use one of those fast 1/sqrt algorithms.
>
> we do not need to do a sqrt() on the band size, not in the current
> approuch and not with the other variant. A small LUT will do fine
> considering the small number of band sizes. And even that is not
> needed in all cases ...
>

I'm attaching a version that is functionally correct that does do 1
sqrtf per band (aka up to 120 per frame).

I'm using the Carmack-Lomont 1/sqrtf based on the following benchmark:

With math.h sqrtf
alex at Barcelona:~/Projects/ffmpeg/14496-4$ ../ffmpeg/ffmpeg -i
mpeg4audio-conformance/compressedMp4/al18_48.mp4 -f null -
FFmpeg version SVN-r15584, Copyright (c) 2000-2008 Fabrice Bellard, et al.
  configuration: --enable-gpl
  libavutil     49.11. 0 / 49.11. 0
  libavcodec    52. 0. 0 / 52. 0. 0
  libavformat   52.22. 1 / 52.22. 1
  libavdevice   52. 1. 0 / 52. 1. 0
  built on Oct  7 2008 14:26:51, gcc: 4.3.2
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'mpeg4audio-conformance/compressedMp4/al18_48.mp4':
  Duration: 00:01:00.01, start: 0.000000, bitrate: 67 kb/s
    Stream #0.0(und): Audio: aac, 48000 Hz, mono, s16
Output #0, null, to 'pipe:':
    Stream #0.0(und): Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
  Stream #0.0 -> #0.0
Press [q] to stop encoding
253170 dezicycles in sqrtf, 1 runs, 0 skips
245160 dezicycles in sqrtf, 2 runs, 0 skips
240997 dezicycles in sqrtf, 4 runs, 0 skips
238050 dezicycles in sqrtf, 8 runs, 0 skips
236553 dezicycles in sqrtf, 16 runs, 0 skips
235676 dezicycles in sqrtf, 32 runs, 0 skips
234884 dezicycles in sqrtf, 64 runs, 0 skips
234129 dezicycles in sqrtf, 128 runs, 0 skips
234115 dezicycles in sqrtf, 256 runs, 0 skips
234366 dezicycles in sqrtf, 512 runs, 0 skips
233892 dezicycles in sqrtf, 1024 runs, 0 skips
233624 dezicycles in sqrtf, 2047 runs, 1 skips
size=      -0kB time=59.99 bitrate=  -0.0kbits/s
video:0kB audio:5624kB global headers:0kB muxing overhead -100.000382%

With Carmack-Lomont
alex at Barcelona:~/Projects/ffmpeg/14496-4$ ../ffmpeg/ffmpeg -i
mpeg4audio-conformance/compressedMp4/al18_48.mp4 -f null -
FFmpeg version SVN-r15584, Copyright (c) 2000-2008 Fabrice Bellard, et al.
  configuration: --enable-gpl
  libavutil     49.11. 0 / 49.11. 0
  libavcodec    52. 0. 0 / 52. 0. 0
  libavformat   52.22. 1 / 52.22. 1
  libavdevice   52. 1. 0 / 52. 1. 0
  built on Oct  7 2008 14:26:51, gcc: 4.3.2
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'mpeg4audio-conformance/compressedMp4/al18_48.mp4':
  Duration: 00:01:00.01, start: 0.000000, bitrate: 67 kb/s
    Stream #0.0(und): Audio: aac, 48000 Hz, mono, s16
Output #0, null, to 'pipe:':
    Stream #0.0(und): Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
  Stream #0.0 -> #0.0
Press [q] to stop encoding
197190 dezicycles in sqrtf, 1 runs, 0 skips
190665 dezicycles in sqrtf, 2 runs, 0 skips
187807 dezicycles in sqrtf, 4 runs, 0 skips
185737 dezicycles in sqrtf, 8 runs, 0 skips
184651 dezicycles in sqrtf, 16 runs, 0 skips
184255 dezicycles in sqrtf, 32 runs, 0 skips
183868 dezicycles in sqrtf, 64 runs, 0 skips
183976 dezicycles in sqrtf, 128 runs, 0 skips
183913 dezicycles in sqrtf, 256 runs, 0 skips
184199 dezicycles in sqrtf, 511 runs, 1 skips
184037 dezicycles in sqrtf, 1023 runs, 1 skips
183925 dezicycles in sqrtf, 2047 runs, 1 skips
size=      -0kB time=59.99 bitrate=  -0.0kbits/s
video:0kB audio:5624kB global headers:0kB muxing overhead -100.000382%

Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz

>
>>
>> >
>> >> Or just sticking with what we have it sounds
>> >> fine and is fast.
>> >
>> > well if its conformant thats fine, but it seemed to me that it is not
>>
>> It isn't but it seemed to me that strict conformance is not always
>> required for ffmpeg (RE: block switching)?
>
> the way i understood the spec when i read the relevant pat last time is
> that this specific kind of block switching is invalid, thus outside the
> spec and no special behavior is mandated.
>
>
>>
>> > though iam still waiting for a quote from the spec to confirm this.
>> >
>>
>> I sent you section and page numbers. It's not a one liner. The [PNS-1]
>> section of 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF) is fairly straight
>> forward.
>
> I will not pay iso for a document just to confirm the claim.
> Luckily though ive found 14496-3:2005 on my hd (sorry, i do have a mess)
> It says:
> The noise substitution decoding process for one channel is defined by the following pseudo code:
> nrg = global_gain - NOISE_OFFSET - 256;
> for (g=0; g<num_window_groups; g++) {
>   /* Decode noise energies for this group */
>   for (sfb=0; sfb<max_sfb; sfb++) {
>        if (is_noise(g,sfb)) {
>           nrg += dpcm_noise_nrg[g][sfb];
>           noise_nrg[g][sfb] = nrg;
>        }
>    }
>     /* Do perceptual noise substitution decoding */
>     for (b=0; b<window_group_length[g]; b++) {
>         for (sfb=0; sfb<max_sfb; sfb++) {
>             if (is_noise(g,sfb)) {
>                 size = swb_offset[sfb+1] - swb_offset[sfb];
>                 /* Generate random vector */
>                 gen_rand_vector( &spec[g][b][sfb][0], size );
>                 nrg=0;
>                 for (i=0; i<size; i++) {
>                    nrg+= spec[g][b][sfb][i] * spec[g][b][sfb][i];
>                 }
>                 sqrt_nrg = sqrt (nrg);
>                 scale *= 2.0^(0.25*noise_nrg [g][sfb]) / sqrt_nrg;
>                 /* scale random vector to desired target energy */
>                 for (i=0; i<size; i++) {
>                    spec[g][b][sfb][i] *= scale;
>                 }
>             }
>         }
>     }
> }
> The constant NOISE_OFFSET is used to adapt the range of average noise energy values to the usual range of
> scalefactors and has a value of 90.
> The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values whereas
> their sum of squares is unequal to zero. A suitable random number generator can be realized using one
> multiplication/accumulation per random value.
> ---------------
>
> So it seems the spec has been changed to use the more idiotic normalization.
> And that confirms your claim.
>
>
>>
>>
>> >
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >> However with this patch there appears to be no audible difference
>> >> >> >> between the approaches.
>> >> >> >
>> >> >> >> I don't know the ideal mean energy so I'm
>> >> >> >> using the sample mean energy for 1024 iterations of the LCG.
>> >> >> >
>> >> >> > i assume cpu cycles got more expensive if people can only spare a few
>> >> >> > thousand
>> >> >> >
>> >> >>
>> >> >> How many do you propose then? I tried running it over the whole period
>> >> >> and the result seemed low, I think it's a classic case of adding too
>> >> >> many equal size floating point values.
>> >> >
>> >> > real mathematicans tend not to use floats that are bound to rounding errors
>> >> >
>> >> > try:
>> >> > for(i=min; i<=max; i++){
>> >> >    uint64_t a= i*i;
>> >> >    var += a;
>> >> >    if(var < a){
>> >> >            var2++;
>> >> >    }
>> >> > }
>> >> >
>> >>
>> >> That will only hold 5 or 6 big values 2^64/((2^31)^2) = 4.
>> >
>> > read the code again please, especially var2
>> > also, just to make sure you have the types correct
>> >
>> > int min= -(1<<31)+1;
>> > int max=  (1<<31)-1;
>> > int64_t i;
>> > uint64_t var=0;
>> > uint64_t var2=0;
>> >
>>
>> Why are we leaving out (int)0x80000000?
>
> because otherwise the mean would be -0.5.
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Its not that you shouldnt use gotos but rather that you should write
> readable code and code with gotos often but not always is less readable
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD8DBQFI65XtYR7HhwQLD6sRAryQAKCTSEZbRYYWQp1hJ3KJanBlfDWt1gCffPte
> rYwJ/6ZJ6PYkl2/ELx2GI28=
> =br5Q
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aac-pns-scale-per-band.diff
Type: text/x-diff
Size: 2535 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081007/57b3af1e/attachment.diff>



More information about the ffmpeg-devel mailing list