[FFmpeg-devel] AAC decoder round 6

Michael Niedermayer michaelni
Mon Aug 11 00:35:27 CEST 2008


On Sun, Aug 10, 2008 at 11:11:21PM +0100, Robert Swain wrote:
> 2008/8/10 Michael Niedermayer <michaelni at gmx.at>:
[...]

> >> >> > [...]
> >> >> >
> >> >> >> @@ -330,10 +790,286 @@
> >> >> >>  }
> >> >> >>
> >> >> >>  /**
> >> >> >> - * Parse Spectral Band Replication extension data; reference: table 4.55.
> >> >> >> + * Dequantize and scale spectral data; reference: 4.6.3.3.
> >> >> >>   *
> >> >> >> + * @param   icoef       array of quantized spectral data
> >> >> >> + * @param   band_type   array of the used band type
> >> >> >> + * @param   sf          array of scalefactors or intensity stereo positions
> >> >> >> + * @param   coef        array of dequantized, scaled spectral data
> >> >> >> + */
> >> >> >> +static void dequant(AACContext * ac, float coef[1024], const int icoef[1024], float sf[120],
> >> >> >> +        const IndividualChannelStream * ics, enum BandType band_type[120]) {
> >> >> >> +    const uint16_t * offsets = ics->swb_offset;
> >> >> >> +    const int c = 1024/ics->num_window_groups;
> >> >> >> +    int g, i, group, k, idx = 0;
> >> >> >> +
> >> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
> >> >> >> +        memset(coef + g * 128 + offsets[ics->max_sfb], 0, sizeof(float)*(c - offsets[ics->max_sfb]));
> >> >> >> +        for (i = 0; i < ics->max_sfb; i++, idx++) {
> >> >> >> +            if (band_type[idx] == NOISE_BT) {
> >> >> >> +                const float scale = sf[idx] / ((offsets[i+1] - offsets[i]) * PNS_MEAN_ENERGY);
> >> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
> >> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
> >> >> >> +                        coef[group*128+k] = lcg_random(&ac->random_state) * scale;
> >> >> >> +                }
> >> >> >> +            } else if (band_type[idx] != INTENSITY_BT && band_type[idx] != INTENSITY_BT2) {
> >> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
> >> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++) {
> >> >> >> +                        coef[group*128+k] = ivquant(icoef[group*128+k]) * sf[idx];
> >> >> >> +                    }
> >> >> >> +                }
> >> >> >> +            }
> >> >> >> +        }
> >> >> >> +        coef  += ics->group_len[g]*128;
> >> >> >> +        icoef += ics->group_len[g]*128;
> >> >> >> +    }
> >> >> >> +}
> >> >> >
> >> >> > it might be better to add 128 just outside the innermost loop to avoid
> >> >> > the "group*128 +" inside
> >> >>
> >> >> They would need to be reset the end of each iteration over i. I moved
> >> >> the group loops inside the conditions recently as I recall. What do
> >> >> you suggest? Add 128 to coef and icoef through each iteration over
> >> >> group and then reset?
> >> >
> >> > I suggest you benchmark it, if it makes no speed difference they can stay
> >> > as they are.
> >>
> >> OK. On a Core Duo 1.83GHz, 5 repetitions interleaved...
> >>
> >> Unpatched:
> >> real  0m8.697s
> >> user  0m8.052s
> >> sys   0m0.494s
> >
> > and START/STOP_TIMER over the function?
> 
> It's done in a number of functions. But here are the results:
> 
> Unpatched:
> 2200 dezicycles in decode_spectrum, 4 runs, 8188 skips
> 2200 dezicycles in decode_spectrum, 4 runs, 16380 skips
> 2200 dezicycles in decode_spectrum, 4 runs, 32764 skips
> 2200 dezicycles in decode_spectrum, 4 runs, 65532 skips
> 
> 116572 dezicycles in dequant, 8186 runs, 6 skips
> 116390 dezicycles in dequant, 16371 runs, 13 skips
> 114657 dezicycles in dequant, 32741 runs, 27 skips
> 107620 dezicycles in dequant, 65499 runs, 37 skips
> 
> 102380 dezicycles in apply_mid_side_stereo, 4094 runs, 2 skips
> 102210 dezicycles in apply_mid_side_stereo, 8188 runs, 4 skips
> 101466 dezicycles in apply_mid_side_stereo, 16365 runs, 19 skips
> 96039 dezicycles in apply_mid_side_stereo, 32745 runs, 23 skips
> 
> Patched:
> 2282 dezicycles in decode_spectrum, 4 runs, 8188 skips
> 2282 dezicycles in decode_spectrum, 4 runs, 16380 skips
> 2282 dezicycles in decode_spectrum, 4 runs, 32764 skips
> 2282 dezicycles in decode_spectrum, 4 runs, 65532 skips
> 
> 132744 dezicycles in dequant, 8191 runs, 1 skips
> 133292 dezicycles in dequant, 16380 runs, 4 skips
> 130422 dezicycles in dequant, 32758 runs, 10 skips
> 124137 dezicycles in dequant, 65515 runs, 21 skips
> 
> 106972 dezicycles in apply_mid_side_stereo, 4095 runs, 1 skips
> 107052 dezicycles in apply_mid_side_stereo, 8188 runs, 4 skips
> 105295 dezicycles in apply_mid_side_stereo, 16376 runs, 8 skips
> 100982 dezicycles in apply_mid_side_stereo, 32749 runs, 19 skips
> 
> I guess it didn't like decode_spectrum() but regardless it looks like
> it's faster without the change. Did I implement it poorly?

hmm, ok leave it as it is faster


> 
> > [...]
> >> > [...]
> >> >> >> +    }
> >> >> >> +    domain = get_bits1(gb);
> >> >> >> +
> >> >> >> +    if (is_indep_coup) {
> >> >> >> +        coup->coupling_point = AFTER_IMDCT;
> >> >> >> +    } else if(domain) {
> >> >> >> +        coup->coupling_point = BETWEEN_TNS_AND_IMDCT;
> >> >> >> +    } else
> >> >> >> +        coup->coupling_point = BEFORE_TNS;
> >> >> >
> >> >> > :/ i thought the bits were at least stored together, what morons designed
> >> >> > this!?
> >> >> >
> >> >> > coup->coupling_point= get_bits1(gb);
> >> >> > ...
> >> >> > coup->coupling_point+= 2*get_bits1(gb);
> >> >> >
> >> >> > still is a possibility though that is cleaner
> >> >> > btw, what is the 4th possibility for?
> >> >>
> >> >> is_indep_coup = 1 overrides whatever domain is so the 4th case is
> >> >> redundant. Though theoretically, domain must be 1 if is_indep_coup is
> >> >> 1. is_indep_coup = 1 and domain = 0 is theoretically invalid.
> >> >>
> >> >> coup->coupling_point = 2*get_bits1(gb);
> >> >> ...
> >> >> coup->coupling_point += get_bits1(gb) && !coup->coupling_point;
> >> >> ?
> >> >
> >> > hmm, this seems unneccesarily complex
> >> > and the invalid value should be checked for with av_log&return -1
> >>
> >> It's not invalid in that it should be checked and cause an error. It's
> >> just that the value of domain is ignored when is_indep_coup is set. I
> >
> > Where does the spec say this?
> > If the spec says that one bit switches between doing it before and after A
> > and says the second switches before and after B then the order of A and B
> > implicate which combination is invalid.
> 
> ind_sw_cce_flag
> one bit indicating whether the coupled target syntax element is an
> independently switched (1) or a dependently switched (0).
> 
> cc_domain
> one bit indicating whether the coupling is performed before (0) or after (1)
> the TNS decoding of the coupled target channels
> 
> In 4.6.8.3.3 Decoding process (4.6.8.3 is Coupling channel) it states that:
> 
> "the independently switched CCE must be decoded all the way to the
> time domain (i.e. including the synthesis filterbank) before it is
> scaled and added onto the various SCE and CPE channels that it is
> coupled to in the case that window state does not match."
> 
> "A dependently switched CCE, on the other hand, must have a window
> state that matches all of the target SCE and CPE channels that it is
> coupled onto as determined by the list of cc_l and cc_r elements. In
> this case, the CCE only needs to be decoded as far as the frequency
> domain and then scaled as directed by the gain list before it is added
> to the target SCE or CPE channels."
> 
> So, the spec seems to suggest that one can couple an independently
> switched CCE in the frequency domain in the case that the window state
> matches. In these cases, I suppose the cc_domain variable would be of
> consequence but otherwise, when coupling in the time domain for
> independently switched CCEs, cc_domain should really always be 1 as
> coupling would always be conducted after TNS decoding. 

> Can all
> encoders be trusted to write the correct value? :)

who cares? its a bug that is VERY easy to debug as long as the "illegal"
value is checked for. Its much better than silently doing something with
it that may or may not be what the encoder expects. (the artifacts would
be much harder to debug)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I wish the Xiph folks would stop pretending they've got something they
do not.  Somehow I fear this will remain a wish. -- M?ns Rullg?rd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080811/c1a140b4/attachment.pgp>



More information about the ffmpeg-devel mailing list