[FFmpeg-devel] [PATCH] HE-AAC v1 try 5

Sat Mar 6 22:12:49 CET 2010

On Fri, Mar 05, 2010 at 07:00:14PM -0500, Alex Converse wrote:
> On Thu, Mar 4, 2010 at 5:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Mar 03, 2010 at 10:07:12PM -0500, Alex Converse wrote:
> [...]
> >
> >> ?avformat.h | ? ?2 +-
> >> ?utils.c ? ?| ? ?5 +++++
> >> ?2 files changed, 6 insertions(+), 1 deletion(-)
> >> 4948a33990bbe6fca0e4bda563022b3e2e03d49f ?sbr_avformat.patch
> >> From bfc85e165f3716abe08356a584dc8ac6e5593369 Mon Sep 17 00:00:00 2001
> >> From: Alex Converse <alex.converse at gmail.com>
> >> Date: Wed, 3 Mar 2010 05:17:59 -0500
> >> Subject: [PATCH] Add a workaround for backwards compatible HE-AAC signaling.
> [...]
> >> --------------1
> >
> > ok if tested
> >
> >
> 
> ACK
> 
> Thanks for the feedback, most of it was very useful. However on a few
> points I vehemently disagree.
> 
> Attached is an updated patch.
> 
> At this point we are significantly faster than faad. With your
> approval I would like to commit this and move on to close the feature
> gap with faad (work on ps).

The code is of less than ideal quality currently
Still as this is a high priority feature i dont object to it being commited
given that it is improved further in the future when people find more
issues in it. 
One thing though must be resolved first, and that are the variable names
they are all 1:1 copied from the spec and i do not know if we can legally
include this in a *GPL program in terms of copyright.
You seem to not even know with certainity what some variable names mean ...
Not to mention these are extreemly poor names on technical grounds
That is unless one of our legal experts says, this is ok

also if we did find some one selling code outside the *GPL and
that used 100% identical variable names to ffmpeg we would have a quite
strong case against him if some ffmpeg devel choose to pursue it.

[...]
> 
> >
> > [...]
> >> +static inline void remove_table_element_int16(int16_t *table, uint8_t *last_el,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int el)
> >> +{
> >> + ? ?memmove(table + el, table + el + 1, (*last_el - el)*sizeof(int16_t));
> >> + ? ?(*last_el)--;
> >> +}
> >
> > please use variable names that make sense, that is size or length not last_el
> > and el should be index
> >
> 
> el(ement) and last_el(ement) make plenty of sense to me.

it can very easily be understood as:

for(i=0; table[i] != el; i++)
    ;
memmove(table + i, table + i + 1, ....

i think index and length are simply clearer

[...]

> 
> >
> >
> >> + ? ? ? ? ? ?if (sbr->f_tablelim[k] == sbr->f_tablelim[k-1] ||
> >> + ? ? ? ? ? ? ? ?!in_table_int16(patch_borders, sbr->num_patches, sbr->f_tablelim[k]))
> >> + ? ? ? ? ? ? ? ?remove_table_element_int16(sbr->f_tablelim, &sbr->n_lim, k);
> >> + ? ? ? ? ? ?else if (!in_table_int16(patch_borders, sbr->num_patches, sbr->f_tablelim[k-1]))
> >> + ? ? ? ? ? ? ? ?remove_table_element_int16(sbr->f_tablelim, &sbr->n_lim, k-1);
> >> + ? ? ? ? ? ?else
> >> + ? ? ? ? ? ? ? ?k++;
> >> + ? ? ? ?}
> >
> > this code seems somewhat obfuscated
> > we have 2 tables A and B, they are combined into table C and sorted
> > then starting from 0 to size all elements from C are removed that are
> > too close to their previous element unless they where in B
> >
> > this can trivially be done in O(|C|) time your code though needs
> > O(|C|*|C|) besides that the code seems messy which probably is the
> > bigger issue.
> > note, iam not asking you to make it faster if speed is irrelevant but
> > it could be made less messy i think
> >
> 
> I've simplified the logic slightly by removing the continue.
> It looks O(|C||B|) and fairly clean to me. If you have an idea that's
> less messy, please propose it.

i would use 2 arrays, that way remove_table_element_int16 would become
in++
or
out--
and the keep case would become
*out++ = *in++

> 
[...]
> >
> > also i seriously doubt that the whole ac^k needs to be scaned for the
> > minimum element, at the least you can stop once elements are more than 1 or
> > 2 larger than the minimum, and that assumes rounding has a significant
> > effect
> >
> 
> If you wish to pursue this please provide a benchmark justifying the
> extra complexity

Its surely faster, i dont know what effect it has overall, i guess this
can be tested after commit ...

> 
> >
> > [...]
> >> +
> >> +/// Master Frequency Band Table (14496-3 sp04 p194)
> >> +static int sbr_make_f_master(AACContext *ac, SpectralBandReplication *sbr,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? SpectrumParameters *spectrum)
> >> +{
> >> + ? ?unsigned int temp;
> >> + ? ?unsigned int start_min, stop_min;
> >> + ? ?int k;
> >> + ? ?const uint8_t *sbr_offset_ptr;
> >> + ? ?int16_t stop_dk[13];
> >> +
> >> + ? ?if (sbr->sample_rate < 32000) {
> >> + ? ? ? ?temp = 3000;
> >> + ? ?} else if (sbr->sample_rate < 64000) {
> >> + ? ? ? ?temp = 4000;
> >> + ? ?} else
> >> + ? ? ? ?temp = 5000;
> >> +
> >> + ? ?start_min = ((temp << 7) + (sbr->sample_rate >> 1)) / sbr->sample_rate;
> >> + ? ?stop_min ?= ((temp << 8) + (sbr->sample_rate >> 1)) / sbr->sample_rate;
> >> +
> >> + ? ?switch (sbr->sample_rate) {
> >> + ? ?case 16000:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[0];
> >> + ? ? ? ?break;
> >> + ? ?case 22050:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[1];
> >> + ? ? ? ?break;
> >> + ? ?case 24000:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[2];
> >> + ? ? ? ?break;
> >> + ? ?case 32000:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[3];
> >> + ? ? ? ?break;
> >> + ? ?case 44100: case 48000: case 64000:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[4];
> >> + ? ? ? ?break;
> >> + ? ?case 88200: case 96000: case 128000: case 176400: case 192000:
> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[5];
> >> + ? ? ? ?break;
> >> + ? ?default:
> >> + ? ? ? ?av_log(ac->avccontext, AV_LOG_ERROR,
> >> + ? ? ? ? ? ? ? "Unsupported sample rate for SBR: %d\n", sbr->sample_rate);
> >> + ? ? ? ?return -1;
> >> + ? ?}
> >> +
> >> + ? ?sbr->k[0] = start_min + sbr_offset_ptr[spectrum->bs_start_freq];
> >> +
> >> + ? ?if (spectrum->bs_stop_freq < 14) {
> >> + ? ? ? ?sbr->k[2] = stop_min;
> >
> >> + ? ? ? ?make_bands(stop_dk, stop_min, 64, 13);
> >> + ? ? ? ?qsort(stop_dk, 13, sizeof(stop_dk[0]), qsort_comparison_function_int16);
> >
> > just confirming, this array is not guranteed to be sorted already?
> >
> 
> If it comes out of make_bands it isn't sorted.
> 
> If you inspect the history you will see adding these sorts fixed specific bugs.

understood, though if this is speed relevant, a insertion sort or some
sort that is fast for nearly sorted data would be better than qsort()

[...]
> >
> > [...]
> >> + ? ?// temp == max number of QMF subbands
> >> + ? ?if (sbr->sample_rate <= 32000) {
> >> + ? ? ? ?temp = 48;
> >> + ? ?} else if (sbr->sample_rate == 44100) {
> >> + ? ? ? ?temp = 35;
> >> + ? ?} else if (sbr->sample_rate >= 48000)
> >> + ? ? ? ?temp = 32;
> >> +
> >> + ? ?if (sbr->k[2] - sbr->k[0] > temp) {
> >> + ? ? ? ?av_log(ac->avccontext, AV_LOG_ERROR,
> >> + ? ? ? ? ? ? ? "Invalid bitstream, too many QMF subbands: %d\n", sbr->k[2] - sbr->k[0]);
> >> + ? ? ? ?return -1;
> >> + ? ?}
> >
> >
> >
> >> +
> >> + ? ?return 0;
> >> +}
> >> +
> >> +/// High Frequency Generation - Patch Construction (14496-3 sp04 p216 fig. 4.46)
> >> +static int sbr_hf_calc_npatches(AACContext *ac, SpectralBandReplication *sbr)
> >> +{
> >> + ? ?int i, k, sb = 0;
> >
> >> + ? ?int msb = sbr->k[0];
> >> + ? ?int usb = sbr->k[4];
> >
> > what is usb
> Possibly upper subband?
> > what is msb
> Possibly middle subband?

"possibly"? you dont know what the code you submit means?

[...]
> > and the condition can probably be written wthout odd
> > maybe:
> > sb > ((msb - 1)&~1) + sbr->k[0]
> > works?
> >
> 
> Is there an advantage to eliminating odd?

it appears cleaner to me, i see no other advanatge

[...]

> >> +static void sbr_grid_copy(SBRData *dst, const SBRData *src) {
> >> + ? ?//These variables are saved from the previous frame rather than copied
> >> + ? ?dst->bs_freq_res[0] = dst->bs_freq_res[dst->bs_num_env[1]];
> >> + ? ?dst->bs_num_env[0] ?= dst->bs_num_env[1];
> >> +
> >
> >> + ? ?//These variables are read from the bitstream and therefore copied
> >> + ? ?memcpy(dst->bs_freq_res+1, src->bs_freq_res+1, sizeof(dst->bs_freq_res)-1);
> >> + ? ?memcpy(dst->bs_num_env+1, ?src->bs_num_env+1, ?sizeof(dst->bs_num_env)-1);
> >
> > the len is only correct when its a (u)int8_t
> >
> 
> And it is. What a "lucky" coincidence.

Designed with the hope it wont be changed by anyone ever
-sizeof(*dst->bs_freq_res)
would make it more robust against such change

[...]
> > [...]
> >> +static void sbr_sinusoidal_coding(SpectralBandReplication *sbr,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?GetBitContext *gb, SBRData *ch_data)
> >> +{
> >> + ? ?int i;
> >> + ? ?for (i = 0; i < sbr->n[1]; i++)
> >> + ? ? ? ?ch_data->bs_add_harmonic[i] = get_bits1(gb);
> >> +}
> >
> > i guess this can be consdered a duplicate of all the other identical
> > cases that read a single linear array of n bits
> >
> 
> Factored out.
> Maybe functionality should be added to get_bits.h?

dunno, not really important currently

> 
> >
> > [...]
> >> +static void sbr_single_channel_element(AACContext *ac,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SpectralBandReplication *sbr,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GetBitContext *gb)
> >> +{
> >> + ? ?if (get_bits1(gb)) // bs_data_extra
> >> + ? ? ? ?skip_bits(gb, 4); // bs_reserved
> >> +
> >> + ? ?sbr_grid(ac, sbr, gb, &sbr->data[0]);
> >> + ? ?sbr_dtdf(sbr, gb, &sbr->data[0]);
> >> + ? ?sbr_invf(sbr, gb, &sbr->data[0]);
> >> + ? ?sbr_envelope(sbr, gb, &sbr->data[0], 0);
> >> + ? ?sbr_noise(sbr, gb, &sbr->data[0], 0);
> >> +
> >> + ? ?if ((sbr->data[0].bs_add_harmonic_flag = get_bits1(gb)))
> >> + ? ? ? ?sbr_sinusoidal_coding(sbr, gb, &sbr->data[0]);
> >> +}
> >
> > please name functions that primarely read bitstream data consistently
> > like read_sbr_grid()
> > so they can be distinguished from functions performing bitstream reader
> > independant calculations like apply_mdct()
> >
> 
> The fact that they take gb and the first or second parameter should be
> a huge clue.
> Never the less they have been renamed.

thank you

> 
> >
> >> +
> >> +static void sbr_channel_pair_element(AACContext *ac,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SpectralBandReplication *sbr,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GetBitContext *gb)
> >> +{
> >> + ? ?if (get_bits1(gb)) ? ?// bs_data_extra
> >> + ? ? ? ?skip_bits(gb, 8); // bs_reserved
> >> +
> >
> >> + ? ?if ((sbr->bs_coupling = get_bits1(gb))) {
> >> + ? ? ? ?sbr_grid(ac, sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?sbr_grid_copy(&sbr->data[1], &sbr->data[0]);
> >> + ? ? ? ?sbr_dtdf(sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?sbr_dtdf(sbr, gb, &sbr->data[1]);
> >> + ? ? ? ?sbr_invf(sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?memcpy(sbr->data[1].bs_invf_mode[1], sbr->data[1].bs_invf_mode[0], sizeof(sbr->data[1].bs_invf_mode[0]));
> >> + ? ? ? ?memcpy(sbr->data[1].bs_invf_mode[0], sbr->data[0].bs_invf_mode[0], sizeof(sbr->data[1].bs_invf_mode[0]));
> >> + ? ? ? ?sbr_envelope(sbr, gb, &sbr->data[0], 0);
> >> + ? ? ? ?sbr_noise(sbr, gb, &sbr->data[0], 0);
> >> + ? ? ? ?sbr_envelope(sbr, gb, &sbr->data[1], 1);
> >> + ? ? ? ?sbr_noise(sbr, gb, &sbr->data[1], 1);
> >> + ? ?} else {
> >> + ? ? ? ?sbr_grid(ac, sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?sbr_grid(ac, sbr, gb, &sbr->data[1]);
> >> + ? ? ? ?sbr_dtdf(sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?sbr_dtdf(sbr, gb, &sbr->data[1]);
> >> + ? ? ? ?sbr_invf(sbr, gb, &sbr->data[0]);
> >> + ? ? ? ?sbr_invf(sbr, gb, &sbr->data[1]);
> >> + ? ? ? ?sbr_envelope(sbr, gb, &sbr->data[0], 0);
> >> + ? ? ? ?sbr_envelope(sbr, gb, &sbr->data[1], 1);
> >> + ? ? ? ?sbr_noise(sbr, gb, &sbr->data[0], 0);
> >> + ? ? ? ?sbr_noise(sbr, gb, &sbr->data[1], 1);
> >> + ? ?}
> >
> > can be simplified by factorizing things out
> >
> 
> Can be "simplified" at the expense of a more complicated control flow

yes but fewer calls reduce the object size when the calls have been inlined
by the compiler. Not a big issue at the moment if you prefer it this way

> 
> >
> > [...]
> >> +/// SBR Envelope and Noise Floor Decoding (14496-3 sp04 p201)
> >> +static void sbr_env_noise_floors(SpectralBandReplication *sbr, SBRData *ch_data,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int ch)
> >> +{
> >> + ? ?int delta = (ch == 1 && sbr->bs_coupling == 1) + 1;
> >> + ? ?int i, k, L;
> >> + ? ?const int temp = sbr->n[1] & 1;
> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
> >> + ? ? ? ?if (ch_data->bs_df_env[L]) {
> >> + ? ? ? ? ? ?// bs_freq_res[0] == bs_freq_res[bs_num_env[1]] from prev frame
> >> + ? ? ? ? ? ?if (ch_data->bs_freq_res[L + 1] == ch_data->bs_freq_res[L]) {
> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++)
> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][k] + delta * ch_data->bs_data_env[L][k];
> >> + ? ? ? ? ? ?} else if (ch_data->bs_freq_res[L + 1]) {
> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++) {
> >> + ? ? ? ? ? ? ? ? ? ?i = (k + temp) >> 1; // find i such that f_tablelow[i] <= f_tablehigh[k] < f_tablelow[i + 1]
> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][i] + delta * ch_data->bs_data_env[L][k];
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?} else {
> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++) {
> >> + ? ? ? ? ? ? ? ? ? ?i = k ? 2*k - temp : 0; // find i such that f_tablehigh[i] == f_tablelow[k]
> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][i] + delta * ch_data->bs_data_env[L][k];
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?}
> >> + ? ? ? ?} else {
> >> + ? ? ? ? ? ?ch_data->env_facs[L + 1][0] = delta * ch_data->bs_data_env[L][0];
> >> + ? ? ? ? ? ?for (k = 1; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++)
> >> + ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L + 1][k - 1] + delta * ch_data->bs_data_env[L][k];
> >> + ? ? ? ?}
> >> + ? ?}
> >
> > please dont use l as variable name it looks like 1 in many fonts and
> > makes the code harder to read.
> > It is not reasonable to expect everyone who wishes to read the code to
> > switch fonts or run a search and replace over the code, like i just did
> > so i can review the code.
> >
> 
> Find a font that doesn't suck. l is the canonical variable for
> iterating over SBR envelopes.

I surely can, but
as said this is unreasonable to expect from every reader of ffmpeg code.
and making the code less readable because of your stubbornness seems
quite unreasonable to me.
Please change l to some other identifer

[...]
> >
> >> + ? ? ? ?for (L = 1; L <= sbr->data[0].bs_num_env[1]; L++) {
> >> + ? ? ? ? ? ?for (k = 0; k < sbr->n[sbr->data[0].bs_freq_res[L]]; k++) {
> >> + ? ? ? ? ? ? ? ?float temp1 = exp2f(sbr->data[0].env_facs[L][k] * alpha + 7.0f);
> >> + ? ? ? ? ? ? ? ?float temp2 = exp2f((pan_offset - sbr->data[1].env_facs[L][k]) * alpha);
> >
> > isnt env_facs integer here? i so this can probably be done more
> > efficiently with a LUT
> >
> 
> env_facs can get quite large. The LAV of the delta coded version is
> 60. I'm not sure of the exact bounds.

alpha can be 0.5 or 1
float uses 8bit exponents
if we assume the float does not overflow, that makes <512 entries
but i very seriously doubt that the dynamic range of any variable is that
large
the question is primarely if this is speedrelevant, if not exp2f() is fine
otherwise a table is (if faster) the better choice

[...]
> >
> >> + ? ?return -1;
> >> +}
> >> +
> >
> >> +/// Generate the subband filtered lowband
> >> +static int sbr_lf_gen(AACContext *ac, SpectralBandReplication *sbr,
> >> + ? ? ? ? ? ? ? ? ? ? ?float X_low[32][40][2], float W[2][32][32][2]) {
> >
> > nothing being const looks suspicous
> >
> 
> not being const is the easiest way to keep gcc from getting pissy
> while looking for real warnings.

gcc does not complain if all your variables are marked properly
it just complains if a subset is marked const

> "Fixed"
> 
> >
> >> + ? ?int k, L;
> >> + ? ?const int t_HFGen = 8;
> >> + ? ?const int l_f = 32;
> >> + ? ?memset(X_low, 0, 32*sizeof(*X_low));
> >> + ? ?for (k = 0; k < sbr->k[4]; k++) {
> >> + ? ? ? ?for (L = t_HFGen; L < l_f + t_HFGen; L++) {
> >> + ? ? ? ? ? ?X_low[k][L][0] = W[1][L - t_HFGen][k][0];
> >> + ? ? ? ? ? ?X_low[k][L][1] = W[1][L - t_HFGen][k][1];
> >> + ? ? ? ?}
> >> + ? ?}
> >> + ? ?for (k = 0; k < sbr->k[3]; k++) {
> >> + ? ? ? ?for (L = 0; L < t_HFGen; L++) {
> >> + ? ? ? ? ? ?X_low[k][L][0] = W[0][L + l_f - t_HFGen][k][0];
> >> + ? ? ? ? ? ?X_low[k][L][1] = W[0][L + l_f - t_HFGen][k][1];
> >> + ? ? ? ?}
> >> + ? ?}
> >
> > the memset above seems to set some elements to 0 that are never read
> 
> It avoids per k calls to memset as well as extra logic at the bottom.
> maybe rectangle filling functions can be used here?

whatever you prefer but a 10kb memset is not good if it can be reduced

> 
> >
> > also cant this copying around be avoided and "W" be accessed directly
> > by the code that uses X_low?
> >
> 
> It breaks down to a trade off of what's more offensive, the extra
> copying or all the extra logic needed to map X_low to W[0] W[1] or 0
> appropriately.

id say its more a question of whats faster&smaller than whats more
offensive

[...]
> > can that be avoided?
> > and before you reply that the *dct needs it in that order in the next
> > function, question is why is it not in that order to begin with?
> >
> 
> Also in most o the rest of the code things are used in real/im pairs
> so it makes sense to keep them next to eachother
> 

> >
> > [...]
> >> + ? ? ? ? ? ? ? ?for (i = ilb; i < iub; i++) {
> >> + ? ? ? ? ? ? ? ? ? ?sum += X_high[m + sbr->k[4]][i][0] * X_high[m + sbr->k[4]][i][0] +
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? X_high[m + sbr->k[4]][i][1] * X_high[m + sbr->k[4]][i][1];
> >> + ? ? ? ? ? ? ? ?}
> >
> > dont we have a optimized funtion for taking the dot product of a
> > real valued float vector alraedy?
> >
> 
> With alignment and length constraints that can't be guaranteed.
> Perhaps an unaligned, arbitrary length version would be a useful
> addition to dsputil.

maybe, anyway this isnt important atm

> 
> >
> >> + ? ? ? ? ? ? ? ?e_curr[L][m] = sum * recip_env_size;
> >> + ? ? ? ? ? ?}
> >> + ? ? ? ?}
> >> + ? ?} else {
> >> + ? ? ? ?int k, p;
> >> +
> >> + ? ? ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
> >> + ? ? ? ? ? ?const int env_size = 2 * (ch_data->t_env[L + 1] - ch_data->t_env[L]);
> >> + ? ? ? ? ? ?int ilb = ch_data->t_env[L] ? ? * 2 + ENVELOPE_ADJUSTMENT_OFFSET;
> >> + ? ? ? ? ? ?int iub = ch_data->t_env[L + 1] * 2 + ENVELOPE_ADJUSTMENT_OFFSET;
> >> + ? ? ? ? ? ?const uint16_t *table = ch_data->bs_freq_res[L + 1] ? sbr->f_tablehigh : sbr->f_tablelow;
> >> +
> >> + ? ? ? ? ? ?for (p = 0; p < sbr->n[ch_data->bs_freq_res[L + 1]]; p++) {
> >> + ? ? ? ? ? ? ? ?float sum = 0.0f;
> >> + ? ? ? ? ? ? ? ?const int den = env_size * (table[p + 1] - table[p]);
> >> +
> >> + ? ? ? ? ? ? ? ?for (k = table[p]; k < table[p + 1]; k++) {
> >
> >> + ? ? ? ? ? ? ? ? ? ?for (i = ilb; i < iub; i++) {
> >> + ? ? ? ? ? ? ? ? ? ? ? ?sum += X_high[k][i][0] * X_high[k][i][0] +
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? X_high[k][i][1] * X_high[k][i][1];
> >> + ? ? ? ? ? ? ? ? ? ?}
> >
> > duplicate
> 
> It's a simple dot product. There are also some duplicate i++s
> scattered across the code as well.

dot_product() or a call to dsp would be nicer 

> 
> >
> >
> > [...]
> >> + ? ?static const int8_t phi[2][4] = {
> >> + ? ? ? ?{ ?1, ?0, -1, ?0}, // real
> >> + ? ? ? ?{ ?0, ?1, ?0, -1}, // imaginary
> >> + ? ?};
> >> + ? ?float (*g_temp)[48] = ch_data->g_temp, (*q_temp)[48] = ch_data->q_temp;
> >> + ? ?int indexnoise = ch_data->f_indexnoise;
> >> + ? ?int indexsine ?= ch_data->f_indexsine;
> >> + ? ?memcpy(Y[0], Y[1], sizeof(Y[0]));
> >> +
> >> + ? ?if (sbr->reset) {
> >> + ? ? ? ?for (i = 0; i < h_SL; i++) {
> >> + ? ? ? ? ? ?memcpy(g_temp[i + 2*ch_data->t_env[0]], sbr->gain[0], m_max * sizeof(sbr->gain[0][0]));
> >> + ? ? ? ? ? ?memcpy(q_temp[i + 2*ch_data->t_env[0]], sbr->q_m[0], ?m_max * sizeof(sbr->q_m[0][0]));
> >> + ? ? ? ?}
> >> + ? ?} else if (h_SL) {
> >> + ? ? ? ?memcpy(g_temp[2*ch_data->t_env[0]], g_temp[2*ch_data->t_env_num_env_old], 4*sizeof(g_temp[0]));
> >> + ? ? ? ?memcpy(q_temp[2*ch_data->t_env[0]], q_temp[2*ch_data->t_env_num_env_old], 4*sizeof(q_temp[0]));
> >> + ? ?}
> >> +
> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
> >> + ? ? ? ?for (i = 2 * ch_data->t_env[L]; i < 2 * ch_data->t_env[L + 1]; i++) {
> >> + ? ? ? ? ? ?memcpy(g_temp[h_SL + i], sbr->gain[L], m_max * sizeof(sbr->gain[0][0]));
> >> + ? ? ? ? ? ?memcpy(q_temp[h_SL + i], sbr->q_m[L], ?m_max * sizeof(sbr->q_m[0][0]));
> >> + ? ? ? ?}
> >> + ? ?}
> >> +
> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
> >> + ? ? ? ?for (i = 2 * ch_data->t_env[L]; i < 2 * ch_data->t_env[L + 1]; i++) {
> >> + ? ? ? ? ? ?int phi_sign = (1 - 2*(kx & 1));
> >> +
> >> + ? ? ? ? ? ?if (h_SL && L != l_a[0] && L != l_a[1]) {
> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
> >> + ? ? ? ? ? ? ? ? ? ?const int idx1 = i + h_SL;
> >> + ? ? ? ? ? ? ? ? ? ?float g_filt = 0.0f;
> >> + ? ? ? ? ? ? ? ? ? ?for (j = 0; j <= h_SL; j++)
> >> + ? ? ? ? ? ? ? ? ? ? ? ?g_filt += g_temp[idx1 - j][m] * h_smooth[j];
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] =
> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][0] * g_filt;
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] =
> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][1] * g_filt;
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?} else {
> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
> >> + ? ? ? ? ? ? ? ? ? ?const float g_filt = g_temp[i + h_SL][m];
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] =
> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][0] * g_filt;
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] =
> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][1] * g_filt;
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?}
> >> +
> >
> >> + ? ? ? ? ? ?if (L != l_a[0] && L != l_a[1]) {
> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
> >> + ? ? ? ? ? ? ? ? ? ?indexnoise = (indexnoise + 1) & 0x1ff;
> >> + ? ? ? ? ? ? ? ? ? ?if (sbr->s_m[L][m]) {
> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * phi[0][indexsine];
> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * (phi[1][indexsine] * phi_sign);
> >> + ? ? ? ? ? ? ? ? ? ?} else {
> >> + ? ? ? ? ? ? ? ? ? ? ? ?float q_filt;
> >> + ? ? ? ? ? ? ? ? ? ? ? ?if (h_SL) {
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?const int idx1 = i + h_SL;
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt = 0.0f;
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?for (j = 0; j <= h_SL; j++)
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt += q_temp[idx1 - j][m] * h_smooth[j];
> >> + ? ? ? ? ? ? ? ? ? ? ? ?} else {
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt = q_temp[i][m];
> >> + ? ? ? ? ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt * sbr_noise_table[indexnoise][0];
> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt * sbr_noise_table[indexnoise][1];
> >> + ? ? ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ? ? ? ? ?phi_sign = -phi_sign;
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?} else {
> >> + ? ? ? ? ? ? ? ?indexnoise = (indexnoise + m_max) & 0x1ff;
> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * phi[0][indexsine];
> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
> >> + ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * (phi[1][indexsine] * phi_sign);
> >> + ? ? ? ? ? ? ? ? ? ?phi_sign = -phi_sign;
> >> + ? ? ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?}
> >> + ? ? ? ? ? ?indexsine = (indexsine + 1) & 3;
> >
> > all the multiplies by phi* are 0,1,-1
> > half of them are by 0
> >
> >
> 
> Additional loop unrolling can be done at a later date. This is
> sufficient for the time being.

I thought more of writting it along the lines of:
-Y[1][i][m + kx][0] += sbr->s_m[L][m] * phi[0][indexsine];
-Y[1][i][m + kx][1] += sbr->s_m[L][m] * phi[1][indexsine] * phi_sign;
+Y[1][i][m + kx][idx] += sbr->s_m[L][m] * phi_sign;

> 
> >
> > [...]
> >> +
> >> +/**
> >> + * Spectral Band Replication header - spectrum parameters that invoke a reset if they differ from the previous header.
> >> + */
> >> +typedef struct {
> >> + ? ?uint8_t bs_start_freq;
> >> + ? ?uint8_t bs_stop_freq;
> >> + ? ?uint8_t bs_xover_band;
> >> +
> >> + ? ?/**
> >> + ? ? * @defgroup bs_header_extra_1 ? ? Variables associated with bs_header_extra_1
> >> + ? ? * @{
> >> + ? ? */
> >> + ? ?uint8_t bs_freq_scale;
> >> + ? ?uint8_t bs_alter_scale;
> >> + ? ?uint8_t bs_noise_bands;
> >> + ? ?/** @} */
> >> +} SpectrumParameters;
> >> +
> >> +#define SBR_SYNTHESIS_BUF_SIZE ((1280-128)*2)
> >> +
> >> +/**
> >> + * Spectral Band Replication per channel data
> >> + */
> >> +typedef struct {
> >> + ? ?/**
> >> + ? ? * @defgroup bitstream ? ? Main bitstream data variables
> >> + ? ? * @{
> >> + ? ? */
> >> + ? ?uint8_t ? ? ? ? ? ?bs_frame_class;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_add_harmonic_flag;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_env[2];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_freq_res[7];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_var_bord[2];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_rel[2];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_rel_bord[2][3];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_pointer;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_noise;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_df_env[5];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_df_noise[2];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_invf_mode[2][5];
> >> + ? ?int32_t ? ? ? ? ? ?bs_data_env[7][32];
> >> + ? ?int32_t ? ? ? ? ? ?bs_data_noise[2][5];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_add_harmonic[48];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_amp_res;
> >> + ? ?/** @} */
> >> +
> >> + ? ?/**
> >> + ? ? * @defgroup state ? ? ? ? State variables
> >> + ? ? * @{
> >> + ? ? */
> >> + ? ?DECLARE_ALIGNED(16, float, synthesis_filterbank_samples)[SBR_SYNTHESIS_BUF_SIZE];
> >> + ? ?DECLARE_ALIGNED(16, float, analysis_filterbank_samples) [1312];
> >> + ? ?int ? ? ? ? ? ? ? ?synthesis_filterbank_samples_offset;
> >> + ? ?int ? ? ? ? ? ? ? ?l_a[2];
> >> + ? ?float ? ? ? ? ? ? ?bw_array[2][5];
> >> + ? ?float ? ? ? ? ? ? ?W[2][32][32][2];
> >> + ? ?float ? ? ? ? ? ? ?Y[2][38][64][2];
> >> + ? ?float ? ? ? ? ? ? ?g_temp[42][48];
> >> + ? ?float ? ? ? ? ? ? ?q_temp[42][48];
> >> + ? ?uint8_t ? ? ? ? ? ?s_indexmapped[8][48];
> >> + ? ?float ? ? ? ? ? ? ?env_facs[6][48];
> >> + ? ?float ? ? ? ? ? ? ?noise_facs[3][5];
> >> + ? ?uint8_t ? ? ? ? ? ?t_env[8];
> >> + ? ?uint8_t ? ? ? ? ? ?t_env_num_env_old;
> >> + ? ?uint8_t ? ? ? ? ? ?t_q[3];
> >> + ? ?uint16_t ? ? ? ? ? f_indexnoise;
> >> + ? ?uint8_t ? ? ? ? ? ?f_indexsine;
> >> + ? ?/** @} */
> >> +} SBRData;
> >> +
> >> +/**
> >> + * Spectral Band Replication
> >> + */
> >> +typedef struct {
> >> + ? ?int32_t ? ? ? ? ? ?sample_rate;
> >> + ? ?uint8_t ? ? ? ? ? ?start;
> >> + ? ?uint8_t ? ? ? ? ? ?reset;
> >> + ? ?SpectrumParameters spectrum_params[2];
> >> + ? ?uint8_t ? ? ? ? ? ?bs_amp_res_header;
> >> + ? ?/**
> >> + ? ? * @defgroup bs_header_extra_2 ? ? variables associated with bs_header_extra_2
> >> + ? ? * @{
> >> + ? ? */
> >> + ? ?uint8_t ? ? ? ? ? ?bs_limiter_bands;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_limiter_gains;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_interpol_freq;
> >> + ? ?uint8_t ? ? ? ? ? ?bs_smoothing_mode;
> >> + ? ?/** @} */
> >> + ? ?uint8_t ? ? ? ? ? ?bs_coupling;
> >
> >> + ? ?uint8_t ? ? ? ? ? ?k[5]; ///< k0, k1, k2, kx', and kx respectively
> >
> > this seems never used as an array, why is it one?
> >
> 
> I don't know.
> 
> kx and kx' are together due to convention
> 

> > also why are so many variables fixed small sizes? this can cause a
> > speed loss on some architectures, and even x86 with gcc seems to have a
> > dislike for such things from what ive seen recently in h264 optims
> >
> 
> My guess is it was done like this because the SBR context is so large
> and per channel.

what speed effect does it have to change them to int (except (large) arrays) ?

> 
> >
> >> + ? ?uint8_t ? ? ? ? ? ?m[2]; ///< M' and M respectively
> >> + ? ?uint8_t ? ? ? ? ? ?n_master;
> >> + ? ?SBRData ? ? ? ? ? ?data[2];
> >> + ? ?uint8_t ? ? ? ? ? ?n[2]; ///< n_low and n_high respectively
> >> + ? ?uint8_t ? ? ? ? ? ?n_q;
> >> + ? ?uint8_t ? ? ? ? ? ?n_lim;
> >> + ? ?uint16_t ? ? ? ? ? f_master[49];
> >> + ? ?uint16_t ? ? ? ? ? f_tablelow[25];
> >> + ? ?uint16_t ? ? ? ? ? f_tablehigh[49];
> >> + ? ?uint16_t ? ? ? ? ? f_tablenoise[6];
> >> + ? ?uint16_t ? ? ? ? ? f_tablelim[29];
> > [...]
> >> + ? ?float ? ? ? ? ? ? ?X_low[32][40][2];
> >> + ? ?float ? ? ? ? ? ? ?X_high[64][40][2];
> >> + ? ?DECLARE_ALIGNED(16, float, X)[2][32][64];
> >> + ? ?float ? ? ? ? ? ? ?alpha0[64][2];
> >> + ? ?float ? ? ? ? ? ? ?alpha1[64][2];
> >> + ? ?float ? ? ? ? ? ? ?e_origmapped[7][48];
> >> + ? ?float ? ? ? ? ? ? ?q_mapped[7][48];
> >> + ? ?uint8_t ? ? ? ? ? ?s_mapped[7][48];
> >> + ? ?float ? ? ? ? ? ? ?e_curr[7][48];
> >> + ? ?float ? ? ? ? ? ? ?q_m[7][48];
> >> + ? ?float ? ? ? ? ? ? ?s_m[7][48];
> >
> > It would tremendously help code readability and thus also me reviewing
> > if these variables where things that made sense and did not require one
> > to look _each_ up in a spec that isnt freely available.
> >
> 
> I find it much easier when writing the code and fixing bugs if the
> variables do match the spec rather than having to look across some
> equivalence table.

So you dont know yourself what these variables mean.
its just 2 funny letters that are easy to find in the spec.
the problem with this is that understanding is a prerequesite for writing
a good implementation. Good both in speed as well as readability.
With h264 for example this would amount to 5 recursive calls through
functions instead of x[-1] to get the left pixel. Its clear the aac spec
is not written that poorly but still id expect quite a bit of simplifications
to become obvious if one could look at a piece of code and actually
understand what the variables used contain instead of having to look that up
for each in some equivalence table.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No great genius has ever existed without some touch of madness. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100306/7e03bfd2/attachment.pgp>