[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder

Michael Niedermayer michaelni
Sat May 22 03:27:18 CEST 2010


On Sun, May 16, 2010 at 04:46:37PM +0200, Francesco Lavra wrote:
> On Mon, 2010-05-10 at 12:16 +0200, Michael Niedermayer wrote:
> > > > > +/**
> > > > > + * Calculates match score and gain of an LPC-filtered vector with respect to
> > > > > +   input data
> > > > > + * @param block array used to calculate the filtered vector
> > > > > + * @param coefs coefficients of the LPC filter
> > > > > + * @param vect original vector
> > > > > + * @param data input data
> > > > > + * @param score pointer to variable where match score is returned
> > > > > + * @param gain pointer to variable where gain is returned
> > > > > + */
> > > > > +static void get_match_score(int16_t *block, const int16_t *coefs,
> > > > > +                            const int16_t *vect, const int16_t *data,
> > > > > +                            float *score, int *gain)
> > > > > +{
> > > > > +    float c, g;
> > > > > +    int i;
> > > > > +
> > > > > +    if (ff_celp_lp_synthesis_filter(block, coefs, vect, BLOCKSIZE, LPC_ORDER, 1,
> > > > > +                                    0x800)) {
> > > > > +        *score = 0;
> > > > > +        return;
> > > > > +    }
> > > > > +    c = g = 0;
> > > > > +    for (i = 0; i < BLOCKSIZE; i++) {
> > > > > +        g += block[i] * block[i];
> > > > > +        c += data[i] * block[i];
> > > > > +    }
> > > > > +    if (!g || (c <= 0)) {
> > > > 
> > > > the !g check is redundant
> > > 
> > > Why? If a codebook vector gets zeroed by the LPC filter, g will be zero,
> > > and we don't want the match score to be NaN.
> > 
> > g can only be 0 if all block[i] are 0
> > if all block[i] are 0 then c is 0
> 
> Right, fixed.
> 
> > > > > +
> > > > > +
> > > > > +/**
> > > > > + * Performs gain quantization
> > > > > + * @param block array used to calculate filtered vectors
> > > > > + * @param lpc_coefs coefficients of the LPC filter
> > > > > + * @param cba_vect vector containing the best entry from the adaptive codebook,
> > > > > + *                 or NULL if the adaptive codebook is not used
> > > > > + * @param cb1_idx index of the best entry of the first fixed codebook
> > > > > + * @param cb2_idx index of the best entry of the second fixed codebook
> > > > > + * @param rms RMS of the reflection coefficients
> > > > > + * @param data input data
> > > > > + * @return index of the best entry of the gain table
> > > > > + */
> > > > > +static int quantize_gains(int16_t *block, const int16_t *lpc_coefs,
> > > > > +                          const int16_t *cba_vect, int cb1_idx, int cb2_idx,
> > > > > +                          unsigned int rms, const int16_t* data)
> > > > > +{
> > > > > +    float distance, best_distance;
> > > > > +    int i, n, index;
> > > > > +    unsigned int m[3];
> > > > > +    int16_t exc[BLOCKSIZE]; /**< excitation vector */
> > > > > +
> > > > > +    if (cba_vect)
> > > > > +        m[0] = (irms(cba_vect) * rms) >> 12;
> > > > > +    m[1] = (cb1_base[cb1_idx] * rms) >> 8;
> > > > > +    m[2] = (cb2_base[cb2_idx] * rms) >> 8;
> > > > > +    best_distance = -1;
> > > > 
> > > > FLOAT_MAX
> > > 
> > > If you meant MAXFLOAT, fixed.
> > 
> > i meant FLT_MAX as in the C spec
> 
> Fixed.
>  
> > > > > +    for (n = 0; n < 256; n++) {
> > > > > +        distance = 0;
> > > > > +        add_wav(exc, n, (int)cba_vect, m, cba_vect, cb1_vects[cb1_idx],
> > > > > +                cb2_vects[cb2_idx]);
> > > > > +        if (ff_celp_lp_synthesis_filter(block, lpc_coefs, exc, BLOCKSIZE,
> > > > > +                                        LPC_ORDER, 1, 0xfff))
> > > > > +            continue;
> > > > > +        for (i = 0; i < BLOCKSIZE; i++)
> > > > > +            distance += (block[i] - data[i]) * (block[i] - data[i]);
> > > > > +        if ((distance < best_distance) || (best_distance < 0)) {
> > > > > +            best_distance = distance;
> > > > > +            index = n;
> > > > > +        }
> > > > > +    }
> > > > 
> > > > id guess this could be done faster than by brute force
> > > 
> > > I can't think of any algorithm which avoids searching the entire table
> > > without risking to miss the optimal entry;
> > 
> > if you ignore rounding then add_wav does a weighted sum of 3 vectors
> > x*A + y*B + z*C
> > ff_celp_lp_synthesis_filter performs a linear transformation on the result
> > (that could be viewed as a matrix multiplication)
> > M*(x*A + y*B + z*C)
> > and it adds the past samples padded with zero and filtered similarly
> > M*(x*A + y*B + z*C) + P
> > 
> > this can also be written as: (its just the distributive law)
> > x*M*A + y*M*B + z*M*C + P
> > 
> > that is to do something like ff_celp_lp_synthesis_filter() 
> > 1. on the past samples padded with zeros
> > 2. on all the 3 code book vectors
> > but note this ff_celp_lp_synthesis_filter() must not clip values, so maybe
> > the float counterpart (ff_celp_lp_synthesis_filterf) would be easiest here
> > 
> > after that you only have to find the best x,y,z from the table
> > that minimize that (this can be done fast as well but lets look into this
> > once this optimization suceedded (or failed))
> 
> Done as you suggested, using ff_celp_lp_synthesis_filterf().
> 
> > Also once you found the best x,y,z with unclipped floats, it makes
> > sense to run something like the current brute force loop on all
> > entries that scored well in the optimized case. So we do not skip
> > considering rounding
> 
> In the attached implementation, the five best gain entries are found
> with floating point data, and for each of these entries the brute force
> method calculates the actual encoding error. Five is a tweakable
> parameter.
> 

> > > > > +    /**
> > > > > +     * TODO: orthogonalize the best entry of the adaptive codebook with the
> > > > > +     * basis vectors of the first fixed codebook, and the best entry of the
> > > > > +     * first fixed codebook with the basis vectors of the second fixed codebook.
> > > > > +     */
> > > > 
> > > > yes, also shouldnt the search be iterative instead of just one pass?
> > > 
> > > I tried inserting several iteration runs to find the optimal entries of
> > > the fixed codebooks, but rarely the entries found on the second and
> > > subsequent iterations are different from the first chioces, and in any
> > > case I couldn't hear any improvement in quality, so the iterative method
> > > doesn't seem to bring any added value.
> > 
> > did you orthogonalize the entries?
> 
> I didn't, but now I have. 

how much PSNR improvement has this brought?


> No iterative search is performed, since in the
> algorithm at http://focus.ti.com/lit/an/spra136/spra136.pdf there is no
> mention of multiple iterations.

are you serious?
because some 16 year old paper that smells a bit like a description of a low
complexity hardware encoder doesnt do it you dont even try?
hell if we where designing video encoders like that ...

dont missunderstand me here, iam perfectly ok if you dont want to try this
because its too much work or some other reason. but just because some
paper doesnt mention it uhm


[...]
> +/**
> + * Quantizes a value by searching a sorted table for the element with the
> + * nearest value
> + *
> + * @param value value to quantize
> + * @param table array containing the quantization table
> + * @param size size of the quantization table
> + * @return index of the quantization table corresponding to the element with the
> + *         nearest value
> + */
> +static int quantize(int value, const int16_t *table, unsigned int size)
> +{
> +    int error;
> +    int index;
> +    unsigned int low = 0, high = size - 1;
> +
> +    while (1) {

> +        index = (low + high) >> 1;
> +        error = table[index] - value;

declaration and initialization can be merged


> +        if (index == low)
> +            return table[high] + error > value ? low : high;
> +        if (error > 0) {
> +            high = index;
> +        } else {
> +            low = index;
> +        }
> +    }
> +}
> +
> +

> +/**
> + * Orthogonalizes a vector to another vector
> + *
> + * @param v vector to orthogonalize
> + * @param u vector against which orthogonalization is performed
> + */
> +static void orthogonalize(float *v, const float *u)

missing const


> +{
> +    int i;
> +    float num = 0, den = 0;
> +
> +    for (i = 0; i < BLOCKSIZE; i++) {
> +        num += v[i] * u[i];
> +        den += u[i] * u[i];
> +    }

> +    for (i = 0; i < BLOCKSIZE; i++)
> +        v[i] -= (num * u[i]) / den;

num /= den;
for (i = 0; i < BLOCKSIZE; i++)
    v[i] -= num * u[i];


[...]
> +/**
> + * Searches the adaptive codebook for the best entry and gain and removes its
> + * contribution from input data
> + *
> + * @param adapt_cb array from which the adaptive codebook is extracted
> + * @param work array used to calculate LPC-filtered vectors
> + * @param coefs coefficients of the LPC filter
> + * @param data input data
> + * @return index of the best entry of the adaptive codebook
> + */
> +static int adaptive_cb_search(const int16_t *adapt_cb, float *work,
> +                              const float *coefs, float *data)
> +{
> +    int i, j, best_vect;
> +    float score, gain, best_score, best_gain;
> +    float exc[BLOCKSIZE];
> +
> +    gain = best_score = 0;
> +    for (i = BLOCKSIZE / 2; i <= BUFFERSIZE; i++) {
> +        for (j = 0; j < FFMIN(BLOCKSIZE, i); j++)
> +            exc[j] = adapt_cb[BUFFERSIZE - i + j];
> +        if (i < BLOCKSIZE)
> +            for (j = 0; j < BLOCKSIZE - i; j++)
> +                exc[i + j] = adapt_cb[BUFFERSIZE - i + j];
> +        get_match_score(work, coefs, exc, NULL, NULL, data, &score, &gain);
> +        if (score > best_score) {
> +            best_score = score;
> +            best_vect = i;
> +            best_gain = gain;
> +        }
> +    }
> +    if (!best_score)
> +        return 0;
> +
> +    /**
> +     * Re-calculate the filtered vector from the vector with maximum match score
> +     * and remove its contribution from input data.
> +     */

> +    for (j = 0; j < FFMIN(BLOCKSIZE, best_vect); j++)
> +        exc[j] = adapt_cb[BUFFERSIZE - best_vect + j];
> +    if (i < BLOCKSIZE)
> +        for (j = 0; j < BLOCKSIZE - best_vect; j++)
> +            exc[best_vect + j] = adapt_cb[BUFFERSIZE - best_vect + j];

code duplication


> +    ff_celp_lp_synthesis_filterf(work, coefs, exc, BLOCKSIZE, LPC_ORDER);



> +    for (i = 0; i < BLOCKSIZE; i++)
> +        data[i] -= best_gain * work[i];
> +    return (best_vect - BLOCKSIZE / 2 + 1);
> +}
> +
> +

> +/**
> + * Searches the two fixed codebooks for the best entry and gain
> + *
> + * @param work array used to calculate LPC-filtered vectors
> + * @param coefs coefficients of the LPC filter
> + * @param data input data
> + * @param cba_idx index of the best entry of the adaptive codebook
> + * @param cb1_idx pointer to variable where the index of the best entry of the
> + *        first fixed codebook is returned
> + * @param cb2_idx pointer to variable where the index of the best entry of the
> + *        second fixed codebook is returned
> + */
> +static void fixed_cb_search(float *work, const float *coefs, float *data,
> +                            int cba_idx, int *cb1_idx, int *cb2_idx)
> +{
> +    int i, j, ortho_cb1;
> +    float score, gain, best_score, best_gain;
> +    float cba_vect[BLOCKSIZE], cb1_vect[BLOCKSIZE];
> +    float vect[BLOCKSIZE];
> +
> +    /**
> +     * The filtered vector from the adaptive codebook can be retrieved from
> +     * work, because this function is called just after adaptive_cb_search().
> +     */
> +    if (cba_idx)
> +        memcpy(cba_vect, work, sizeof(cba_vect));
> +
> +    *cb1_idx = gain = best_score = best_gain = 0;
> +    for (i = 0; i < FIXED_CB_SIZE; i++) {
> +        for (j = 0; j < BLOCKSIZE; j++)
> +            vect[j] = ff_cb1_vects[i][j];
> +        get_match_score(work, coefs, vect, cba_idx ? cba_vect: NULL, NULL, data,
> +                        &score, &gain);
> +        if (score > best_score) {
> +            best_score = score;
> +            *cb1_idx = i;
> +            best_gain = gain;
> +        }
> +    }
> +
> +    /**
> +     * Re-calculate the filtered vector from the vector with maximum match score
> +     * and remove its contribution from input data.
> +     */
> +    if (best_gain) {
> +        for (j = 0; j < BLOCKSIZE; j++)
> +            vect[j] = ff_cb1_vects[*cb1_idx][j];
> +        ff_celp_lp_synthesis_filterf(work, coefs, vect, BLOCKSIZE, LPC_ORDER);
> +        if (cba_idx)
> +            orthogonalize(work, cba_vect);
> +        for (i = 0; i < BLOCKSIZE; i++)
> +            data[i] -= best_gain * work[i];
> +        memcpy(cb1_vect, work, sizeof(cb1_vect));
> +        ortho_cb1 = 1;
> +    } else
> +        ortho_cb1 = 0;
> +

> +    *cb2_idx = best_score = best_gain = 0;
> +    for (i = 0; i < FIXED_CB_SIZE; i++) {
> +        for (j = 0; j < BLOCKSIZE; j++)
> +            vect[j] = ff_cb2_vects[i][j];
> +        get_match_score(work, coefs, vect, cba_idx ? cba_vect : NULL,
> +                        ortho_cb1 ? cb1_vect : NULL, data, &score, &gain);
> +        if (score > best_score) {
> +            best_score = score;
> +            *cb2_idx = i;
> +            best_gain = gain;
> +        }
> +    }

duplicate


> +}
> +
> +
> +/**
> + * Encode a subblock of the current frame
> + *
> + * @param ractx encoder context
> + * @param sblock_data input data of the subblock
> + * @param lpc_coefs coefficients of the LPC filter
> + * @param rms RMS of the reflection coefficients
> + * @param pb pointer to PutBitContext of the current frame
> + */
> +static void ra144_encode_subblock(RA144Context *ractx,
> +                                  const int16_t *sblock_data,
> +                                  const int16_t *lpc_coefs, unsigned int rms,
> +                                  PutBitContext *pb)
> +{
> +#define NUM_BEST_GAINS  5
> +
> +    float data[BLOCKSIZE], work[LPC_ORDER + BLOCKSIZE];
> +    float coefs[LPC_ORDER];
> +    float zero[BLOCKSIZE], cba[BLOCKSIZE], cb1[BLOCKSIZE], cb2[BLOCKSIZE];
> +    int16_t cba_vect[BLOCKSIZE], exc[BLOCKSIZE];
> +    int cba_idx, cb1_idx, cb2_idx, gain;
> +    int i, n, m[3];
> +    float g[3];
> +    float error, best_errors[NUM_BEST_GAINS];
> +    int indexes[NUM_BEST_GAINS];
> +
> +    for (i = 0; i < LPC_ORDER; i++) {
> +        work[i] = ractx->curr_sblock[BLOCKSIZE + i];
> +        coefs[i] = lpc_coefs[i] / 4096.0;

* (1/4096.0);
multiplies are faster than divides


> +    }
> +
> +    /**
> +     * Calculate the zero-input response of the LPC filter and subtract it from
> +     * input data.
> +     */
> +    memset(data, 0, sizeof(data));
> +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, data, BLOCKSIZE,
> +                                 LPC_ORDER);
> +    for (i = 0; i < BLOCKSIZE; i++) {
> +        zero[i] = work[LPC_ORDER + i];
> +        data[i] = sblock_data[i] - zero[i];
> +    }
> +
> +    /**
> +     * Codebook search is performed without taking into account the contribution
> +     * of the previous subblock, since it has been just subtracted from input
> +     * data.
> +     */
> +    memset(work, 0, LPC_ORDER * sizeof(*work));
> +
> +    cba_idx = adaptive_cb_search(ractx->adapt_cb, work + LPC_ORDER, coefs,
> +                                 data);
> +    if (cba_idx) {
> +        /**
> +         * The filtered vector from the adaptive codebook can be retrieved from
> +         * work, see implementation of adaptive_cb_search().
> +         */
> +        memcpy(cba, work + LPC_ORDER, sizeof(cba));
> +
> +        ff_copy_and_dup(cba_vect, ractx->adapt_cb, cba_idx + BLOCKSIZE / 2 - 1);
> +        m[0] = (ff_irms(cba_vect) * rms) >> 12;
> +    }
> +    fixed_cb_search(work + LPC_ORDER, coefs, data, cba_idx, &cb1_idx, &cb2_idx);
> +    for (i = 0; i < BLOCKSIZE; i++) {
> +        cb1[i] = ff_cb1_vects[cb1_idx][i];
> +        cb2[i] = ff_cb2_vects[cb2_idx][i];
> +    }
> +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb1, BLOCKSIZE,
> +                                 LPC_ORDER);
> +    memcpy(cb1, work + LPC_ORDER, sizeof(cb1));
> +    m[1] = (ff_cb1_base[cb1_idx] * rms) >> 8;
> +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb2, BLOCKSIZE,
> +                                 LPC_ORDER);
> +    memcpy(cb2, work + LPC_ORDER, sizeof(cb2));
> +    m[2] = (ff_cb2_base[cb2_idx] * rms) >> 8;
> +
> +    /**
> +     * Gain quantization is performed taking the NUM_BEST_GAINS best entries
> +     * obtained from floating point data and calculating for each entry the
> +     * actual encoding error with fixed point data.
> +     */
> +    for (i = 0; i < NUM_BEST_GAINS; i++) {
> +        best_errors[i] = FLT_MAX;
> +        indexes[i] = -1;
> +    }
> +    for (n = 0; n < 256; n++) {
> +        g[1] = ((ff_gain_val_tab[n][1] * m[1]) >> ff_gain_exp_tab[n]) / 4096.0;
> +        g[2] = ((ff_gain_val_tab[n][2] * m[2]) >> ff_gain_exp_tab[n]) / 4096.0;
> +        error = 0;
> +        if (cba_idx) {
> +            g[0] = ((ff_gain_val_tab[n][0] * m[0]) >> ff_gain_exp_tab[n]) /
> +                   4096.0;
> +            for (i = 0; i < BLOCKSIZE; i++) {
> +                data[i] = zero[i] + g[0] * cba[i] + g[1] * cb1[i] +
> +                          g[2] * cb2[i];
> +                error += (data[i] - sblock_data[i]) *
> +                         (data[i] - sblock_data[i]);
> +            }
> +        } else {
> +            for (i = 0; i < BLOCKSIZE; i++) {
> +                data[i] = zero[i] + g[1] * cb1[i] + g[2] * cb2[i];
> +                error += (data[i] - sblock_data[i]) *
> +                         (data[i] - sblock_data[i]);
> +            }
> +        }

> +        for (i = 0; i < NUM_BEST_GAINS; i++)
> +            if (error < best_errors[i]) {
> +                best_errors[i] = error;
> +                indexes[i] = n;
> +                break;
> +            }

this does not keep the 5 best
it only gurantees to keep the 1 best

you are testing your changes in terms of PSNR, arent you?
if not, we need to go back to the last patch and test each change
individually.
I  very much prefer naive and slow code compared to optimized but
untested and thus buggy code. we alraedy have a vorbis and aac encoder
</rant>

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100522/052fbfa8/attachment.pgp>



More information about the ffmpeg-devel mailing list