[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder

Sat May 8 20:57:39 CEST 2010

On Wed, 2010-05-05 at 18:26 +0200, Michael Niedermayer wrote:
> On Sun, May 02, 2010 at 08:06:47PM +0200, Francesco Lavra wrote:
> > Hi,
> [...]
> > +    if (avctx->frame_size > 0) {
> > +        if (avctx->frame_size != NBLOCKS * BLOCKSIZE) {
> > +            av_log(avctx, AV_LOG_ERROR, "invalid block size: %d\n",
> > +                   avctx->frame_size);
> > +            return -1;
> > +        }
> > +    } else {
> > +        avctx->frame_size = NBLOCKS * BLOCKSIZE;
> > +    }
> 
> is this complexity needed?

I took that piece of code from flacenc.c, but it seems I chose the worst
file to copy from, since that check on avctx->frame_size is not done
anywhere else. Removed.

> > +/**
> > + * Quantizes a value by searching a table for the element with the nearest value
> > + * @param value value to quantize
> > + * @param table array containing the quantization table
> > + * @param size size of the quantization table
> > + * @return index of the quantization table corresponding to the element with the
> > + *         nearest value
> > + */
> 
> > +static int quantize(int value, const int16_t *table, unsigned int size)
> > +{
> > +    int i, best_index;
> > +    int err, min_err;
> > +
> > +    best_index = 0;
> > +    min_err = FFABS(value - table[0]);
> > +    for (i = 1; i < size; i++) {
> > +        err = FFABS(value - table[i]);
> > +        if (err < min_err) {
> > +            best_index = i;
> > +            min_err = err;
> > +        }
> > +    }
> > +    return best_index;
> > +}
> 
> can be done faster through binary search

Done.

> > +/**
> > + * Calculates match score and gain of an LPC-filtered vector with respect to
> > +   input data
> > + * @param block array used to calculate the filtered vector
> > + * @param coefs coefficients of the LPC filter
> > + * @param vect original vector
> > + * @param data input data
> > + * @param score pointer to variable where match score is returned
> > + * @param gain pointer to variable where gain is returned
> > + */
> > +static void get_match_score(int16_t *block, const int16_t *coefs,
> > +                            const int16_t *vect, const int16_t *data,
> > +                            float *score, int *gain)
> > +{
> > +    float c, g;
> > +    int i;
> > +
> > +    if (ff_celp_lp_synthesis_filter(block, coefs, vect, BLOCKSIZE, LPC_ORDER, 1,
> > +                                    0x800)) {
> > +        *score = 0;
> > +        return;
> > +    }
> > +    c = g = 0;
> > +    for (i = 0; i < BLOCKSIZE; i++) {
> > +        g += block[i] * block[i];
> > +        c += data[i] * block[i];
> > +    }
> > +    if (!g || (c <= 0)) {
> 
> the !g check is redundant

Why? If a codebook vector gets zeroed by the LPC filter, g will be zero,
and we don't want the match score to be NaN.

> [...]
> > +
> > +
> > +/**
> > + * Performs gain quantization
> > + * @param block array used to calculate filtered vectors
> > + * @param lpc_coefs coefficients of the LPC filter
> > + * @param cba_vect vector containing the best entry from the adaptive codebook,
> > + *                 or NULL if the adaptive codebook is not used
> > + * @param cb1_idx index of the best entry of the first fixed codebook
> > + * @param cb2_idx index of the best entry of the second fixed codebook
> > + * @param rms RMS of the reflection coefficients
> > + * @param data input data
> > + * @return index of the best entry of the gain table
> > + */
> > +static int quantize_gains(int16_t *block, const int16_t *lpc_coefs,
> > +                          const int16_t *cba_vect, int cb1_idx, int cb2_idx,
> > +                          unsigned int rms, const int16_t* data)
> > +{
> > +    float distance, best_distance;
> > +    int i, n, index;
> > +    unsigned int m[3];
> > +    int16_t exc[BLOCKSIZE]; /**< excitation vector */
> > +
> > +    if (cba_vect)
> > +        m[0] = (irms(cba_vect) * rms) >> 12;
> > +    m[1] = (cb1_base[cb1_idx] * rms) >> 8;
> > +    m[2] = (cb2_base[cb2_idx] * rms) >> 8;
> > +    best_distance = -1;
> 
> FLOAT_MAX

If you meant MAXFLOAT, fixed.

> > +    for (n = 0; n < 256; n++) {
> > +        distance = 0;
> > +        add_wav(exc, n, (int)cba_vect, m, cba_vect, cb1_vects[cb1_idx],
> > +                cb2_vects[cb2_idx]);
> > +        if (ff_celp_lp_synthesis_filter(block, lpc_coefs, exc, BLOCKSIZE,
> > +                                        LPC_ORDER, 1, 0xfff))
> > +            continue;
> > +        for (i = 0; i < BLOCKSIZE; i++)
> > +            distance += (block[i] - data[i]) * (block[i] - data[i]);
> > +        if ((distance < best_distance) || (best_distance < 0)) {
> > +            best_distance = distance;
> > +            index = n;
> > +        }
> > +    }
> 
> id guess this could be done faster than by brute force

I can't think of any algorithm which avoids searching the entire table
without risking to miss the optimal entry; however, I implemented an
empirical method which reduces significantly the encoding time without
audible quality degradation.

> 
> 
> [...]
> > +    /**
> > +     * TODO: orthogonalize the best entry of the adaptive codebook with the
> > +     * basis vectors of the first fixed codebook, and the best entry of the
> > +     * first fixed codebook with the basis vectors of the second fixed codebook.
> > +     */
> 
> yes, also shouldnt the search be iterative instead of just one pass?

I tried inserting several iteration runs to find the optimal entries of
the fixed codebooks, but rarely the entries found on the second and
subsequent iterations are different from the first chioces, and in any
case I couldn't hear any improvement in quality, so the iterative method
doesn't seem to bring any added value.

Daniel's remark about ff_ prefix of non-static symbols has been
addressed, so here is the updated patch series:
This first patch refactors the current code of the RealAudio decoder
such that ra144dec.c will contain code specific to the decoder, ra144.c
will contain code which can be shared between decoder and encoder, and
ra144.h will contain declarations for stuff in ra144.c; this patch must
be preceded by:
svn mv libavcodec/ra144.c libavcodec/ra144dec.c
svn cp libavcodec/ra144.h libavcodec/ra144.c
The second patch adds the ff_ prefix to all non-static symbols resulting
from patch #1.
The third patch contains cosmetic changes deriving from patch #2.
The fourth patch adds the function ff_subblock_synthesis() to ra144.c
and inserts a call to that function in the decoder code, such that more
code will be shared between decoder and encoder.
The fifth patch adds the encoder.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01_ra144enc.patch
Type: text/x-patch
Size: 86213 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100508/51517677/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 02_ra144enc.patch
Type: text/x-patch
Size: 11227 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100508/51517677/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 03_ra144enc.patch
Type: text/x-patch
Size: 1275 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100508/51517677/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 04_ra144enc.patch
Type: text/x-patch
Size: 4105 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100508/51517677/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 05_ra144enc.patch
Type: text/x-patch
Size: 22391 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100508/51517677/attachment-0004.bin>