[FFmpeg-devel] Nellymoser encoder

Michael Niedermayer michaelni
Fri Aug 29 22:36:10 CEST 2008


On Fri, Aug 29, 2008 at 08:55:23PM +0200, Bartlomiej Wolowiec wrote:
> Friday 29 August 2008 15:54:32 Michael Niedermayer napisa?(a):
> > On Fri, Aug 29, 2008 at 03:11:59PM +0200, Bartlomiej Wolowiec wrote:
> > > Friday 29 August 2008 00:02:36 Michael Niedermayer napisa?(a):
> > > > > +#define LUT_init_add -3134
> > > > > +#define LUT_init_size 31355 + LUT_init_add
> > > > > +static int LUT_init_table[LUT_init_size];
> > > >
> > > > i do not belive that the table needs to be that large
> > > >
> > > > > +
> > > > > +#define LUT_delta_add 11725
> > > > > +#define LUT_delta_size 12975 + LUT_delta_add
> > > > > +static int LUT_delta_table[LUT_delta_size];
> > > > > +
> > > > > +#define LUT_dequantization_mul 128.0
> > > > > +#define LUT_dequantization_add LUT_dequantization_mul * 2.7
> > > > > +#define LUT_dequantization_size (int)(LUT_dequantization_mul * 2.5 +
> > > > > LUT_dequantization_add) +#define LUT_dequantization_maxbits 6
> > > > > +static int
> > > > > LUT_dequantization_table[LUT_dequantization_maxbits][LUT_dequantizati
> > > > >on_s ize];
> > > >
> > > > neither do i belive that this one needs to be that large
> > > > besides they both can be uint8_t instead of int
> > > >
> > > > and the tables for fewer bits dont need to be as large as the largest
> > >
> > > Ok, I've tried to change sizes of these arrays. Unfortunately, now I have
> > > a problem, because I don't know how I can simply allocate memory for
> > > LUT_dequantization_table so that the whole is thread-safety.
> >
> > drop all the messy stuff and the problems will disapear
> 
> Ok, I cleared it significantly. Now it looks much better. 

yes, i also like it much more now


> 
> > > > > +
> > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in, float *coefs)
> > > > > +{
> > > > > +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > +
> > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES * sizeof(float));
> > > > > +    s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> > > > > +    s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, in_buff +
> > > > > NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN); +
> > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > +}
> > > >
> > > > The data is copied once in encode_frame and twice here
> > > > There is no need to copy the data 3 times.
> > > > vector_fmul can be used with a singl memcpy to get the data into any
> > > > destination, and vector_fmul_reverse doesnt even need 1 memcpy, so
> > > > overall a single memcpy is enough
> > >
> > > Hope that you meant something similar to my solution.
> >
> > no, you still do 2 memcpy() but now the code is really messy as well.
> >
> > what you should do is, for each block of samples you get from the user
> > 1. apply one half of the window onto it with vector_fmul_reverse and
> >    destination of some internal buffer
> > 2. memcpy into the 2nd destination and apply the other half of the
> >    window onto it with vector_fmul
> > 3. run the mdct as appropriate on the internal buffers.
> 
> Hmm, I considered it, but I don't understand exactly what should I change...
> In the code I copy data two times: 
> a) in encode_frame - I convert int16_t to float and copy data to s->buf - I 
> need to do it somewhere because vector_mul requires float *. Additionally, 
> part of the data is needed to the next call of encode_frame
> b) in apply_mdct - here I think that some additional part of buffer is needed.
> If I understood correctly I have to get rid of a), but how to get access to 
> old data when the next call of encode_frame is performed and how call 
> vector_fmul on int16_t?

have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
I think ffmpeg should support this already. If it does not work then we can
keep int16 for now which would implicate more copying


[...]

> Index: libavcodec/nellymoserenc.c
> ===================================================================
> --- libavcodec/nellymoserenc.c	(wersja 0)
> +++ libavcodec/nellymoserenc.c	(wersja 0)
> @@ -0,0 +1,294 @@
> +/*
> + * Nellymoser encoder
> + * This code is developed as part of Google Summer of Code 2008 Program.
> + *
> + * Copyright (c) 2008 Bartlomiej Wolowiec
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file nellymoserenc.c
> + * Nellymoser encoder
> + * by Bartlomiej Wolowiec
> + *
> + * Generic codec information: libavcodec/nellymoserdec.c
> + *
> + * Some information also from: http://www1.mplayerhq.hu/ASAO/ASAO.zip
> + *                             (Copyright Joseph Artsimovich and UAB "DKD")
> + *
> + * for more information about nellymoser format, visit:
> + * http://wiki.multimedia.cx/index.php?title=Nellymoser
> + */
> +
> +#include "nellymoser.h"
> +#include "avcodec.h"
> +#include "dsputil.h"
> +
> +#define BITSTREAM_WRITER_LE
> +#include "bitstream.h"
> +
> +#define POW_TABLE_SIZE (1<<11)
> +#define POW_TABLE_OFFSET 3
> +
> +typedef struct NellyMoserEncodeContext {
> +    AVCodecContext  *avctx;
> +    int             last_frame;

ok (that is all the code from the empty line to here can be commited)


> +    int             bufsel;
> +    int             have_saved;
> +    DSPContext      dsp;
> +    MDCTContext     mdct_ctx;
> +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> +    DECLARE_ALIGNED_16(float, buf[2][3 * NELLY_BUF_LEN]);     ///< sample buffer


> +} NellyMoserEncodeContext;
> +
> +static float pow_table[POW_TABLE_SIZE];     ///< -pow(2, -i / 2048.0 - 3.0);
> +
> +static const uint8_t sf_lut[96] = {
> +     0,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  4,  4,
> +     5,  5,  5,  6,  7,  7,  8,  8,  9, 10, 11, 11, 12, 13, 13, 14,
> +    15, 15, 16, 17, 17, 18, 19, 19, 20, 21, 22, 22, 23, 24, 25, 26,
> +    27, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40,
> +    41, 41, 42, 43, 44, 45, 45, 46, 47, 48, 49, 50, 51, 52, 52, 53,
> +    54, 55, 55, 56, 57, 57, 58, 59, 59, 60, 60, 60, 61, 61, 61, 62,
> +};
> +
> +static const uint8_t sf_delta_lut[78] = {
> +     0,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  4,  4,
> +     4,  5,  5,  5,  6,  6,  7,  7,  8,  8,  9, 10, 10, 11, 11, 12,
> +    13, 13, 14, 15, 16, 17, 17, 18, 19, 19, 20, 21, 21, 22, 22, 23,
> +    23, 24, 24, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28,
> +    28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30,
> +};
> +
> +static const uint8_t quant_lut[230] = {
> +     0,
> +
> +     0,  1,  2,
> +
> +     0,  1,  2,  3,  4,  5,  6,
> +
> +     0,  1,  1,  2,  2,  3,  3,  4,  5,  6,  7,  8,  9, 10, 11, 11,
> +    12, 13, 13, 13, 14,
> +
> +     0,  1,  1,  2,  2,  2,  3,  3,  4,  4,  5,  5,  6,  6,  7,  8,
> +     8,  9, 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
> +    22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 29,
> +    30,
> +
> +     0,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  3,
> +     4,  4,  4,  5,  5,  5,  6,  6,  7,  7,  7,  8,  8,  9,  9,  9,
> +    10, 10, 11, 11, 11, 12, 12, 13, 13, 13, 13, 14, 14, 14, 15, 15,
> +    15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 20, 20, 20,
> +    21, 21, 22, 22, 23, 23, 24, 25, 26, 26, 27, 28, 29, 30, 31, 32,
> +    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 42, 43, 44, 44, 45, 45,
> +    46, 47, 47, 48, 48, 49, 49, 50, 50, 50, 51, 51, 51, 52, 52, 52,
> +    53, 53, 53, 54, 54, 54, 55, 55, 55, 56, 56, 56, 57, 57, 57, 57,
> +    58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 60, 61, 61, 61,
> +    61, 61, 61, 61, 62,
> +};
> +
> +static const float quant_lut_mul[7] = { 0.0,  0.0,  2.0,  2.0,  5.0, 12.0,  36.6 };
> +static const float quant_lut_add[7] = { 0.0,  0.0,  2.0,  7.0, 21.0, 56.0, 157.0 };
> +static const uint8_t quant_lut_offset[8] = { 0, 0, 1, 4, 11, 32, 81, 230 };

ok (yes all the tables can be commited)




> +
> +void apply_mdct(NellyMoserEncodeContext *s)
> +{
> +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> +
> +    memcpy(in_buff, s->buf[s->bufsel], NELLY_BUF_LEN * sizeof(float));
> +    s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> +    s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, s->buf[s->bufsel] + NELLY_BUF_LEN, ff_sine_128,
> +                               NELLY_BUF_LEN);
> +    ff_mdct_calc(&s->mdct_ctx, s->mdct_out, in_buff);
> +
> +    s->dsp.vector_fmul(s->buf[s->bufsel] + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> +    s->dsp.vector_fmul_reverse(s->buf[s->bufsel] + 2 * NELLY_BUF_LEN, s->buf[1 - s->bufsel], ff_sine_128,
> +                               NELLY_BUF_LEN);
> +    ff_mdct_calc(&s->mdct_ctx, s->mdct_out + NELLY_BUF_LEN, s->buf[s->bufsel] + NELLY_BUF_LEN);
> +}
> +

> +static av_cold int encode_init(AVCodecContext *avctx)
> +{
> +    NellyMoserEncodeContext *s = avctx->priv_data;
> +    int i;
> +
> +    if (avctx->channels != 1) {
> +        av_log(avctx, AV_LOG_ERROR, "Nellymoser supports only 1 channel\n");
> +        return -1;
> +    }

ok


> +
> +    if (avctx->sample_rate != 8000 && avctx->sample_rate != 11025 &&
> +        avctx->sample_rate != 22050 && avctx->sample_rate != 44100) {
> +        av_log(avctx, AV_LOG_ERROR, "Nellymoser works only with 8000, 11025, 22050 and 44100 sample rate\n");
> +        return -1;
> +    }

Maybe this could be limited to normal strict_std_compliance values.


> +
> +    avctx->frame_size = NELLY_SAMPLES;
> +    s->avctx = avctx;
> +    ff_mdct_init(&s->mdct_ctx, 8, 0);
> +    dsputil_init(&s->dsp, avctx);
> +
> +    /* Generate overlap window */
> +    ff_sine_window_init(ff_sine_128, 128);
> +    for (i = 0; i < POW_TABLE_SIZE; i++)
> +        pow_table[i] = -pow(2, -i / 2048.0 - 3.0 + POW_TABLE_OFFSET);
> +
> +    return 0;
> +}

ok


> +
> +static av_cold int encode_end(AVCodecContext *avctx)
> +{
> +    NellyMoserEncodeContext *s = avctx->priv_data;
> +
> +    ff_mdct_end(&s->mdct_ctx);
> +    return 0;
> +}

ok


> +
> +#define find_best(val, table, LUT, LUT_add, LUT_size) \
> +    best_idx = \
> +        LUT[av_clip ((((int)val) >> 8) + LUT_add, 0, LUT_size - 1)]; \
> +    if (abs(val - table[best_idx]) > abs(val - table[best_idx + 1])) \
> +        best_idx++;

(int)some_float is slow, lrintf() should be faster
also if val is a float instead of an int then fabs() may actually be better
than abs()


> +
> +/**
> + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 * NELLY_BUF_LEN values
> + *  @param s               encoder context
> + *  @param output          output buffer
> + *  @param output_size     size of output buffer
> + */
> +static void encode_block(NellyMoserEncodeContext *s, unsigned char *output, int output_size)
> +{
> +    PutBitContext pb;
> +    int i, band, block, best_idx, power_idx = 0;
> +    float power_val, power_candidate, coeff, coeff_sum;
> +    int band_start, band_end;
> +    float pows[NELLY_FILL_LEN];
> +    int bits[NELLY_BUF_LEN];
> +
> +    const float C = 1.0;
> +    const float D = 2.0;
> +
> +    apply_mdct(s);
> +
> +    init_put_bits(&pb, output, output_size * 8);
> +

> +    band_start = 0;
> +    band_end = ff_nelly_band_sizes_table[0];
> +    for (band = 0; band < NELLY_BANDS; band++) {
> +        coeff_sum = 0;
> +        for (i = band_start; i < band_end; i++) {
> +            //coeff_sum += s->mdct_out[i                ] * s->mdct_out[i                ]
> +            //           + s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN];
> +            coeff_sum += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i + NELLY_BUF_LEN]), D);
> +        }
> +        power_candidate =
> +            //log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2;
> +            C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / log(D);
> +
> +        if (band) {
> +            power_candidate -= power_idx;
> +            find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut, 37, 78);
> +            put_bits(&pb, 5, best_idx);
> +            power_idx += ff_nelly_delta_table[best_idx];
> +        } else {
> +            //base exponent
> +            find_best(power_candidate, ff_nelly_init_table, sf_lut, -20, 96);
> +            put_bits(&pb, 6, best_idx);
> +            power_idx = ff_nelly_init_table[best_idx];
> +        }

the choice of power_idx/best_idx values could still be tried to be found
with viterbi. Its somewhat similar (and simpler) than our viterbi/trellis
ADPCM encoder


[...]

> +AVCodec nellymoser_encoder = {
> +    .name = "nellymoser",
> +    .type = CODEC_TYPE_AUDIO,
> +    .id = CODEC_ID_NELLYMOSER,
> +    .priv_data_size = sizeof(NellyMoserEncodeContext),
> +    .init = encode_init,
> +    .encode = encode_frame,
> +    .close = encode_end,
> +    .capabilities = CODEC_CAP_SMALL_LAST_FRAME | CODEC_CAP_DELAY,
> +    .long_name = NULL_IF_CONFIG_SMALL("Nellymoser Asao Codec"),
> +};

ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080829/5e942834/attachment.pgp>



More information about the ffmpeg-devel mailing list