[FFmpeg-devel] Nellymoser encoder

Mon Sep 1 01:18:15 CEST 2008

Sunday 31 August 2008 23:49:23 Michael Niedermayer napisa?(a):
> On Sun, Aug 31, 2008 at 10:07:22PM +0200, Bartlomiej Wolowiec wrote:
> > Sunday 31 August 2008 15:53:23 Michael Niedermayer napisa?(a):
> > > On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> > > > Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > > > > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > > > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > > > > +
> > > > > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float
> > > > > > > > > > > > *in, float *coefs) +{
> > > > > > > > > > > > +    DECLARE_ALIGNED_16(float,
> > > > > > > > > > > > in_buff[NELLY_SAMPLES]); +
> > > > > > > > > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > > > > sizeof(float)); +    s->dsp.vector_fmul(in_buff,
> > > > > > > > > > > > ff_sine_128, NELLY_BUF_LEN); +
> > > > > > > > > > > > s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN,
> > > > > > > > > > > > in_buff + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> > > > > > > > > > > > + ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > > > > +}
> > > > > > > > > > >
> > > > > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > > > > vector_fmul can be used with a singl memcpy to get the
> > > > > > > > > > > data into any destination, and vector_fmul_reverse
> > > > > > > > > > > doesnt even need 1 memcpy, so overall a single memcpy
> > > > > > > > > > > is enough
> > > > > > > > > >
> > > > > > > > > > Hope that you meant something similar to my solution.
> > > > > > > > >
> > > > > > > > > no, you still do 2 memcpy() but now the code is really
> > > > > > > > > messy as well.
> > > > > > > > >
> > > > > > > > > what you should do is, for each block of samples you get
> > > > > > > > > from the user 1. apply one half of the window onto it with
> > > > > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > > > > 2. memcpy into the 2nd destination and apply the other half
> > > > > > > > > of the window onto it with vector_fmul
> > > > > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > > > > >
> > > > > > > > Hmm, I considered it, but I don't understand exactly what
> > > > > > > > should I change... In the code I copy data two times:
> > > > > > > > a) in encode_frame - I convert int16_t to float and copy data
> > > > > > > > to s->buf - I need to do it somewhere because vector_mul
> > > > > > > > requires float *. Additionally, part of the data is needed to
> > > > > > > > the next call of encode_frame b) in apply_mdct - here I think
> > > > > > > > that some additional part of buffer is needed. If I
> > > > > > > > understood correctly I have to get rid of a), but how to get
> > > > > > > > access to old data when the next call of encode_frame is
> > > > > > > > performed and how call vector_fmul on int16_t?
> > > > > > >
> > > > > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > > > > I think ffmpeg should support this already. If it does not work
> > > > > > > then we can keep int16 for now which would implicate more
> > > > > > > copying
> > > > > >
> > > > > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work.
> > > > > > I made only that changes:
> > > > > >
> > > > > > float *samples = data;
> > > > > > ...
> > > > > > for (i = 0; i < avctx->frame_size; i++) {
> > > > > >     s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > > > > }
> > > > > > ...
> > > > > > .sample_fmts = (enum
> > > > > > SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> > > > >
> > > > > hmm
> > > >
> > > > Any idea? or should I leave it as it is?
> > >
> > > does PCM float work for you? if so what is the difference to your
> > > encoder?
> >
> > pcm_f32le doesn't work - because it isn't hacked in ffmpeg.c. Nellymoser
> > probably for the same reason...
>
> [...]
>
> > > > +
> > > > +    apply_mdct(s);
> > > > +
> > > >
> > > > +    init_put_bits(&pb, output, output_size * 8);
> > > > +
> > > > +    i = 0;
> > > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > > +        coeff_sum = 0;
> > > > +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > > > +            //coeff_sum += s->mdct_out[i                ] *
> > > > s->mdct_out[i                ] +            //           +
> > > > s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +   
> > > >         coeff_sum += pow(fabs(s->mdct_out[i]), D) +
> > > > pow(fabs(s->mdct_out[i +
> > > > NELLY_BUF_LEN]), D); +        }
> > > > +        cand[band] =
> > > > +            //log(FFMAX(1.0, coeff_sum /
> > > > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +         
> > > >   C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] <<
> > > > 7))) * 1024.0 / log(D);
> > >
> > > the MAX should maybe be done after the correction for D
> >
> > I don't know what exactly do you mean...
>
> forget it, ive misread the order of the () somehow
>
> > --
> > Bartlomiej Wolowiec
> >
> > Index: nellymoserenc.c
> > ===================================================================
> > --- nellymoserenc.c	(wersja 15126)
> > +++ nellymoserenc.c	(kopia robocza)
> > @@ -45,11 +45,18 @@
> >  #define POW_TABLE_SIZE (1<<11)
> >  #define POW_TABLE_OFFSET 3
> >
> > +#undef NDEBUG
> > +#include <assert.h>
> > +
> >  typedef struct NellyMoserEncodeContext {
> >      AVCodecContext  *avctx;
> >      int             last_frame;
> > +    int             bufsel;
> >
> >
> > +    int             have_saved;
> >      DSPContext      dsp;
> >      MDCTContext     mdct_ctx;
> > +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
>
> ok
>
>
> [...]
>
> > @@ -146,6 +169,212 @@
> >      if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> >          best_idx++;
> >
> > +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand,
> > int *idx_table) +{
> > +    int band, best_idx, power_idx = 0;
> > +    float power_candidate;
> > +
> > +    //base exponent
> > +    find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> > +    idx_table[0] = best_idx;
> > +    power_idx = ff_nelly_init_table[best_idx];
> > +
> > +    for (band = 1; band < NELLY_BANDS; band++) {
> > +        power_candidate = cand[band] - power_idx;
> > +        find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut,
> > 37, 78); +        idx_table[band] = best_idx;
> > +        power_idx += ff_nelly_delta_table[best_idx];
> > +    }
> > +}
>
> ok
>
> > +
> > +#define OPT_SIZE ((1<<15) + 3000)
> > +
> > +static inline float distance(float x, float y, int band)
> > +{
> > +    //return pow(fabs(x-y), 2.0);
> > +    float tmp = x - y;
> > +    return tmp * tmp;
> > +}
> > +
> > +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float
> > *cand, int *idx_table) +{
> > +    int i, j, band, best_idx;
> > +    float power_candidate, best_val;
> > +
> > +    float opt[NELLY_BANDS][OPT_SIZE];
> > +    int path[NELLY_BANDS][OPT_SIZE];
> > +
> > +    for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> > +        opt[0][i] = INFINITY;
> > +    }
> > +
> > +    for (i = 0; i < 64; i++) {
> > +        opt[0][ff_nelly_init_table[i]] = distance(cand[0],
> > ff_nelly_init_table[i], 0); +        path[0][ff_nelly_init_table[i]] = i;
> > +    }
> > +
> > +    for (band = 1; band < NELLY_BANDS; band++) {
> > +        int q, c = 0;
> > +        float tmp;
> > +        int idx_min, idx_max, idx;
> > +        power_candidate = cand[band];
> > +        for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> > +            idx_min = FFMAX(0, cand[band] - q);
> > +            idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> > +            for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE,
> > cand[band - 1] + q); i++) { +                if ( isinf(opt[band - 1][i])
> > )
> > +                    continue;
> > +                for (j = 0; j < 32; j++) {
> > +                    idx = i + ff_nelly_delta_table[j];
> > +                    if (idx > idx_max)
> > +                        break;
> > +                    if (idx >= idx_min) {
> > +                        tmp = opt[band - 1][i] + distance(idx,
> > power_candidate, band); +                        if (opt[band][idx] >
> > tmp) {
> > +                            opt[band][idx] = tmp;
> > +                            path[band][idx] = j;
> > +                            c = 1;
> > +                        }
> > +                    }
> > +                }
> > +            }
> > +        }
> > +        assert(c); //FIXME
> > +    }
> > +
> > +    best_val = INFINITY;
> > +    best_idx = -1;
> > +    band = NELLY_BANDS - 1;
> > +    for (i = 0; i < OPT_SIZE; i++) {
> > +        if (best_val > opt[band][i]) {
> > +            best_val = opt[band][i];
> > +            best_idx = i;
> > +        }
> > +    }
> > +    for (band = NELLY_BANDS - 1; band >= 0; band--) {
> > +        idx_table[band] = path[band][best_idx];
> > +        if (band) {
> > +            best_idx -= ff_nelly_delta_table[path[band][best_idx]];
> > +        }
> > +    }
> > +}
>
> this could be improved a bit but when it doesnt help quality, theres no
> point, so its ok too
>
> > +
> > +/**
> > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 *
> > NELLY_BUF_LEN values + *  @param s               encoder context
> > + *  @param output          output buffer
> > + *  @param output_size     size of output buffer
> > + */
> > +static void encode_block(NellyMoserEncodeContext *s, unsigned char
> > *output, int output_size) +{
> > +    PutBitContext pb;
> > +    int i, j, band, block, best_idx, power_idx = 0;
> > +    float power_val, coeff, coeff_sum;
> > +    float pows[NELLY_FILL_LEN];
> > +    int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> > +    float cand[NELLY_BANDS];
> > +
> > +    const float C = 1.0;
> > +    const float D = 2.0;
> > +
> > +    apply_mdct(s);
> > +
> > +    init_put_bits(&pb, output, output_size * 8);
> > +
> > +    i = 0;
> > +    for (band = 0; band < NELLY_BANDS; band++) {
> > +        coeff_sum = 0;
> > +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > +            //coeff_sum += s->mdct_out[i                ] *
> > s->mdct_out[i                ] +            //           + s->mdct_out[i
> > + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +            coeff_sum
> > += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i +
> > NELLY_BUF_LEN]), D); +        }
> > +        cand[band] =
> > +            //log(FFMAX(1.0, coeff_sum /
> > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +            C
> > * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > 1024.0 / log(D); +    }
> > +
> > +    if (s->avctx->trellis) {
> > +        get_exponent_dynamic(s, cand, idx_table);
> > +    } else {
> > +        get_exponent_greedy(s, cand, idx_table);
> > +    }
> > +
> > +    i = 0;
> > +    for (band = 0; band < NELLY_BANDS; band++) {
> > +        if (band) {
> > +            power_idx += ff_nelly_delta_table[idx_table[band]];
> > +            put_bits(&pb, 5, idx_table[band]);
> > +        } else {
> > +            power_idx = ff_nelly_init_table[idx_table[0]];
> > +            put_bits(&pb, 6, idx_table[0]);
> > +        }
> > +        power_val = pow_table[power_idx & 0x7FF] / (1 << ((power_idx >>
> > 11) + POW_TABLE_OFFSET)); +        for (j = 0; j <
> > ff_nelly_band_sizes_table[band]; i++, j++) { +            s->mdct_out[i]
> > *= power_val;
> > +            s->mdct_out[i + NELLY_BUF_LEN] *= power_val;
> > +            pows[i] = power_idx;
> > +        }
> > +    }
> > +
> > +    ff_nelly_get_sample_bits(pows, bits);
> > +
> > +    for (block = 0; block < 2; block++) {
> > +        for (i = 0; i < NELLY_FILL_LEN; i++) {
> > +            if (bits[i] > 0) {
> > +                const float *table = ff_nelly_dequantization_table + (1
> > << bits[i]) - 1; +                coeff = s->mdct_out[block *
> > NELLY_BUF_LEN + i]; +                best_idx =
> > +                    quant_lut[av_clip (
> > +                            coeff * quant_lut_mul[bits[i]] +
> > quant_lut_add[bits[i]], +                           
> > quant_lut_offset[bits[i]],
> > +                            quant_lut_offset[bits[i]+1] - 1
> > +                            )];
> > +                if (fabs(coeff - table[best_idx]) > fabs(coeff -
> > table[best_idx + 1])) +                    best_idx++;
> > +
> > +                put_bits(&pb, bits[i], best_idx);
> > +            }
> > +        }
> > +        if (!block)
> > +            put_bits(&pb, NELLY_HEADER_BITS + NELLY_DETAIL_BITS -
> > put_bits_count(&pb), 0); +    }
> > +}
>
> as the C/D stuff turned out  useless you can remove that again, except that
> ok
>
> the rest of the patch is ok as well (except the #undef NDEBUG)
> unless you want to fix ffmpeg to work with floats in which case the rest
> can be simplified.
>
> [...]

I will try to find in the following week a nice solution for this problem (but 
now I don't know so well necessary parts of the code), so that I don't know 
if I will be able to fix anything. I will write about the progress in my 
work.
So now I can commit the whole code?

-- 
Bartlomiej Wolowiec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nellymoser7.patch
Type: text/x-diff
Size: 10669 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/23ed0f99/attachment.patch>