[FFmpeg-devel] Nellymoser encoder

Michael Niedermayer michaelni
Mon Sep 1 02:08:09 CEST 2008


On Mon, Sep 01, 2008 at 01:18:15AM +0200, Bartlomiej Wolowiec wrote:
> Sunday 31 August 2008 23:49:23 Michael Niedermayer napisa?(a):
> > On Sun, Aug 31, 2008 at 10:07:22PM +0200, Bartlomiej Wolowiec wrote:
> > > Sunday 31 August 2008 15:53:23 Michael Niedermayer napisa?(a):
> > > > On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> > > > > Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > > > > > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > > > > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float
> > > > > > > > > > > > > *in, float *coefs) +{
> > > > > > > > > > > > > +    DECLARE_ALIGNED_16(float,
> > > > > > > > > > > > > in_buff[NELLY_SAMPLES]); +
> > > > > > > > > > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > > > > > sizeof(float)); +    s->dsp.vector_fmul(in_buff,
> > > > > > > > > > > > > ff_sine_128, NELLY_BUF_LEN); +
> > > > > > > > > > > > > s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN,
> > > > > > > > > > > > > in_buff + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> > > > > > > > > > > > > + ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > > > > > +}
> > > > > > > > > > > >
> > > > > > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > > > > > vector_fmul can be used with a singl memcpy to get the
> > > > > > > > > > > > data into any destination, and vector_fmul_reverse
> > > > > > > > > > > > doesnt even need 1 memcpy, so overall a single memcpy
> > > > > > > > > > > > is enough
> > > > > > > > > > >
> > > > > > > > > > > Hope that you meant something similar to my solution.
> > > > > > > > > >
> > > > > > > > > > no, you still do 2 memcpy() but now the code is really
> > > > > > > > > > messy as well.
> > > > > > > > > >
> > > > > > > > > > what you should do is, for each block of samples you get
> > > > > > > > > > from the user 1. apply one half of the window onto it with
> > > > > > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > > > > > 2. memcpy into the 2nd destination and apply the other half
> > > > > > > > > > of the window onto it with vector_fmul
> > > > > > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > > > > > >
> > > > > > > > > Hmm, I considered it, but I don't understand exactly what
> > > > > > > > > should I change... In the code I copy data two times:
> > > > > > > > > a) in encode_frame - I convert int16_t to float and copy data
> > > > > > > > > to s->buf - I need to do it somewhere because vector_mul
> > > > > > > > > requires float *. Additionally, part of the data is needed to
> > > > > > > > > the next call of encode_frame b) in apply_mdct - here I think
> > > > > > > > > that some additional part of buffer is needed. If I
> > > > > > > > > understood correctly I have to get rid of a), but how to get
> > > > > > > > > access to old data when the next call of encode_frame is
> > > > > > > > > performed and how call vector_fmul on int16_t?
> > > > > > > >
> > > > > > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > > > > > I think ffmpeg should support this already. If it does not work
> > > > > > > > then we can keep int16 for now which would implicate more
> > > > > > > > copying
> > > > > > >
> > > > > > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work.
> > > > > > > I made only that changes:
> > > > > > >
> > > > > > > float *samples = data;
> > > > > > > ...
> > > > > > > for (i = 0; i < avctx->frame_size; i++) {
> > > > > > >     s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > > > > > }
> > > > > > > ...
> > > > > > > .sample_fmts = (enum
> > > > > > > SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> > > > > >
> > > > > > hmm
> > > > >
> > > > > Any idea? or should I leave it as it is?
> > > >
> > > > does PCM float work for you? if so what is the difference to your
> > > > encoder?
> > >
> > > pcm_f32le doesn't work - because it isn't hacked in ffmpeg.c. Nellymoser
> > > probably for the same reason...
> >
> > [...]
> >
> > > > > +
> > > > > +    apply_mdct(s);
> > > > > +
> > > > >
> > > > > +    init_put_bits(&pb, output, output_size * 8);
> > > > > +
> > > > > +    i = 0;
> > > > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > > > +        coeff_sum = 0;
> > > > > +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > > > > +            //coeff_sum += s->mdct_out[i                ] *
> > > > > s->mdct_out[i                ] +            //           +
> > > > > s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +   
> > > > >         coeff_sum += pow(fabs(s->mdct_out[i]), D) +
> > > > > pow(fabs(s->mdct_out[i +
> > > > > NELLY_BUF_LEN]), D); +        }
> > > > > +        cand[band] =
> > > > > +            //log(FFMAX(1.0, coeff_sum /
> > > > > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +         
> > > > >   C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] <<
> > > > > 7))) * 1024.0 / log(D);
> > > >
> > > > the MAX should maybe be done after the correction for D
> > >
> > > I don't know what exactly do you mean...
> >
> > forget it, ive misread the order of the () somehow
> >
> > > --
> > > Bartlomiej Wolowiec
> > >
> > > Index: nellymoserenc.c
> > > ===================================================================
> > > --- nellymoserenc.c	(wersja 15126)
> > > +++ nellymoserenc.c	(kopia robocza)
> > > @@ -45,11 +45,18 @@
> > >  #define POW_TABLE_SIZE (1<<11)
> > >  #define POW_TABLE_OFFSET 3
> > >
> > > +#undef NDEBUG
> > > +#include <assert.h>
> > > +
> > >  typedef struct NellyMoserEncodeContext {
> > >      AVCodecContext  *avctx;
> > >      int             last_frame;
> > > +    int             bufsel;
> > >
> > >
> > > +    int             have_saved;
> > >      DSPContext      dsp;
> > >      MDCTContext     mdct_ctx;
> > > +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> >
> > ok
> >
> >
> > [...]
> >
> > > @@ -146,6 +169,212 @@
> > >      if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> > >          best_idx++;
> > >
> > > +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand,
> > > int *idx_table) +{
> > > +    int band, best_idx, power_idx = 0;
> > > +    float power_candidate;
> > > +
> > > +    //base exponent
> > > +    find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> > > +    idx_table[0] = best_idx;
> > > +    power_idx = ff_nelly_init_table[best_idx];
> > > +
> > > +    for (band = 1; band < NELLY_BANDS; band++) {
> > > +        power_candidate = cand[band] - power_idx;
> > > +        find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut,
> > > 37, 78); +        idx_table[band] = best_idx;
> > > +        power_idx += ff_nelly_delta_table[best_idx];
> > > +    }
> > > +}
> >
> > ok
> >
> > > +
> > > +#define OPT_SIZE ((1<<15) + 3000)
> > > +
> > > +static inline float distance(float x, float y, int band)
> > > +{
> > > +    //return pow(fabs(x-y), 2.0);
> > > +    float tmp = x - y;
> > > +    return tmp * tmp;
> > > +}
> > > +
> > > +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float
> > > *cand, int *idx_table) +{
> > > +    int i, j, band, best_idx;
> > > +    float power_candidate, best_val;
> > > +
> > > +    float opt[NELLY_BANDS][OPT_SIZE];
> > > +    int path[NELLY_BANDS][OPT_SIZE];
> > > +
> > > +    for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> > > +        opt[0][i] = INFINITY;
> > > +    }
> > > +
> > > +    for (i = 0; i < 64; i++) {
> > > +        opt[0][ff_nelly_init_table[i]] = distance(cand[0],
> > > ff_nelly_init_table[i], 0); +        path[0][ff_nelly_init_table[i]] = i;
> > > +    }
> > > +
> > > +    for (band = 1; band < NELLY_BANDS; band++) {
> > > +        int q, c = 0;
> > > +        float tmp;
> > > +        int idx_min, idx_max, idx;
> > > +        power_candidate = cand[band];
> > > +        for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> > > +            idx_min = FFMAX(0, cand[band] - q);
> > > +            idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> > > +            for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE,
> > > cand[band - 1] + q); i++) { +                if ( isinf(opt[band - 1][i])
> > > )
> > > +                    continue;
> > > +                for (j = 0; j < 32; j++) {
> > > +                    idx = i + ff_nelly_delta_table[j];
> > > +                    if (idx > idx_max)
> > > +                        break;
> > > +                    if (idx >= idx_min) {
> > > +                        tmp = opt[band - 1][i] + distance(idx,
> > > power_candidate, band); +                        if (opt[band][idx] >
> > > tmp) {
> > > +                            opt[band][idx] = tmp;
> > > +                            path[band][idx] = j;
> > > +                            c = 1;
> > > +                        }
> > > +                    }
> > > +                }
> > > +            }
> > > +        }
> > > +        assert(c); //FIXME
> > > +    }
> > > +
> > > +    best_val = INFINITY;
> > > +    best_idx = -1;
> > > +    band = NELLY_BANDS - 1;
> > > +    for (i = 0; i < OPT_SIZE; i++) {
> > > +        if (best_val > opt[band][i]) {
> > > +            best_val = opt[band][i];
> > > +            best_idx = i;
> > > +        }
> > > +    }
> > > +    for (band = NELLY_BANDS - 1; band >= 0; band--) {
> > > +        idx_table[band] = path[band][best_idx];
> > > +        if (band) {
> > > +            best_idx -= ff_nelly_delta_table[path[band][best_idx]];
> > > +        }
> > > +    }
> > > +}
> >
> > this could be improved a bit but when it doesnt help quality, theres no
> > point, so its ok too
> >
> > > +
> > > +/**
> > > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 *
> > > NELLY_BUF_LEN values + *  @param s               encoder context
> > > + *  @param output          output buffer
> > > + *  @param output_size     size of output buffer
> > > + */
> > > +static void encode_block(NellyMoserEncodeContext *s, unsigned char
> > > *output, int output_size) +{
> > > +    PutBitContext pb;
> > > +    int i, j, band, block, best_idx, power_idx = 0;
> > > +    float power_val, coeff, coeff_sum;
> > > +    float pows[NELLY_FILL_LEN];
> > > +    int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> > > +    float cand[NELLY_BANDS];
> > > +
> > > +    const float C = 1.0;
> > > +    const float D = 2.0;
> > > +
> > > +    apply_mdct(s);
> > > +
> > > +    init_put_bits(&pb, output, output_size * 8);
> > > +
> > > +    i = 0;
> > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > +        coeff_sum = 0;
> > > +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > > +            //coeff_sum += s->mdct_out[i                ] *
> > > s->mdct_out[i                ] +            //           + s->mdct_out[i
> > > + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +            coeff_sum
> > > += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i +
> > > NELLY_BUF_LEN]), D); +        }
> > > +        cand[band] =
> > > +            //log(FFMAX(1.0, coeff_sum /
> > > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +            C
> > > * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > > 1024.0 / log(D); +    }
> > > +
> > > +    if (s->avctx->trellis) {
> > > +        get_exponent_dynamic(s, cand, idx_table);
> > > +    } else {
> > > +        get_exponent_greedy(s, cand, idx_table);
> > > +    }
> > > +
> > > +    i = 0;
> > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > +        if (band) {
> > > +            power_idx += ff_nelly_delta_table[idx_table[band]];
> > > +            put_bits(&pb, 5, idx_table[band]);
> > > +        } else {
> > > +            power_idx = ff_nelly_init_table[idx_table[0]];
> > > +            put_bits(&pb, 6, idx_table[0]);
> > > +        }
> > > +        power_val = pow_table[power_idx & 0x7FF] / (1 << ((power_idx >>
> > > 11) + POW_TABLE_OFFSET)); +        for (j = 0; j <
> > > ff_nelly_band_sizes_table[band]; i++, j++) { +            s->mdct_out[i]
> > > *= power_val;
> > > +            s->mdct_out[i + NELLY_BUF_LEN] *= power_val;
> > > +            pows[i] = power_idx;
> > > +        }
> > > +    }
> > > +
> > > +    ff_nelly_get_sample_bits(pows, bits);
> > > +
> > > +    for (block = 0; block < 2; block++) {
> > > +        for (i = 0; i < NELLY_FILL_LEN; i++) {
> > > +            if (bits[i] > 0) {
> > > +                const float *table = ff_nelly_dequantization_table + (1
> > > << bits[i]) - 1; +                coeff = s->mdct_out[block *
> > > NELLY_BUF_LEN + i]; +                best_idx =
> > > +                    quant_lut[av_clip (
> > > +                            coeff * quant_lut_mul[bits[i]] +
> > > quant_lut_add[bits[i]], +                           
> > > quant_lut_offset[bits[i]],
> > > +                            quant_lut_offset[bits[i]+1] - 1
> > > +                            )];
> > > +                if (fabs(coeff - table[best_idx]) > fabs(coeff -
> > > table[best_idx + 1])) +                    best_idx++;
> > > +
> > > +                put_bits(&pb, bits[i], best_idx);
> > > +            }
> > > +        }
> > > +        if (!block)
> > > +            put_bits(&pb, NELLY_HEADER_BITS + NELLY_DETAIL_BITS -
> > > put_bits_count(&pb), 0); +    }
> > > +}
> >
> > as the C/D stuff turned out  useless you can remove that again, except that
> > ok
> >
> > the rest of the patch is ok as well (except the #undef NDEBUG)
> > unless you want to fix ffmpeg to work with floats in which case the rest
> > can be simplified.
> >
> > [...]
> 
> I will try to find in the following week a nice solution for this problem (but 
> now I don't know so well necessary parts of the code), so that I don't know 
> if I will be able to fix anything. I will write about the progress in my 
> work.

ok, maybe peter could also help, after all he added that (non working) float
support.
And yes i can confirm pcm_f32le doesnt work ...


> So now I can commit the whole code?

yes

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/32f83819/attachment.pgp>



More information about the ffmpeg-devel mailing list