[FFmpeg-devel] Nellymoser encoder
Bartlomiej Wolowiec
bartek.wolowiec
Mon Sep 1 01:18:15 CEST 2008
Sunday 31 August 2008 23:49:23 Michael Niedermayer napisa?(a):
> On Sun, Aug 31, 2008 at 10:07:22PM +0200, Bartlomiej Wolowiec wrote:
> > Sunday 31 August 2008 15:53:23 Michael Niedermayer napisa?(a):
> > > On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> > > > Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > > > > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > > > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > > > > +
> > > > > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float
> > > > > > > > > > > > *in, float *coefs) +{
> > > > > > > > > > > > + DECLARE_ALIGNED_16(float,
> > > > > > > > > > > > in_buff[NELLY_SAMPLES]); +
> > > > > > > > > > > > + memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > > > > sizeof(float)); + s->dsp.vector_fmul(in_buff,
> > > > > > > > > > > > ff_sine_128, NELLY_BUF_LEN); +
> > > > > > > > > > > > s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN,
> > > > > > > > > > > > in_buff + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> > > > > > > > > > > > + ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > > > > +}
> > > > > > > > > > >
> > > > > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > > > > vector_fmul can be used with a singl memcpy to get the
> > > > > > > > > > > data into any destination, and vector_fmul_reverse
> > > > > > > > > > > doesnt even need 1 memcpy, so overall a single memcpy
> > > > > > > > > > > is enough
> > > > > > > > > >
> > > > > > > > > > Hope that you meant something similar to my solution.
> > > > > > > > >
> > > > > > > > > no, you still do 2 memcpy() but now the code is really
> > > > > > > > > messy as well.
> > > > > > > > >
> > > > > > > > > what you should do is, for each block of samples you get
> > > > > > > > > from the user 1. apply one half of the window onto it with
> > > > > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > > > > 2. memcpy into the 2nd destination and apply the other half
> > > > > > > > > of the window onto it with vector_fmul
> > > > > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > > > > >
> > > > > > > > Hmm, I considered it, but I don't understand exactly what
> > > > > > > > should I change... In the code I copy data two times:
> > > > > > > > a) in encode_frame - I convert int16_t to float and copy data
> > > > > > > > to s->buf - I need to do it somewhere because vector_mul
> > > > > > > > requires float *. Additionally, part of the data is needed to
> > > > > > > > the next call of encode_frame b) in apply_mdct - here I think
> > > > > > > > that some additional part of buffer is needed. If I
> > > > > > > > understood correctly I have to get rid of a), but how to get
> > > > > > > > access to old data when the next call of encode_frame is
> > > > > > > > performed and how call vector_fmul on int16_t?
> > > > > > >
> > > > > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > > > > I think ffmpeg should support this already. If it does not work
> > > > > > > then we can keep int16 for now which would implicate more
> > > > > > > copying
> > > > > >
> > > > > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work.
> > > > > > I made only that changes:
> > > > > >
> > > > > > float *samples = data;
> > > > > > ...
> > > > > > for (i = 0; i < avctx->frame_size; i++) {
> > > > > > s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > > > > }
> > > > > > ...
> > > > > > .sample_fmts = (enum
> > > > > > SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> > > > >
> > > > > hmm
> > > >
> > > > Any idea? or should I leave it as it is?
> > >
> > > does PCM float work for you? if so what is the difference to your
> > > encoder?
> >
> > pcm_f32le doesn't work - because it isn't hacked in ffmpeg.c. Nellymoser
> > probably for the same reason...
>
> [...]
>
> > > > +
> > > > + apply_mdct(s);
> > > > +
> > > >
> > > > + init_put_bits(&pb, output, output_size * 8);
> > > > +
> > > > + i = 0;
> > > > + for (band = 0; band < NELLY_BANDS; band++) {
> > > > + coeff_sum = 0;
> > > > + for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > > > + //coeff_sum += s->mdct_out[i ] *
> > > > s->mdct_out[i ] + // +
> > > > s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +
> > > > coeff_sum += pow(fabs(s->mdct_out[i]), D) +
> > > > pow(fabs(s->mdct_out[i +
> > > > NELLY_BUF_LEN]), D); + }
> > > > + cand[band] =
> > > > + //log(FFMAX(1.0, coeff_sum /
> > > > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +
> > > > C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] <<
> > > > 7))) * 1024.0 / log(D);
> > >
> > > the MAX should maybe be done after the correction for D
> >
> > I don't know what exactly do you mean...
>
> forget it, ive misread the order of the () somehow
>
> > --
> > Bartlomiej Wolowiec
> >
> > Index: nellymoserenc.c
> > ===================================================================
> > --- nellymoserenc.c (wersja 15126)
> > +++ nellymoserenc.c (kopia robocza)
> > @@ -45,11 +45,18 @@
> > #define POW_TABLE_SIZE (1<<11)
> > #define POW_TABLE_OFFSET 3
> >
> > +#undef NDEBUG
> > +#include <assert.h>
> > +
> > typedef struct NellyMoserEncodeContext {
> > AVCodecContext *avctx;
> > int last_frame;
> > + int bufsel;
> >
> >
> > + int have_saved;
> > DSPContext dsp;
> > MDCTContext mdct_ctx;
> > + DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
>
> ok
>
>
> [...]
>
> > @@ -146,6 +169,212 @@
> > if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> > best_idx++;
> >
> > +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand,
> > int *idx_table) +{
> > + int band, best_idx, power_idx = 0;
> > + float power_candidate;
> > +
> > + //base exponent
> > + find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> > + idx_table[0] = best_idx;
> > + power_idx = ff_nelly_init_table[best_idx];
> > +
> > + for (band = 1; band < NELLY_BANDS; band++) {
> > + power_candidate = cand[band] - power_idx;
> > + find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut,
> > 37, 78); + idx_table[band] = best_idx;
> > + power_idx += ff_nelly_delta_table[best_idx];
> > + }
> > +}
>
> ok
>
> > +
> > +#define OPT_SIZE ((1<<15) + 3000)
> > +
> > +static inline float distance(float x, float y, int band)
> > +{
> > + //return pow(fabs(x-y), 2.0);
> > + float tmp = x - y;
> > + return tmp * tmp;
> > +}
> > +
> > +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float
> > *cand, int *idx_table) +{
> > + int i, j, band, best_idx;
> > + float power_candidate, best_val;
> > +
> > + float opt[NELLY_BANDS][OPT_SIZE];
> > + int path[NELLY_BANDS][OPT_SIZE];
> > +
> > + for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> > + opt[0][i] = INFINITY;
> > + }
> > +
> > + for (i = 0; i < 64; i++) {
> > + opt[0][ff_nelly_init_table[i]] = distance(cand[0],
> > ff_nelly_init_table[i], 0); + path[0][ff_nelly_init_table[i]] = i;
> > + }
> > +
> > + for (band = 1; band < NELLY_BANDS; band++) {
> > + int q, c = 0;
> > + float tmp;
> > + int idx_min, idx_max, idx;
> > + power_candidate = cand[band];
> > + for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> > + idx_min = FFMAX(0, cand[band] - q);
> > + idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> > + for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE,
> > cand[band - 1] + q); i++) { + if ( isinf(opt[band - 1][i])
> > )
> > + continue;
> > + for (j = 0; j < 32; j++) {
> > + idx = i + ff_nelly_delta_table[j];
> > + if (idx > idx_max)
> > + break;
> > + if (idx >= idx_min) {
> > + tmp = opt[band - 1][i] + distance(idx,
> > power_candidate, band); + if (opt[band][idx] >
> > tmp) {
> > + opt[band][idx] = tmp;
> > + path[band][idx] = j;
> > + c = 1;
> > + }
> > + }
> > + }
> > + }
> > + }
> > + assert(c); //FIXME
> > + }
> > +
> > + best_val = INFINITY;
> > + best_idx = -1;
> > + band = NELLY_BANDS - 1;
> > + for (i = 0; i < OPT_SIZE; i++) {
> > + if (best_val > opt[band][i]) {
> > + best_val = opt[band][i];
> > + best_idx = i;
> > + }
> > + }
> > + for (band = NELLY_BANDS - 1; band >= 0; band--) {
> > + idx_table[band] = path[band][best_idx];
> > + if (band) {
> > + best_idx -= ff_nelly_delta_table[path[band][best_idx]];
> > + }
> > + }
> > +}
>
> this could be improved a bit but when it doesnt help quality, theres no
> point, so its ok too
>
> > +
> > +/**
> > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 *
> > NELLY_BUF_LEN values + * @param s encoder context
> > + * @param output output buffer
> > + * @param output_size size of output buffer
> > + */
> > +static void encode_block(NellyMoserEncodeContext *s, unsigned char
> > *output, int output_size) +{
> > + PutBitContext pb;
> > + int i, j, band, block, best_idx, power_idx = 0;
> > + float power_val, coeff, coeff_sum;
> > + float pows[NELLY_FILL_LEN];
> > + int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> > + float cand[NELLY_BANDS];
> > +
> > + const float C = 1.0;
> > + const float D = 2.0;
> > +
> > + apply_mdct(s);
> > +
> > + init_put_bits(&pb, output, output_size * 8);
> > +
> > + i = 0;
> > + for (band = 0; band < NELLY_BANDS; band++) {
> > + coeff_sum = 0;
> > + for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > + //coeff_sum += s->mdct_out[i ] *
> > s->mdct_out[i ] + // + s->mdct_out[i
> > + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; + coeff_sum
> > += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i +
> > NELLY_BUF_LEN]), D); + }
> > + cand[band] =
> > + //log(FFMAX(1.0, coeff_sum /
> > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; + C
> > * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > 1024.0 / log(D); + }
> > +
> > + if (s->avctx->trellis) {
> > + get_exponent_dynamic(s, cand, idx_table);
> > + } else {
> > + get_exponent_greedy(s, cand, idx_table);
> > + }
> > +
> > + i = 0;
> > + for (band = 0; band < NELLY_BANDS; band++) {
> > + if (band) {
> > + power_idx += ff_nelly_delta_table[idx_table[band]];
> > + put_bits(&pb, 5, idx_table[band]);
> > + } else {
> > + power_idx = ff_nelly_init_table[idx_table[0]];
> > + put_bits(&pb, 6, idx_table[0]);
> > + }
> > + power_val = pow_table[power_idx & 0x7FF] / (1 << ((power_idx >>
> > 11) + POW_TABLE_OFFSET)); + for (j = 0; j <
> > ff_nelly_band_sizes_table[band]; i++, j++) { + s->mdct_out[i]
> > *= power_val;
> > + s->mdct_out[i + NELLY_BUF_LEN] *= power_val;
> > + pows[i] = power_idx;
> > + }
> > + }
> > +
> > + ff_nelly_get_sample_bits(pows, bits);
> > +
> > + for (block = 0; block < 2; block++) {
> > + for (i = 0; i < NELLY_FILL_LEN; i++) {
> > + if (bits[i] > 0) {
> > + const float *table = ff_nelly_dequantization_table + (1
> > << bits[i]) - 1; + coeff = s->mdct_out[block *
> > NELLY_BUF_LEN + i]; + best_idx =
> > + quant_lut[av_clip (
> > + coeff * quant_lut_mul[bits[i]] +
> > quant_lut_add[bits[i]], +
> > quant_lut_offset[bits[i]],
> > + quant_lut_offset[bits[i]+1] - 1
> > + )];
> > + if (fabs(coeff - table[best_idx]) > fabs(coeff -
> > table[best_idx + 1])) + best_idx++;
> > +
> > + put_bits(&pb, bits[i], best_idx);
> > + }
> > + }
> > + if (!block)
> > + put_bits(&pb, NELLY_HEADER_BITS + NELLY_DETAIL_BITS -
> > put_bits_count(&pb), 0); + }
> > +}
>
> as the C/D stuff turned out useless you can remove that again, except that
> ok
>
> the rest of the patch is ok as well (except the #undef NDEBUG)
> unless you want to fix ffmpeg to work with floats in which case the rest
> can be simplified.
>
> [...]
I will try to find in the following week a nice solution for this problem (but
now I don't know so well necessary parts of the code), so that I don't know
if I will be able to fix anything. I will write about the progress in my
work.
So now I can commit the whole code?
--
Bartlomiej Wolowiec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nellymoser7.patch
Type: text/x-diff
Size: 10669 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/23ed0f99/attachment.patch>
More information about the ffmpeg-devel
mailing list