[FFmpeg-devel] Nellymoser encoder

Fri Aug 29 00:02:36 CEST 2008

On Thu, Aug 28, 2008 at 08:56:24PM +0200, Bartlomiej Wolowiec wrote:
> Thursday 28 August 2008 14:43:23 Michael Niedermayer napisa?(a):
> > On Thu, Aug 28, 2008 at 12:53:50PM +0200, Bartlomiej Wolowiec wrote:
> > > Thursday 28 August 2008 00:11:20 Michael Niedermayer napisa?(a):
[...]
> > > > > +
> > > > > +/**
> > > > > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains
> > > > > 3 * NELLY_BUF_LEN values + *  @param s               encoder context
> > > > > + *  @param output          output buffer
> > > > > + *  @param output_size     size of output buffer
> > > > > + *  @param samples         input samples
> > > > > + */
> > > > > +static void encode_block(NellyMoserEncodeContext *s,
> > > > > +                         unsigned char *output, int output_size,
> > > > > float *samples) +{
> > > > > +    PutBitContext pb;
> > > > > +    int i, band, block, best_idx, power_idx = 0;
> > > > > +    float power_val, power_candidate, coeff, coeff_sum;
> > > > > +    int band_start, band_end;
> > > > > +
> > > > > +    apply_mdct(s, samples, s->mdct_out);
> > > > > +    apply_mdct(s, samples + NELLY_BUF_LEN, s->mdct_out +
> > > > > NELLY_BUF_LEN); +
> > > > > +    init_put_bits(&pb, output, output_size * 8);
> > > > > +
> > > > > +    band_start = 0;
> > > > > +    band_end = ff_nelly_band_sizes_table[0];
> > > > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > > > +        coeff_sum = 0;
> > > > > +        for (i = band_start; i < band_end; i++) {
> > > > >
> > > > > +            for (block = 0; block < 2; block++) {
> > > > > +                coeff = s->mdct_out[i + block * NELLY_BUF_LEN];
> > > > > +                coeff_sum += coeff * coeff;
> > > > > +            }
> > > >
> > > > id unroll that by hand to
> > > > coeff_sum += s->mdct_out[i                ]*s->mdct_out[i              
> > > >  ]; +s->mdct_out[i + NELLY_BUF_LEN]*s->mdct_out[i + NELLY_BUF_LEN];
> > > >
> > > > > +        }
> > > > > +        power_candidate =
> > > > > +            (log(FFMAX(64.0, coeff_sum /
> > > > > (ff_nelly_band_sizes_table[band] << 1))) - +             log(64.0)) *
> > > > > 1024.0 / M_LN2;
> > > >
> > > > log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > > > 1024.0 / M_LN2;
> > > >
> > > > also this is based on
> > > > (sum(0..N) ABS(coeff)^2/N)^(1/2)
> > > >
> > > > it would be interresting to try
> > > > C*(sum(0..N) ABS(coeff)^D/N)^(1/D) for different values of C and D
> > > >
> > > > maybe you could try
> > > > C={0.9,1.0,1.1}
> > > > D={1.9,2.0,2.1}
> > > > at first and see if any improves distortion
> > >
> > > Hmm... How should I check distortion?
> >
> > listening is one option ...
> >
> > > I've listened to few recorgings and in
> > > my opinion differences are insignificant - sometimes D=2.0 is better,
> > > sometimes D=2.3... C!=1.0 in my opinion doesn't give better effects.
> >
> > have you tried different C for D=2.3 ?
> > also what about larger D like 3.0 ?
> > besides can you post a patch so other can (if they want) also experiment
> > with this?
> 
> Enclosed patch contains C and D constants. 

thanks

> 
> > > > > +
> > > > > +        if (band) {
> > > > > +            power_candidate -= power_idx;
> > > > > +            find_best_value(power_candidate, ff_nelly_delta_table,
> > > > > 32, best_idx); +            put_bits(&pb, 5, best_idx);
> > > > > +            power_idx += ff_nelly_delta_table[best_idx];
> > > > > +        } else {
> > > > > +            //base exponent
> > > > > +            find_best_value(power_candidate, ff_nelly_init_table,
> > > > > 64, best_idx); +            put_bits(&pb, 6, best_idx);
> > > > > +            power_idx = ff_nelly_init_table[best_idx];
> > > > > +        }
> > > >
> > > > I wish i knew how to optimally assign these values, sadly i do not.
> > > > Suggestions would be welcome of course in case anyone has an idea on
> > > > how to optimally select them, the tricky part is that these not only
> > > > scale the signal, they also are the basis upon which the bits per band
> > > > and thus encoding is selected.
> > > >
> > > > Still they could be made to closer match the "power_candidate" values
> > > > from above using viterbi though arguably it would just be closer to a
> > > > guess.
> > > >
> > > > An alternative may be to just retry the whole encode_block with
> > > > slightly changed power_candidate values for each band and pick what end
> > > > up with the least distortion (that is least difference to the input
> > > > signal) This should be rather easy to try ...
> > >
> > > slightly change ? What do you exactly? mean?
> >
> > power_candidate += random
> 
> Ok, can I work on it when the rest of the code will be in svn? I think it will 
> be easier.

If kostyas psy model reaches svn before your code does then i wont approve
your code before it also uses it. That could be by above or maybe something
smarter if someone has a idea ...

> 
> > > And again a problem how to
> > > measure distortion - common difference mdct won't give a good effect.
> >
> > Well, kostyas psy model if it where finished could be used ...
> >
> > > > > +
> > > > > +        if (power_idx >= 0) {
> > > > > +            power_val = pow_table[power_idx & 0x7FF] / (1 <<
> > > > > (power_idx
> > > > >
> > > > > >> 11)); +        } else {
> > > > >
> > > > > +            power_val = -pow(2, -power_idx / 2048.0 - 3.0);
> > > > > +        }
> > > >
> > > > power_idx can be <0 ?
> > >
> > > Yes. In this encoder code it can be, possibly in original decoder too.
> >
> > pow_table[power_idx & 0x7FF] / (1 << ((power_idx >> 11)+C));
> >
> > with appropriately changed pow_table may avoid it
> >
> > [...]
> 
> It's not so easy, because & doesn't work good for negative numbers - but just 
> adding constants%0x800==0 helps :)

it works fine with negative numbers, you dont need to add constants

> 
> -- 
> Bartlomiej Wolowiec

> Index: Changelog
> ===================================================================
> --- Changelog	(wersja 15006)
> +++ Changelog	(kopia robocza)
> @@ -133,6 +133,7 @@
>  - AAC decoder
>  - floating point PCM encoder/decoder
>  - MXF muxer
> +- Nellymoser ASAO encoder
>  
>  version 0.4.9-pre1:
>  
> Index: libavcodec/Makefile
> ===================================================================
> --- libavcodec/Makefile	(wersja 15006)
> +++ libavcodec/Makefile	(kopia robocza)
> @@ -139,6 +139,7 @@
>  OBJS-$(CONFIG_MSVIDEO1_DECODER)        += msvideo1.o
>  OBJS-$(CONFIG_MSZH_DECODER)            += lcldec.o
>  OBJS-$(CONFIG_NELLYMOSER_DECODER)      += nellymoserdec.o nellymoser.o mdct.o fft.o
> +OBJS-$(CONFIG_NELLYMOSER_ENCODER)      += nellymoserenc.o nellymoser.o mdct.o fft.o
>  OBJS-$(CONFIG_NUV_DECODER)             += nuv.o rtjpeg.o
>  OBJS-$(CONFIG_PAM_ENCODER)             += pnmenc.o pnm.o
>  OBJS-$(CONFIG_PBM_ENCODER)             += pnmenc.o pnm.o
> Index: libavcodec/allcodecs.c
> ===================================================================
> --- libavcodec/allcodecs.c	(wersja 15006)
> +++ libavcodec/allcodecs.c	(kopia robocza)
> @@ -200,7 +200,7 @@
>      REGISTER_DECODER (MP3ON4, mp3on4);
>      REGISTER_DECODER (MPC7, mpc7);
>      REGISTER_DECODER (MPC8, mpc8);
> -    REGISTER_DECODER (NELLYMOSER, nellymoser);
> +    REGISTER_ENCDEC  (NELLYMOSER, nellymoser);
>      REGISTER_DECODER (QDM2, qdm2);
>      REGISTER_DECODER (RA_144, ra_144);
>      REGISTER_DECODER (RA_288, ra_288);
> Index: libavcodec/nellymoserenc.c
> ===================================================================
> --- libavcodec/nellymoserenc.c	(wersja 0)
> +++ libavcodec/nellymoserenc.c	(wersja 0)
> @@ -0,0 +1,273 @@
> +/*
> + * Nellymoser encoder
> + * This code is developed as part of Google Summer of Code 2008 Program.
> + *
> + * Copyright (c) 2008 Bartlomiej Wolowiec
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file nellymoserenc.c
> + * Nellymoser encoder
> + * by Bartlomiej Wolowiec
> + *
> + * Generic codec information: libavcodec/nellymoserdec.c
> + * Log search algorithm idea: http://www1.mplayerhq.hu/ASAO/ASAO.zip
> + * (Copyright Joseph Artsimovich and UAB "DKD")
> + *
> + * for more information about nellymoser format, visit:
> + * http://wiki.multimedia.cx/index.php?title=Nellymoser
> + */
> +
> +#include "nellymoser.h"
> +#include "avcodec.h"
> +#include "dsputil.h"
> +
> +#define BITSTREAM_WRITER_LE
> +#include "bitstream.h"
> +

> +#define MAX_POW_CACHED (1<<11)

This name is poorly choosen, there no "cache" its rather fractional and
integer parts and a table large enough to hold the fractional ones
id suggest the name POW_TABLE_SIZE, its not perfect and iam happy to hear
better suggestions but it at least gets rid of the "cache"

> +#define POW_TABLE_OFFSET 3
> +
> +typedef struct NellyMoserEncodeContext {
> +    AVCodecContext  *avctx;
> +    int             last_frame;
> +    int             bufsize;              ///< number of sample in buf

> +    int             bits[NELLY_BUF_LEN];  ///< number of bits used to encode coeff

This does not need to be in the context

> +    DSPContext      dsp;
> +    MDCTContext     mdct_ctx;
> +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> +    DECLARE_ALIGNED_16(float, buf[2 * NELLY_SAMPLES]);     ///< sample buffer
> +} NellyMoserEncodeContext;
> +
> +static float pow_table[MAX_POW_CACHED];     ///< -pow(2, -i / 2048.0 - 3.0);
> +
> +#define LUT_init_add -3134
> +#define LUT_init_size 31355 + LUT_init_add
> +static int LUT_init_table[LUT_init_size];

i do not belive that the table needs to be that large

> +
> +#define LUT_delta_add 11725
> +#define LUT_delta_size 12975 + LUT_delta_add
> +static int LUT_delta_table[LUT_delta_size];
> +
> +#define LUT_dequantization_mul 128.0
> +#define LUT_dequantization_add LUT_dequantization_mul * 2.7
> +#define LUT_dequantization_size (int)(LUT_dequantization_mul * 2.5 + LUT_dequantization_add)
> +#define LUT_dequantization_maxbits 6
> +static int LUT_dequantization_table[LUT_dequantization_maxbits][LUT_dequantization_size];

neither do i belive that this one needs to be that large
besides they both can be uint8_t instead of int

and the tables for fewer bits dont need to be as large as the largest

> +
> +void apply_mdct(NellyMoserEncodeContext *s, float *in, float *coefs)
> +{
> +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> +
> +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES * sizeof(float));
> +    s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> +    s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, in_buff + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> +    ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> +}

The data is copied once in encode_frame and twice here
There is no need to copy the data 3 times.
vector_fmul can be used with a singl memcpy to get the data into any
destination, and vector_fmul_reverse doesnt even need 1 memcpy, so
overall a single memcpy is enough

[...]

> +#define find_best(val, table, LUT, LUT_add, LUT_mul, LUT_size) \
> +    best_idx = \
> +        LUT[av_clip (lrintf(val * LUT_mul + LUT_add), 0, LUT_size - 1)]; \
> +    if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> +        best_idx++;

this can be an inline function which would be cleaner

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080829/341f8236/attachment.pgp>