[FFmpeg-devel] AMR-NB decoder

Mon Aug 10 14:48:54 CEST 2009

On Mon, Aug 10, 2009 at 08:42:53AM +0100, Colin McQuillan wrote:
> 2009/8/9 M?ns Rullg?rd <mans at mansr.com>:
> > Colin McQuillan <m.niloc at googlemail.com> writes:
> >
> >> 2009/8/8 Michael Niedermayer <michaelni at gmx.at>:
> >>> On Sat, Aug 08, 2009 at 04:09:39PM +0100, Colin McQuillan wrote:
> >>>> 2009/8/8 Michael Niedermayer <michaelni at gmx.at>:
> >>>> > On Fri, Aug 07, 2009 at 08:23:53PM +0100, Colin McQuillan wrote:
> >>>> >> 2009/8/6 Michael Niedermayer <michaelni at gmx.at>:
> >>>> >> > On Wed, Aug 05, 2009 at 05:51:36PM +0100, Colin McQuillan wrote:
> >>>> >> >> Attached is a patch for an AMR-NB decoder.
> >>>>
> >>>> [...]
> >>>>
> >>>> >> > that should e a seperate patch
> >>>> >>
> >>>> >> I'll leave this one until I investigate a version for sparse vectors. Attached:
> >>>> >>
> >>>> >> 1. Helper functions for gain control in floating-point codecs
> >>>> >> I couldn't find a similar fixed point function to copy the function name.
> >>>> >>
> >>>> >> 2. Floating-point version of ff_acelp_high_pass_filter
> >>>> >
> >>>> >> ?acelp_vectors.c | ? 22 ++++++++++++++++++++++
> >>>> >> ?acelp_vectors.h | ? 27 +++++++++++++++++++++++++++
> >>>> >> ?2 files changed, 49 insertions(+)
> >>>> >> f1abbee9b62c1779fd5fb1c634d4ab4294d8611d ?get-set-energyf.patch
> >>>> >> Index: libavcodec/acelp_vectors.c
> >>>> >> ===================================================================
> >>>> >> --- libavcodec/acelp_vectors.c ? ? ? ?(revision 19606)
> >>>> >> +++ libavcodec/acelp_vectors.c ? ? ? ?(working copy)
> >>>> >> @@ -155,3 +155,25 @@
> >>>> >> ? ? ? ? ?out[i] = weight_coeff_a * in_a[i]
> >>>> >> ? ? ? ? ? ? ? ? + weight_coeff_b * in_b[i];
> >>>> >> ?}
> >>>> >> +
> >>>> >> +float ff_energyf(const float *v, int length)
> >>>> >> +{
> >>>> >> + ? ?float sum = 0;
> >>>> >> + ? ?int i;
> >>>> >> +
> >>>> >> + ? ?for (i = 0; i < length; i++)
> >>>> >> + ? ? ? ?sum += v[i] * v[i];
> >>>> >> +
> >>>> >> + ? ?return sum;
> >>>> >> +}
> >>>> >
> >>>> > ff_dot_productf)(
> >>>>
> >>>> Do you mean that ff_energyf is redundant? I've taken it out.
> >>>
> >>> hmm well, as you say it that way, ff_energyf() could be faster due to
> >>> fewer mem reads, if that is te case in practice it could be kept
> >>
> >> ff_energyf is reliably 4% faster in my test, so I'll add it back in.
> >
> > That function has high simdicity so it should be added to dsputil and
> > simdified.
> 
> I'll try, but I didn't mean to imply that energy calculations are
> critical to performance. The slow parts of the AMR decoder are the IIR
> and FIR filters, which are already in celp_filters.c.

if the SIMD energy is not reaching an overall (whole codec) 0.1% speedup over
using a more generic SIMD dot product then its probably not worth it and
could be ommited

> 
> Attached is "Implement vector energy calculation in dsputil".

>  dsputil.c         |   12 ++++++++++++
>  dsputil.h         |    2 ++
>  x86/dsputil_mmx.c |   26 ++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> bfa9b2ddf24406c925efda2d4a58e3bb078e74fb  vector_energyf.patch
> Index: libavcodec/x86/dsputil_mmx.c
> ===================================================================
> --- libavcodec/x86/dsputil_mmx.c	(revision 19613)
> +++ libavcodec/x86/dsputil_mmx.c	(working copy)
> @@ -2051,6 +2051,31 @@
>      }
>  }
>  
> +static float vector_energyf_sse(const float *src, int len)
> +{
> +    float result;
> +    x86_reg i = (len - 4) * 4;
> +    __asm__ volatile(
> +        "xorps         %%xmm2, %%xmm2 \n"
> +        "1:                           \n"

> +        "movaps       (%2,%0), %%xmm0 \n"
> +        "movaps        %%xmm0, %%xmm1 \n"
> +        "mulps         %%xmm0, %%xmm1 \n"
> +        "addps         %%xmm1, %%xmm2 \n"

movaps       (%2,%0), %%xmm0
mulps         %%xmm0, %%xmm0
addps         %%xmm0, %%xmm2

> +        "sub              $16,     %0 \n"
> +        "jge 1b                       \n"

> +        "movlhps       %%xmm2, %%xmm1 \n"
> +        "addps         %%xmm2, %%xmm1 \n"
> +        "shufps $0xBB, %%xmm1, %%xmm2 \n"
> +        "addps         %%xmm1, %%xmm2 \n"
> +        "movhlps       %%xmm2, %%xmm2 \n"
> +        "movss         %%xmm2,     %1 \n"

i wonder if thats the fastest way to do it ...

[...]

> Index: libavcodec/dsputil.h
> ===================================================================
> --- libavcodec/dsputil.h	(revision 19613)
> +++ libavcodec/dsputil.h	(working copy)
> @@ -387,6 +387,8 @@
>      void (*ac3_downmix)(float (*samples)[256], float (*matrix)[2], int out_ch, int in_ch, int len);
>      /* no alignment needed */
>      void (*flac_compute_autocorr)(const int32_t *data, int len, int lag, double *autoc);
> +    /* assume len is a multiple of 4, and arrays are 16-byte aligned */
> +    float (*vector_energyf)(const float *src, int len);

alignment requirements are supposed to be written liks:
void ff_vp3_idct_put_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/);
also "/*" is not doxygen compatible and what the function does should be
more verbosely described, energy isnt a mathematically clear term, dot
product and sum or squares are.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090810/2e990c93/attachment.pgp>