[FFmpeg-devel] [PATCH] Add a G.722 encoder

Wed Sep 15 13:40:14 CEST 2010

On Wed, Sep 15, 2010 at 12:41:34PM +0300, Martin Storsj? wrote:
> On Tue, 14 Sep 2010, Michael Niedermayer wrote:
> 
> > On Sat, Sep 11, 2010 at 10:18:38PM +0300, Martin Storsj? wrote:
> > > On Fri, 10 Sep 2010, Martin Storsj? wrote:
> > > 
> > > > This actually turned out to work quite well, thanks! New version attached 
> > > > that does trellis for both of them at the same time.
> > > 
> > > Updated patches attached - I tuned the testing range for the lower subband 
> > > a bit to achieve even better results.
> > > 
> > > // Martin
> 
> > > +static inline int encode_high(G722Context *c, int xhigh)
> > > +{
> > > +    int diff = av_clip_int16(xhigh - c->band[1].s_predictor);
> > > +    int pred = 564 * c->band[1].scale_factor >> 10;
> > 
> > *141 >> 8
> 
> Fixed
> 
> > > +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > +
> > > +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > +                          high_inv_quant[index] >> 10, index);
> > > +    return index;
> > > +}
> > > +
> > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > +{
> > > +    int diff  = av_clip_int16(xlow - state->s_predictor);
> > > +    int limit = diff >= 0 ? diff : -(diff + 1);
> > 
> > > +    int i = 0;
> > > +    while (i < 29 && limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > +        i++;
> > 
> > that doesnt look efficient
> > limit >= (low_quant[i] * state->scale_factor) >> 10)
> > can be changed to
> > C > low_quant[i]
> 
> Hmm, do you mean something like this?
> 
>     limit = (limit << 10) / state->scale_factor;
>     while (i < 29 && limit >= low_quant[i])
>         i++;

something like this yes, but with correct rounding
you could also try to just get rid of the >>
(limit+1)<<10 > low_quant[i] * state->scale_factor

to get the rounding correct use your brain and try to exactly move each
operation from one side of the >= to the other


> 
> This makes the results differ slightly from the reference test vectors 
> (which perhaps in itself is acceptable, but a bitexact mode to test 
> against the reference may still be useful). This actually made it slower,
> 
> 1015 dezicycles in encode_low, 1048518 runs, 58 skips
> 
> vs
> 
> 963 dezicycles in encode_low, 1048494 runs, 82 skips

as the result differs we dont know if the loop runs the same number of
iterations but its possible of course the the division is too slow for
this to work out


> 
> initially.
> 
> > also a LUT could be tried if this matters speed wise
> 
> Hmm, what would this LUT contain? The output depends both on the current 
> 16-bit diff and the current scale factor, so one single 64k LUT isn't 
> enough.

as said this is equivalent to  C > low_quant[i] you just have to calculate C
exactly and not assume these are real values from math lectures, they are
integers and >> and / can round down

also the i<29 can likely be removed by making sure the table contains an
appropriate value at the end


[...]
> +static inline int encode_high(G722Context *c, int xhigh)
> +{
> +    int diff = av_clip_int16(xhigh - c->band[1].s_predictor);
> +    int pred = 141 * c->band[1].scale_factor >> 8;
> +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> +
> +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> +                          high_inv_quant[index] >> 10, index);
> +    return index;
> +}
> +
> +static inline int encode_low(const struct G722Band* state, int xlow)
> +{
> +    int diff  = av_clip_int16(xlow - state->s_predictor);

> +    int limit = diff >= 0 ? diff : -(diff + 1);

thats
diff ^ (diff>>31)
i think

and maybe that can also be used in encode_high() to get rid of the branch


> +    int i = 0;
> +    while (i < 29 && limit >= (low_quant[i] * state->scale_factor) >> 10)
> +        i++;
> +    return (diff < 0 ? (i < 2 ? 63 : 33) : 61) - i;
> +}
> +
> +static int g722_encode_frame(AVCodecContext *avctx,
> +                             uint8_t *dst, int buf_size, void *data)
> +{
> +    G722Context *c = avctx->priv_data;
> +    const int16_t *samples = data;
> +    int i;
> +

> +    for (i = 0; i < buf_size/2; i++) {

>>1

/2 with signed vars is not as fast


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100915/42d2a214/attachment.pgp>