[FFmpeg-devel] [PATCH] adpcm: Reset the ssd back to zero more often

Tue Nov 23 05:32:04 CET 2010

On Mon, Nov 22, 2010 at 10:45:29AM +0200, Martin Storsj? wrote:
> On Mon, 22 Nov 2010, Michael Niedermayer wrote:
> 
> > On Thu, Nov 18, 2010 at 04:01:31PM +0200, Martin Storsjo wrote:
> > > If using very large trellis sizes (e.g. -trellis 15), the frontier
> > > is so large that the difference between the best and the worst
> > > trellis node in the frontier is large enough to cause wraparound.
> > > 
> > > Resetting at (1<<20) is enough to avoid the issue at -trellis 16
> > > with the sample I'm testing, therefore doing the resets at (1<<18)
> > > to have some extra safety margin.
> > > 
> > > This doesn't incur any noticeable slowdown.
> > > ---
> > > I noticed this while testing the optimizations that Michael suggested
> > > (using a hash to find out which samples to skip), which made the
> > > encoder fast enough to actually be able to try out the maximum
> > > trellis levels.
> > > 
> > >  libavcodec/adpcm.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/libavcodec/adpcm.c b/libavcodec/adpcm.c
> > > index 56eb602..f6e3cb7 100644
> > > --- a/libavcodec/adpcm.c
> > > +++ b/libavcodec/adpcm.c
> > > @@ -444,7 +444,7 @@ static void adpcm_compress_trellis(AVCodecContext *avctx, const short *samples,
> > >          nodes_next = u;
> > >  
> > >          // prevent overflow
> > > -        if(nodes[0]->ssd > (1<<28)) {
> > > +        if(nodes[0]->ssd > (1<<18)) {
> > >              for(j=1; j<frontier && nodes[j]; j++)
> > >                  nodes[j]->ssd -= nodes[0]->ssd;
> > >              nodes[0]->ssd = 0;
> > 
> > what if this is always performed and the subtraction added into some existing
> > code.
> 
> Even if doing this each round, that doesn't help when the SSD of the 
> best node is much much smaller than the SSD of the worst node, since we 
> can't ever subtract more than the SSD of the best one. Theoretically, the 
> SSD of the best one could be at near-zero all the time, and the worst one 
> could get an added 65535^2 each round, overflowing almost instantly, while 
> there isn't anything that could be subtracted from all nodes.

someone could try to write asm macros for doing 64 add and compare on x86-32
especially as a proof of concept this should be quite trivial and if that
isnt slower thrn gcc likely messed up the 64bit code

[...]
--
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct awnser.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101123/aa87837d/attachment.pgp>