[FFmpeg-devel] [PATCH] adpcm: Collapse nodes with similar state only after checking ssd

Michael Niedermayer michaelni
Wed Nov 17 00:29:58 CET 2010


On Mon, Nov 15, 2010 at 03:47:20PM +0200, Martin Storsj? wrote:
> Hi,
> 
> In the attached patch, I shuffled around some checks in the adpcm trellis 
> code, moving the costly check for collapsing nodes with similar state to 
> after checking if the ssd is worse than the one to be replaced.
> 
> With this patch in place, the run time for -trellis 8 on one test sample 
> drops from 158 to 79 seconds, on my machine.
> 
> Runtime vs PSNR graphs are available at 
> http://albin.abo.fi/~mstorsjo/adpcm-graphs2/. For the music1 and music2 
> samples, this gives quite clear improvments in performance. For the 
> trailer sample, I notice that all three algorithms seem to converge to the 
> same output quality at the end.
> 
> // Martin
>  adpcm.c |   28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
> a8a50c076fb95bc35637592b1a192d374e4862a8  0001-adpcm-Collapse-nodes-with-similar-state-only-after-c.patch
> From 66a48591cd79d9b4269efe272fd7591bb5036975 Mon Sep 17 00:00:00 2001
> From: Martin Storsjo <martin at martin.st>
> Date: Sun, 14 Nov 2010 12:41:06 +0200
> Subject: [PATCH] adpcm: Collapse nodes with similar state only after checking the ssd
> 
> If the heap is full, insertion often is skipped if the ssd is larger
> than for the node to be replaced.
> 
> Since we know the size of the heap, we don't need to check if
> node_next[k] is non-null in each iteration.
> 
> For -trellis 8, this lowers the run time from 158 to 79 seconds,
> for a 30 second 44 kHz mono sample, on my machine.
> ---
>  libavcodec/adpcm.c |   28 ++++++++++++++++------------
>  1 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/libavcodec/adpcm.c b/libavcodec/adpcm.c
> index 56eb602..f37a104 100644
> --- a/libavcodec/adpcm.c
> +++ b/libavcodec/adpcm.c
> @@ -370,29 +370,33 @@ static void adpcm_compress_trellis(AVCodecContext *avctx, const short *samples,
>  #define STORE_NODE(NAME, STEP_INDEX)\
>                      int d;\
>                      uint32_t ssd;\
> -                    int pos;\
> +                    int pos, heap_size = FFMIN(heap_pos, frontier);\
>                      TrellisNode *u;\
>                      dec_sample = av_clip_int16(dec_sample);\
>                      d = sample - dec_sample;\
>                      ssd = nodes[j]->ssd + d*d;\
> -                    /* Collapse any two states with the same previous sample value. \
> -                     * One could also distinguish states by step and by 2nd to last
> -                     * sample, but the effects of that are negligible. */\
> -                    for(k=0; k<frontier && nodes_next[k]; k++) {\
> -                        if(dec_sample == nodes_next[k]->sample1) {\
> -                            assert(ssd >= nodes_next[k]->ssd);\
> -                            goto next_##NAME;\
> -                        }\
> -                    }\
>                      if (heap_pos < frontier) {\
> -                        pos = heap_pos++;\
> +                        pos = heap_pos;\
>                      } else {\
>                          /* Try to replace one of the leaf nodes with the new \
>                           * one, but try a different slot each time. */\
> -                        pos = (frontier >> 1) + (heap_pos++ & ((frontier >> 1) - 1));\
> +                        pos = (frontier >> 1) + (heap_pos & ((frontier >> 1) - 1));\
>                          if (ssd > nodes_next[pos]->ssd)\
>                              goto next_##NAME;\
>                      }\
> +                    /* Collapse any two states with the same previous sample value. \
> +                     * One could also distinguish states by step and by 2nd to last \
> +                     * sample, but the effects of that are negligible. \
> +                     * Since nodes in the previous generation are iterated \
> +                     * through a heap, they're roughly ordered from better to \
> +                     * worse, but not strictly ordered. Therefore, an earlier \
> +                     * node with the same sample value is better in most cases \
> +                     * (and thus the current is skipped), but not strictly \
> +                     * in all cases. */ \
> +                    for (k = 0; k < heap_size; k++)\
> +                        if (dec_sample == nodes_next[k]->sample1)\
> +                            goto next_##NAME;\

should this not check ssd too, so that better ones are kept?

also a hashtable could be used to speed this up in theory
like:
h=&hash[dec_sample];
if(h->generation == generation && h->ssd < ssd){
    goto next_##NAME;
}
h->generation= generation;
h->ssd= ssd

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101117/51573f47/attachment.pgp>



More information about the ffmpeg-devel mailing list