[FFmpeg-devel] [PATCH] FFV1 speed tweaks

Michael Niedermayer michaelni
Sat Aug 9 02:38:37 CEST 2008


On Fri, Aug 08, 2008 at 03:29:51PM -0600, Jason Garrett-Glaser wrote:
> On the order of 7-10 clocks faster per pixel, out of 120-160 clocks
> total, for encoding and decoding.
> 
> Dark Shikari
> 
> Index: libavcodec/ffv1.c
> ===================================================================
> --- libavcodec/ffv1.c	(revision 14661)
> +++ libavcodec/ffv1.c	(working copy)
> @@ -354,10 +354,10 @@
>  static inline int encode_line(FFV1Context *s, int w, int_fast16_t
> *sample[2], int plane_index, int bits){
>      PlaneContext * const p= &s->plane[plane_index];
>      RangeCoder * const c= &s->c;
> -    int x;
>      int run_index= s->run_index;
>      int run_count=0;
>      int run_mode=0;
> +    int_fast16_t *cur_sample[3] = {sample[0],sample[1],sample[2]};
> 
>      if(s->ac){
>          if(c->bytestream_end - c->bytestream < w*20){
> @@ -370,18 +370,17 @@
>              return -1;
>          }
>      }
> +    for(; w>0; w--){
> +        int diff, context, sign;
> 
> -    for(x=0; x<w; x++){
> -        int diff, context;
> +        context= get_context(s, cur_sample[0], cur_sample[1], cur_sample[2]);
> +        diff= cur_sample[0][0] - predict(cur_sample[0], cur_sample[1]);
> +
> +        /* Negate context and diff if context is negative */
> +        sign = context >> 31;
> +        context = (sign ^ context) - sign;
> +        diff = (sign ^ diff) - sign;
> 
> -        context= get_context(s, sample[0]+x, sample[1]+x, sample[2]+x);
> -        diff= sample[0][x] - predict(sample[0]+x, sample[1]+x);
> -
> -        if(context < 0){
> -            context = -context;
> -            diff= -diff;
> -        }
> -
>          diff= fold(diff, bits);
> 
>          if(s->ac){
> @@ -413,6 +412,9 @@
>              if(run_mode == 0)
>                  put_vlc_symbol(&s->pb, &p->vlc_state[context], diff, bits);
>          }
> +        cur_sample[0]++;
> +        cur_sample[1]++;
> +        cur_sample[2]++;

The change to cur_sample and the sign handling seem independant. They should
be in seperate and benchmarked patches.
I have no choice but to reject such combined patches (this isnt the first ...),
and iam affraid this could make you angry which is not what i want at all.
Its a matter of less than 5 min to split this and running a benchmark 3 times
which each should barely take a minute with a short file.
Combining several changes and benchmarking them together leads to chunks of
code being commited that are slower then the original ...

Also as the sign part ended up worse on ARM, maybe setting arch/cpu/tune flags
correctly would help x86 by making it use cmov without changing the code.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080809/2b6ae334/attachment.pgp>



More information about the ffmpeg-devel mailing list