[FFmpeg-devel] [PATCH] h264_cabac.c: branchless (amvd>2)+(amvd>32)

Måns Rullgård mans
Fri Feb 26 17:10:06 CET 2010


"Zhou Zongyi"<zhouzy at os.pku.edu.cn> writes:

> Hi Michael,
>
> in commit 22032:
>>switch back to (amvd>2)+(amvd>32), its 5 cpu cycles faster now.
>
> On x86 it seems gcc uses the following way to get (amvd>2)
> xor reg, reg
> cmp reg, 2
> setg regb
>
> This introduces partial register access, which is slow on most CPUs.
>
> Here is my patch, saving one instruction and no partial register access.
>
> Index: libavcodec/h264_cabac.c
> ===================================================================
> --- libavcodec/h264_cabac.c (revision 22075)
> +++ libavcodec/h264_cabac.c (working copy)
> @@ -912,10 +912,12 @@
>  static int decode_cabac_mb_mvd( H264Context *h, int ctxbase, int amvd, int *mvda) {
>      int mvd;
>
> -    if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+(amvd>2)+(amvd>32)])){
> +#define SHIFT (sizeof(int)*4-1)

INT_BIT-1

> +    if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+((amvd-3)>>SHIFT)+((amvd-33)>>SHIFT)+2])){
>          *mvda= 0;
>          return 0;
>      }
> +#undef SHIFT
>
>      mvd= 1;
>      ctxbase+= 3;

I checked the effects of this change on various targets, and most of
them seem to benefit or be stay unchanged.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list