[FFmpeg-devel] [PATCH] h264_cabac.c: branchless (amvd>2)+(amvd>32)

Fri Feb 26 16:28:32 CET 2010

Hi Michael,

in commit 22032:
>switch back to (amvd>2)+(amvd>32), its 5 cpu cycles faster now.

On x86 it seems gcc uses the following way to get (amvd>2)
xor reg, reg
cmp reg, 2
setg regb

This introduces partial register access, which is slow on most CPUs.

Here is my patch, saving one instruction and no partial register access.

Index: libavcodec/h264_cabac.c
===================================================================

--- libavcodec/h264_cabac.c (revision 22075)
+++ libavcodec/h264_cabac.c (working copy)
@@ -912,10 +912,12 @@
 static int decode_cabac_mb_mvd( H264Context *h, int ctxbase, int amvd, int *mvda) {
     int mvd;
 
-    if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+(amvd>2)+(amvd>32)])){
+#define SHIFT (sizeof(int)*4-1)
+    if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+((amvd-3)>>SHIFT)+((amvd-33)>>SHIFT)+2])){
         *mvda= 0;
         return 0;
     }
+#undef SHIFT
 
     mvd= 1;
     ctxbase+= 3;

Regards,

ZZ