[FFmpeg-devel] FASTDIV macro

Sun Nov 9 15:49:17 CET 2008

On Sunday 09 November 2008, M?ns Rullg?rd wrote:
[...]
> > Here is some very crude synthetic benchmarking program attached. Of
> > course it does not take into account possible cache misses on the
> > table access and also the fact that sometimes we may need to use
> > expressions like "b==1 ? a : FASTDIV(a, b)".
> >
> > The results are the following:
> >
> > --- Pentium-M, gcc 4.3.2 (-O2) ---
> > normaldiv(-1896828497) : time=2.195s
> > fastdiv_c(-1896828497) : time=0.564s
> > fastdiv_asm_x86(-1896828497) : time=0.416s
> >
> > --- Core2 (64-bit), gcc 4.1.2 (-O2) ---
> > normaldiv(-1896828497) : time=0.681s
> > fastdiv_c(-1896828497) : time=0.183s
> > fastdiv_asm_x86(-1896828497) : time=0.222s
>
> So plain C is faster than asm on Core2?  Did you look at the generated
> code?

The inner loop is the following:

fastdiv_c:
  46:   8b 04 95 00 00 00 00    mov    0x0(,%rdx,4),%eax
  4d:   48 ff c2                inc    %rdx
  50:   48 0f af c6             imul   %rsi,%rax
  54:   48 c1 e8 20             shr    $0x20,%rax
  58:   01 c1                   add    %eax,%ecx
  5a:   48 81 fa fe 00 00 00    cmp    $0xfe,%rdx
  61:   75 e3                   jne    46 <fastdiv_c+0x6>

fastdiv_asm_x86:
  86:   89 f0                   mov    %esi,%eax
  88:   f7 24 8d 00 00 00 00    mull   0x0(,%rcx,4)
  8f:   48 ff c1                inc    %rcx
  92:   01 d7                   add    %edx,%edi
  94:   48 81 f9 fe 00 00 00    cmp    $0xfe,%rcx
  9b:   75 e9                   jne    86 <fastdiv_asm_x86+0x6>

-- 
Best regards,
Siarhei Siamashka