[FFmpeg-devel] FASTDIV macro
Siarhei Siamashka
siarhei.siamashka
Sun Nov 9 15:49:17 CET 2008
On Sunday 09 November 2008, M?ns Rullg?rd wrote:
[...]
> > Here is some very crude synthetic benchmarking program attached. Of
> > course it does not take into account possible cache misses on the
> > table access and also the fact that sometimes we may need to use
> > expressions like "b==1 ? a : FASTDIV(a, b)".
> >
> > The results are the following:
> >
> > --- Pentium-M, gcc 4.3.2 (-O2) ---
> > normaldiv(-1896828497) : time=2.195s
> > fastdiv_c(-1896828497) : time=0.564s
> > fastdiv_asm_x86(-1896828497) : time=0.416s
> >
> > --- Core2 (64-bit), gcc 4.1.2 (-O2) ---
> > normaldiv(-1896828497) : time=0.681s
> > fastdiv_c(-1896828497) : time=0.183s
> > fastdiv_asm_x86(-1896828497) : time=0.222s
>
> So plain C is faster than asm on Core2? Did you look at the generated
> code?
The inner loop is the following:
fastdiv_c:
46: 8b 04 95 00 00 00 00 mov 0x0(,%rdx,4),%eax
4d: 48 ff c2 inc %rdx
50: 48 0f af c6 imul %rsi,%rax
54: 48 c1 e8 20 shr $0x20,%rax
58: 01 c1 add %eax,%ecx
5a: 48 81 fa fe 00 00 00 cmp $0xfe,%rdx
61: 75 e3 jne 46 <fastdiv_c+0x6>
fastdiv_asm_x86:
86: 89 f0 mov %esi,%eax
88: f7 24 8d 00 00 00 00 mull 0x0(,%rcx,4)
8f: 48 ff c1 inc %rcx
92: 01 d7 add %edx,%edi
94: 48 81 f9 fe 00 00 00 cmp $0xfe,%rcx
9b: 75 e9 jne 86 <fastdiv_asm_x86+0x6>
--
Best regards,
Siarhei Siamashka
More information about the ffmpeg-devel
mailing list