[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Clément Bœsch
u at pkh.me
Tue Oct 13 08:45:01 CEST 2015
On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
> >>
> >>> Bench from libavfilter/astats on a 15 min clip.
> >>
> >> I believe that your test would indicate that the
> >> old variant is faster or that no result can be
> >> given which is what my tests show.
>
> Also, how you can possibly believe that the old variant is faster is
> beyond me given the astonishing amount of work by Intel, Red Hat, and
> others to create the absolutely best performing libc.
>
> Just have a look at
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
> it gives an idea of the extreme lengths they go to.
>
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
[/tmp]☭ cat a.c
#include <math.h>
#include <stdlib.h>
#define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
double f1d(double x) { return fabs(x); }
double f2d(double x) { return FFABS(x); }
int f1i(int x) { return abs(x); }
int f2i(int x) { return FFABS(x); }
[/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o
a.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f1d>:
0: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # 8 <f1d+0x8>
7: 00
8: 66 0f 54 c1 andpd xmm0,xmm1
c: c3 ret
d: 0f 1f 00 nop DWORD PTR [rax]
0000000000000010 <f2d>:
10: 66 0f 2e 05 00 00 00 ucomisd xmm0,QWORD PTR [rip+0x0] # 18 <f2d+0x8>
17: 00
18: 72 06 jb 20 <f2d+0x10>
1a: f3 c3 repz ret
1c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
20: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # 28 <f2d+0x18>
27: 00
28: 66 0f 57 c1 xorpd xmm0,xmm1
2c: c3 ret
2d: 0f 1f 00 nop DWORD PTR [rax]
0000000000000030 <f1i>:
30: 89 fa mov edx,edi
32: 89 f8 mov eax,edi
34: c1 fa 1f sar edx,0x1f
37: 31 d0 xor eax,edx
39: 29 d0 sub eax,edx
3b: c3 ret
3c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000040 <f2i>:
40: 89 fa mov edx,edi
42: 89 f8 mov eax,edi
44: c1 fa 1f sar edx,0x1f
47: 31 d0 xor eax,edx
49: 29 d0 sub eax,edx
4b: c3 ret
[/tmp]☭
So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
essentially identical to FFABS().
I have similar results with clang (3.7.0).
Conclusion: using fabs() looks better with at least recent versions of clang
and GCC on x86-64 (but may introduce slight behaviour changes?)
To be more rigorous, it would be interesting to compare on different arch &
compilers, but changing FFABS() with fabs() sounds OK to me.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151013/373fed57/attachment.sig>
More information about the ffmpeg-devel
mailing list