[FFmpeg-devel] [PATCH] swr/resample: use fma when it is faster

Sun Dec 13 23:55:43 CET 2015

On Sun, Dec 13, 2015 at 5:47 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Sun, Dec 13, 2015 at 4:59 PM, Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> wrote:
>>
>> fma is a faster function on architectures supporting a native CPU
>> instruction for it.
>> This may be tested by the ISO C optionally defined FP_FAST_FMA. Although
>> in the x86 lineup this came fairly late
>> (from Haswell onwards, and hence is absent unless appropriate -march is
>> passed),
>> numerous other architectures support it:
>> https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation.
>>
>> Concretely, one can expect ~ 15-25% speedup that is of course heavily
>> architecture dependent.
>>
>> This patch also ensures that as people migrate to newer CPU's, the
>> benefit will slowly trickle in.
>>
>> I doubt this will cause build failures on broken libm's since I can't
>> imagine a platform where FP_FAST_FMA is defined but the function fma is
>> absent.
>>
>> Sample benchmark (x86-64, Haswell, GNU/Linux under -march=native)
>>
>> old:
>> 515828458 decicycles in build_filter (loop 1000),    1024 runs,      0
>> skips
>>
>> new (fma):
>> 435866377 decicycles in build_filter (loop 1000),    1024 runs,      0
>> skips
>>
>> Tested with FATE.
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> ---
>>  libswresample/resample.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/libswresample/resample.c b/libswresample/resample.c
>> index 34eb4c0..e61d4c5 100644
>> --- a/libswresample/resample.c
>> +++ b/libswresample/resample.c
>> @@ -33,8 +33,12 @@ static inline double eval_poly(const double *coeff, int
>> size, double x) {
>>      double sum = coeff[size-1];
>>      int i;
>>      for (i = size-2; i >= 0; --i) {
>> +#ifdef FP_FAST_FMA
>> +        sum = fma(sum, x, coeff[i]);
>> +#else
>>          sum *= x;
>>          sum += coeff[i];
>> +#endif
>>      }
>>      return sum;
>>  }
>> --
>> 2.6.4
>
>
> Nope, this is not how we do CPU-specific optimizations. Check example
> implementations in libswresample/x86/*.asm and the related init functions
> plus macros to check for runtime cpu support in libswresample/x86/*_init.c.
> You want to follow that pattern.

No, this is not x86 specific. This is generic code. If I did such a
maneouver, benefits would apply only to x86, an inferior outcome.

>
> Ronald