[FFmpeg-devel] [PATCH] swr/resample: use fma when it is faster

Ronald S. Bultje rsbultje at gmail.com
Sun Dec 13 23:47:43 CET 2015


Hi,

On Sun, Dec 13, 2015 at 4:59 PM, Ganesh Ajjanagadde <gajjanagadde at gmail.com>
wrote:

> fma is a faster function on architectures supporting a native CPU
> instruction for it.
> This may be tested by the ISO C optionally defined FP_FAST_FMA. Although
> in the x86 lineup this came fairly late
> (from Haswell onwards, and hence is absent unless appropriate -march is
> passed),
> numerous other architectures support it:
> https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation.
>
> Concretely, one can expect ~ 15-25% speedup that is of course heavily
> architecture dependent.
>
> This patch also ensures that as people migrate to newer CPU's, the
> benefit will slowly trickle in.
>
> I doubt this will cause build failures on broken libm's since I can't
> imagine a platform where FP_FAST_FMA is defined but the function fma is
> absent.
>
> Sample benchmark (x86-64, Haswell, GNU/Linux under -march=native)
>
> old:
> 515828458 decicycles in build_filter (loop 1000),    1024 runs,      0
> skips
>
> new (fma):
> 435866377 decicycles in build_filter (loop 1000),    1024 runs,      0
> skips
>
> Tested with FATE.
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
>  libswresample/resample.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/libswresample/resample.c b/libswresample/resample.c
> index 34eb4c0..e61d4c5 100644
> --- a/libswresample/resample.c
> +++ b/libswresample/resample.c
> @@ -33,8 +33,12 @@ static inline double eval_poly(const double *coeff, int
> size, double x) {
>      double sum = coeff[size-1];
>      int i;
>      for (i = size-2; i >= 0; --i) {
> +#ifdef FP_FAST_FMA
> +        sum = fma(sum, x, coeff[i]);
> +#else
>          sum *= x;
>          sum += coeff[i];
> +#endif
>      }
>      return sum;
>  }
> --
> 2.6.4


Nope, this is not how we do CPU-specific optimizations. Check example
implementations in libswresample/x86/*.asm and the related init functions
plus macros to check for runtime cpu support in libswresample/x86/*_init.c.
You want to follow that pattern.

Ronald


More information about the ffmpeg-devel mailing list