[FFmpeg-devel] [PATCH 0/3] synth filter float ASM

James Almer jamrial at gmail.com
Sat Mar 1 04:32:06 CET 2014

Here are some extra implementations that extend Christophe's work.

The first one (SSE) could very well replace SSE2 considering the only difference 
is in essence one extra mova.
I benched a bit and there didn't seem to be any difference in speed at all between 
the two.

Second patch is an implementation of AVX using ymm registers.
In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU, 150 
cycles vs 180 cycles.

I don't have proper numbers for the third patch since i could only test on an AMD 
rig, where functions using ymm registers tend to have subpar performance.
It still beat the AVX version by a decent marging, though, so Haswell should see 
a nice boost with it.

I could add an FMA4 version using xmm registers, which would benefit AMD users 
unlike these AVX/FMA3 ymm ones. Thoughts?

James Almer (3):
  x86/synth_filter: add synth_filter_sse
  x86/synth_filter: add synth_filter_avx
  x86/synth_filter: add synth_filter_fma3

 libavcodec/x86/dcadsp.asm    | 84 ++++++++++++++++++++++++++++----------------
 libavcodec/x86/dcadsp_init.c | 51 ++++++++++++++++++---------
 2 files changed, 88 insertions(+), 47 deletions(-)


More information about the ffmpeg-devel mailing list