[FFmpeg-devel] [PATCH 0/3] synth filter float ASM
jamrial at gmail.com
Sat Mar 1 04:32:06 CET 2014
Here are some extra implementations that extend Christophe's work.
The first one (SSE) could very well replace SSE2 considering the only difference
is in essence one extra mova.
I benched a bit and there didn't seem to be any difference in speed at all between
Second patch is an implementation of AVX using ymm registers.
In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU, 150
cycles vs 180 cycles.
I don't have proper numbers for the third patch since i could only test on an AMD
rig, where functions using ymm registers tend to have subpar performance.
It still beat the AVX version by a decent marging, though, so Haswell should see
a nice boost with it.
I could add an FMA4 version using xmm registers, which would benefit AMD users
unlike these AVX/FMA3 ymm ones. Thoughts?
James Almer (3):
x86/synth_filter: add synth_filter_sse
x86/synth_filter: add synth_filter_avx
x86/synth_filter: add synth_filter_fma3
libavcodec/x86/dcadsp.asm | 84 ++++++++++++++++++++++++++++----------------
libavcodec/x86/dcadsp_init.c | 51 ++++++++++++++++++---------
2 files changed, 88 insertions(+), 47 deletions(-)
More information about the ffmpeg-devel