[FFmpeg-devel] [PATCH 0/3] synth filter float ASM
christophe.gisquet at gmail.com
Sat Mar 1 08:22:13 CET 2014
2014-03-01 4:32 GMT+01:00 James Almer <jamrial at gmail.com>:
> Here are some extra implementations that extend Christophe's work.
Thanks for this, it looks very nice.
> The first one (SSE) could very well replace SSE2 considering the only difference
> is in essence one extra mova.
> I benched a bit and there didn't seem to be any difference in speed at all between
> the two.
Actually, I would have written an SSE-only version (there was some
difference for me though), but I remember people wanting no further
SSE asm when there is a SSE2 version, up to the point that it made my
life simpler doing what they asked rather than argue with them. I hope
you'll be saved the trouble.
> Second patch is an implementation of AVX using ymm registers.
> In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU, 150
> cycles vs 180 cycles.
Nice, maybe update the patch comment with this reference number,
because it underlines it is a 15% speedup, which is not small.
I don't have comments on the asm otherwise, as I don't know avx. I
know your code passes fate-dts so that should be ok.
More information about the ffmpeg-devel