[FFmpeg-devel] [PATCH 0/3] synth filter float ASM
James Almer
jamrial at gmail.com
Sat Mar 1 09:35:24 CET 2014
On 01/03/14 4:22 AM, Christophe Gisquet wrote:
> Hi,
>
> 2014-03-01 4:32 GMT+01:00 James Almer <jamrial at gmail.com>:
>> Here are some extra implementations that extend Christophe's work.
>
> Thanks for this, it looks very nice.
>
>> The first one (SSE) could very well replace SSE2 considering the only difference
>> is in essence one extra mova.
>> I benched a bit and there didn't seem to be any difference in speed at all between
>> the two.
>
> Actually, I would have written an SSE-only version (there was some
> difference for me though), but I remember people wanting no further
> SSE asm when there is a SSE2 version, up to the point that it made my
> life simpler doing what they asked rather than argue with them. I hope
> you'll be saved the trouble.
Having both SSE and SSE2 is pointless on x64, which is why i made it x86 only.
The function is pretty much SSE for that matter. pxor/xorps work exactly the same
to zero the registers, and so do pshufd/shufps to spread the 32 of data bits across
registers.
The only difference is in one pshufd/shufps case since the former uses one source
and the latter uses two (dst being the second), so for a memory source the movaps
was needed if i wanted the same results as pshufd.
I didn't notice a performance hit from those extra movaps, but if you or others do
then maybe it's better to keep both versions.
>
>> Second patch is an implementation of AVX using ymm registers.
>> In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU, 150
>> cycles vs 180 cycles.
>
> Nice, maybe update the patch comment with this reference number,
> because it underlines it is a 15% speedup, which is not small.
Personally i was expecting a bigger boost than that, considering the main loop is
being run only once in x64 and twice in x86, compared to two and four times
respectively with SSE2. But i guess things aren't as linear as i thought.
> I don't have comments on the asm otherwise, as I don't know avx. I
> know your code passes fate-dts so that should be ok.
>
More information about the ffmpeg-devel
mailing list