[Libav-user] a little performance/optimisation headbreaker :)

Fri Feb 15 16:48:13 CET 2013

Thanks, Claudio!

On Feb 15, 2013, at 16:33, Claudio Freire wrote:

> gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's
> what you're experiencing. I guess compiler flags do matter too.

I haven't compiled with -ftree-vectorize (rather, I tried with and without, made no difference), but you're right ... -fno-tree-vectorize gets me back to the 2x faster performance of the hand-coded SSE version. Amazing, I never really saw a lot of benefit to the tree-vectoriser before!

If it wasn't clear, I didn't hand code the SSE version myself, so comparing the versions will be like looking for the relative differences between the works of 2 post-modern art schools ;)
I've run the code through Shark, though, and that showed a clear load difference in disfavour of the SSE version.

> gcc, which tends to inhibit many of its other optimizations. Why don't
> you try gcc's vector primitives instead?

Which ones? As in the few lines with intrinsics for MSVC, which also compile under gcc but shows no speed dis/advantage with gcc ?

BTW, this does beg the question why ffmpeg's build process uses -fno-tree-vectorize ... maybe that's no longer required for today's compilers?

R.