[Libav-user] gcc auto-vectorisation

Mon Feb 25 23:34:53 CET 2013

On Feb 25, 2013, at 20:19, Claudio Freire wrote:

> That's because __builtin_assume_aligned isn't being called (most
> likely, didn't check). That results in **far** sub-optimal
> vectorization. I don't know about the failing tests though.

I doubt that call (or rather, token?) is required on OS X, where memory allocations (and stack alignment) are aligned. I know of a case where the absence of the token didn't prevent a very substantial performance gain, but haven't checked if that's always the case.

I have a list of the loops that were vectorised (hard to read as I build with -j4 :)). I kind of expect those loops to be in places where the vectorisation doesn't change the overall performance picture - because the containing functions don't do any significant work (think initialisation) or simply aren't called at all (catch-all cases for which there is no hand-coded mmx/sse/... function that don't get tripped by the test suite). If that hunch is correct, the absence of a performance gain isn't surprising.

I guess I ought to repeat the comparison in x86_64 mode...

R.