[Libav-user] gcc auto-vectorisation

Claudio Freire klaussfreire at gmail.com
Wed Feb 27 17:37:32 CET 2013

On Wed, Feb 27, 2013 at 1:31 PM, "René J.V. Bertin" <rjvbertin at gmail.com> wrote:
> For me this settles the question: better stick to not using auto-vectorisation esp. since it causes a few tests to fail.
> I have yet to test my modifications on MS Windows but I'd be willing to post a patch for this option (but also to admit it'd annoy me to have to adapt my cross-platform HR timing routines to ffmpeg naming conventions :( )
> Detailed benchmark results: (32 bit, MMX/SSE code, -fno-tree-vectorize)
>                    samples          user t        kernel t          real t           CPU %
> Video decode  :      85166         27.0846s        2.48361s        13.5333s        218.484%

Wait... 200%... what's your hardware like?

If by any chance you have Hyper Threading enabled (which is quite
likely), then I bet that's what the penalty is coming from (there's
only one SIMD execution unit, and thus no real parallelization of SIMD
code, whereas float code can be run in parallel with hand-optimized
SIMD code or other integer code).

More information about the Libav-user mailing list