[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Måns Rullgård mans
Wed Feb 27 23:59:52 CET 2008

Michael Niedermayer <michaelni at gmx.at> writes:

> On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> > On Wed, Feb 27, 2008 at 03:29:56PM -0500, Alexander Strange wrote:
>> >> I don't think anyone can get Altivec asm to work better than
>> >> intrinsics on more than one CPU - PPC is really, really
>> >> scheduling-sensitive, especially the G5 and Cell.
>> >
>> > Until i see benchmarks id guess gcc+intrinsics will be slower than
>> > unsheduled naively written asm()
>> That depends on the CPU.  Some CPUs are quite particular about
>> instruction scheduling.
> That is true but can gcc schedule instructions properly on these cpus?

Well, setting the right -march/-mcpu flags can make a huge difference.

> Also the real question is can gcc beat a human in instruction scheduling ;)

Probably not, but that's not the point.  By that reasoning, we should
be writing all code in assembler, and have one version for every
variant of every CPU.

In theory, a compiler can schedule intrinsics according to the
currently targeted CPU.  Plain assembler cannot be reordered to
improve scheduling.

>> >> I guess you can always try, though, but don't do anything to
>> >> discourage people who know altivec from adding more. There's still a
>> >> lot missing from H.264.
>> >
>> > Code is either well written or should be rejected.
>> > Intrinsics != well written.
>> That's where you're wrong.  Code using intrinsics can be well-written.
> If the compiler generated optimal code to begin with there would be
> no need for asm/intrinsics. OTOH if it does not, using intrinsics is
> not that smart.

As long as we're using the C language, expressing algorithms in a way
that give a compiler even a remote chance to identify possibilities
for SIMD optimisation is next to impossible.  If we entertain the
notion of moving to a different high-level language, that language
would probably have features resembling these oh-so-hated intrinsics.

>> The problem is not the code, but the compiler.
>> I agree that if the most commonly used compilers can't compile
>> intrinsics properly, plain assembler should be used.  I have no idea
>> whether this is the case for Altivec, and neither do you.
> I do know that gcc does quite stupid things on x86 be it when compiling C
> code or intrinsics. And i know that gcc is generally better at compiling
> x86 code than code for other less common architecures. Combining these
> does strongly point toward that the gap between intrinsics and asm will
> be bigger on ppc than x86 not smaller.
> Of course you are correct that i do not strictly "know" it. Its just VERY
> likely.

It is wrong to talk about the probability of something that has a
definite value, even if it has not yet been measured.  Probabilities
would be appropriate if speculating over the abilities of future GCC

> Also one can always write asm code that is as fast as intrinsic
> code, its not neccessarily possible to write intrinsics code that is
> as fast as asm.

One can write assembler that is as fast as intrinsics for *one* CPU
variant.  Even a moderately clever compiler may well compile the
intrinsics into code outperforming code that was hand-tuned for the
wrong CPU.

I believe that gcc generates bad code from intrinsics if people say
so, and since this appears to be the case, we should avoid using them
for now.

What I disagree with, is concluding from the current shortcomings of
gcc that intrinsics are inherently bad.  The idea is good.
Unfortunately, the gcc developers have done a remarkably bad job of
implementing it.

Extending your argument, one could say that a lot of software is bad,
therefore computers are bad.  Fortunately, badness propagates only one

M?ns Rullg?rd
mans at mansr.com

More information about the ffmpeg-devel mailing list