[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Måns Rullgård
mans
Wed Feb 27 23:59:52 CET 2008
Michael Niedermayer <michaelni at gmx.at> writes:
> On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>> > On Wed, Feb 27, 2008 at 03:29:56PM -0500, Alexander Strange wrote:
>> >> I don't think anyone can get Altivec asm to work better than
>> >> intrinsics on more than one CPU - PPC is really, really
>> >> scheduling-sensitive, especially the G5 and Cell.
>> >
>> > Until i see benchmarks id guess gcc+intrinsics will be slower than
>> > unsheduled naively written asm()
>>
>> That depends on the CPU. Some CPUs are quite particular about
>> instruction scheduling.
>
> That is true but can gcc schedule instructions properly on these cpus?
Well, setting the right -march/-mcpu flags can make a huge difference.
> Also the real question is can gcc beat a human in instruction scheduling ;)
Probably not, but that's not the point. By that reasoning, we should
be writing all code in assembler, and have one version for every
variant of every CPU.
In theory, a compiler can schedule intrinsics according to the
currently targeted CPU. Plain assembler cannot be reordered to
improve scheduling.
>> >> I guess you can always try, though, but don't do anything to
>> >> discourage people who know altivec from adding more. There's still a
>> >> lot missing from H.264.
>> >
>> > Code is either well written or should be rejected.
>> > Intrinsics != well written.
>>
>> That's where you're wrong. Code using intrinsics can be well-written.
>
> If the compiler generated optimal code to begin with there would be
> no need for asm/intrinsics. OTOH if it does not, using intrinsics is
> not that smart.
As long as we're using the C language, expressing algorithms in a way
that give a compiler even a remote chance to identify possibilities
for SIMD optimisation is next to impossible. If we entertain the
notion of moving to a different high-level language, that language
would probably have features resembling these oh-so-hated intrinsics.
>> The problem is not the code, but the compiler.
>>
>> I agree that if the most commonly used compilers can't compile
>> intrinsics properly, plain assembler should be used. I have no idea
>> whether this is the case for Altivec, and neither do you.
>
> I do know that gcc does quite stupid things on x86 be it when compiling C
> code or intrinsics. And i know that gcc is generally better at compiling
> x86 code than code for other less common architecures. Combining these
> does strongly point toward that the gap between intrinsics and asm will
> be bigger on ppc than x86 not smaller.
> Of course you are correct that i do not strictly "know" it. Its just VERY
> likely.
It is wrong to talk about the probability of something that has a
definite value, even if it has not yet been measured. Probabilities
would be appropriate if speculating over the abilities of future GCC
versions.
> Also one can always write asm code that is as fast as intrinsic
> code, its not neccessarily possible to write intrinsics code that is
> as fast as asm.
One can write assembler that is as fast as intrinsics for *one* CPU
variant. Even a moderately clever compiler may well compile the
intrinsics into code outperforming code that was hand-tuned for the
wrong CPU.
I believe that gcc generates bad code from intrinsics if people say
so, and since this appears to be the case, we should avoid using them
for now.
What I disagree with, is concluding from the current shortcomings of
gcc that intrinsics are inherently bad. The idea is good.
Unfortunately, the gcc developers have done a remarkably bad job of
implementing it.
Extending your argument, one could say that a lot of software is bad,
therefore computers are bad. Fortunately, badness propagates only one
way.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list