[FFmpeg-devel] Inline ASM vs. Intrinsics

Måns Rullgård mans
Fri May 11 22:10:26 CEST 2007

Michael Niedermayer <michaelni at gmx.at> writes:

> Hi
> On Fri, May 11, 2007 at 07:44:53PM +0100, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> > Hi
>> >
>> > On Fri, May 11, 2007 at 10:22:32AM -0400, Dave Dodge wrote:
>> >> On Fri, May 11, 2007 at 02:06:11PM +0200, Guillaume POIRIER wrote:
>> >> > Exactly. I wrongfully assumed that "register" keywork was honnored
>> >> > with xmm/mm intrinsics, but I was wrong. It's simply ignored by ICC. I
>> >> > don't know about GCC.
>> >> 
>> >> According to its documentation gcc also ignores the "register" storage
>> >> class specifier, except in a few special cases:
>> >> 
>> >>   - when using asm in a declaration to explicitly specify which register.
>> >>   - when using -O0.
>> >>   - when using setjmp on certain rare target platforms.
>> >> 
>> >> Aside: on IA64 icc supports only intrinsics -- no inline assembly.  On
>> >> the one hand IA64 assembly is so painful that you'd rarely want to
>> >> write it manually anyway; but the downside is that the intrinsics
>> >
>> > IA64 is a complete failure with and without intrinsics
>> IA64 is a complete commercial failure.  Its performance is far better
>> than any x86-based CPU of the same time period.  The main reason it
>> failed was lack of good x86 emulation, and people insisting on
>> continuing to run the same old rubbish non-portable software.
> really?  can you point to some benchmarks? (not from intel of
> course) i thought it was significantly slower than compareable CPUs
> (same time period and same price range) even when both run natively
> compiled code (with similarly good compilers of course)

Sorry, I don't have any benchmarks handy.  I just remember playing
with one in uni, and it was quite fast compared to the P3 machines
also available, particularly for floating point.

> and even if it was faster its was conceptually flawed (=missdesigned)
> it tried to move things from runtime to compiletime which are not
> "constant" but change depending on the data the code works on
> and from what i remember its a nightmare for a compiler to generate
> good code for it ...

And you're suggesting the x86 is not conceptually flawed?  Hanging on
to a 30-year old design when much better ways are known seems like an
unusually bad idea to me.

M?ns Rullg?rd
mans at mansr.com

More information about the ffmpeg-devel mailing list