[FFmpeg-devel] Inline ASM vs. Intrinsics

Michael Niedermayer michaelni
Fri May 11 22:12:35 CEST 2007


Hi

On Fri, May 11, 2007 at 09:10:26PM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > Hi
> >
> > On Fri, May 11, 2007 at 07:44:53PM +0100, M?ns Rullg?rd wrote:
> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> 
> >> > Hi
> >> >
> >> > On Fri, May 11, 2007 at 10:22:32AM -0400, Dave Dodge wrote:
> >> >> On Fri, May 11, 2007 at 02:06:11PM +0200, Guillaume POIRIER wrote:
> >> >> > Exactly. I wrongfully assumed that "register" keywork was honnored
> >> >> > with xmm/mm intrinsics, but I was wrong. It's simply ignored by ICC. I
> >> >> > don't know about GCC.
> >> >> 
> >> >> According to its documentation gcc also ignores the "register" storage
> >> >> class specifier, except in a few special cases:
> >> >> 
> >> >>   - when using asm in a declaration to explicitly specify which register.
> >> >>   - when using -O0.
> >> >>   - when using setjmp on certain rare target platforms.
> >> >> 
> >> >> Aside: on IA64 icc supports only intrinsics -- no inline assembly.  On
> >> >> the one hand IA64 assembly is so painful that you'd rarely want to
> >> >> write it manually anyway; but the downside is that the intrinsics
> >> >
> >> > IA64 is a complete failure with and without intrinsics
> >> 
> >> IA64 is a complete commercial failure.  Its performance is far better
> >> than any x86-based CPU of the same time period.  The main reason it
> >> failed was lack of good x86 emulation, and people insisting on
> >> continuing to run the same old rubbish non-portable software.
> >
> > really?  can you point to some benchmarks? (not from intel of
> > course) i thought it was significantly slower than compareable CPUs
> > (same time period and same price range) even when both run natively
> > compiled code (with similarly good compilers of course)
> 
> Sorry, I don't have any benchmarks handy.  I just remember playing
> with one in uni, and it was quite fast compared to the P3 machines
> also available, particularly for floating point.

ok


> 
> > and even if it was faster its was conceptually flawed (=missdesigned)
> > it tried to move things from runtime to compiletime which are not
> > "constant" but change depending on the data the code works on
> > and from what i remember its a nightmare for a compiler to generate
> > good code for it ...
> 
> And you're suggesting the x86 is not conceptually flawed?  Hanging on
> to a 30-year old design when much better ways are known seems like an
> unusually bad idea to me.

x86 is very flawed too sure

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070511/dee7591c/attachment.pgp>



More information about the ffmpeg-devel mailing list