[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2

Guillaume Poirier gpoirier
Thu Aug 24 18:35:07 CEST 2006


\o/ Rich is back to flaming mode!

Rich Felker wrote:
> On Thu, Aug 24, 2006 at 09:59:37AM +0200, Guillaume POIRIER wrote:
>>>Intrinsics are also gcc4-specific
>>False, They existed in 3.4 and I think in 3.3 also (I don't know about
>>earlier releases, but for sure 2.95 do not support them).
> Only gcc4 and later have the 3dnow intrinsics.

I wasn't specifically talking about 3dnow intrinsics...
You probably should back up your claims Rich, or precise that you made
smth up.

>>Also, ICC is able to process these intrinsics, whereas it has a hard
>>time with inline asm.
> Supporting ICC would be nice, but you can always compile with asm
> disabled.. Any viable compiler for high-performance needs to have full
> inline asm available, not just a limited set of intrinsics for vector
> ops.

ICC does support inline asm, it's just that they do not support
_GCC_'s syntax as well as they support intrinsics.
ICC do support inline asm.

>>Rich, you should really consider that some ppl aren't willing to spend
>>their youth on writting killer hand tuned asm code.
> It takes maybe 5-10 minutes more to write the obvious handwritten asm
> than to write the code with intrinsics, and performance should be same
> or better. If you want to make it even faster you may spend somewhat
> longer but your claims of "spending their youth" are exaggerated and
> misleading.

Well, you forgot to consider several things:
appropriate register allocation (gcc may not be to good at that, it's
still easier to write code with named variables rather than with
anonymous reg names).
appropriate scheduling (fair enough, GCC is not all that good at that,
but ICC is better)
appropriate clobbering of inputs
ah, I almost forgot:
writing a 2nd version of the code that _takes advantage_ of x86-64
(using REG_xx is cheating as you limit yourself to just half of the

>>PS: yes, I totally made up the above figures
> Obviously.

Well, you seem to do the same, so why shouldn't I?

BTW: I do not pretend that intrinsics is the perfect solution for
writing SIMD code, I'm just saying that they provide a different
approach to writing such code, which has its advantages and disadvantages.


More information about the ffmpeg-devel mailing list