[FFmpeg-devel] [RFC] clobbers for XMM registers

Ramiro Polla ramiro.polla
Thu Sep 30 19:48:37 CEST 2010

2010/9/30 M?ns Rullg?rd <mans at mansr.com>:
> Ramiro Polla <ramiro.polla at gmail.com> writes:
>> 2010/9/30 M?ns Rullg?rd <mans at mansr.com>:
>>> Alexander Strange <astrange at ithinksw.com> writes:
>>>> On Thursday, September 30, 2010, M?ns Rullg?rd <mans at mansr.com> wrote:
>>>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>>>>> 2010/9/28 M?ns Rullg?rd <mans at mansr.com>:
>>>>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>>>>> On Tue, Sep 28, 2010 at 09:36:40AM -0400, Ronald S. Bultje wrote:
>>>>>>>>> On Tue, Sep 28, 2010 at 8:34 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>>>>>>> > you want to execute code from vp3dsp_sse2.c on a pre SSE cpu?
>>>>>>>>> All _sse2 files are templates files that are included in dsputil_mmx.c
>>>>>>>>> or similar.
>>>>>>>> we could add the flags to dsputil_mmx then
>>>>>>> That would allow the compiler to use SSE instructions in functions
>>>>>>> that should be MMX only.
>>>>>> I'm gonna start kicking this subject until it's solved. Come on guys,
>>>>>> keep this moving. Why don't we make it (the clobbering) a macro and
>>>>>> only enable this on x86-64. Don't forget all xmm registers are
>>>>>> caller-save on x86-32 and x86-64 has no issues with marking clobbers
>>>>> The issue is not fundamentally about caller vs callee saved
>>>>> registers. ?It is about telling the compiler which registers are
>>>>> clobbered, so that it can save and restore them if necessary.
>>>>> The missing clobber lists caused the FFT to fail with suncc, despite
>>>>> all the used registers being caller-saved. ?Apparently the compiler
>>>>> was using them for something outside the asm block.
>>>>>> (and even if it did, -msse is fine, there is no single x86-64 CPU that
>>>>>> does not support SSE). We could consider making it as simple as :::
>>>>>> CLOBBER_IF_X86_64("%xmm6", "%xmm7",) "%eax" which evaluates to the
>>>>>> string in it (including commas) on x86-64 and nothing on x86-32 (and
>>>>>> omit the comma if that's the only thing in the clobberlist).
>>>>> We obviously need a conditional of some kind, but it should be tested
>>>>> in configure and applied whenever the compiler recognises xmm registers.
>>>>> It is, however, not quite as straight forward as you make it out.
>>>>> Stray commas are not allowed, nor is an empty list.
>>>>> One possible solution is to have the macro always include "cc". ?Most
>>>>> of the asm blocks do clobber the condition flags, and for any that do
>>>>> not, it is unlikely to make any difference. ?It also seems that
>>>>> including the stack pointer in the clobber list is ignored, although
>>>>> relying on this seems dubious at best.
>>>> asm blocks always clobber cc whether or not you put it in the list, so
>>>> the "cc" clobber is a no-op.
>>> In that case always adding it is certainly harmless, and allows a
>>> single macro to be used.
>> What about
>> # ? ?define XMM_CLOBBERS(a, ...) __VA_ARGS__
>> #else
>> # ? ?define XMM_CLOBBERS(a, ...) a
>> #endif
>> to be used as in lavc/x86/fft_sse.c:
>> ? ? ? ? :"+r"(j), "+r"(k)
>> ? ? ? ? :"r"(output+n4), "r"(output+n4*3),
>> ? ? ? ? ?"m"(*m1m1m1m1)
>> ? ? ? ? XMM_CLOBBERS(, : "%xmm0", "%xmm1", "%xmm7")
>> ? ? );
> That falls over if any other clobbers are needed.

If any other clobbers are needed they could be written before the macro.

> ?Here's my idea:
> # ? define XMM_CLOBBERS(...) "cc", __VA_ARGS__
> #else
> # ? define XMM_CLOBBERS(...) "cc"
> #endif
> [...]
> __asm__ ("..." ::: XMM_CLOBBERS("%xmm0", "%xmm1"), "other");
> This macro can be called anywhere in a clobber list, and it requires
> no obscure syntax at call site. ?A comment about "cc" being a no-op
> might be in order, or course.

If you don't mind having "cc" there always, that seems fine to me too.
Should I send another patch or can you commit directly?

More information about the ffmpeg-devel mailing list