[FFmpeg-devel] [PATCH] some SIMD write-combining for h264

Loren Merritt lorenm
Sun Jan 17 23:53:48 CET 2010


On 17 Jan 2010, Alexander Strange wrote:
> On 16 Jan 2010, M?ns Rullg?rd wrote:
>> On 16 Jan 2010, Alexander Strange wrote:
>>
>>> +#define AV_COPY128 AV_COPY128
>>> +static inline void AV_COPY128(void *d, const void *s)
>>> +{
>>> +    typedef struct {uint64_t i[2];} v;
>>> +
>>> +    __asm__("movaps   %1, %%xmm0  \n\t"
>>> +            "movaps   %%xmm0, %0  \n\t"
>>> +            : "=m"(*(v*)d)
>>> +            : "m" (*(const v*)s)
>>> +            : "xmm0");
>>> +}
>>
>> I would have done something like this instead of the typedef:
>>
>>    struct { uint64_t i[2]; } *vd = d, *vs = s;
>>
>> Or maybe like this:
>>
>>    uint64_t (*vd)[2] = d;
>>    uint64_t (*vs)[2] = s;
>>
>> Examining asm is probably the best way of choosing.
>
> It doesn't affect it as long as it's the right size (in fact, even that doesn't seem to matter, char* gives 1 insn worse asm).

Wrong size will look like it works until you get a miscompilation due to 
gcc reordering a load/store to the part of the argument that you haven't 
told it you're accessing.
Same with wrong type, if your struct doesn't contain the same int type as 
the other access to the same data.

--Loren Merritt



More information about the ffmpeg-devel mailing list