[FFmpeg-devel] [PATCH] Add check for Athlon64 and similar AMD processors with slow SSE2.
Ronald S. Bultje
Sun Feb 6 02:46:23 CET 2011
On Fri, Feb 4, 2011 at 1:03 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
> On 02/04/2011 12:27 PM, Ronald S. Bultje wrote:
>> I'm not against the original idea of reusing SSE2SLOW, just make sure
>> it's properly documented.
>> - SSE2 - CPU supports good SSE2
>> - SSE2SLOW (core1 etc.) - CPU supports SSE2 in theory but it's almost
>> always slower - only set SSE2 functions if explicitely tested to be
>> - SSE2|SSE2SLOW (athlon64 etc.) - CPU supports SSE2 but it's
>> occasionaly slower - don't set SSE2 functions if explicitely tested to
>> be slower
>> And I thought that's what your patch did.
> It did. But I think it made one of the flag checks more complicated.
> all sse2:
> flags & (SSE2 | SSE2SLOW)
> exclude core 1 only:
> flags & SSE2
> exclude core 1 and athlon64:
> (flags & SSE2) && !(flags & SSE2SLOW)
> (flags & (SSE2 | SSE2SLOW)) ^ SSE2SLOW
flags & (SSE2|SSE2SLOW) == SSE2,
(^ SSE2SLOW only flips the slow bit, and then if either bit is non-zero, etc.)
> exclude athlon64 only:
> (flags & (SSE2 | SSE2SLOW)) && !(flags & SSE2 && flags & SSE2SLOW)
> (flags & (SSE2 | SSE2SLOW)) ^ (SSE2 | SSE2SLOW)
> The first 3 are self-explanatory, but the last case is not.
I don't think it matters. When would you ever want to exclude
Athlon64, but not Core1?
> With an added flag for AMD it becomes:
> (flags & (SSE2 | SSE2SLOW | AMDSSE2SLOW)) && !(flags & AMDSSE2SLOW)
> (flags & (SSE2 | SSE2SLOW | AMDSSE2SLOW)) ^ AMDSSE2SLOW
> If the first way seems ok to anyone else and/or the case is probably not
> common enough to worry about, then I can resend the first patch with the
> AMD vendor string check.
I preferred that one, but please document the above carefully so it's
clear how to use it.
More information about the ffmpeg-devel