[FFmpeg-devel] [PATCH] Add check for Athlon64 and similar AMD processors with slow SSE2.
Fri Feb 4 19:03:25 CET 2011
On 02/04/2011 12:27 PM, Ronald S. Bultje wrote:
> On Thu, Feb 3, 2011 at 7:04 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
>> On 02/03/2011 01:58 AM, Jason Garrett-Glaser wrote:
>>> On Wed, Feb 2, 2011 at 3:26 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
>>>> This was ported from x264 so we need permission to relicense from
>>>> Loren, Jason, or whoever added this particular check in x264.
>>> That's one line of code, it's too trivial to matter.
>>> But keep in mind that SSE2SLOW seems to mean something different in
>>> ffmpeg. It's already used for the case of Core 1, where SSE2 is
>>> ALMOST ALWAYS slower than MMX. But on Athlon 64, it's mostly faster,
>>> just slower in some cases.
>> Yes, but it's still possible to reuse this flag because we disable
>> AV_CPU_FLAG_SSE2 in the case of Core 1. Although it would make things
>> slightly more complex in the case where you want to enable an SSE2
>> function on Core 1 but disable it on Athlon64. So I'll send a new patch
>> that adds a separate flag.
> I'm not against the original idea of reusing SSE2SLOW, just make sure
> it's properly documented.
> - SSE2 - CPU supports good SSE2
> - SSE2SLOW (core1 etc.) - CPU supports SSE2 in theory but it's almost
> always slower - only set SSE2 functions if explicitely tested to be
> - SSE2|SSE2SLOW (athlon64 etc.) - CPU supports SSE2 but it's
> occasionaly slower - don't set SSE2 functions if explicitely tested to
> be slower
> And I thought that's what your patch did.
It did. But I think it made one of the flag checks more complicated.
flags & (SSE2 | SSE2SLOW)
exclude core 1 only:
flags & SSE2
exclude core 1 and athlon64:
(flags & SSE2) && !(flags & SSE2SLOW)
(flags & (SSE2 | SSE2SLOW)) ^ SSE2SLOW
exclude athlon64 only:
(flags & (SSE2 | SSE2SLOW)) && !(flags & SSE2 && flags & SSE2SLOW)
(flags & (SSE2 | SSE2SLOW)) ^ (SSE2 | SSE2SLOW)
The first 3 are self-explanatory, but the last case is not. With an
added flag for AMD it becomes:
(flags & (SSE2 | SSE2SLOW | AMDSSE2SLOW)) && !(flags & AMDSSE2SLOW)
(flags & (SSE2 | SSE2SLOW | AMDSSE2SLOW)) ^ AMDSSE2SLOW
If the first way seems ok to anyone else and/or the case is probably not
common enough to worry about, then I can resend the first patch with the
AMD vendor string check.
More information about the ffmpeg-devel