[FFmpeg-devel] [PATCH] Port extra x264 CPU detection code

Michael Niedermayer michaelni
Sat Jan 10 03:21:43 CET 2009


On Wed, Jan 07, 2009 at 11:46:59AM -0500, Jason Garrett-Glaser wrote:
> This patch adds two features from x264:
> 
> 1.  Completely disable SSE2 on Core 1 and Pentium-M CPUs (detected
> using family/model/stepping).  These CPUs are so slow at SSE2 that it
> is almost universally slower than MMX.  Even with x264's enormous
> library of asm functions, only a single one turned out to be faster on
> SSE2 than MMX, and only by a few clocks, so we simply pretend that
> these CPUs do not have SSE2 at all.
> 
> Yes, these CPUs have much slower SSE2 than even the Athlon 64.
> 
> 2.  Replace 3DNOW with SSE2_IS_SLOW when used for that purpose.  This
> is because the Phenom has 3DNOW, but it isn't slow at SSE2.

[...]

> @@ -75,7 +76,7 @@
>      if (a == c)
>          return 0; /* CPUID not supported */
>  
> -    cpuid(0, max_std_level, ebx, ecx, edx);
> +    cpuid( 0, max_std_level, vendor[0], vendor[2], vendor[1] );
             ^                                                 ^ 
Please dont add these spaces 


>  
>      if(max_std_level >= 1){
>          cpuid(1, eax, ebx, ecx, std_caps);
> @@ -90,9 +91,21 @@
>          if (ecx & 1)
>              rval |= FF_MM_SSE3;
>          if (ecx & 0x00000200 )
> -            rval |= FF_MM_SSSE3
> +            rval |= FF_MM_SSSE3;
>  #endif
> -                  ;

this will break compilation when HAVE_SSE is not set


> +        if( !strcmp((char*)vendor, "GenuineIntel") ){
> +            int family, model, stepping;
> +            family = ((eax>>8)&0xf) + ((eax>>20)&0xff);
> +            model  = ((eax>>4)&0xf) + ((eax>>12)&0xf0);
> +            stepping = eax&0xf;
> +            /* 6/9 (pentium-m "banias"), 6/13 (pentium-m "dothan"), and 6/14 (core1 "yonah")
> +             * theoretically support sse2, but it's significantly slower than mmx for
> +             * basically all functions, so let's just pretend they don't. */
> +            if( family==6 && (model==9 || model==13 || model==14) ){
> +                rval &= ~FF_MM_SSE2;
> +                assert(!(rval&FF_MM_SSSE3));
> +            }
> +        }
>      }
>  
>      cpuid(0x80000000, max_ext_level, ebx, ecx, edx);

i am not entirly happy about lying about the supported feature set.
Though iam not rejecting this, rather i abstain from approving it, 
if the others think this is ok so am i with it if not then not.

the rest looks ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090110/be1ca2f5/attachment.pgp>



More information about the ffmpeg-devel mailing list