[FFmpeg-devel] [PATCH] Fix apply_welch_window_sse2 compilation on Mac OS X/x86

Pierre d'Herbemont pdherbemont
Thu Oct 18 15:00:28 CEST 2007

On Oct 18, 2007, at 1:17 PM, Loren Merritt wrote:

> The whole point of splitting the asm block was to allow gcc to spill
> registers in between, because it doesn't have 6 general regs free. And
> look at your own disassembly: it did. So you jump from the 2nd asm  
> block
> to the 1st without running the appropriate spilling code, and run  
> the 1st
> block with the register values from the 2nd. Then you run the
> initialization code for the 2nd block again, which gcc expected to  
> only
> run once.

Sorry, I missed that. Thanks for the explanation.

> BTW, spilling shouldn't be needed. It's possible to write the loop  
> with 5
> regs, but that's slower than 6 if you have 6. Ideally I'd be able  
> write
> the loop part in C and gcc would use 5 or 6 regs for addressing  
> depending
> on what's available, but that's not what happens in practice.

I got your point. Here is one attempt at the 5 regs version. I guess  
that's not far from Thorsten Jordan's version.

Basically there are two negl added for one sub removed in the loop.  
So it's slower. I guess it would be nice to keep the 6 reg version  
around, or rewrite it in C as you proposed.


Index: libavcodec/i386/dsputil_mmx.c
--- libavcodec/i386/dsputil_mmx.c       (revision 10759)
+++ libavcodec/i386/dsputil_mmx.c       (working copy)
@@ -2967,7 +2967,6 @@
      double c = 2.0 / (len-1.0);
      int n2 = len>>1;
      long i = -n2*sizeof(int32_t);
-    long j =  n2*sizeof(int32_t);
      asm volatile(
          "movsd   %0,     %%xmm7 \n\t"
          "movapd  %1,     %%xmm6 \n\t"
@@ -2985,17 +2984,18 @@
          "movapd   %%xmm6,  %%xmm0   \n\t"\
          "subpd    %%xmm1,  %%xmm0   \n\t"\
          "pshufd   $0x4e,   %%xmm0, %%xmm1 \n\t"\
-        "cvtpi2pd (%4,%0), %%xmm2   \n\t"\
-        "cvtpi2pd (%5,%1), %%xmm3   \n\t"\
+        "cvtpi2pd (%3,%0), %%xmm2   \n\t"\
          "mulpd    %%xmm0,  %%xmm2   \n\t"\
+        "movapd   %%xmm2, (%1,%0,2) \n\t"\
+        "negl %0\n\t"\
+        "cvtpi2pd (%4,%0), %%xmm3   \n\t"\
          "mulpd    %%xmm1,  %%xmm3   \n\t"\
-        "movapd   %%xmm2, (%2,%0,2) \n\t"\
-        MOVPD"    %%xmm3, (%3,%1,2) \n\t"\
+        MOVPD"    %%xmm3, (%2,%0,2) \n\t"\
          "subpd    %%xmm5,  %%xmm7   \n\t"\
-        "sub      $8,      %1       \n\t"\
+        "negl %0\n\t"\
          "add      $8,      %0       \n\t"\
          "jl 1b                      \n\t"\
-        :"+&r"(i), "+&r"(j)\
+        :"+&r"(i)\
          :"r"(w_data+n2), "r"(w_data+len-2-n2),\
           "r"(data+n2), "r"(data+len-2-n2)\

More information about the ffmpeg-devel mailing list