[FFmpeg-devel] r9017 breaks WMA decoding on Intel Macs

Trent Piepho xyzzy
Fri Jun 1 05:55:01 CEST 2007

On Fri, 1 Jun 2007, Michael Niedermayer wrote:
> On Thu, May 31, 2007 at 05:30:16PM -0700, Trent Piepho wrote:
> [...]
> > > so i think you will agree with me that gcc does not gain any knowledge
> > > of the read/written memory locatons by introducing extra operands in this
> > > case ...
> > >
> > > and in the case that you agree we are back to the original:
> > > current code (breaks old buggy apple assembler) vs. more operands which might
> >
> > And uses incorrect syntax.
> maybe, its not overly relevant as we will likely not keep that code like
> it is, but still iam curious, could you quote the part of the manual which
> says its incorrect?

	"Infix operators" take two arguments, one on either side.

> > > break with old gcc vs. writing the loop in asm which will be faster,
> > > more compatible and wont depend on the moon phases when gcc was
> > > released
> >
> > I think you'll find writing that loop in asm to not be so easy.
> > You'll
> > need to have a lot of operands to the asm block,
> no, the 4 arguments passed to the function are sufficient
> iam not saying it should be implemented like that iam just proofing
> you wrong ...

Unless you want to hard code the structure offsets, and assume they don't
change with different compiler options, you'll need to add some "i"
arguments for them.  If you want to allocate stack space, you'll need to
make sure that none of the operands uses esp, or you can't access the
operand while the stack has been moved.  And allocating pointers on the
stack in an asm block in a way that works on both ia32 and x86-64 will be
hard to say the least.

> > And since it "might" break with old gcc it's impossible to do it.
> no even if the above false claims where true this is not true
> it would have to be tested like any other solution and if it works
> there would be no problem, or are you arguing that the solution we
> choose should not be tested?
> anyway i dont understand why every thread in which you participate
> degenerates into such trolling
> may i suggest that you reread your replies and think about the
> correctness of your statements before hitting the send button?

I think it's you who must disagree with anything I say, because I'm the one
who said it.  If the existing code just used two asm arguments when it
accessed two different elements of an array, and I posted a patch to get
rid of one and use a "4+%0" displacement, you would argue the exact
opposite point.

In fact, I'll do just that, it wasn't hard to find code that does it "my"
way and change it to "your" way.  It will be interesting to see the mental
gymnastics you use to prove I'm wrong both times.

Index: i386/dsputil_mmx.c
--- i386/dsputil_mmx.c	(revision 9165)
+++ i386/dsputil_mmx.c	(working copy)
@@ -2717,31 +2717,31 @@
                 "pmullw %%mm5, %%mm2 \n\t" // (s-dx)*dy
                 "pmullw %%mm4, %%mm1 \n\t" // dx*(s-dy)

-                "movd   %4,    %%mm5 \n\t"
-                "movd   %3,    %%mm4 \n\t"
+                "movd 1+%2,    %%mm5 \n\t"
+                "movd   %2,    %%mm4 \n\t"
                 "punpcklbw %%mm7, %%mm5 \n\t"
                 "punpcklbw %%mm7, %%mm4 \n\t"
                 "pmullw %%mm5, %%mm3 \n\t" // src[1,1] * dx*dy
                 "pmullw %%mm4, %%mm2 \n\t" // src[0,1] * (s-dx)*dy

-                "movd   %2,    %%mm5 \n\t"
+                "movd 1+%1,    %%mm5 \n\t"
                 "movd   %1,    %%mm4 \n\t"
                 "punpcklbw %%mm7, %%mm5 \n\t"
                 "punpcklbw %%mm7, %%mm4 \n\t"
                 "pmullw %%mm5, %%mm1 \n\t" // src[1,0] * dx*(s-dy)
                 "pmullw %%mm4, %%mm0 \n\t" // src[0,0] * (s-dx)*(s-dy)
-                "paddw  %5,    %%mm1 \n\t"
+                "paddw  %3,    %%mm1 \n\t"
                 "paddw  %%mm3, %%mm2 \n\t"
                 "paddw  %%mm1, %%mm0 \n\t"
                 "paddw  %%mm2, %%mm0 \n\t"

-                "psrlw    %6,    %%mm0 \n\t"
+                "psrlw    %4,    %%mm0 \n\t"
                 "packuswb %%mm0, %%mm0 \n\t"
                 "movd     %%mm0, %0    \n\t"

                 : "=m"(dst[x+y*stride])
-                : "m"(src[0]), "m"(src[1]),
-                  "m"(src[stride]), "m"(src[stride+1]),
+                : "m"(src[0]),
+                  "m"(src[stride]),
                   "m"(*r4), "m"(shift2)
             src += stride;
@@ -3139,11 +3139,11 @@
     for(i=0; i<len; i+=4) {
         asm volatile(
             "pf2id       %1, %%mm0 \n\t"
-            "pf2id       %2, %%mm1 \n\t"
+            "pf2id     8+%1, %%mm1 \n\t"
             "packssdw %%mm1, %%mm0 \n\t"
             "movq     %%mm0, %0    \n\t"
-            :"m"(src[i]), "m"(src[i+2])
+            :"m"(src[i])
     asm volatile("femms");
@@ -3153,11 +3153,11 @@
     for(i=0; i<len; i+=4) {
         asm volatile(
             "cvtps2pi    %1, %%mm0 \n\t"
-            "cvtps2pi    %2, %%mm1 \n\t"
+            "cvtps2pi  8+%1, %%mm1 \n\t"
             "packssdw %%mm1, %%mm0 \n\t"
             "movq     %%mm0, %0    \n\t"
-            :"m"(src[i]), "m"(src[i+2])
+            :"m"(src[i])
     asm volatile("emms");

More information about the ffmpeg-devel mailing list