[Ffmpeg-devel] [RFC] svq1 very slow encoding

Sat Mar 31 11:06:57 CEST 2007

On Sat, 31 Mar 2007, Loren Merritt wrote:
> On Fri, 30 Mar 2007, Trent Piepho wrote:
> > On Thu, 29 Mar 2007, Loren Merritt wrote:
> >
> > static int ssd_int8_vs_int16_mmx(int8_t *pix1, int16_t *pix2, int size){
> > +    int sum;
> > +    long i=size;
> > +    asm volatile(
> > ...
> > +        "movd %%mm4, %1 \n"
> > +        :"+r"(i), "=r"(sum)
> > +        :"r"(pix1), "r"(pix2)
> > +    );
> > +    return sum;
> >
> > Shouldn't that be "+&r"(i)?
>
> Maybe. I've never used earlyclobber, so if it is needed then there's a
> whole bunch of theoretically incorrect asm in lavc.

In theory it's needed here.  With this exact code i is also an input and
gcc can't assume that i, pix1, or pix2 have the same value, and so it won't
be able to choose the same register.  But if you changed it just a bit..

+    int sum;
+    long i=size;
+    asm volatile(
+        "sub $8, %0 \n"
...
+        "movd %%mm4, %1 \n"
+        "test %4, $0x07"   // anything that uses %4
+        :"+r"(i), "=r"(sum)
+        :"r"(pix1), "r"(pix2), "r"(size)
+    );

That will miscompile, as gcc will almost certainly choose the same register
for %0 and %4.

> > On x86-64, could "int sum" be put in a 64-bit register?  Which would
> > generate something like "movd %mm4, %rax".  Don't have a 64-bit system, but
> > can you use movd with a 64-bit general purpose register?  If you can, isn't
> > it still wrong, since %rax will have garbage in the top 32 bits?
>
> int is 32bit, and the register name generated by a bare %1 is the
> same size as the value or variable it's associated with.

gcc will use 32-bit registers for 16 or 8 bit variables.

> All 32bit ops zero out the high bits of the destination. And even if they
> didn't (e.g. if sum was 16bit), gcc will add any necessary extension.

So movd %mm0, %rax will zero the high bits?  Because movd %mm0, %mm1 won't.
Or do you mean that movd %mm0, %eax will zero the high bits of %rax?

I know with 16-bit outputs, gcc always insists on zeroing the top 16-bits
after the asm block.  I don't remember a good way to avoid this, when you
know the result is already zeroed.