[Ffmpeg-devel] [MacIntel] Testers welcome!

Mon Aug 14 16:06:57 CEST 2006

On Mon, 14 Aug 2006, Marco Manfredini wrote:
> On Sunday 13 August 2006 04:46, John Dalgliesh wrote:
>> If you want to continue the investigation instead, then please be my
>> guest! :)
>
> Hi John,
>
> It looks like the compiler has a personal feud with 'i' in
> ff_snow_horizontal_compose97i_sse and tries to kill it as best as he can.

Well in general if the compiler wants to optimise it out I would say 
that's a good thing...

> Getting i in again is not so easy:
>
> For example, none of these work
[snip]
> These work:
[snip]
> my copy of ffmpeg passes the tests after this modification.

OK I think you are missing a couple of steps of explanation here. Why do 
you care about that variable? Or are you implying that because it has 
reclaimed the stack space originally intended for 'i', that's why the 
var-len array isn't 16-byte aligned - despite the optimiser's assumption? 
If so, then please say so explictly; it's not at all clear to me that this 
is the reasoning you're following.

> Potentially bad is, that both solutions force 'i' into memory after
> declaration.

Yes if it affects the optimisation of the routine I don't think it'll be 
acceptable.

My main question is: Did my workaround work for you? The code obviously 
already does not expect temp_buf to be aligned, that's why it calculates 
temp at some offset into temp_buf. If think if the reason for the problem 
is found and addressed, then a simple patch like that workaround would be 
accepted.

The approach should be to either reproduce the behaviour in a simpler test 
case, and figure out which compilers it breaks on, or find the gcc bug / 
patch / changelog entry where it is fixed. It doesn't happen for me with 
gcc4.0.3 on linux ... but there are too many variations there to say that 
it has been fixed by 4.0.3.

I've attached my workaround as a patch to this email so you don't have to 
go hunting for it.

> Marco

{P^/
-------------- next part --------------
Index: snowdsp_mmx.c
===================================================================

--- snowdsp_mmx.c	(revision 5992)
+++ snowdsp_mmx.c	(working copy)
@@ -25,7 +25,7 @@
     const int w2= (width+1)>>1;
     // SSE2 code runs faster with pointers aligned on a 32-byte boundary.
     DWTELEM temp_buf[(width>>1) + 4];
-    DWTELEM * const temp = temp_buf + 4 - (((int)temp_buf & 0xF) >> 2);
+    DWTELEM * const temp = (DWTELEM*)(((intptr_t)&temp_buf[4])&~0xF);
     const int w_l= (width>>1);
     const int w_r= w2 - 1;
     int i;