[FFmpeg-devel] [PATCH] Fix ff_imdct_calc_sse() on gcc-4.6.

Sun Jan 30 11:23:21 CET 2011

On Jan 30, 2011, at 5:13 AM, Reimar D?ffinger wrote:

> On Sun, Jan 30, 2011 at 01:22:05AM -0800, Alex Converse wrote:
>> On Sun, Jan 30, 2011 at 1:08 AM, Alex Converse <alex.converse at gmail.com> wrote:
>>> 
>>> Gcc 4.6 only preserves the first value when using a vector with an "m"
>>> constraint.
>>> ---
>>>  libavcodec/x86/fft_sse.c |    4 ++--
>>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> 
>> 
>> oops this generates an extra indirection. Those of you who like to
>> defend inline asm, please step up and make some suggestions.
> 
> Use what would be your _only_ option (conceptually, not in implementation
> of course) if you didn't use inline asm:
> MANGLE and change DECLARE_ALIGNED to DECLARE_ASM_CONST
> There's also the option of a gcc bug report, I have some doubts that is
> a valid optimization (though there are constraints to make gcc load
> directly into a xmm register, but both that and the current code have
> needlessly unpredictable performance).

It is a valid optimization. (*m1m1m1m1) has type int and accesses the first element of the array only, so the rest is unused and can be removed. Adding __attribute__((used)) will not stop gcc from trying to do this.
The proper fix is to declare it as a SSE vector, since it is one, either by using the intrinsic headers or direct gcc-isms:

typedef int i4 __attribute__((vector_size(16)));
i4 m1m1m1m1 = (i4){ 1 << 31, 1 << 31, 1 << 31, 1 << 31 };

Using MANGLE will of course fix it too.

BTW, I noticed the bgr24ToUV_mmx template in swscale has the same bug with its use of ff_bgr24toUV.