[FFmpeg-devel] [PATCH] Fix for DECLARE_ALIGNED_8 macro on ARM (also prevent gcc 3.4.x ICE)

Sat Feb 16 14:49:13 CET 2008

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> Hello All,
>
> Currently DECLARE_ALIGNED_8 macro is quite misleading on ARM (it tries to
> enforce 4 byte alignment contarary to what a developer reading the 
> ffmpeg sources could naturally expect):
>
> ...
> /* This is to use 4 bytes read to the IDCT pointers for some 'zero'
>    line optimizations */
> #define DECLARE_ALIGNED_8(t, v) DECLARE_ALIGNED(4, t, v)
> #define STRIDE_ALIGN 4
> ...
>
> And it really causes problems with gcc 3.4.x:
> http://lists.mplayerhq.hu/pipermail/ffmpeg-user/2005-August/000930.html
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22177
>
> The offending line is the following:
>     DECLARE_ALIGNED_8 (uint64_t, aligned_bak[stride]);
>
> Apparently gcc got confused when asked to align uint64_t array at 4 byte
> boundary and died with ICE. Earlier I just thought 'WTF?' and removed
> DECLARE_ALIGNED_8 around that array (gcc should be clever enough to align 
> that array properly without any help anyway). Now it is clear that
> DECLARE_ALIGNED_8 macro definition triggered this problem.
>
> Modern ARM cores have instructions for reading/writing 64-bit data
> and require strict 8 byte alignment for using them. According to ARM
> manual, using LDRD instruction with the data that is not aligned at
> 8 byte boundary is UNPREDICTABLE and IMPLEMENTATION
> DEFINED. Unfortunately, both ARM cores that I have available (ARM9E
> and ARM11) require just 4 byte alignment for these instructions
> (ARM11 is faster when used with 8 byte aligned data though). So if
> there were any other problems related to the use of this macro, I
> could not observe them in my tests.
>
> Anyway, I think it is better to fix this macro even though the
> immediate visible effect is only the "out of the box" support for
> legacy compiler (gcc 3.4.x).

I agree.  Since some ARM variants might require strict alignment for
64-bit loads, we must provide this.

> The comment in the code was removed by patch as I don't see direct
> relation of this macro and IDCT, 16 byte alignment is supposedly
> guaranteed for IDCT according to the comments from dsputil.h, anyway
> feel free to prove me wrong:
>     void (*idct)(DCTELEM *block/* align 16*/);
>
> Index: libavcodec/dsputil.h
> ===================================================================
> --- libavcodec/dsputil.h	(revision 12120)
> +++ libavcodec/dsputil.h	(working copy)
> @@ -535,10 +535,8 @@
>  
>  #elif defined(ARCH_ARMV4L)
>  
> -/* This is to use 4 bytes read to the IDCT pointers for some 'zero'
> -   line optimizations */
> -#define DECLARE_ALIGNED_8(t, v) DECLARE_ALIGNED(4, t, v)
> -#define STRIDE_ALIGN 4
> +#define DECLARE_ALIGNED_8(t, v) DECLARE_ALIGNED(8, t, v)
> +#define STRIDE_ALIGN 8

Looks good to me.

-- 
M?ns Rullg?rd
mans at mansr.com