[Ffmpeg-devel] Re: [Ffmpeg-cvslog] r8420 - trunk/libavcodec/dv.c

Rich Felker dalias
Mon Mar 26 07:10:16 CEST 2007

On Mon, Mar 26, 2007 at 07:33:01AM +0300, Uoti Urpala wrote:
> On Sun, 2007-03-25 at 20:57 -0700, Roman Shaposhnik wrote:
> >   Such files, for example, would be generated by gcc when you do -Os.
> gcc did that with -Os earlier, but I think that is considered to have
> been a bug in -Os. -Os does not change -mpreferred-stack-boundary in
> 4.1.2, and probably will not change it in future versions either.

And how do you have any guarantee about what version of gcc the code
calling libavcodec was compiled with? You don't. Even if you did,
-mpreferred-stack-boundary 4 would be in my CFLAGS for most code (or
more likely, my gcc spec file).

> -mstackrealign
>     Realign the stack at entry.  On the Intel x86, the -mstackrealign
>     option will generate an alternate prologue and epilogue that
>     realigns the runtime stack.  This supports mixing legacy codes that

Thanks. This is the first constructive post you've made in the whole
thread. Sadly, I doubt this works on all supported gcc versions.. Can
you find when it was added? (And before you flame I'm not just talking
about 2.95 - I don't care about this for myself because I don't use
SSE anyway - but about 3.x and early 4.x.)

>     keep a 4-byte aligned stack with modern codes that keep a 16-byte
>     stack for SSE compatibility.  The alternate prologue and epilogue
>     are slower and bigger than the regular ones, and the alternate pro?
>     logue requires an extra scratch register; this lowers the number of
>     registers available if used in conjunction with the "regparm"
>     attribute.  The -mstackrealign option is incompatible with the

Typical gcc propaganda-language documentation... In reality regparm is
not used (totally nonstandard ABI) and thus the scratch register for
prologue is irrelevant. I suspect the overhead is 2-3 opcodes, making
it irrelevant for the large sorts of functions that need aligned stack
variables. Or... does this option generate the (useless) prologue even
in functions that don't want or need the alignment?

> If you compile with the above option then I assume the code will work
> with any calling stack alignment but possibly with a noticeable
> performance penalty.

Feel free to post benchmarks. I suspect the difference will be
impossible to measure unless gcc is stupid like I speculated in my
last sentence above..

> Setting the corresponding function attribute only
> for functions that can be entry points from an external application
> would be more work but should eliminate most of the performance penalty.

This is a hideous hideous hack and should not be needed if
-mstackrealign is implemented correctly. Whether it is, I don't know..
Just putting the attribute on functions that use aligned stack
variables, on the other hand, would be fine. But putting it on all
externally-callable functions is a most disgusting case of nasty hacks
polluting the entire codebase.


More information about the ffmpeg-devel mailing list