[Ffmpeg-devel] Re: [Ffmpeg-cvslog] r8420 - trunk/libavcodec/dv.c

Trent Piepho xyzzy
Mon Mar 26 15:24:59 CEST 2007

On Mon, 26 Mar 2007, Rich Felker wrote:
> you find when it was added? (And before you flame I'm not just talking
> about 2.95 - I don't care about this for myself because I don't use
> SSE anyway - but about 3.x and early 4.x.)

It's new for gcc 4.2

> >     keep a 4-byte aligned stack with modern codes that keep a 16-byte
> >     stack for SSE compatibility.  The alternate prologue and epilogue
> >     are slower and bigger than the regular ones, and the alternate pro???
> >     logue requires an extra scratch register; this lowers the number of
> >     registers available if used in conjunction with the "regparm"
> >     attribute.  The -mstackrealign option is incompatible with the
> Typical gcc propaganda-language documentation... In reality regparm is
> not used (totally nonstandard ABI) and thus the scratch register for
> prologue is irrelevant. I suspect the overhead is 2-3 opcodes, making
> it irrelevant for the large sorts of functions that need aligned stack
> variables. Or... does this option generate the (useless) prologue even
> in functions that don't want or need the alignment?

It's a little more complex than that.

First, regparm _is_ used.  It's used by the Linux kernel by default.  It's
also entirely possible to declare only certain internal functions regparm
with function attributes, so it can still be used even through the ABI is
non-standard.  regparm isn't an all or nothing proposition.

In fact, gcc can do this automatically for static functions.  It will
inline them or call them with non-standard calling conventions if it thinks
that will result in better code.

Secondly, using the stack re-alignment will typically cost an extra
register for the entire function, not just the prologue.

Normally, both the function's arguments and local variables can be
addressed relative to ebp (or esp when using omit-frame-pointer).  When the
stack is re-aligned, the function arguments are before the alignment while
the locals are after the alignment.  They can't be addressed with the same
register; it takes two.

> > If you compile with the above option then I assume the code will work
> > with any calling stack alignment but possibly with a noticeable
> > performance penalty.
> Feel free to post benchmarks. I suspect the difference will be
> impossible to measure unless gcc is stupid like I speculated in my
> last sentence above..

The purpose of the patch to gcc was to allow library or kernel entry points
to be compiled with stack re-alignment.  gcc already had the ability to
re-align main(), the patch just added this to other functions.  It does not
automatically increase the stack alignment only as necessary when a
function has stack variables that need greater than default alignment.
That would be a nice feature, but it's not something gcc can do.  There is
a phrase I've heard on this list, "patches welcome," that is apropos here.

More information about the ffmpeg-devel mailing list