[FFmpeg-devel] [PATCH] update doc/optimization.txt

Ronald S. Bultje rsbultje
Tue Sep 21 02:03:29 CEST 2010

Hi Reimar,

On Mon, Sep 20, 2010 at 7:34 PM, Reimar D?ffinger
<Reimar.Doeffinger at gmx.de> wrote:
> While the "obvious" failures may be gone, proper clobber lists
> for inline asm are still necessary and some parts like
> merging asm block would have been quite independent of how
> you write it.

I agree partly with this. I did try the clobberlist-approach first, by
the way, but this broke fate on various BSDs and possibly other
systems. We could argue that we don't need to mark clobbers on
Unix-like systems but I believe that this is just asking for disaster
to hit us later on. We need something that works on all systems.
That's why I chose yasm as a tool, rather than inline asm. But this
could be fixed in inline asm also, I agree, and many functions still
need to be fixed. (I'm willing to (slowly, with proper testing)
convert each of these to yasm but I don't feel that'll be much
appreciated.) If someone will do the work to properly mark these
clobbers in inline asm and do the work to get that patch in SVN, I'm
totally fine with that. But the work needs to be done by someone.
That's as far as marking clobbers is concerned.

Now, as for merging blocks, I don't think the merging of asm blocks is
all that easy as you make it sound. Many of the asm blocks that I've
seen mix with C, e.g. like this:

for (x=0;x<n;x++)
if (some_cond)

and so on. Converting these to a single asm block is about as much
effort when you do it in yasm as it is when you do it in inline asm.
Why? It requires good testing and so on, especially
performance-related, because gcc is quite unpredictable at what it
decides to unroll and what-not, and some of these (think e.g. if each
loop iteration accesses a lookup table with the counter as index -
when unrolling the loop, you can suddenly "inline" the complete table
in the code) are highly speed-critical (see h264_idct_add8_sse2() for
an example of what I just said). (Note how I'm complimenting gcc here,
it actually did better than I did in my first try.)

However, we _have_ to convert the C loop to asm (either inline or
yasm), because else it's not a single block, and we thus _have_ to do
all this manual work to test performance before/after, make sure we
didn't screw up or introduce a bug or be silly while gcc was smarter
than us or re-roll an otherwise unrolled loop and so on. (And then the
cases where we were smarter than gcc are a gain for us.) In the end,
_this_ is where most of the time is spent, and that's the same for
whether you do it in inline asm or in yasm. Retyping a list of
commands from one syntax into another really doesn't take that much

> I think Michaels concern is that the chances for future optimization
> (e.g. by inlining more MMX/SSE functions, which particularly
> for x86_64 often should not be so hard) may have been decreased
> quite a bit.

If a function is inlined, or we believe inlining is preferred over
function pointer, then obviously inline asm is better, and we'll keep
it (e.g. all the arithcoder asm in cabac.h, or same for the VP8

For the functions that I converted, I don't believe any of them will
be inlined. In fact, I believe we'll use more function pointers
because of new instruction sets (e.g. Intel's new AVX). These we can't
inline if you want to keep supporting "older" x86-64 CPUs (which you
most likely do).

> Please don't feel like I'm bashing you, I find my quest to
> support nasm at least as questionable, I'd just like
> you to get a chance to see my line of thought.

I understand the concern and share some of it.

(And I hope nasm will eventually be supported, yasm is basically just
nasm-ng or so.)


More information about the ffmpeg-devel mailing list