[FFmpeg-devel] [PATCH] ff_scalarproduct_float_sse

Michael Niedermayer michaelni
Wed Jan 20 22:53:20 CET 2010

On Wed, Jan 20, 2010 at 09:31:14PM +0000, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> > On Wed, Jan 20, 2010 at 02:48:57PM +0000, M?ns Rullg?rd wrote:
> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> 
> >> > On Tue, Jan 19, 2010 at 11:42:40PM -0500, Alex Converse wrote:
> >> >> This cause a >50% decrease in SBR decode time.
> >> >> 
> >> >> For the time being it can help in the other places where
> >> >> scalarproduct_float() is used.
> >> >> 
> >> >> Regards,
> >> >> Alex Converse
> >> >
> >> >>  dsputil_mmx.c    |    5 +++++
> >> >>  dsputil_yasm.asm |   25 +++++++++++++++++++++++++
> >> >
> >> > Would you mind to avoid yasm and use gcc asm instead ?
> >> >
> >> > I have no problem with yasm as such but gcc asm is more portable and
> >> > can be integrated with C code if we ever want that.
> >> 
> >> I have to disagree.  Just look at how many FATE targets broke with
> >> your change to h264_loop_filter_strength_mmx2 yesterday.  Several
> >> compilers are still failing to build it.
> >
> > what we had is called a syntax error, yasm wont do any better
> > if you make such errors, though yasm would more consistently fail i guess
> There was no syntax error.  A syntax error would have had gcc say
> "syntax error", which it didn't.  In fact, it compiled just fine on
> x86_64, only failing mysteriously on x86_32.  David then fixed it with
> gcc, leaving only icc and suncc failing.

8+1*(%blah) is a syntax error
so is
some versions of gas support it and depending on luck you might end
with 8+1*%m being substututed to 8+1*123(%blah) which isnt a syntax
error still davids code was not correct this has nothing to do with gcc
inline asm.
break the rules for yasm and it fails as well
that said its gas not gcc for which it is a syntax error

> > what we had before was too many complex memory operands, yasm does not
> > support that in the first place.
> Eh what?  Yasm is an assembler.  You do your own register allocation
> there.  That is why it is superior, among other reasons.

Nothing stops you from allocating your registers in gcc yourself either.

> > Summary, h264_loop_filter_strength_mmx2() is poorly implemented by having
> > loops in C and mixed with asm that expects the compiler to figure out how
> > to address complex pointers + - several indexes. Thats not how gcc asm
> > should be written IMHO. I dont think i wrote the original function, i just
> > fixed a bug in it related to B frames, ideally one should rewrite it with
> > all the loops being integrated into asm, this likely would also make it
> > faster and closer to how it would look in yasm
> So you want to use gcc as an assembler with the world's ugliest
> syntax?

no, i want 2 things mainly
1. i do want to be able to try to avoid the call overhead for some functions
that are called often (jasons idea of interleaving block decode with idct is
an example)
2. i dont want to maintain some of the more convolutedly optimized pieces of
yasm code. Yasm has powerull macro support and i think people overuse it
rendering code hard to understand for someone who is not the author

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100120/a72dda41/attachment.pgp>

More information about the ffmpeg-devel mailing list