[FFmpeg-devel] [PATCH] update doc/optimization.txt
Tue Sep 21 16:14:23 CEST 2010
On Tue, Sep 21, 2010 at 03:08:51PM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> > On Tue, Sep 21, 2010 at 09:48:43AM -0400, Ronald S. Bultje wrote:
> >> Hi,
> >> On Tue, Sep 21, 2010 at 9:30 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > On Tue, Sep 21, 2010 at 05:37:40AM -0700, Jason Garrett-Glaser wrote:
> >> >> > interresting strawman argument
> >> >> > noone was talking about cases that cannot easily be done in inline asm
> >> >> > not that calling from inline would be impossible or anything but i surely
> >> >> > agree that for these 0.1% of asm yasm is likely the better choice
> >> >>
> >> >> You mean this 99%. ?It's only 0.1% because you don't think about
> >> >> optimizations that can't be done under your current system.
> >> >
> >> > please elaborate on what other optimizations are possible in yasm that cannot
> >> > be done in inline asm.
> >> The biggest one is that I can create a "double-width" version in SSE*
> >> (usually SSE2) and a "single-width" version in MMX* (usually MMX2) of
> >> a function (e.g. subpel MC, weighted prediction, intra prediction, or
> >> something) in a single go. I don't need to write the function twice.
> >> Optimizing one will optimize both. This is incredibly handy if you're
> >> writing new asm code.
> > thats just a source difference through macro/preprocessor use not a
> > optimization that yasm can do that inline cannot.
> > And actually its unlikely that this is optimal. SSE2 and later cpus are
> > unlikely to have the same optimal instruction sequence that pre SSE2 cpus had.
> > so 2 functions do make sense.
> All SSE2 CPUs execute out of order,
iam not disputing that
> so the exact sequence doesn't matter.
i do dispute that
try to reorder instructions and benchmark them, youd be surprised
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel