[FFmpeg-devel] h264 speed regression after PAFF

Michael Niedermayer michaelni
Sat Oct 13 03:08:12 CEST 2007


Hi

On Sat, Oct 13, 2007 at 01:57:44AM +0300, Uoti Urpala wrote:
> On Fri, 2007-10-12 at 08:51 -0400, Jeff Downs wrote:
> > On Fri, 12 Oct 2007, Andreas man wrote:
> > > After the PAFF code got in h264 is about 3% slower for non-interlaced
> > > content. See test below.
> 
> You need to be careful when benchmarking though. I've seen code changes
> in other translation units which did not involve any code running during
> the benchmark cause 3% speed changes in h264 decoding. I haven't found
> out the cause but I guess the most likely reason is some cache alignment
> effect. So just because a change makes the benchmark run 3% slower or
> faster does not necessarily mean that real h264 decoding performance
> changed either way.
> 
> > If anyone doesn't need interlace support and wants speedup, they can undef 
> > interlaced support for now (not terribly tested, but should behave as 
> > before).
> 
> I've been using MPlayer with ALLOW_INTERLACE undefined with h264.h, and
> it's been working fine.
> 
> 
> I'm attaching some speedup patches I've used. I cleaned them up a bit
> but they're still not meant to be directly committable. At some earlier
> point they had a significantly bigger effect. Maybe latest gcc 4.2
> updates got smarter or the current code just doesn't happen to trigger
> the worst inlining decisions.
> 

> Patch 1 adds av_noinline to some functions in dsputil_mmx.c.

this seems to just affect mpeg4 ASP related functions not h.264


> 
> Patch 2 marks various h264 functions as either av_always_inline or
> av_noinline. Many of those changes probably have no effect; I didn't try
> to minimize the amount of changed functions.

this looks very interresting, someone though should split it and benchmark
each change individually


> 
> Patch 3 cleans up some of the asm in cabac.h (the HAVE_FAST_CMOV case I
> use on my own machine). It's not primarily intended as a speedup patch
> but did seem to make the code a bit faster. It adds proper dependencies
> to the asm statements and removes the hacks used to work around the lack
> of those (MANGLE, "memory" clobbers, attribute_used, volatile
> qualifiers). There are a few ways how that can allow slightly better
> code generation. "memory" clobbers can prevent optimization. Using
> "+m"(*c) instead of "r"(c) where c is H264Context->cabac allows
> addressing fields directly as offsets from the H264Context pointer
> (which is probably already in a register) instead of first calculating
> H264Context->cabac outside the asm.

this one breaks gcc 2.95 so it cannot be used in its current form
also ive seen some tabs in there which arent allowed in ffmpeg svn

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071013/07b57e45/attachment.pgp>



More information about the ffmpeg-devel mailing list