[FFmpeg-devel] h264 speed regression after PAFF

Uoti Urpala uoti.urpala
Sat Oct 13 00:57:44 CEST 2007

On Fri, 2007-10-12 at 08:51 -0400, Jeff Downs wrote:
> On Fri, 12 Oct 2007, Andreas man wrote:
> > After the PAFF code got in h264 is about 3% slower for non-interlaced
> > content. See test below.

You need to be careful when benchmarking though. I've seen code changes
in other translation units which did not involve any code running during
the benchmark cause 3% speed changes in h264 decoding. I haven't found
out the cause but I guess the most likely reason is some cache alignment
effect. So just because a change makes the benchmark run 3% slower or
faster does not necessarily mean that real h264 decoding performance
changed either way.

> If anyone doesn't need interlace support and wants speedup, they can undef 
> interlaced support for now (not terribly tested, but should behave as 
> before).

I've been using MPlayer with ALLOW_INTERLACE undefined with h264.h, and
it's been working fine.

I'm attaching some speedup patches I've used. I cleaned them up a bit
but they're still not meant to be directly committable. At some earlier
point they had a significantly bigger effect. Maybe latest gcc 4.2
updates got smarter or the current code just doesn't happen to trigger
the worst inlining decisions.

Patch 1 adds av_noinline to some functions in dsputil_mmx.c.

Patch 2 marks various h264 functions as either av_always_inline or
av_noinline. Many of those changes probably have no effect; I didn't try
to minimize the amount of changed functions.

Patch 3 cleans up some of the asm in cabac.h (the HAVE_FAST_CMOV case I
use on my own machine). It's not primarily intended as a speedup patch
but did seem to make the code a bit faster. It adds proper dependencies
to the asm statements and removes the hacks used to work around the lack
of those (MANGLE, "memory" clobbers, attribute_used, volatile
qualifiers). There are a few ways how that can allow slightly better
code generation. "memory" clobbers can prevent optimization. Using
"+m"(*c) instead of "r"(c) where c is H264Context->cabac allows
addressing fields directly as offsets from the H264Context pointer
(which is probably already in a register) instead of first calculating
H264Context->cabac outside the asm.

Patch 4 just turns off interlace support (obvious not good for

Timings playing a CABAC h264 file in MPlayer:
standard FFmpeg: 13.46
patch 4: 13.31
patches 1-3: 13.31
patches 1-4: 12.89

For some reason the patches now have a much larger effect together than
separately. Also even completely disabling interlacing does not make a
3% speed difference, so the recent interlace changes did not cause a 3%
slowdown on my machine (unless they affect speed even with interlace
support disabled, but OTOH the speed with those patches applied has not
changed much recently so that doesn't seem likely).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-disable_dsputil_inlines.diff
Type: text/x-patch
Size: 17271 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071013/296b279c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2-force_h264_inlining.diff
Type: text/x-patch
Size: 17537 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071013/296b279c/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3-cleanup_cabac_asm.diff
Type: text/x-patch
Size: 10535 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071013/296b279c/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4-disable_interlace_support
Type: text/x-patch
Size: 355 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071013/296b279c/attachment-0003.bin>

More information about the ffmpeg-devel mailing list