[Ffmpeg-devel] [PATCH] h264 optimization: common case hl_decode_mb

Michael Niedermayer michaelni
Fri Feb 23 12:25:09 CET 2007


On Fri, Feb 23, 2007 at 02:08:26AM -0500, Alexander Strange wrote:
> I noticed that hl_decode_mb is near the top of profiling the h264  
> decoder and is full of huge conditionals.
> This patch copies the function, with a new version that runs for the  
> common case: no interlacing, grayscale decoding disabled, not  
> encoding, and not decoding SVQ3.
> It has a very small, but significant speed gain on my test video,  
> which is 1080p and 1.2MBit with I/P frames:
> BENCHMARKs: VC:  25.189s VO:   1.906s A:   0.000s Sys:   0.181s =    
> 27.277s
> BENCHMARKs: VC:  25.188s VO:   1.889s A:   0.000s Sys:   0.180s =    
> 27.257s
> BENCHMARKs: VC:  25.195s VO:   1.897s A:   0.000s Sys:   0.181s =    
> 27.273s
> BENCHMARKs: VC:  25.192s VO:   1.898s A:   0.000s Sys:   0.182s =    
> 27.271s
> avg 25.101 +/- .003162
> BENCHMARKs: VC:  24.926s VO:   1.903s A:   0.000s Sys:   0.182s =    
> 27.010s
> BENCHMARKs: VC:  24.927s VO:   1.903s A:   0.000s Sys:   0.182s =    
> 27.012s
> BENCHMARKs: VC:  24.926s VO:   1.900s A:   0.000s Sys:   0.182s =    
> 27.008s
> BENCHMARKs: VC:  24.924s VO:   1.898s A:   0.000s Sys:   0.181s =    
> 27.003s
> avg 24.9258 +/- .001258

nice :)

> This is a 2.16GHz Intel Core Duo, so I expect most other people will  
> see a bigger change.
> hl_decode_mb_simple is 880 instructions vs. 2018 for the general one.
> _simple inlines backup_mb_border and xchg_mb_border, which still have  
> checks for grayscale. For some reason when I removed them it actually  
> got slower. I guess this is because it gives gcc's register allocator  
> more live variables at once?
> Any comments on this are appreciated.

ok, first, tabs are forbidden in svn
second, could you try something like:

static always_inline hl_decode_mb_internal(H264Context *h, int complex){
        interlacing and other complex code
    if( ...

static hl_decode_mb_simple(H264Context *h){
    hl_decode_mb_internal(h, 0);

static hl_decode_mb_complex(H264Context *h){
    hl_decode_mb_internal(h, 1);

that prevents code duplication (which is definitly bad for the already pretty
large h264.c)

or even keeping a single hl_decode_mb() but spliting the mbaff out
into other av_noinline functions (though this might have a negative
impact on the mbaff speed?

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070223/171d1e7b/attachment.pgp>

More information about the ffmpeg-devel mailing list