[FFmpeg-devel] [PATCH] avoid mb_xy recalculation in h264

Alexander Strange astrange
Thu May 8 04:42:04 CEST 2008

On May 7, 2008, at 3:15 PM, Michael Niedermayer wrote:

> On Wed, May 07, 2008 at 02:49:20PM -0400, Alexander Strange wrote:
>> On May 7, 2008, at 8:53 AM, Michael Niedermayer wrote:
>>> On Wed, May 07, 2008 at 03:05:47AM -0400, Alexander Strange wrote:
>>>> This line appears in a lot of h264.c functions:
>>>> const int mb_xy= s->mb_x + s->mb_y*s->mb_stride;
>>>> It only changes once per MB, so we can add it to the context and  
>>>> only
>>>> recalculate it in decode_slice().
>>>> I put it at the end of H264Context, but it could be moved up  
>>>> earlier if
>>>> someone likes it there. The position doesn't seem to affect  
>>>> performance
>>>> much; it can't be moved next to mb_x/mb_y since that would put it  
>>>> in
>>>> MpegEncContext, and none of the other codecs use it.
>>>> Patches:
>>>> 1- does that, updating it for MBAFF when needed
>>>> 2- removes some newly unused variables
>>>> 3- does the same thing to svq3.c
>>>> Before: avg 8.799s max 8.8s min 8.794s
>>>> After: avg 8.645s max 8.651s min 8.641s
>>> Knowing the cpu and compiler would also be usefull.
>> gcc 4.0.1, core 2 x86-32 Darwin
>> I guess it's about the same as the usual distro compiler, but this  
>> isn't
>> really a sensitive messing-with-the-register-allocator patch.
> How does the change compare to a splited (litterally duplicated)
> decode_cabac_residual()
> with the mb_xy only calculated for the "cat"s which actually need it?
> That is one for cat=0/3 and one for the others
> or
> with the mb_xy calc moved in the if(cat=0/3) which use it?

Actually, the compiled decode_cabac_residual is barely different at  
all. gcc loads h->mb_xy, immediately spills it to the same place in  
the stack, and the rest of the function is the same. I suppose  
changing mb_xy to h->mb_xy for the DC cats might be an improvement.

Diffing the asm shows:
* less imull
* hl_motion is different (mb_x load moved into an if())
* write_back_intra_pred_mode() is inlined and removed
* slightly different stack sizes (hard to know if that means anything)

Making write_back_intra_pred_mode() always_inline doesn't change cavlc  
time, but explains about half the speed change for cabac.

> Iam not at all against putting mb_xy in the context but i somehow  
> doubt
> that this is what causes the speed difference.
> Also it would be very nice if you had a few more compilers installed
> if you want to do more optimizations & benchmarks.

Yeah, I'll make a linux vm for gcc 2/3/the other ccs - the 64bit one I  
have now doesn't support so many compilers. If they don't inline that  
after the patch (4.4 inlines it as it is) I'll mess with always_inline  
on some of the smaller functions.

More information about the ffmpeg-devel mailing list