[Ffmpeg-devel] [PATCH] h264 - simplify cabac/luma cbp

Alexander Strange astrange
Wed Mar 7 09:49:41 CET 2007


On Mar 1, 2007, at 10:13 AM, Michael Niedermayer wrote:

> please redo the -benchmark test the difference of 0.5% is fairly small
> and i think 5 runs on each are too little to be sure also iam  
> concerned
> that the START_TIMER test reports contradicting results
> also add START_TIMER outside decode_mb_cabac() to reduce the chance of
> it affecteing the generated code, that is
>
> START_TIMER
> decode_mb_cabac
> STOP_TIMER
>
> also please try
>
> START_TIMER
> decode_mb_cabac
> if(!h->prev_mb_skipped){
>     STOP_TIMER
> }
>
> and maybe both with av_noinline decode_mb_cabac

I reran the tests with timers around the call to decode_mb_cabac()  
(where they were before) and around cbp_luma (which is the one that  
actually changed).

3863 dezicycles in decode_cabac_mb_cbp_luma, 524223 runs, 65 skips
26126 dezicycles in decode_mb_cabac, 1048231 runs, 345 skips
3849 dezicycles in decode_cabac_mb_cbp_luma, 1048453 runs, 123 skips
28011 dezicycles in decode_mb_cabac, 2096030 runs, 1122 skips
3857 dezicycles in decode_cabac_mb_cbp_luma, 2096927 runs, 225 skips
29255 dezicycles in decode_mb_cabac, 4192384 runs, 1920 skips
3857 dezicycles in decode_cabac_mb_cbp_luma, 4193961 runs, 343 skips

3733 dezicycles in decode_cabac_mb_cbp_luma, 524239 runs, 49 skips
26347 dezicycles in decode_mb_cabac, 1048321 runs, 255 skips
3728 dezicycles in decode_cabac_mb_cbp_luma, 1048495 runs, 81 skips
28233 dezicycles in decode_mb_cabac, 2096209 runs, 943 skips
3732 dezicycles in decode_cabac_mb_cbp_luma, 2097013 runs, 139 skips
29471 dezicycles in decode_mb_cabac, 4192594 runs, 1710 skips
3727 dezicycles in decode_cabac_mb_cbp_luma, 4194045 runs, 259 skips

Results with !h->prev_mb_skipped for STOP_TIMER are the same, with a  
lot more skips. Disabling one or the other timer lowers the numbers  
for the remaining one, but with the same ratio.

So it did get faster, but unrelated code then slowed down to compensate.

Since at least some part of it speeds up, I think the rest is due to  
unrelated effects with the gcc optimizations, which move code blocks  
around quite a lot in large branchy functions. I'll try simplifying  
the rest of the control flow and hopefully that will fix it.




More information about the ffmpeg-devel mailing list