[Ffmpeg-devel] benchmark of different CABAC routines

Uoti Urpala uoti.urpala
Wed Oct 11 20:30:24 CEST 2006

Results on AMD 2500+ for revision 6658.
These were done with the skip functionality in START/STOP_TIMER disabled
and repeated a few times because some versions seemed to have
consistently higher skip counts and I wanted to rule out biased results
because of that (and the skip functionality probably isn't reliable
anyway in this case since there is no expectation that every
decode_cabac_residual call would use a similar amount of time).

branchless cmov: 4890
branchless no-cmov: 4834
non-branchless: 4553
non-branchless C: 5058
non-branchless C, modified: 4996
branchless C: 5440

The modified non-branchless C version has

	uint8_t tmp = s + 2;
	if (tmp < 126)
	    s = tmp;
	*state = s;

instead of

        *state= ff_h264_mps_state[s];

Writing it that way instead of the "s += 2; if (s < 128) *state = s"
which was there earlier (and was slower) makes gcc use cmov instead of a
branch and is faster.

More information about the ffmpeg-devel mailing list