[Ffmpeg-devel] benchmark of different CABAC routines
Wed Oct 11 20:30:24 CEST 2006
Results on AMD 2500+ for revision 6658.
These were done with the skip functionality in START/STOP_TIMER disabled
and repeated a few times because some versions seemed to have
consistently higher skip counts and I wanted to rule out biased results
because of that (and the skip functionality probably isn't reliable
anyway in this case since there is no expectation that every
decode_cabac_residual call would use a similar amount of time).
branchless cmov: 4890
branchless no-cmov: 4834
non-branchless C: 5058
non-branchless C, modified: 4996
branchless C: 5440
The modified non-branchless C version has
uint8_t tmp = s + 2;
if (tmp < 126)
s = tmp;
*state = s;
Writing it that way instead of the "s += 2; if (s < 128) *state = s"
which was there earlier (and was slower) makes gcc use cmov instead of a
branch and is faster.
More information about the ffmpeg-devel