[FFmpeg-devel] [PATCHv2] avutil/lls: speed up performance of solve_lls

Wed Nov 25 04:13:22 CET 2015

This is a trivial rewrite of the loops that results in better
prefetching and associated cache efficiency. Essentially, the problem is
that modern prefetching logic is based on finite state Markov memory, a reasonable
assumption that is used elsewhere in CPU's in for instance branch
predictors.

Surrounding loops all iterate forward through the array, making the
predictor think of prefetching in the forward direction, but the
intermediate loop is unnecessarily in the backward direction.

Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
solve_lls, with START/STOP_TIMER. File is tests/data/fate/flac-16-lpc-cholesky.err.
Hardware: x86-64, Haswell, GNU/Linux.

new:
  17291 decicycles in solve_lls, 2096706 runs,    446 skips
  17255 decicycles in solve_lls, 4193657 runs,    647 skips
  17231 decicycles in solve_lls, 8384997 runs,   3611 skips
  17189 decicycles in solve_lls,16771010 runs,   6206 skips
  17132 decicycles in solve_lls,33544757 runs,   9675 skips
  17092 decicycles in solve_lls,67092404 runs,  16460 skips
  17058 decicycles in solve_lls,134188213 runs,  29515 skips

old:
  18009 decicycles in solve_lls, 2096665 runs,    487 skips
  17805 decicycles in solve_lls, 4193320 runs,    984 skips
  17779 decicycles in solve_lls, 8386855 runs,   1753 skips
  18289 decicycles in solve_lls,16774280 runs,   2936 skips
  18158 decicycles in solve_lls,33548104 runs,   6328 skips
  18420 decicycles in solve_lls,67091793 runs,  17071 skips
  18310 decicycles in solve_lls,134187219 runs,  30509 skips

Reviewed-by: Michael Niedermayer <michael at niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
---
 libavutil/lls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/lls.c b/libavutil/lls.c
index f77043b..4330697 100644
--- a/libavutil/lls.c
+++ b/libavutil/lls.c
@@ -55,7 +55,7 @@ void avpriv_solve_lls(LLSModel *m, double threshold, unsigned short min_order)
         for (j = i; j < count; j++) {
             double sum = covar[i][j];
 
-            for (k = i - 1; k >= 0; k--)
+            for (k = 0; k <= i-1; k++)
                 sum -= factor[i][k] * factor[j][k];
 
             if (i == j) {
@@ -71,7 +71,7 @@ void avpriv_solve_lls(LLSModel *m, double threshold, unsigned short min_order)
     for (i = 0; i < count; i++) {
         double sum = covar_y[i + 1];
 
-        for (k = i - 1; k >= 0; k--)
+        for (k = 0; k <= i-1; k++)
             sum -= factor[i][k] * m->coeff[0][k];
 
         m->coeff[0][i] = sum / factor[i][i];
-- 
2.6.2