[FFmpeg-devel] [PATCH] Channels correlation

Thu Oct 29 17:47:30 CET 2009

On Thu, Oct 29, 2009 at 05:08:51PM +0100, Nicolas George wrote:
> > I'd expect that it would be faster to swap those loops around, with the
> > n1/n2 stuff innermost since that means that m[...] could be kept in a
> > register in the innermost loop and also since nc1 and nc2 are probably
> > much smaller values this would have less branch mis-predictions.
> 
> I think you slightly misread the code, since each cell of m is accessed
> exactly once in these loops. What these loops do exactly is:
> 	for all 0 <= i < n1 and 0 <= j < n1
> 		M1[i, j] += d1[i] * d1[j];
> 	for all 0 <= i < n2 and 0 <= j < n1
> 		M2[i, j] += d2[i] * d1[j];

There is a loop over the samples around that, so you do
loop over samples
   loop over channel combinations
I suggested to try
loop over channel combinations
    loop over samples
The latter way should cause no extra cache pressure due to m, though
it has a higher cache pressure due to reading the samples once per
channel combination so it could be worse.
It would however be much better if a planar channel layout was used -
it might be worth investigating if you get better results if you split
the interleaved channels into one array per channel at some other place.

> +    while (n1 > 0 && n2 > 0) {
> +        double *m = state->matr_io;
> +        for (i = 0; i < nc1; i++)
> +            for (j = 0; j < nc1; j++)
> +                *(m++) += d1[i] * d1[j];
> +        for (i = 0; i < nc2; i++)
> +            for (j = 0; j < nc1; j++)
> +                *(m++) += d2[i] * d1[j];
> +        d1 += nc1;
> +        d2 += nc2;
> +        n1--;
> +        n2--;
> +    }

Though even if you don't want to do that least you could
if you ensure that nc1 != 0 (probably a good idea anyway) do
something like below, freeing the registers otherwise used for n1 and n2
(or just use a variable n = FFMIN(n1, n2) )
d1_end = d1 + nc1 * FFMIN(n1, n2);
while (d1 < d1_end) {
    double *m1 = state->matr_io;
    double *m2 = state->matr_io + nc1 * nc1;
    for (i = 0; i < nc1; i++) {
        double d1val = d1[i];
        for (j = 0; j < nc1; j++)
            *m1++ += d1val * d1[j];
        for (j = 0; j < nc2; j++)
            *m2++ += d1val * d2[j];
    }
    d1 += nc1;
    d2 += nc2;
}

Better variable names are very welcome, and note that it transposes
the correlation matrix.
And are nc1 and nc2 allowed to differ? I guess yes, but I am not sure.