[FFmpeg-devel] [PATCH] Channels correlation
Reimar Döffinger
Reimar.Doeffinger
Thu Oct 29 17:47:30 CET 2009
On Thu, Oct 29, 2009 at 05:08:51PM +0100, Nicolas George wrote:
> > I'd expect that it would be faster to swap those loops around, with the
> > n1/n2 stuff innermost since that means that m[...] could be kept in a
> > register in the innermost loop and also since nc1 and nc2 are probably
> > much smaller values this would have less branch mis-predictions.
>
> I think you slightly misread the code, since each cell of m is accessed
> exactly once in these loops. What these loops do exactly is:
> for all 0 <= i < n1 and 0 <= j < n1
> M1[i, j] += d1[i] * d1[j];
> for all 0 <= i < n2 and 0 <= j < n1
> M2[i, j] += d2[i] * d1[j];
There is a loop over the samples around that, so you do
loop over samples
loop over channel combinations
I suggested to try
loop over channel combinations
loop over samples
The latter way should cause no extra cache pressure due to m, though
it has a higher cache pressure due to reading the samples once per
channel combination so it could be worse.
It would however be much better if a planar channel layout was used -
it might be worth investigating if you get better results if you split
the interleaved channels into one array per channel at some other place.
> + while (n1 > 0 && n2 > 0) {
> + double *m = state->matr_io;
> + for (i = 0; i < nc1; i++)
> + for (j = 0; j < nc1; j++)
> + *(m++) += d1[i] * d1[j];
> + for (i = 0; i < nc2; i++)
> + for (j = 0; j < nc1; j++)
> + *(m++) += d2[i] * d1[j];
> + d1 += nc1;
> + d2 += nc2;
> + n1--;
> + n2--;
> + }
Though even if you don't want to do that least you could
if you ensure that nc1 != 0 (probably a good idea anyway) do
something like below, freeing the registers otherwise used for n1 and n2
(or just use a variable n = FFMIN(n1, n2) )
d1_end = d1 + nc1 * FFMIN(n1, n2);
while (d1 < d1_end) {
double *m1 = state->matr_io;
double *m2 = state->matr_io + nc1 * nc1;
for (i = 0; i < nc1; i++) {
double d1val = d1[i];
for (j = 0; j < nc1; j++)
*m1++ += d1val * d1[j];
for (j = 0; j < nc2; j++)
*m2++ += d1val * d2[j];
}
d1 += nc1;
d2 += nc2;
}
Better variable names are very welcome, and note that it transposes
the correlation matrix.
And are nc1 and nc2 allowed to differ? I guess yes, but I am not sure.
More information about the ffmpeg-devel
mailing list