[FFmpeg-devel] [PATCH] Channels correlation

Thu Oct 29 20:02:55 CET 2009

L'octidi 8 brumaire, an CCXVIII, Reimar D?ffinger a ?crit?:
> There is a loop over the samples around that, so you do
> loop over samples
>    loop over channel combinations
> I suggested to try
> loop over channel combinations
>     loop over samples
> The latter way should cause no extra cache pressure due to m, though
> it has a higher cache pressure due to reading the samples once per
> channel combination so it could be worse.
> It would however be much better if a planar channel layout was used -
> it might be worth investigating if you get better results if you split
> the interleaved channels into one array per channel at some other place.

Ok, I did not get that. I tried inverting the loops like that, but it
results in a significant slowdown.

> Though even if you don't want to do that least you could
> if you ensure that nc1 != 0 (probably a good idea anyway) do
> something like below, freeing the registers otherwise used for n1 and n2
> (or just use a variable n = FFMIN(n1, n2) )
> d1_end = d1 + nc1 * FFMIN(n1, n2);

I tried various variations on that theme, but none showed any enhancement
compared with the current code, and some simple changes that should
obviously speed up things ended up slowing them.

I think that I am interfering with the compiler optimizations, and that any
benchmark results would be highly sensible to the version of the compiler.

I quite like the current version because it makes the symmetrical role of
the various variables (d1 and d2, n1 and n2, i and j) obvious.

> And are nc1 and nc2 allowed to differ? I guess yes, but I am not sure.

Yes. The typical use for this code would be nc1 = 6 and nc2 = 2, to know how
surround content has been downmixed to stereo.

> No point in calculating more than one half of a self-correlation
> (and thus symmetric) matrix.

I did that optimization, as it shows a small but visible speedup.

For some reason, computing the lower half of the matrix, on the other hand,
causes a big slowdown. Again, I think this is too near the compiler
optimization process to make any conclusion: I expect that depending on the
architecture and compiler version, almost any version can be the fastest or
the slowest.

I also changed some variables names, hopefully for the better.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffmpeg-correl-20091029c.diff
Type: text/x-diff
Size: 12822 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091029/69da935a/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091029/69da935a/attachment.pgp>