[FFmpeg-devel] [PATCH 04/12] Add vector_fmul_matrix to dsputil
Måns Rullgård
mans
Wed Oct 21 00:06:09 CEST 2009
M?ns Rullg?rd <mans at mansr.com> writes:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
>> On Sun, Oct 18, 2009 at 11:29:22PM +0100, M?ns Rullg?rd wrote:
>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>
>>> > On Sun, Oct 18, 2009 at 10:13:20PM +0100, M?ns Rullg?rd wrote:
>>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>>> >>
>>> >> > On Sun, Oct 18, 2009 at 09:17:48PM +0100, M?ns Rullg?rd wrote:
>>> >> >> Michael Niedermayer <michaelni at gmx.at> writes:
>>> >> > [...]
>>> >> >> >> + }
>>> >> >> >> + } else {
>>> >> >> >> + for (i = 0; i < len; i++) {
>>> >> >> >> + const float *m = mtx;
>>> >> >> >> + for (j = 0; j < w; j++) {
>>> >> >> >> + float s = 0;
>>> >> >> >
>>> >> >> >> + for (k = 0; k < w; k++)
>>> >> >> >> + s += v[k][i] * *m++;
>>> >> >> >
>>> >> >> > this is quite inefficient because for(k) v[k][i] needs 2
>>> >> >> > memory reads a flat 2d array would be better
>>> >> >>
>>> >> >> And how will the data magically transform itself into such a layout?
>>> >> >
>>> >> > What is the a reason that the data is not in that layout? If
>>> >> > the awnser is that some decoder is implemenetd that way then
>>> >> > my next question is, would there be a disadvanatge in
>>> >> > changing it?
>>> >>
>>> >> Many of the audio decoders allocate the channels separately. I didn't
>>> >> write them, so I can't say how difficult it would be to change that.
>>> >
>>> > for many channels it should even be faster to memcpy them instead of the
>>> > double dereferences
>>> > memcpy needs O(w*len)
>>> > the dereferences are O(w*w*len)
>>>
>>> I don't expect w to be greater than 8.
>>> It will probably be 2 or 6 in most cases.
>>
>> for 6 channels we have 36 dereferences, a cpy copying just
>> 1 value at a time needs 6 reads and 6 writes to get rid of these 36
>> at that naive instruction counting level, it seems my suggesting
>> with copy is faster than yours without
>
> Can you please explain exactly what you're thinking of. I thought you
> were saying the audio channel data was to be moved such that all the
> channels would be contiguous in memory instead of passing pointer to
> each. Copying it into such a layout would require w*len operations,
> not w*w, and I still don't see how that would be massively more
> efficient. I also don't understand what 36 unnecessary dereferences
> you're talking about. The entire matrix must of course be read for
> each sample. We are doing len [1 x w]*[w x w] matrix multiplications.
Ping. I'd like to get this finished.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list