[FFmpeg-devel] [PATCH 04/12] Add vector_fmul_matrix to dsputil

Mon Oct 19 01:04:06 CEST 2009

On Sun, Oct 18, 2009 at 11:29:22PM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > On Sun, Oct 18, 2009 at 10:13:20PM +0100, M?ns Rullg?rd wrote:
> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> 
> >> > On Sun, Oct 18, 2009 at 09:17:48PM +0100, M?ns Rullg?rd wrote:
> >> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> > [...]
> >> >> >> +        }
> >> >> >> +    } else {
> >> >> >> +        for (i = 0; i < len; i++) {
> >> >> >> +            const float *m = mtx;
> >> >> >> +            for (j = 0; j < w; j++) {
> >> >> >> +                float s = 0;
> >> >> >
> >> >> >> +                for (k = 0; k < w; k++)
> >> >> >> +                    s += v[k][i] * *m++;
> >> >> >
> >> >> > this is quite inefficient because for(k) v[k][i] needs 2 memory reads
> >> >> > a flat 2d array would be better
> >> >> 
> >> >> And how will the data magically transform itself into such a layout?
> >> >
> >> > What is the a reason that the data is not in that layout?
> >> > If the awnser is that some decoder is implemenetd that way then my next
> >> > question is, would there be a disadvanatge in changing it?
> >> 
> >> Many of the audio decoders allocate the channels separately.  I didn't
> >> write them, so I can't say how difficult it would be to change that.
> >
> > for many channels it should even be faster to memcpy them instead of the
> > double dereferences
> > memcpy needs O(w*len)
> > the dereferences are O(w*w*len)
> 
> Please stop being ridiculous.

Please stop being rude

>  I don't expect w to be greater than 8.
> It will probably be 2 or 6 in most cases.

for 6 channels we have 36 dereferences, a cpy copying just
1 value at a time needs 6 reads and 6 writes to get rid of these 36
at that naive instruction counting level, it seems my suggesting
with copy is faster than yours without

> 
> Which do you prefer, getting 99% of the speed now, with little effort,
> or 99.5% of the speed at some indeterminate future time, when someone
> has rewritten all the code to use some other data layout, and you have
> reviewed and approved the rewrite?  

let us take 99.2% speed now with what i suggested and some code copying
the inefficiently stored data into more efficiently.
This allows decoders to be adapted one by one (or none ...) and each
can then just drop the extra step reaching 99.5%

> Your absurd requirements often
> make people give up and do nothing at all, which is usually the worst
> of all alternatives.  Is that the way you like it?

no, i like it if people talk with me if they see a problem, you do now
thats great. But it seems at least in the case here its just a
misunderstanding, i am not requireing any decoders to be changed for
this code to be commited.

> 
> > also, maybe mtx would be more convenient for SIMD if its transposed
> > before the function 
> 
> That's quite possible, but we can change that later.

true but the later its done, the more code has to be changed at once.
Now its just a unused dsp function that would need a trvial change.

Later if the decoders are adapted to the current API not only would
some possibly have to be changed back it also all woud have to be done
at once as things would break otherwise
I do think its better to decide
on which way the matrix is better stored now.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091019/1a25426e/attachment.pgp>