[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6
Tue Aug 28 06:04:36 CEST 2007
On Tue, Aug 28, 2007 at 01:07:03AM +0200, Balatoni Denes wrote:
> Monday 27 August 2007 23:25-kor Michael Niedermayer ezt ?rta:
> > i suspect that this can be improved by slightly changing
> > some coefficients or bias ...
> What bias?
hmmm darn i hate sparc ...
i meant the bias which is added before shifting right at the end of the
well just forget that ...
> Change the coefficients how? I would be interested to know (really,
> I am curious).
well like adding or subtracting 1 and running dct-test
> > > > > ok, but then you should move the for up so its not immedeatly before
> > > > > a fcmpd using its result
> > > >
> > > > Ok, done.
> > >
> > > Well, I moved them back, because it broke sparse matrices.
> > well you do have to change the used register of course
> There are only two registers left. The code would look like crap.
> > well your current code mixes the even and odd calculations thus it would
> > require twice as many intermediates, a proper implementation would not
> > and thus would only need 4 registers to accumulate values until the
> > butterfly also 1 register would become available after each column thus
> > 0. column 9 registers available
> > 2. column 6 registers available
> > 4. column 7 registers available
> > 6. column 8 registers available
> > 1. column 9 registers available
> > 3. column 6 registers available
> > 5. column 7 registers available
> > 7. column 8 registers available
> Ok, I understand what you mean. I did some calculations. On the ultrasparc III
> (4 clock latency) about 14 clocks would be spent waiting - that's not too
> bad, that's still an 18 clock speed improvement. However on the ultrasparc T2
> (Niagara 2, 6 clock latency) about 36 clocks would be spent waiting - that
> would be slower than before the rewrite. So it's a bad idea.
well and what if you combine the code for 2 columns? that is 2 even ones
or 2 odd ones not even odd mix ...
> > > Anyhow, do as you wish, I am off to have dinner
> > this decission is easy, patch rejected
> You forgot to give a good reason, because your argument seems flawed.
the code is suboptimal speedwise and you try to convice me that it cant
be improved instead of trying to improve the code
your code does alot of stores which are followed by loads many of them
can be avoided with no changes to the available registers yet you dont
you rather concentrate on arguing what in your oppinion cant be done
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel