[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6
Tue Aug 28 01:07:03 CEST 2007
Monday 27 August 2007 23:25-kor Michael Niedermayer ezt ?rta:
> i suspect that this can be improved by slightly changing
> some coefficients or bias ...
What bias? Change the coefficients how? I would be interested to know (really,
I am curious).
> > > > ok, but then you should move the for up so its not immedeatly before
> > > > a fcmpd using its result
> > >
> > > Ok, done.
> > Well, I moved them back, because it broke sparse matrices.
> well you do have to change the used register of course
There are only two registers left. The code would look like crap.
> well your current code mixes the even and odd calculations thus it would
> require twice as many intermediates, a proper implementation would not
> and thus would only need 4 registers to accumulate values until the
> butterfly also 1 register would become available after each column thus
> 0. column 9 registers available
> 2. column 6 registers available
> 4. column 7 registers available
> 6. column 8 registers available
> 1. column 9 registers available
> 3. column 6 registers available
> 5. column 7 registers available
> 7. column 8 registers available
Ok, I understand what you mean. I did some calculations. On the ultrasparc III
(4 clock latency) about 14 clocks would be spent waiting - that's not too
bad, that's still an 18 clock speed improvement. However on the ultrasparc T2
(Niagara 2, 6 clock latency) about 36 clocks would be spent waiting - that
would be slower than before the rewrite. So it's a bad idea.
> > Anyhow, do as you wish, I am off to have dinner
> this decission is easy, patch rejected
You forgot to give a good reason, because your argument seems flawed.
More information about the ffmpeg-devel