[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6

Balatoni Denes dbalatoni
Tue Aug 28 01:07:03 CEST 2007


Hi!

Monday 27 August 2007 23:25-kor Michael Niedermayer ezt ?rta:
> i suspect that this can be improved by slightly changing
> some coefficients or bias ...

What bias? Change the coefficients how? I would be interested to know (really, 
I am curious).

> > > > ok, but then you should move the for up so its not immedeatly before
> > > > a fcmpd using its result
> > >
> > > Ok, done.
> >
> > Well, I moved them back, because it broke sparse matrices.
>
> well you do have to change the used register of course

There are only two registers left. The code would look like crap.

> well your current code mixes the even and odd calculations thus it would
> require twice as many intermediates, a proper implementation would not
> and thus would only need 4 registers to accumulate values until the
> butterfly also 1 register would become available after each column thus
>
> 0. column 9 registers available
> 2. column 6 registers available
> 4. column 7 registers available
> 6. column 8 registers available
> 1. column 9 registers available
> 3. column 6 registers available
> 5. column 7 registers available
> 7. column 8 registers available

Ok, I understand what you mean. I did some calculations. On the ultrasparc III 
(4 clock latency) about 14 clocks would be spent waiting - that's not too 
bad, that's still an 18 clock speed improvement. However on the ultrasparc T2 
(Niagara 2, 6 clock latency) about 36 clocks would be spent waiting - that 
would be slower than before the rewrite. So it's a bad idea.

> > Anyhow, do as you wish, I am off to have dinner
>
> this decission is easy, patch rejected

You forgot to give a good reason, because your argument seems flawed.

bye
Denes




More information about the ffmpeg-devel mailing list