[FFmpeg-devel] [PATCH] SPARC VIS simple_idct

Balatoni Denes dbalatoni
Sat Aug 25 01:20:45 CEST 2007

Hi Michael!

Friday 24 August 2007 23:59-kor Michael Niedermayer ezt ?rta:
> ok, but then you should move the for up so its not immedeatly before
> a fcmpd using its result

Ok, done.

> there are 32 64bit registers these should be enough to do the idct without
> an intermediate store-load
> the whole 8x8 block needs 16registers, 7 for the constant coefficients
> that leaves 9 available

It would be slower. In it's current form of the idct, there are 8 independent 
VIS instructions after each other, so the instruction latency is not a 
problem. If you only use 9 registers, than good luck with latency. 

> if you think that this patch will be accepted due to you whining how much
> time you spend on it already then you live in some strange fantasy world

It's not about time anymore. I didn't like your suggestions, mainly because 
either they are 1) impossible or 2) make a convoluted mess of the code (which 
is quite simple now) for minimal gain. I liked your ideas up to now 
(including trying 8 bit coefficients and the negate thing), and after (maybe 
before) I stopped whining I did implement them. But not this time.

> either you make the improvements (or argue why its not possible or wouldnt
> make sense ...) or your patch wont be applied
> also keep in mind i have as well spend considerable time on this already
> (reading the sparc asm manuals reviewing your code, ...) and dont complain
> hey i even have learned alot about sparcs
> if i accept half optimal/working patches because the author has not enough
> time or interrest to implement it properly then ffmpeg would be half broken
> and running at half the speed it does now
> its not the "its just 0.X% overall" your changes in the last 2 days made
> your idct 40% faster or so (it was around 1000 and then 1400/sec IIRC)
> if i would accept patches in general which
> are 40% slower then they could be then ffmpeg as a whole would b 40%
> slower then it could be ...
> also about the code becoming messy, i am sure this can be avoided by
> properly implementing it

I think what you are suggesting are such low level optimizations, that the 
assembly code will loose it's clean structure. I know the idct could be made 
a few percent faster (using your ideas), but I am not interested in these 
kind of optimizations - because, again, they clutter up the code no matter 
what you do, and the gain is - although probably measurable - little.

I attached a patch, with the for moved further behind the fcmpd.


ps: there is a half as fast version of this idct, but that's accurate (32 bit 
multiplies) - I am wondering if maybe that would make more sense in ffmpeg. 
Or there could be three sparc idcts: one slow (but faster than the C version) 
and accurate, one faster but less accurate, and then the mlib (fastest, very 
inaccurate, mpeg4 routinely turns pink while viewing). Maybe it's not a good 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_idct_vis_try5.diff
Type: text/x-diff
Size: 21987 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070825/9c1f4de9/attachment.diff>

More information about the ffmpeg-devel mailing list