[FFmpeg-devel] [PATCH] SPARC VIS simple_idct
Sat Aug 25 01:20:45 CEST 2007
Friday 24 August 2007 23:59-kor Michael Niedermayer ezt ?rta:
> ok, but then you should move the for up so its not immedeatly before
> a fcmpd using its result
> there are 32 64bit registers these should be enough to do the idct without
> an intermediate store-load
> the whole 8x8 block needs 16registers, 7 for the constant coefficients
> that leaves 9 available
It would be slower. In it's current form of the idct, there are 8 independent
VIS instructions after each other, so the instruction latency is not a
problem. If you only use 9 registers, than good luck with latency.
> if you think that this patch will be accepted due to you whining how much
> time you spend on it already then you live in some strange fantasy world
It's not about time anymore. I didn't like your suggestions, mainly because
either they are 1) impossible or 2) make a convoluted mess of the code (which
is quite simple now) for minimal gain. I liked your ideas up to now
(including trying 8 bit coefficients and the negate thing), and after (maybe
before) I stopped whining I did implement them. But not this time.
> either you make the improvements (or argue why its not possible or wouldnt
> make sense ...) or your patch wont be applied
> also keep in mind i have as well spend considerable time on this already
> (reading the sparc asm manuals reviewing your code, ...) and dont complain
> hey i even have learned alot about sparcs
> if i accept half optimal/working patches because the author has not enough
> time or interrest to implement it properly then ffmpeg would be half broken
> and running at half the speed it does now
> its not the "its just 0.X% overall" your changes in the last 2 days made
> your idct 40% faster or so (it was around 1000 and then 1400/sec IIRC)
> if i would accept patches in general which
> are 40% slower then they could be then ffmpeg as a whole would b 40%
> slower then it could be ...
> also about the code becoming messy, i am sure this can be avoided by
> properly implementing it
I think what you are suggesting are such low level optimizations, that the
assembly code will loose it's clean structure. I know the idct could be made
a few percent faster (using your ideas), but I am not interested in these
kind of optimizations - because, again, they clutter up the code no matter
what you do, and the gain is - although probably measurable - little.
I attached a patch, with the for moved further behind the fcmpd.
ps: there is a half as fast version of this idct, but that's accurate (32 bit
multiplies) - I am wondering if maybe that would make more sense in ffmpeg.
Or there could be three sparc idcts: one slow (but faster than the C version)
and accurate, one faster but less accurate, and then the mlib (fastest, very
inaccurate, mpeg4 routinely turns pink while viewing). Maybe it's not a good
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 21987 bytes
Desc: not available
More information about the ffmpeg-devel