[FFmpeg-devel] [PATCH 04/10] lavc/flacenc: add sse4 version of the lpc encoder
christophe.gisquet at gmail.com
Thu Feb 13 16:09:37 CET 2014
2014-02-13 James Darnley <james.darnley at gmail.com>:
> On 2014-02-12 12:41, Christophe Gisquet wrote:
> I managed to reduce the function to 5 auto-load args. It doesn't much
> matter where r5mp (shift) really is as I only use it once then I can use
> r5 as I want. That means I don't need r7 on x64 so I have dropped that
> down to 7 registers.
> More reductions don't seem worth the amount of code *I think* I would
> have to add (it is a lot!) to ensure correct loading on all 3 platforms.
> With the length of these functions I don't think it would save much
> time at all to avoid storing 1 more register
Yes, and they are not worth more of your time. It's mostly a trick I
was taught on this mailing list to shave the last precious cycles, and
those kinds of function end up never being touched again.
> Actually I re-use r2 (len) on x86 for holding one of my j offsets above
> so I couldn't do this trick without working around this. I will have a
> go at it and see how much extra code requires.
It depends mostly on the number of iterations of each loop, and it's
probably a negligible benefit. To me, the function already looks good
enough whatever you end up doing.
More information about the ffmpeg-devel