[FFmpeg-devel] [PATCH] SIMD-optimized exponent_min() for ac3enc
Mon Jan 17 14:10:43 CET 2011
On 01/16/2011 09:19 PM, Loren Merritt wrote:
> On Sun, 16 Jan 2011, Justin Ruggles wrote:
>> Reversing the outer loop seems unrelated to what you've mentioned. I
>> don't see how it helps. Is it actually faster to have an extra add
>> instead of an offset in the load and store?
> The point was to make expq point to the base of the current inner loop.
> Any change in addressing of the outer loop is a side-effect, and isn't
> supposed to affect speed.
ok, I think I've got it now.
I was stuck at reading exp first, then comparing the following blocks,
then I finally realized it doesn't matter. Now the inner loop starts at
exp+offset and ends at exp, so sub+jae works fine.
New patch attached. The best benchmarks are pretty much the same, but
the average speed is more consistently faster.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 12780 bytes
Desc: not available
More information about the ffmpeg-devel