[FFmpeg-devel] [PATCH] SIMD-optimized exponent_min() for ac3enc

Justin Ruggles justin.ruggles
Mon Jan 17 14:10:43 CET 2011


On 01/16/2011 09:19 PM, Loren Merritt wrote:

> On Sun, 16 Jan 2011, Justin Ruggles wrote:
>>
>> Reversing the outer loop seems unrelated to what you've mentioned.  I
>> don't see how it helps.  Is it actually faster to have an extra add
>> instead of an offset in the load and store?
> 
> The point was to make expq point to the base of the current inner loop. 
> Any change in addressing of the outer loop is a side-effect, and isn't 
> supposed to affect speed.


ok, I think I've got it now.

I was stuck at reading exp first, then comparing the following blocks,
then I finally realized it doesn't matter.  Now the inner loop starts at
exp+offset and ends at exp, so sub+jae works fine.

New patch attached.  The best benchmarks are pretty much the same, but
the average speed is more consistently faster.

Thanks,
Justin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ac3_exponent_min.patch
Type: text/x-patch
Size: 12780 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110117/785c74b7/attachment.bin>



More information about the ffmpeg-devel mailing list