[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Loren Merritt
lorenm
Thu Feb 3 06:05:23 CET 2011
On Wed, 2 Feb 2011, Justin Ruggles wrote:
> On 01/31/2011 03:19 PM, Ronald S. Bultje wrote:
>> On Mon, Jan 31, 2011 at 2:53 PM, Loren Merritt wrote:
>>>
>>> I usually blame such weird results on code alignment, but I have no
>>> systematic way to fix them.
>>
>> Same here, try adding an ALIGN <num> (8 or 16) directly before a loop
>> statement, or disassemble before/after and see where alignment could
>> cause issues.
>
> Thanks for the suggestion. Below is a chart of the results for
> adding ALIGN 8 and ALIGN 16 before each of the 2 loops.
>
> LOOP1/LOOP2 MMX MMX2 SSE2
> -------------------------------
> NONE/NONE : 5270 5283 2757
> NONE/8 : 5200 5077 2644
> NONE/16 : 5723 3961 2161
> 8/NONE : 5214 5339 2787
> 8/8 : 5198* 5083 2722
> 8/16 : 5936 3902 2128
> 16/NONE : 6613 4788 2580
> 16/8 : 5490 3702 2020
> 16/16 : 5474 3680* 2000*
Other things that affect instruction size/count and therefore alignment
include:
* compiling for x86_32 vs x86_64-unix vs win64
* register size (d vs q as per my previous patch)
* whether PIC is enabled (not relevant this time because this function
doesn't use any static consts)
* and sometimes not only the mod16 or mod64 alignment matters, but also
the difference in memory address between this function and the rest of the
library.
While this isn't as bad as gcc's random code generator, don't assume
that the optimum you found in one configuration will be non-pessimal in
the others.
If there is a single optimal place to add a single optimal number of NOPs,
great. But often when I run into alignment weirdness, there is no such
solution, and the best I can do is poke it with a stick until I find some
combination of instructions that isn't so sensitive to alignment.
--Loren Merritt
More information about the ffmpeg-devel
mailing list