[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Ronald S. Bultje
Thu Feb 3 19:53:22 CET 2011
On Thu, Feb 3, 2011 at 1:35 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
> On 02/03/2011 12:05 AM, Loren Merritt wrote:
>> On Wed, 2 Feb 2011, Justin Ruggles wrote:
>>> On 01/31/2011 03:19 PM, Ronald S. Bultje wrote:
>>>> On Mon, Jan 31, 2011 at 2:53 PM, Loren Merritt wrote:
>>>>> I usually blame such weird results on code alignment, but I have no
>>>>> systematic way to fix them.
>>>> Same here, try adding an ALIGN <num> (8 or 16) directly before a loop
>>>> statement, or disassemble before/after and see where alignment could
>>>> cause issues.
>>> Thanks for the suggestion. ?Below is a chart of the results for
>>> adding ALIGN 8 and ALIGN 16 before each of the 2 loops.
>>> LOOP1/LOOP2 ? MMX ? MMX2 ? SSE2
>>> NONE/NONE : ?5270 ? 5283 ? 2757
>>> ? ?NONE/8 : ?5200 ? 5077 ? 2644
>>> ? NONE/16 : ?5723 ? 3961 ? 2161
>>> ? ?8/NONE : ?5214 ? 5339 ? 2787
>>> ? ? ? 8/8 : ?5198* ?5083 ? 2722
>>> ? ? ?8/16 : ?5936 ? 3902 ? 2128
>>> ? 16/NONE : ?6613 ? 4788 ? 2580
>>> ? ? ?16/8 : ?5490 ? 3702 ? 2020
>>> ? ? 16/16 : ?5474 ? 3680* ?2000*
>> Other things that affect instruction size/count and therefore alignment
>> * compiling for x86_32 vs x86_64-unix vs win64
>> * register size (d vs q as per my previous patch)
>> * whether PIC is enabled (not relevant this time because this function
>> doesn't use any static consts)
> Doesn't yasm take these into account when using ALIGN?
>> * and sometimes not only the mod16 or mod64 alignment matters, but also
>> the difference in memory address between this function and the rest of the
>> While this isn't as bad as gcc's random code generator, don't assume
>> that the optimum you found in one configuration will be non-pessimal in
>> the others.
>> If there is a single optimal place to add a single optimal number of NOPs,
>> great. But often when I run into alignment weirdness, there is no such
>> solution, and the best I can do is poke it with a stick until I find some
>> combination of instructions that isn't so sensitive to alignment.
> I don't have much to poke around with as far as using different
> instructions in this case. ?So should we just accept what is an obvious
> bad case on one configuration because there is a chance that fixing it
> is worse in another?
> Even the worst case versions are 80-90% faster than the C version in the
> tested configuration (x86_64 unix). ?Is it likely that the worst case
> will be much slower in another?
I don't think that's what Loren meant. Your measures are fine and show
that alignment matters. Patch is therefore fine.
He just means that it's not always that easy, if you have PIC/non-PIC
codepaths, etc, then it may take a little more effort. But that's not
the case here, so this patch is fine now.
More information about the ffmpeg-devel