[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Måns Rullgård
mans
Fri Feb 4 03:30:39 CET 2011
Justin Ruggles <justin.ruggles at gmail.com> writes:
> On 02/03/2011 07:13 PM, Justin Ruggles wrote:
>
>> On 02/03/2011 06:47 PM, Loren Merritt wrote:
>>
>>> On Thu, 3 Feb 2011, Justin Ruggles wrote:
>>>> So should we just accept what is an obvious bad case on one
>>>> configuration because there is a chance that fixing it is worse
>>>> in another?
>>>
>>> My expectation of the effect of this fix on the performance of the
>>> configurations you haven't benchmarked, is positive. If you don't want to
>>> benchmark them, I won't reject this patch on those grounds.
>>>
>>> I am merely saying that as long as you haven't identified the actual
>>> cause of the slowdowns, as long as performance is still random unto you,
>>> making decisions based on a thorough benchmark of only one compiler
>>> configuration is generalizing from one data point.
>>>
>>>> Even the worst case versions are 80-90% faster than the C version in the
>>>> tested configuration (x86_64 unix). Is it likely that the worst case
>>>> will be much slower in another?
>>>
>>> Not more than 40% slower. (Some confidence since on this question your
>>> benchmark counts as 24 data points, not 1.)
>>
>>
>> I can recompile with "--extra-cflags=-m32 --extra-ldflags=-m32" and add
>> 24 more data points if you think this would be useful.
>
> Results for x86_32:
>
> LOOP1/LOOP2 MMX MMX2 SSE2
> -------------------------------
> NONE/NONE : 5150 4640 2735
> NONE/8 : 5240 3716 2343
> NONE/16 : 5270 3713* 2360
> 8/NONE : 5123 3765 2899
> 8/8 : 4970 5295 2793
> 8/16 : 5911 4361 2469
> 16/NONE : 4902* 4860 2696
> 16/8 : 5381 3922 2228
> 16/16 : 5382 3954 2226*
>
> And again, the results for x86_64:
>
> LOOP1/LOOP2 MMX MMX2 SSE2
> -------------------------------
> NONE/NONE : 5270 5283 2757
> NONE/8 : 5200 5077 2644
> NONE/16 : 5723 3961 2161
> 8/NONE : 5214 5339 2787
> 8/8 : 5198* 5083 2722
> 8/16 : 5936 3902 2128
> 16/NONE : 6613 4788 2580
> 16/8 : 5490 3702 2020
> 16/16 : 5474 3680* 2000*
>
> So this is definitely not conclusive. :(
>
> One thing that is consistent is that no matter what the alignment of the
> first loop is, increasing the alignment for the 2nd loop gives better
> results for mmx2 and sse2.
>
> I would be ok with doing nothing for mmx since it is wildly inconsistent
> and either only aligning the 2nd loop for mmx2 and sse2 or aligning both
> loops.
All x86_64 CPUs have SSE2, so the MMX(2) performance there doesn't
really matter.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list