[FFmpeg-devel] [PATCH] Optimisations for av_log2 and integer clip functions
Wed Jan 13 23:32:15 CET 2010
Jason Garrett-Glaser <darkshikari at gmail.com> writes:
> 2010/1/13 M?ns Rullg?rd <mans at mansr.com>:
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>>> +#define av_log2 av_log2
>>>> +static inline av_const int av_log2(unsigned int v)
>>>> + ? ?return v? 31 - __builtin_clz(v) : 0;
>>>> +#ifndef av_log2_16bit
>>>> +#define av_log2_16bit av_log2
>>> Won't ^31 be faster? ?"31 - X" requires an extra mov on x86.
>> Maybe. ?The subtraction might play nicer with the way it's used in
>> e.g. golomb.h. ?I'd be surprised if gcc could figure out such bit
>> magic by itself.
>>> Also, __builtin_clz/ctz maps to bsr/bsf, which are extraordinarily
>>> slow on Athlons.
>> Fabulous. ?So what shall we do? ?List the CPUs with good clz support
>> in configure like we do with cmov et al?
> For BSR (BSF is similar but not always identical):
> PPro/P2/P3/PM: 2 uops
> Core 2: 2/1 (latency/recip throughput)
> Core i7: 3/1
> Pentium 4: 4/2
> Pentium 4E: 16/4 (WHAT THE FUCKETY FUCK?!)
> Atom: 16/? (DIE INTEL DIE)
> Via Nano: 3/2
> Athlon K7: 9/9
> Athlon K8 (A64): 10/10
> Athlon K10 (Phenom): 4/3 (note: has SSE4a's LZCNT which is very
> similar and is 2/1)
ARM (all I checked): 1/1
> Isn't it awesome?
That doesn't answer my question.
mans at mansr.com
More information about the ffmpeg-devel