[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount

James Almer jamrial at gmail.com
Wed Feb 25 18:29:11 CET 2015


On 25/02/15 12:43 PM, Clément Bœsch wrote:
> On Tue, Feb 24, 2015 at 10:05:24PM -0300, James Almer wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> I decided to go the configure route since other features (cmov, clz) also do
>> it , but if prefered this could instead be done with a new intmath.h header 
>> in the x86/ folder containing something like
>>
>> #if defined(__GNUC__) && defined(__POPCNT__)
>>     #define av_popcount   __builtin_popcount
>> #if ARCH_X86_64
>>     #define av_popcount64 __builtin_popcountll
>> #endif
>> #endif
>>
>> For a cleaner compile time check.
>>
>>  configure           | 12 ++++++++++--
>>  libavutil/intmath.h | 13 +++++++++++++
>>  2 files changed, 23 insertions(+), 2 deletions(-)
>>
> 
> For the record, the builtin implementation looks like this here:
> 
> 0000000000000000 <av_popcount_c>:
>    0:   89 f8                   mov    %edi,%eax
>    2:   d1 e8                   shr    %eax
>    4:   25 55 55 55 55          and    $0x55555555,%eax
>    9:   29 c7                   sub    %eax,%edi
>    b:   89 fa                   mov    %edi,%edx
>    d:   c1 ef 02                shr    $0x2,%edi
>   10:   81 e2 33 33 33 33       and    $0x33333333,%edx
>   16:   81 e7 33 33 33 33       and    $0x33333333,%edi
>   1c:   8d 04 17                lea    (%rdi,%rdx,1),%eax
>   1f:   89 c2                   mov    %eax,%edx
>   21:   c1 ea 04                shr    $0x4,%edx
>   24:   01 d0                   add    %edx,%eax
>   26:   25 0f 0f 0f 0f          and    $0xf0f0f0f,%eax
>   2b:   89 c2                   mov    %eax,%edx
>   2d:   c1 ea 08                shr    $0x8,%edx
>   30:   01 d0                   add    %edx,%eax
>   32:   89 c2                   mov    %eax,%edx
>   34:   c1 ea 10                shr    $0x10,%edx
>   37:   01 d0                   add    %edx,%eax
>   39:   83 e0 3f                and    $0x3f,%eax
>   3c:   c3                      retq   
>   3d:   0f 1f 00                nopl   (%rax)
> 
> 0000000000000040 <popcount_gcc>:
>   40:   48 83 ec 08             sub    $0x8,%rsp
>   44:   89 ff                   mov    %edi,%edi
>   46:   e8 00 00 00 00          callq  4b <popcount_gcc+0xb>
>   4b:   48 83 c4 08             add    $0x8,%rsp
>   4f:   c3                      retq   
> 
> 0000000000000040 <popcount_clang>:
>   40:   89 f8                   mov    %edi,%eax
>   42:   d1 e8                   shr    %eax
>   44:   25 55 55 55 55          and    $0x55555555,%eax
>   49:   29 c7                   sub    %eax,%edi
>   4b:   89 f8                   mov    %edi,%eax
>   4d:   25 33 33 33 33          and    $0x33333333,%eax
>   52:   c1 ef 02                shr    $0x2,%edi
>   55:   81 e7 33 33 33 33       and    $0x33333333,%edi
>   5b:   01 c7                   add    %eax,%edi
>   5d:   89 f8                   mov    %edi,%eax
>   5f:   c1 e8 04                shr    $0x4,%eax
>   62:   01 f8                   add    %edi,%eax
>   64:   25 0f 0f 0f 0f          and    $0xf0f0f0f,%eax
>   69:   69 c0 01 01 01 01       imul   $0x1010101,%eax,%eax
>   6f:   c1 e8 18                shr    $0x18,%eax
>   72:   c3                      retq   
> 
> We might see relevant "optimizations" for our reference code.

What's clang code for av_popcount64_c, or their builtin?
We're currently calling av_popcount_c twice from within av_popcount64_c, 
when on x86_64 cpus we could probably take advantage of the 64bits gprs.

> 
> [...]
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 



More information about the ffmpeg-devel mailing list