[FFmpeg-devel] 6c6ac9cb "avutil/x86/intmath: Use tzcnt in place of bsf."

Hans Wennborg hans at chromium.org
Tue Nov 17 19:59:43 CET 2015

Hi Matt,

Thanks for the quick reply.

On Mon, Nov 16, 2015 at 11:02 PM, Matt Oliver <protogonoi at gmail.com> wrote:
> On 17 November 2015 at 12:12, Hans Wennborg <hans at chromium.org> wrote:
>> This commit [1] is causing problems when compiling with Clang on Windows:
>> ..\..\third_party\ffmpeg\libavutil/x86/intmath.h(53,33) :  error:
>> always_inline function '__tzcnt_u32' requires target feature 'bmi',
>> but would be inlined into function 'ff_ctzll_x86' that is compiled
>> without support for 'bmi'
>>     return ((uint32_t)v == 0) ? _tzcnt_u32((uint32_t)(v >> 32)) + 32 :
>> _tzcnt_u32((uint32_t)v);
>>                                 ^
>> Essentially the compiler is saying that it won't allow using this
>> intrinsic unless compiling for a target that supports BMI.
>> Is there a performance reason for using __tzcnt_u32 instead of
>> _BitScanForward, or was it mainly to simplify the code?
>> We're working around this in Chromium by #define'ing __tzcnt_u32 to
>> __builtin_ctz at the moment, but it would be good if we could find a
>> nicer solution that could be applied upstream.
>> Cheers,
>> Hans
>>  [1].
>> https://github.com/FFmpeg/FFmpeg/commit/6c6ac9cb17c4944514bde833f2fa8aa8dafa974a
> tzcnt was used instead of bsf as it has performance advantages on any cpu
> that supports the tzcnt instruction. Although tzcnt is a newer instruction
> that is part of the BMI instruction set the actual opcode generated by this
> instruction is the equivalent to 'rep bsf'. So this instruction will still
> execute on any older cpu that doesnt support tzcnt as just the bsf
> instruction instead. So it was used as it provides optimum performance on
> both newer and older cpus.
> Clang is clearly not allowing this particular optimisation. I dont know of a
> way to ignore this using a command line option so probably the best way is
> to just disable this code when using clang. This will go back to using the
> previous behavior which assuming clang on windows still defines the gcc
> version check macros will use the __builtin_ctz.
> The attached patch should fix it but I dont have clang for windows to test
> it.

This got fixed in Clang instead:
http://llvm.org/viewvc/llvm-project?rev=253358&view=rev It will now
allow the intrinsic even if the target doesn't have BMI.

Sorry for the noise.


More information about the ffmpeg-devel mailing list