[FFmpeg-devel] [PATCH] x86/intmath: allow the source operand for icc/icl ff_ctzll to be a memory location

Matt Oliver protogonoi at gmail.com
Wed Oct 28 08:50:00 CET 2015


On 26 October 2015 at 04:46, James Almer <jamrial at gmail.com> wrote:

> On 10/25/2015 1:39 PM, Matt Oliver wrote:
> > On 26 October 2015 at 00:43, Hendrik Leppkes <h.leppkes at gmail.com>
> wrote:
> >
> >> On Sun, Oct 25, 2015 at 12:30 PM, Matt Oliver <protogonoi at gmail.com>
> >> wrote:
> >>> On 25 October 2015 at 22:16, Hendrik Leppkes <h.leppkes at gmail.com>
> >> wrote:
> >>>
> >>>> On Sun, Oct 25, 2015 at 11:26 AM, Matt Oliver <protogonoi at gmail.com>
> >>>> wrote:
> >>>>> On 25 October 2015 at 12:22, Ganesh Ajjanagadde <gajjanag at mit.edu>
> >>>> wrote:
> >>>>>
> >>>>>> On Sat, Oct 24, 2015 at 7:03 PM, James Almer <jamrial at gmail.com>
> >> wrote:
> >>>>>>> On 10/24/2015 7:48 PM, Ganesh Ajjanagadde wrote:
> >>>>>>>> On Sat, Oct 24, 2015 at 6:08 PM, James Almer <jamrial at gmail.com>
> >>>> wrote:
> >>>>>>>>> This gives the compiler some flexibility
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: James Almer <jamrial at gmail.com>
> >>>>>>>>> ---
> >>>>>>>>>  libavutil/x86/intmath.h | 2 +-
> >>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
> >>>>>>>>> index 7881e3c..10d3e7f 100644
> >>>>>>>>> --- a/libavutil/x86/intmath.h
> >>>>>>>>> +++ b/libavutil/x86/intmath.h
> >>>>>>>>> @@ -36,7 +36,7 @@ static av_always_inline av_const int
> >>>>>> ff_ctzll_x86(long long v)
> >>>>>>>>>  {
> >>>>>>>>>  #   if ARCH_X86_64
> >>>>>>>>>      uint64_t c;
> >>>>>>>>> -    __asm__("bsfq %1,%0" : "=r" (c) : "r" (v));
> >>>>>>>>> +    __asm__ ("bsfq %1,%0" : "=r" (c) : "rm" (v));
> >>>>>>>>>      return c;
> >>>>>>>>>  #   else
> >>>>>>>>>      return ((uint32_t)v == 0) ? _bit_scan_forward((uint32_t)(v
> >>>>
> >>>>>> 32)) + 32 : _bit_scan_forward((uint32_t)v);
> >>>>>>>>> --
> >>>>>>>>> 2.6.2
> >>>>>>>>
> >>>>>>>> This question may be silly, but can you clarify when this asm
> >> code is
> >>>>>>>> used instead of __builtin_ctzll? For instance, I think on
> >> GNU/Linux,
> >>>>>>>> x86-64, sufficiently modern GCC (even with asm enabled), we should
> >>>>>>>> always use the builtin: the compiler knows best what to do with
> >> its
> >>>>>>>> builtin.
> >>>>>>>
> >>>>>>> This is ICC/ICL, not GCC.
> >>>>>>
> >>>>>> Ah, now I recall that _bit_scan_forward64 was not always available
> on
> >>>>>> ICC. Anyway, this is just a heads up to whoever uses ICC/ICL and the
> >>>>>> like: you may want to find out to which versions this built in
> >>>>>> applies.
> >>>>>
> >>>>>
> >>>>> bit_scan_forward64 isnt available on ICL at all, hence the use of
> asm.
> >>>>>
> >>>>
> >>>> Intels intrinsic guide lists _BitScanForward64 as the intrinsic to
> use,
> >>>> however.
> >>>>
> >>>>
> >>
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=Other&expand=373
> >>>
> >>>
> >>>  _BitScanForward64 is a msvc intrinsic only available on windows
> >> platforms
> >>> so not usable with ICC. The native asm implementation also performs
> >> better
> >>> with ICL hence its use.
> >>
> >> The "Intel(R) C++ Compiler User and Reference Guide" would care to
> >> disagree with you.
> >>
> >
> > Well ill be... I dont have ICC on hand to check it but ill take Intels
> word
> > for it. Its interesting that Intel decided to support that intrinsic as
> > opposed to just extending there own version to 64b (i.e.
> > _bit_scan_forward64). _BitScanForward64 as far as I know was introduced
> by
> > msvc and made available in "winnt.h" hence the different naming
> convention
> > to other intrinsics. That said the inline asm version is still preferable
> > with Intel as the intrinsic passes the result via pointer which when
> tested
> > with msvc11 and 12 crt's resulted in horrible performance hits as the
> > compiler didnt optimise out the memory access in order to keep the result
> > in register (works fine with newer ICL 16 and msvcrt14 though).
>
> Tried ICC 13 x86_64 from https://gcc.godbolt.org/ and it doesn't recognize
> _BitScanForward64, or _BitScanForward for that matter.
>

I didnt know about godbolt, definitely a nice tool, thanks for checking.
I was a little surprised as I swore I checked those (and others) intrinsics
several years ago and they were windows only. Atleast now I know my memory
is not completely shot.


> >
> > Using tzcnt would be the more interesting option. Changing the asm to
> tzcnt
> > works with ICL. But also the use of the tzcnt intrinsic (_tzcnt_u64)
> would
> > be possible with both intel and msvc. I dont have anything older than
> > Haswell or Piledriver to double check but I have seen its use in several
> > other projects without issue so in theory it shouldnt be a problem.
> >
> > If tzcnt is preferable I can make up a patch to convert the intel and
> msvc
> > versions to use the required intrinsic.
>

> IIRC tzcnt (rep bsf) is only slower than plain bsf on really ancient
> machines, e.g. 80486 and such, so there's not really any downside of
> using it.


Well in that case the optimizations can be heavily simplified to those in
the attached patch.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avutil-x86-intmath-Use-tzcnt-in-place-of-bsf.patch
Type: application/octet-stream
Size: 2479 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151028/70d5b14a/attachment.obj>


More information about the ffmpeg-devel mailing list