[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.
Alan Kelly
alankelly at google.com
Fri Jul 16 17:46:09 EEST 2021
On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial at gmail.com> wrote:
> On 7/16/2021 10:44 AM, Alan Kelly wrote:
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
> > email thread.
>
> I was very explicit about this not being ok. We're not disabling all ymm
> usage for Haswell just for one or two swscale functions using gathers.
>
> Lets go with Lynne's latest suggestion and not change the flags at all
> and use gathers on Haswell, same as other arches, by looking at the
> AVX2_FAST flag.
>
> > libavutil/cpu.h | 1 +
> > libavutil/x86/cpu.c | 11 ++++++++++-
> > 2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> > index c069076439..ec3073d021 100644
> > --- a/libavutil/cpu.h
> > +++ b/libavutil/cpu.h
> > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
> > * av_set_cpu_flags_mask(), then this function will behave as if AVX
> is not
> > * present.
> > */
> > +
> > size_t av_cpu_max_align(void);
> >
> > #endif /* AVUTIL_CPU_H */
> > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> > index bcd41a50a2..158e2170c4 100644
> > --- a/libavutil/x86/cpu.c
> > +++ b/libavutil/x86/cpu.c
> > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
> > if (max_std_level >= 7) {
> > cpuid(7, eax, ebx, ecx, edx);
> > #if HAVE_AVX2
> > - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
> > + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
> > rval |= AV_CPU_FLAG_AVX2;
> > +
> > + cpuid(1, eax, ebx, ecx, std_caps);
> > + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
> > + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
> > + // Haswell and earlier has slow gather
> > + if(family == 6 && model < 70)
> > + rval |= AV_CPU_FLAG_AVXSLOW;
> > + }
> > +
> > #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
> > if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
> > if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) ==
> 0xd0030000)
> >
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
OK, apologies for the misunderstanding. In that case part 1 of this patch
is not required. Part two remains valid with the function protected by
EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or
is it OK as is?
More information about the ffmpeg-devel
mailing list