[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
Paul B Mahol
onemda at gmail.com
Wed Oct 27 10:58:59 EEST 2021
On Wed, Oct 27, 2021 at 9:51 AM Mark Reid <mindmark at gmail.com> wrote:
> On Monday, October 25, 2021, Michael Niedermayer <michael at niedermayer.cc>
> wrote:
>
> > On Sun, Oct 24, 2021 at 09:09:52PM -0700, mindmark at gmail.com wrote:
> > > From: Mark Reid <mindmark at gmail.com>
> > >
> > > yuv2gbrp_full_X_4_512_c: 12096.6
> > > yuv2gbrp_full_X_4_512_sse2: 10782.6
> > > yuv2gbrp_full_X_4_512_sse4: 5143.6
> > > yuv2gbrp_full_X_4_512_avx2: 3000.1
> > > yuv2gbrap_full_X_4_512_c: 15463.1
> > > yuv2gbrap_full_X_4_512_sse2: 14296.6
> > > yuv2gbrap_full_X_4_512_sse4: 6319.1
> > > yuv2gbrap_full_X_4_512_avx2: 3554.1
> > > yuv2gbrp9be_full_X_4_512_c: 14281.6
> > > yuv2gbrp9be_full_X_4_512_sse2: 11206.1
> > > yuv2gbrp9be_full_X_4_512_sse4: 5033.6
> > > yuv2gbrp9be_full_X_4_512_avx2: 3012.6
> > > yuv2gbrp9le_full_X_4_512_c: 12688.6
> > > yuv2gbrp9le_full_X_4_512_sse2: 10914.1
> > > yuv2gbrp9le_full_X_4_512_sse4: 5144.6
> > > yuv2gbrp9le_full_X_4_512_avx2: 3014.6
> > > yuv2gbrp10be_full_X_4_512_c: 14257.6
> > > yuv2gbrp10be_full_X_4_512_sse2: 11089.6
> > > yuv2gbrp10be_full_X_4_512_sse4: 5039.1
> > > yuv2gbrp10be_full_X_4_512_avx2: 3001.1
> > > yuv2gbrp10le_full_X_4_512_c: 12098.6
> > > yuv2gbrp10le_full_X_4_512_sse2: 10884.1
> > > yuv2gbrp10le_full_X_4_512_sse4: 5138.1
> > > yuv2gbrp10le_full_X_4_512_avx2: 2999.6
> > > yuv2gbrap10be_full_X_4_512_c: 18549.6
> > > yuv2gbrap10be_full_X_4_512_sse2: 14538.6
> > > yuv2gbrap10be_full_X_4_512_sse4: 6292.6
> > > yuv2gbrap10be_full_X_4_512_avx2: 3583.6
> > > yuv2gbrap10le_full_X_4_512_c: 16631.1
> > > yuv2gbrap10le_full_X_4_512_sse2: 14190.6
> > > yuv2gbrap10le_full_X_4_512_sse4: 6348.1
> > > yuv2gbrap10le_full_X_4_512_avx2: 3554.6
> > > yuv2gbrp12be_full_X_4_512_c: 13555.1
> > > yuv2gbrp12be_full_X_4_512_sse2: 10952.1
> > > yuv2gbrp12be_full_X_4_512_sse4: 5137.6
> > > yuv2gbrp12be_full_X_4_512_avx2: 3009.6
> > > yuv2gbrp12le_full_X_4_512_c: 12082.6
> > > yuv2gbrp12le_full_X_4_512_sse2: 10891.1
> > > yuv2gbrp12le_full_X_4_512_sse4: 5184.1
> > > yuv2gbrp12le_full_X_4_512_avx2: 3011.1
> > > yuv2gbrap12be_full_X_4_512_c: 18689.6
> > > yuv2gbrap12be_full_X_4_512_sse2: 14522.6
> > > yuv2gbrap12be_full_X_4_512_sse4: 6237.6
> > > yuv2gbrap12be_full_X_4_512_avx2: 3585.6
> > > yuv2gbrap12le_full_X_4_512_c: 16760.6
> > > yuv2gbrap12le_full_X_4_512_sse2: 14202.1
> > > yuv2gbrap12le_full_X_4_512_sse4: 6252.1
> > > yuv2gbrap12le_full_X_4_512_avx2: 3591.1
> > > yuv2gbrp14be_full_X_4_512_c: 13555.6
> > > yuv2gbrp14be_full_X_4_512_sse2: 10949.1
> > > yuv2gbrp14be_full_X_4_512_sse4: 5185.1
> > > yuv2gbrp14be_full_X_4_512_avx2: 3012.1
> > > yuv2gbrp14le_full_X_4_512_c: 12068.1
> > > yuv2gbrp14le_full_X_4_512_sse2: 10883.6
> > > yuv2gbrp14le_full_X_4_512_sse4: 5145.1
> > > yuv2gbrp14le_full_X_4_512_avx2: 3007.1
> > > yuv2gbrp16be_full_X_4_512_c: 12383.6
> > > yuv2gbrp16be_full_X_4_512_sse2: 8230.6
> > > yuv2gbrp16be_full_X_4_512_sse4: 4765.6
> > > yuv2gbrp16be_full_X_4_512_avx2: 2742.6
> > > yuv2gbrp16le_full_X_4_512_c: 10906.1
> > > yuv2gbrp16le_full_X_4_512_sse2: 28732.1
> > > yuv2gbrp16le_full_X_4_512_sse4: 4709.6
> > > yuv2gbrp16le_full_X_4_512_avx2: 2753.1
> > > yuv2gbrap16be_full_X_4_512_c: 15472.6
> > > yuv2gbrap16be_full_X_4_512_sse2: 11021.6
> > > yuv2gbrap16be_full_X_4_512_sse4: 5487.6
> > > yuv2gbrap16be_full_X_4_512_avx2: 3143.6
> > > yuv2gbrap16le_full_X_4_512_c: 13668.6
> > > yuv2gbrap16le_full_X_4_512_sse2: 10562.1
> > > yuv2gbrap16le_full_X_4_512_sse4: 5506.6
> > > yuv2gbrap16le_full_X_4_512_avx2: 3149.6
> > > yuv2gbrpf32be_full_X_4_512_c: 15471.1
> > > yuv2gbrpf32be_full_X_4_512_sse2: 8524.6
> > > yuv2gbrpf32be_full_X_4_512_sse4: 4559.1
> > > yuv2gbrpf32be_full_X_4_512_avx2: 2388.1
> > > yuv2gbrpf32le_full_X_4_512_c: 14247.6
> > > yuv2gbrpf32le_full_X_4_512_sse2: 7600.6
> > > yuv2gbrpf32le_full_X_4_512_sse4: 4385.6
> > > yuv2gbrpf32le_full_X_4_512_avx2: 2258.6
> > > yuv2gbrapf32be_full_X_4_512_c: 18412.1
> > > yuv2gbrapf32be_full_X_4_512_sse2: 11353.6
> > > yuv2gbrapf32be_full_X_4_512_sse4: 5807.1
> > > yuv2gbrapf32be_full_X_4_512_avx2: 2928.1
> > > yuv2gbrapf32le_full_X_4_512_c: 16485.1
> > > yuv2gbrapf32le_full_X_4_512_sse2: 10202.1
> > > yuv2gbrapf32le_full_X_4_512_sse4: 5571.6
> > > yuv2gbrapf32le_full_X_4_512_avx2: 2847.6
> > >
> > >
> > > ---
> > > libswscale/x86/output.asm | 440 +++++++++++++++++++++++++++++++++++++-
> > > libswscale/x86/swscale.c | 99 +++++++++
> > > tests/checkasm/Makefile | 2 +-
> > > tests/checkasm/checkasm.c | 1 +
> > > tests/checkasm/checkasm.h | 1 +
> > > tests/checkasm/sw_gbrp.c | 198 +++++++++++++++++
> > > tests/fate/checkasm.mak | 1 +
> > > 7 files changed, 740 insertions(+), 2 deletions(-)
> > > create mode 100644 tests/checkasm/sw_gbrp.c
> >
> > seems to work
> > asm review left to people who worked with asm more recently than me
> >
> >
> Thanks for taking the time to test, I was planning on doing the planer
> input ones next and add the missing unscaled floating point rgb2rgb
> functions
>
>
> > also if you or anyone wants a random idea for swscale improvments
> > we are missing a direct yuv->yuv converter converting between different
> > yuv colorspaces, atm these are handled with rgb intermediate
> >
> >
> Like what the vf_colormatrix filter does?
>
Nope, that is very very very bad filter.
Look at vf_colorspace filter instead, it does it correctly.
>
>
> > thx
> >
> > [...]
> > --
> > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >
> > If you fake or manipulate statistics in a paper in physics you will never
> > get a job again.
> > If you fake or manipulate statistics in a paper in medicin you will get
> > a job for life at the pharma industry.
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list