[FFmpeg-devel] libavcodec/utvideodsp : add avx2 version
martin.vignali at gmail.com
Tue Nov 21 21:09:35 EET 2017
> > Checkasm result (Kaby Lake, os 10.12)
> > restore_rgb_planes_c: 8371.0
> > restore_rgb_planes_sse2: 6583.7
> > restore_rgb_planes_avx2: 3596.5
> > restore_rgb_planes10_c: 16735.7
> > restore_rgb_planes10_sse2: 11478.5
> > restore_rgb_planes10_avx2: 7193.7
> Curious, on my Haswell (mingw-w64 Win10) i get
> restore_rgb_planes_c: 79500.7
> restore_rgb_planes_sse2: 6872.7
> restore_rgb_planes_avx2: 6715.7
> restore_rgb_planes10_c: 91394.7
> restore_rgb_planes10_sse2: 14494.0
> restore_rgb_planes10_avx2: 13468.7
> I check again, i have the same kind of result, than before
Strange, that the speed improvment is so small in Haswell
> > Pass fate test for me
> > 0001-checkasm-add-utvideodsp-test :
> > I'm not entirely sure of mine, for this checkasm,
> > 0002-libavcodec-x86-utvideodsp-make-macro-for-func
> > Code reorganization
> > 0003-libavcodec-utvideodsp-add-avx2-version-for-the-dsp
> > AVX2 version
> > 0004-libavcodec-x86-utvideodsp.asm-cosmetic
> > Cosmetic
> > Martin
> > Jokyo Images
> Sorry i missed this set. The asm changes look simple and good. Only
> thing I'd have done was making sure the constants were wide enough to
> avoid having to use vpbroadcast instructions.
> I noticed for that matter that said constants already exist in
> constants.c, so i just made it use them instead.
Thanks for all the fix.
Your comments, for the use of vpbroadcast for constantes load,
seems similar to a previous comment by James Darnley (in discussion
libavcodec/bswapdsp : add AVX2 for bswap_buf)
I use here the same way use by Henrik Gramner in exr_dsp.predictor func
(but i'm ok to modify that part if need)
Do you think we need to replace all
vbroadcasti128 mm, [constantes]
mova mm, [constantes]
by your method ? (for exr_dsp, the answer is probably yes, because it's
also use pb_80 (i will send a patch for that))
If yes, is it better to use in asm (for example for bswapdsp)
pb_bswap32: times 2 db 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
or adding a constantes (if not exists), in constant.c/h ?
Seems like this case will be common for AVX2 version of dsp func.
> The checkasm test is a bit ugly and could use some cosmetics, though.
Except one thing, (WIDTH_PADDED calc is strange (doesn't remember why i
write this, and only works by "luck"), need to be WIDTH + 16
Do you think, it's need more modification (considering your recent patchs) ?
More information about the ffmpeg-devel