[FFmpeg-devel] [PATCH v3] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

James Almer jamrial at gmail.com
Mon Feb 3 15:20:26 EET 2025


On 1/29/2025 10:03 AM, Shreesh Adiga wrote:
> Hi Andreas,
> 
> I am not sure if that is needed. I can add the data observed on my machine
> (AMD 7950x Zen 4),
> I think this will vary from machine to machine. It is expected to be around
> 2x
> compared to AVX2 and there is no core change apart from processing the
> scalar loop with masked instructions.
> 
> The data doesn't entirely look consistent as per my expectations.
> All the shuffle variants are equivalent in the work they do, yet the
> speedups
> are not consistent as per the report.
> 
> shuffle_bytes_0321_c:                                   56.5 ( 1.00x)
> shuffle_bytes_0321_ssse3:                               15.2 ( 3.70x)
> shuffle_bytes_0321_avx2:                                10.2 ( 5.51x)
> shuffle_bytes_0321_avx512icl:                            9.2 ( 6.11x)
> shuffle_bytes_1230_c:                                   84.5 ( 1.00x)
> shuffle_bytes_1230_ssse3:                               14.2 ( 5.93x)
> shuffle_bytes_1230_avx2:                                15.2 ( 5.54x)
> shuffle_bytes_1230_avx512icl:                           11.2 ( 7.51x)
> shuffle_bytes_2103_c:                                   48.5 ( 1.00x)
> shuffle_bytes_2103_ssse3:                               21.2 ( 2.28x)
> shuffle_bytes_2103_avx2:                                13.8 ( 3.53x)
> shuffle_bytes_2103_avx512icl:                            9.2 ( 5.24x)
> shuffle_bytes_3012_c:                                   84.5 ( 1.00x)
> shuffle_bytes_3012_ssse3:                               14.2 ( 5.93x)
> shuffle_bytes_3012_avx2:                                16.2 ( 5.20x)
> shuffle_bytes_3012_avx512icl:                           10.2 ( 8.24x)
> shuffle_bytes_3210_c:                                   89.2 ( 1.00x)
> shuffle_bytes_3210_ssse3:                               24.2 ( 3.68x)
> shuffle_bytes_3210_avx2:                                16.2 ( 5.49x)
> shuffle_bytes_3210_avx512icl:                            9.2 ( 9.65x)
> 
> I can add the details to commit message if you can confirm if it is needed.
> 
> Thanks,
> Shreesh

Added the benchmarks and pushed the patch. Thanks.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250203/1138e55d/attachment.sig>


More information about the ffmpeg-devel mailing list