[FFmpeg-devel] [PATCH v3] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes
James Almer
jamrial at gmail.com
Mon Feb 3 15:20:26 EET 2025
On 1/29/2025 10:03 AM, Shreesh Adiga wrote:
> Hi Andreas,
>
> I am not sure if that is needed. I can add the data observed on my machine
> (AMD 7950x Zen 4),
> I think this will vary from machine to machine. It is expected to be around
> 2x
> compared to AVX2 and there is no core change apart from processing the
> scalar loop with masked instructions.
>
> The data doesn't entirely look consistent as per my expectations.
> All the shuffle variants are equivalent in the work they do, yet the
> speedups
> are not consistent as per the report.
>
> shuffle_bytes_0321_c: 56.5 ( 1.00x)
> shuffle_bytes_0321_ssse3: 15.2 ( 3.70x)
> shuffle_bytes_0321_avx2: 10.2 ( 5.51x)
> shuffle_bytes_0321_avx512icl: 9.2 ( 6.11x)
> shuffle_bytes_1230_c: 84.5 ( 1.00x)
> shuffle_bytes_1230_ssse3: 14.2 ( 5.93x)
> shuffle_bytes_1230_avx2: 15.2 ( 5.54x)
> shuffle_bytes_1230_avx512icl: 11.2 ( 7.51x)
> shuffle_bytes_2103_c: 48.5 ( 1.00x)
> shuffle_bytes_2103_ssse3: 21.2 ( 2.28x)
> shuffle_bytes_2103_avx2: 13.8 ( 3.53x)
> shuffle_bytes_2103_avx512icl: 9.2 ( 5.24x)
> shuffle_bytes_3012_c: 84.5 ( 1.00x)
> shuffle_bytes_3012_ssse3: 14.2 ( 5.93x)
> shuffle_bytes_3012_avx2: 16.2 ( 5.20x)
> shuffle_bytes_3012_avx512icl: 10.2 ( 8.24x)
> shuffle_bytes_3210_c: 89.2 ( 1.00x)
> shuffle_bytes_3210_ssse3: 24.2 ( 3.68x)
> shuffle_bytes_3210_avx2: 16.2 ( 5.49x)
> shuffle_bytes_3210_avx512icl: 9.2 ( 9.65x)
>
> I can add the details to commit message if you can confirm if it is needed.
>
> Thanks,
> Shreesh
Added the benchmarks and pushed the patch. Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250203/1138e55d/attachment.sig>
More information about the ffmpeg-devel
mailing list