[FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon
Martin Storsjö
martin at martin.st
Sun Mar 2 01:07:45 EET 2025
On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> Before and after:
>
> A78
> ac3_sum_square_bufferfly_int32_neon: 484.8 ( 2.00x)
> ac3_sum_square_bufferfly_int32_neon: 468.2 ( 2.08x)
>
> A72
> ac3_sum_square_bufferfly_int32_neon: 793.6 ( 1.26x)
> ac3_sum_square_bufferfly_int32_neon: 527.3 ( 1.92x)
> ---
> Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only
> a^2, b^2 and 2*a*b in each iteration and derive the latter parts from
> these three at the end.
This patch looks good to me, thanks!
// Martin
More information about the ffmpeg-devel
mailing list