[FFmpeg-devel] [PATCH 2/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

Ramiro Polla ramiro.polla at gmail.com
Sun Aug 18 23:20:07 EEST 2024


On Sun, Aug 18, 2024 at 10:13 PM Ramiro Polla <ramiro.polla at gmail.com> wrote:
>
>                    A53             A76
> pix_norm1_c:     519.2           231.5
> pix_norm1_neon:  195.0 ( 2.66x)   44.2 ( 5.24x)
> pix_sum_c:       344.5           242.2
> pix_sum_neon:    119.0 ( 2.89x)   41.7 ( 5.81x)

This new patchset no longer uses unrolled loops. Even though checkasm
reported the unrolled versions to be faster, in a real encoding
use-case linux perf reports that the non-unrolled versions are faster.


More information about the ffmpeg-devel mailing list