[FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for sse16
Martin Storsjö
martin at martin.st
Thu Aug 4 10:46:05 EEST 2022
On Mon, 25 Jul 2022, Hubert Mazur wrote:
> Provide neon implementation for sse16 function.
>
> Performance comparison tests are shown below.
> - sse_0_c: 273.0
> - sse_0_neon: 48.2
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++
> libavcodec/aarch64/me_cmp_neon.S | 82 ++++++++++++++++++++++++
> 2 files changed, 86 insertions(+)
> +// iterate by one
> +2:
> +
> + ld1 {v0.16b}, [x1], x3 // Load pix1
> + ld1 {v1.16b}, [x2], x3 // Load pix2
> +
> + uabd v30.16b, v0.16b, v1.16b
> + umull v29.8h, v0.8b, v1.8b
> + umull2 v28.8h, v0.16b, v1.16b
This should probably be using v30 instead of v0/v1 in the umull here.
The whole codepath for non-modulo-4 heights is untested in practice. You
can apply the patches from
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=7028 to make
checkasm test it, so please make sure that the uncommon codepaths in the
patches do work too.
// Martin
More information about the ffmpeg-devel
mailing list