[FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions
Swinney, Jonathan
jswinney at amazon.com
Tue Apr 26 01:43:25 EEST 2022
Thanks to Michael and Martin for you reviews on several of my patches. I've made many of the changes you have requested, but I'm not yet ready to resubmit the patches. I'll be out of the office until next week and I will submit updated versions then. Thanks!
--
Jonathan Swinney
On 4/15/22, 11:45 AM, "ffmpeg-devel on behalf of Michael Niedermayer" <ffmpeg-devel-bounces at ffmpeg.org on behalf of michael at niedermayer.cc> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote:
> - ff_pix_abs16_neon
> - ff_pix_abs16_xy2_neon
>
> In direct micro benchmarks of these ff functions verses their C implementations,
> these functions performed as follows on AWS Graviton 2:
>
> ff_pix_abs16_neon:
> c: benchmark ran 100000 iterations in 0.955383 seconds
> ff: benchmark ran 100000 iterations in 0.097669 seconds
>
> ff_pix_abs16_xy2_neon:
> c: benchmark ran 100000 iterations in 1.916759 seconds
> ff: benchmark ran 100000 iterations in 0.370729 seconds
>
> Signed-off-by: Jonathan Swinney <jswinney at amazon.com>
> ---
> libavcodec/aarch64/Makefile | 2 +
> libavcodec/aarch64/me_cmp_init_aarch64.c | 39 +++++
> libavcodec/aarch64/me_cmp_neon.S | 209 +++++++++++++++++++++++
> libavcodec/me_cmp.c | 2 +
> libavcodec/me_cmp.h | 1 +
> libavcodec/x86/me_cmp.asm | 7 +
> libavcodec/x86/me_cmp_init.c | 3 +
> tests/checkasm/Makefile | 2 +-
> tests/checkasm/checkasm.c | 1 +
> tests/checkasm/checkasm.h | 1 +
> tests/checkasm/motion.c | 155 +++++++++++++++++
> 11 files changed, 421 insertions(+), 1 deletion(-)
> create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c
> create mode 100644 libavcodec/aarch64/me_cmp_neon.S
> create mode 100644 tests/checkasm/motion.c
>
[...]
> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
> index ad06d485ab..f73b9f9161 100644
> --- a/libavcodec/x86/me_cmp.asm
> +++ b/libavcodec/x86/me_cmp.asm
> @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX:
>
> HSUM m0, m1, eax
> and rax, 0xFFFF
> + emms
> ret
>
> hadamard8_16_wrapper 0, 14
> @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h
>
> HADDD m7, m1
> movd eax, m7 ; return value
> + emms
> RET
> %endmacro
on which arm chip did you test this ?
[...]
> diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
> index 9af911bb88..b330868a38 100644
> --- a/libavcodec/x86/me_cmp_init.c
> +++ b/libavcodec/x86/me_cmp_init.c
> @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy,
> : "r" (stride), "m" (h)
> : "%ecx");
>
> + emms_c();
> +
> return tmp & 0xFFFF;
> }
> #undef SUM
> @@ -418,6 +420,7 @@ static inline int sum_mmx(void)
> "paddw %%mm0, %%mm6 \n\t"
> "movd %%mm6, %0 \n\t"
> : "=r" (ret));
> + emms_c();
> return ret & 0xFFFF;
> }
hmmm
Also before the patch
checkasm: all 6153 tests passed
after it
checkasm: all 3198 tests passed
thats on a x86-64
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
More information about the ffmpeg-devel
mailing list