[FFmpeg-devel] [PATCH 2/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1
Ramiro Polla
ramiro.polla at gmail.com
Mon Aug 19 00:58:12 EEST 2024
On Sun, Aug 18, 2024 at 10:43 PM Martin Storsjö <martin at martin.st> wrote:
> On Sun, 18 Aug 2024, Ramiro Polla wrote:
>
> > A53 A76
> > pix_norm1_c: 519.2 231.5
> > pix_norm1_neon: 195.0 ( 2.66x) 44.2 ( 5.24x)
> > pix_sum_c: 344.5 242.2
> > pix_sum_neon: 119.0 ( 2.89x) 41.7 ( 5.81x)
> > ---
>
> Hmm, those speedups on the A53 look quite small. I guess that's because
> this isn't unrolled at all, as you mention. Especially for A53, I would
> expect unrolling to have a very large effect here. But it sounds weird if
> you say perf indicates that it is slower in real world use. Yes, unrolling
> does make the code use more space and makes the I-cache less efficient,
> but in this case it would only be a difference of like 2 instructions?
These are the checkasm benchmarks I got for the unrolled version with
manual instruction ordering to give better results on the A53 (patch
attached for reference):
A53 A76
pix_norm1_c: 519.0 231.7
pix_norm1_neon: 140.0 ( 3.71x) 41.5 ( 5.58x)
pix_norm1_dotprod: 17.2 (13.47x)
pix_sum_c: 347.2 242.0
pix_sum_neon: 72.0 ( 4.82x) 21.0 (11.52x)
I had tested the real world case on the A76, but not on the A53. I
spent a couple of hours with perf trying to find the source of the
discrepancy but I couldn't find anything conclusive. I need to learn
more about how to test cache misses.
I just tested again with the following command:
$ taskset -c 2 ./ffmpeg_g -benchmark -f lavfi -i
"testsrc2=size=1920x1080" -vcodec mpeg4 -q 31 -vframes 100 -f rawvideo
-y /dev/null
The entire test was about 1% faster unrolled on A53, but about 1%
slower unrolled on A76 (I had the Raspberry Pi 5 in mind for these
optimizations, so I preferred choosing the version that was faster on
the A76). I wonder if there is any way we could check at runtime. The
problem is also that I don't even know for certain what is causing
this.
> > diff --git a/libavcodec/aarch64/mpegvideoencdsp_neon.S b/libavcodec/aarch64/mpegvideoencdsp_neon.S
> > new file mode 100644
> > index 0000000000..89e50e29b3
> > --- /dev/null
> > +++ b/libavcodec/aarch64/mpegvideoencdsp_neon.S
> > @@ -0,0 +1,67 @@
> > +/*
> > + * Copyright (c) 2024 Ramiro Polla
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include "libavutil/aarch64/asm.S"
> > +
> > +function ff_pix_sum16_neon, export=1
> > +// x0 const uint8_t *pix
> > +// x1 int line_size
> > +
> > + sxtw x1, w1
> > + movi v0.16b, #0
> > + mov w2, #16
> > +
> > +1:
> > + ld1 { v1.16b }, [x0], x1
>
> Nit; we usually don't have these {} written with spaces inside of the
> braces, same below.
Oops, I should check my other neon code then...
> > + subs w2, w2, #1
> > + uadalp v0.8h, v1.16b
> > + b.ne 1b
> > +
> > + uaddlp v0.4s, v0.8h
> > + uaddlv d0, v0.4s
>
> Couldn't this be aggregated with just one instruction, "uaddlv s0, v0.8h"?
> There's no need to widen it to 64 bit as we're truncating the returned
> value to 32 bit anyway.
Yes, that works. I'll fix it in the next iteration.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aarch64_mpegvideoencdsp_unrolled.patch
Type: text/x-patch
Size: 11076 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20240818/364d61e8/attachment.bin>
More information about the ffmpeg-devel
mailing list