[FFmpeg-devel] [PATCH 1/4] lavc/aarch64: new optimization for 8-bit hevc_epel_v
Martin Storsjö
martin at martin.st
Tue Oct 31 14:17:16 EET 2023
On Thu, 26 Oct 2023, Logan.Lyu wrote:
> And I missed submitting a commit that was earlier than these four commits,
> which caused the corrupted whitespace problem. Now I have recreated these
> patches.
>
> In addition, I rebased it to ensure that these patches can be successfully
> applied on the latest master branch.
>
> Please check again, thank you.
Thanks, now these was possibly to apply, and they looked mostly ok, so I
touched up the last details I noticed and pushed them.
Things I noticed and fixed before pushing:
A bunch of minor cosmetics, you had minor misindentations in a few places
(that were copypasted around in lots of places), that I fixed like this:
ld1 {v18.16b}, [x1], x2
.macro calc src0, src1, src2, src3
- ld1 {\src3\().16b}, [x1], x2
+ ld1 {\src3\().16b}, [x1], x2
movi v4.8h, #0
movi v5.8h, #0
calc_epelb v4, \src0, \src1, \src2, \src3
@@ -461,7 +461,7 @@ function ff_hevc_put_hevc_epel_v64_8_neon, export=1
.endm
1: calc_all16
.purgem calc
-2: ld1 {v8.8b-v11.8b}, [sp]
+2: ld1 {v8.8b-v11.8b}, [sp]
add sp, sp, #32
ret
The first patch, with mostly small trivial functions, can probably be
scheduled better for in-order cores. I'll send a patch if I can make them
measurably faster.
In almost every patch, you have loads/stores to the stack; you use the
fused stack decrement nicely everywhere possible, but for the loading,
you're almost always lacking the fused stack increment. I've fixed it now
for this patchset, but please do keep this in mind and fix it up before
submitting any further patches. I've fixed that up like this:
bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm)
- ldp x5, x30, [sp]
ldp x0, x3, [sp, #16]
- add sp, sp, #32
+ ldp x5, x30, [sp], #32
load_epel_filterh x5, x4
(In many places.)
In one place, you wrote below the stack pointer before decrementing it.
That's ok on OSes with a defined red zone, but we shouldn't need to assume
that; I've fixed that like this:
function ff_hevc_put_hevc_qpel_v48_8_neon, export=1
- stp x5, x30, [sp, #-16]
- stp x0, x1, [sp, #-32]
stp x2, x3, [sp, #-48]!
+ stp x0, x1, [sp, #16]
+ stp x5, x30, [sp, #32]
I'll push the patchset with these changes soon.
// Martin
More information about the ffmpeg-devel
mailing list