[FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm
Rémi Denis-Courmont
remi at remlab.net
Fri Dec 22 17:34:45 EET 2023
Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit :
> func ff_decorrelate_sm_rvv, zve32x
> 1:
> vsetvli t0, a2, e32, m8, ta, ma
> vle32.v v8, (a1)
> sub a2, a2, t0
> vle32.v v0, (a0)
> vssra.vi v8, v8, 1
> vsub.vv v16, v0, v8
> vse32.v v16, (a0)
> sh2add a0, t0, a0
> vadd.vv v16, v0, v8
> vse32.v v16, (a1)
> sh2add a1, t0, a1
> bnez a2, 1b
> ret
> endfunc
>
> Is this way? In this situation, or when using vsra, there will be some
> tests that fail, and the result value differs by 1. I'm not sure where the
> problem..
No, I meant something like this, but it turns out slightly slower anyway.
Saving the data dependency is not worth adding an instruction.
func ff_decorrelate_sm_rvv, zve32x
csrwi vxrm, 0
1:
vsetvli t0, a2, e32, m8, ta, ma
vle32.v v8, (a1)
sub a2, a2, t0
vle32.v v0, (a0)
vsra.vi v16, v8, 1
vssra.vi v8, v8, 1
vsub.vv v16, v0, v16
vadd.vv v8, v0, v8
vse32.v v16, (a0)
sh2add a0, t0, a0
vse32.v v8, (a1)
sh2add a1, t0, a1
bnez a2, 1b
ret
endfunc
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list