[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans
Rémi Denis-Courmont
remi at remlab.net
Mon Dec 4 17:15:41 EET 2023
Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > Probably missing VLENB checks.
>
> Changed.
>
> > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add
> > then shift, and by 17 with shift then add. You don't need multiplications.
>
> Changed.
>
> > Do you really need to splat? Can't .vx or .wx be used instead?
>
> Okay, for example in ff_vc1_inv_trans_8x8_dc_rvv
>
> + vsetvli zero, t0, e8, m2, ta, ma
> + vwaddu.vx v4, v0, zero
> + vsetvli zero, t0, e16, m4, ta, ma
> + vadd.vx v4, v4, t2
> - vsetvli zero, t0, e16, m4, ta, ma
> - vmv.v.x v4, t2
> - vsetvli zero, t0, e8, m2, ta, ma
> - vwaddu.wv v4, v4, v0
>
> But the speed has slowed down slightly on the c910,
> I'm not sure if I should modify it.
OK, unfortunately, there is no widening addition with wide scalar operand. But
you can do zero-extension then addition here. In the end, I doubt that you can
reasonably optimise whilst working with a C910-based board. This function
deviates too much on non-conformant hardware.
--
レミ・デニ-クールモン
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list