[FFmpeg-devel] [PATCH 2/2] lavc/flacdsp: implement wasted32 DSP function for VSX on POWER

Sean McGovern gseanmcg at gmail.com
Sat Jul 6 23:00:47 EEST 2024


Hi,


On Thu, Jul 4, 2024, 13:54 Rémi Denis-Courmont <remi at remlab.net> wrote:

> Le torstaina 4. heinäkuuta 2024, 19.26.19 EEST Sean McGovern a écrit :
> > Is that correlated with the comment above re: len? Or is it more general
> > that I should unroll until I've exhausted the available vector registers?
>
> You should unroll if it improves bandwidth.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>

After adding a 2nd set of load/left shift/store it was diminishing/no
returns for more unrolling. I'll send the updated version later.

Does wasted32 (and I guess wasted33 by proxy) not have to worry about loops
tails? I noticed the other vectorized versions don't do anything special in
that regard.

-- Sean McGovern

>


More information about the ffmpeg-devel mailing list