[FFmpeg-devel] [RFC] flac_wasted32 vector implementation for VSX on ppc64le

Thu Jun 27 01:01:07 EEST 2024

Hi,

On Thu, Jun 6, 2024, 12:51 Sean McGovern <gseanmcg at gmail.com> wrote:

>
>
> On Thu, Jun 6, 2024, 05:53 Rémi Denis-Courmont <remi at remlab.net> wrote:
>
>>
>>
>> Le 6 juin 2024 10:43:05 GMT+03:00, Sean McGovern <gseanmcg at gmail.com> a
>> écrit :
>> >Hi,
>> >
>> >Attached inline is a _non-working_ implementation of flac_wasted32 for
>> >VSX developed on a POWER9 in little-endian mode but probably just as
>> >usable on POWER{8,10}.
>> >
>> >I'm not sure why probably one of the simplest DSP functions in lavc
>> >does not work for me, I imagine this is probably something endian
>> >related even though IBM's documentation for vec_sl()[1] does not
>> >suggest any.
>>
>> Mixing up bytes and elements in the iterator. But you should be able to
>> track this down with gdb or good ol' printf().
>>
>> >Here's my code:
>> >
>> >#define VSX_STRIDE 16
>> >
>> >void ff_flac_wasted32_vsx(int32_t *decoded, int wasted, int len)
>> >{
>> >   register vec_s32 vec1;
>> >   register vec_u32 vec2 = { wasted, wasted, wasted, wasted };
>>
>> There should be an instruction to splat a scalar to a vector. Better yet
>> use vector-scalar shift, if VSX has it.
>>
>
> In the POWER ISA, vec_splat() only accepts an immediate, so I think this
> is the only way to do it in flac_wasted32.
>
>
>> >   register vec_s32 shifted;
>> >
>> >   for (int i = 0; i < len; i += VSX_STRIDE) {
>> >       vec1 = vec_vsx_ld(i, decoded);
>> >       shifted = vec_sl(vec1, vec2);
>> >       vec_vsx_st(shifted, i, decoded);
>> >   }
>> >}
>> >
>> >Anyone with experience with AltiVec or VSX see something obvious I am
>> missing?
>> >
>> >-- Sean McGovern
>> >
>> >[1]
>> https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-sl
>> >_______________________________________________
>> >ffmpeg-devel mailing list
>> >ffmpeg-devel at ffmpeg.org
>> >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> >
>> >To unsubscribe, visit link above, or email
>> >ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>> >
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
>
I feel the need to correct myself here: it turns out there is a way --
vec_splat() only accepts an immediate but vec_splats()[1] is what I need
instead.

Thanks for the tips, I have a working version of wasted32 for VSX now. I'll
tackle wasted33 next and then submit them up.

-- Sean McGovern

[1]
https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-splats