[FFmpeg-trac] #5568(swscale:open): POWER8 VSX vectorization libswscale/swscale.c
FFmpeg
trac at avcodec.org
Tue May 7 10:22:17 EEST 2019
#5568: POWER8 VSX vectorization libswscale/swscale.c
-------------------------------------+-----------------------------------
Reporter: edelsohn | Owner:
Type: enhancement | Status: open
Priority: wish | Component: swscale
Version: git-master | Resolution:
Keywords: bounty vsx | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-----------------------------------
Comment (by cand):
The swscale.c funcs with x86 versions now have ppc versions.
Speedups:
hyscale_fast: 4.27
hcscale_fast: 4.48 (x86 MMX is 4.8)
hScale8To19: 2.26 (x86 SSE2 is 2.32)
hScale16To19: 2 (x86 SSE2 is 2.37)
hScale16To15: 2.06
We're within a few percent of the x86 versions. I think this is a good
result, since the ppc code is generic for all filter sizes, while x86 goes
to lengths to get the best performance. The _fast MMX versions use runtime
generated in-memory code, while the SSE2 hscale funcs have hardcoded
versions for specific filter sizes (one of which was hit by my test case -
it's possible the generic SSE2 version is slower than the generic ppc).
I didn't see a need to touch the one existing ppc hscale func mentioned in
the above comments. It was already fast, and wouldn't benefit from the
newer instructions. It already uses the VSX unaligned loads on VSX
platforms, so it's not Altivec-only.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/5568#comment:40>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list