[FFmpeg-trac] #5568(swscale:open): POWER8 VSX vectorization libswscale/swscale.c

Tue May 7 10:22:17 EEST 2019

#5568: POWER8 VSX vectorization libswscale/swscale.c
-------------------------------------+-----------------------------------
             Reporter:  edelsohn     |                    Owner:
                 Type:  enhancement  |                   Status:  open
             Priority:  wish         |                Component:  swscale
              Version:  git-master   |               Resolution:
             Keywords:  bounty vsx   |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-----------------------------------

Comment (by cand):

 The swscale.c funcs with x86 versions now have ppc versions.

 Speedups:
 hyscale_fast: 4.27
 hcscale_fast: 4.48 (x86 MMX is 4.8)
 hScale8To19: 2.26 (x86 SSE2 is 2.32)
 hScale16To19: 2 (x86 SSE2 is 2.37)
 hScale16To15: 2.06

 We're within a few percent of the x86 versions. I think this is a good
 result, since the ppc code is generic for all filter sizes, while x86 goes
 to lengths to get the best performance. The _fast MMX versions use runtime
 generated in-memory code, while the SSE2 hscale funcs have hardcoded
 versions for specific filter sizes (one of which was hit by my test case -
 it's possible the generic SSE2 version is slower than the generic ppc).

 I didn't see a need to touch the one existing ppc hscale func mentioned in
 the above comments. It was already fast, and wouldn't benefit from the
 newer instructions. It already uses the VSX unaligned loads on VSX
 platforms, so it's not Altivec-only.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/5568#comment:40>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker