[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Dan Parrot dan.parrot at mail.com
Wed Jul 6 15:28:27 EEST 2016


On Wed, 2016-07-06 at 09:07 +0200, Hendrik Leppkes wrote:
> On Wed, Jul 6, 2016 at 4:37 AM, Dan Parrot <dan.parrot at mail.com> wrote:
> > Finish providing SIMD versions for POWER8 VSX of functions in libswscale/input.c That should allow trac ticket #5570 to be closed.
> > The speedups obtained for the functions are:
> >
> > abgrToA_c               1.19
> > bgr24ToUV_c             1.23
> > bgr24ToUV_half_c        1.37
> > bgr24ToY_c_vsx          1.43
> > nv12ToUV_c              1.05
> > nv21ToUV_c              1.06
> > planar_rgb_to_uv        1.25
> > planar_rgb_to_y         1.26
> > rgb24ToUV_c             1.11
> > rgb24ToUV_half_c        1.10
> > rgb24ToY_c              0.92
> > rgbaToA_c               0.88
> > uyvyToUV_c              1.05
> > uyvyToY_c               1.15
> > yuy2ToUV_c              1.07
> > yuy2ToY_c               1.17
> > yvy2ToUV_c              1.05
> 
> SIMD implementations that in the best case improve the speed by 43%
> (and in some cases is *slower*) seem barely worth it. One would expect
> a proper SIMD implementation to offer 100% or higher increases, at
> least thats the general expectation on x86 with SSE/AVX.
It sounds like you have either forgotten or never learned a very basic
principle of computer architecture. I recommend the text by Patterson
and Hennessey. The principle is Amdahl's Law. Before you start throwing
numbers around, make sure you understand what was being parallelized.

> So the question here is - is thats VSX being bad, or the intrinsics
> being bad? How would the speedup be in proper hand-written ASM?
> If hand-written ASM can give us the usual 100-200% improvements we would
> expect from SIMD, then this is what should generally be favored.
I am not got to write assembly just so you get a nice fuzzy feeling. If that's a deal-breaker, so be it.

> Also, one further thought:
> From the commit message, it sounds like you might only be doing this
> for the bounty in #5570, do you plan to maintain these optimizations
> in the future?

Unless you are a mind reader, STFU about my motivation in writing code.

One other thing: why didn't this come up when the earlier patch was
submitted and applied?



More information about the ffmpeg-devel mailing list