[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.
Dan Parrot
dan.parrot at mail.com
Wed Jul 6 15:28:27 EEST 2016
On Wed, 2016-07-06 at 09:07 +0200, Hendrik Leppkes wrote:
> On Wed, Jul 6, 2016 at 4:37 AM, Dan Parrot <dan.parrot at mail.com> wrote:
> > Finish providing SIMD versions for POWER8 VSX of functions in libswscale/input.c That should allow trac ticket #5570 to be closed.
> > The speedups obtained for the functions are:
> >
> > abgrToA_c 1.19
> > bgr24ToUV_c 1.23
> > bgr24ToUV_half_c 1.37
> > bgr24ToY_c_vsx 1.43
> > nv12ToUV_c 1.05
> > nv21ToUV_c 1.06
> > planar_rgb_to_uv 1.25
> > planar_rgb_to_y 1.26
> > rgb24ToUV_c 1.11
> > rgb24ToUV_half_c 1.10
> > rgb24ToY_c 0.92
> > rgbaToA_c 0.88
> > uyvyToUV_c 1.05
> > uyvyToY_c 1.15
> > yuy2ToUV_c 1.07
> > yuy2ToY_c 1.17
> > yvy2ToUV_c 1.05
>
> SIMD implementations that in the best case improve the speed by 43%
> (and in some cases is *slower*) seem barely worth it. One would expect
> a proper SIMD implementation to offer 100% or higher increases, at
> least thats the general expectation on x86 with SSE/AVX.
It sounds like you have either forgotten or never learned a very basic
principle of computer architecture. I recommend the text by Patterson
and Hennessey. The principle is Amdahl's Law. Before you start throwing
numbers around, make sure you understand what was being parallelized.
> So the question here is - is thats VSX being bad, or the intrinsics
> being bad? How would the speedup be in proper hand-written ASM?
> If hand-written ASM can give us the usual 100-200% improvements we would
> expect from SIMD, then this is what should generally be favored.
I am not got to write assembly just so you get a nice fuzzy feeling. If that's a deal-breaker, so be it.
> Also, one further thought:
> From the commit message, it sounds like you might only be doing this
> for the bounty in #5570, do you plan to maintain these optimizations
> in the future?
Unless you are a mind reader, STFU about my motivation in writing code.
One other thing: why didn't this come up when the earlier patch was
submitted and applied?
More information about the ffmpeg-devel
mailing list