[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Dan Parrot dan.parrot at mail.com
Tue Jul 5 05:29:46 EEST 2016


On Mon, 2016-07-04 at 16:30 +0000, Carl Eugen Hoyos wrote:
> Dan Parrot <dan.parrot <at> mail.com> writes:
> 
> > > Did you test if using ffmpeg -benchmark -f rawvideo -i /dev/zero... 
> > > showed different results?
> > > I believe this should be both easier and faster to test.
> >
> > Sorry, I don't understand what that command line just above 
> > is trying to achieve. Could you elaborate?
> 
> Instead of running the whole fate suite that takes long and 
> does not test libswscale for most commands, just test an 
> ffmpeg command line that only tests libswscale:
> $ ffmpeg -benchmark -f rawvideo -pix_fmt rgb24 
> -i /dev/zero -pix_fmt yuv420p -f null -vframes 10000 -
$ ./ffmpeg -benchmark -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero
-pix_fmt yuv420p -f null -vframes 1000 -

frame= 1000 fps= 16 q=-0.0 Lsize=N/A time=00:00:40.00 bitrate=N/A
speed=0.632x    
video:477kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: unknown
bench: utime=62.794s
bench: maxrss=21184kB


> vs
> 
> $ ffmpeg -cpuflags 0 -benchmark -f rawvideo -pix_fmt rgb24 
> -i /dev/zero -pix_fmt yuv420p -f null -vframes 10000 -

$ ./ffmpeg -cpuflags 0 -benchmark -f rawvideo -pix_fmt rgb24 -s hd1080
-i /dev/zero -pix_fmt yuv420p -f null -vframes 1000 -

frame= 1000 fps= 12 q=-0.0 Lsize=N/A time=00:00:40.00 bitrate=N/A
speed=0.479x    
video:477kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: unknown
bench: utime=82.918s
bench: maxrss=21120kB

> [...]
> 
> > Surprisingly, gcc is producing some badly suboptimal assembly.
> 
> Just to make sure I don't misunderstand:
> Does this mean intrinsics are suboptimal to write assembly 
> code?
So, the latest version of GCC does produce more efficient assembly.

To recap: GCC 5.3.1 produces assembly that does not take full advantage
of PPC64 POWER8 SIMD instructions. GCC 6.1.1 is much better and produces
shorter sequences that do use SIMD assembly instructions.

> > > Can you confirm with START_TIMER / STOP_TIMER that there is no 
> > > gain?
> >
> > SystemTap probes provide identical functionality by measuring 
> > deltas between function entry and function return.
> 
> Sorry, I don't understand:
> Did you test with both methods to verify that they provide 
> the same results?

> Note that if it turns out that START_TIMER / STOP_TIMER 
> cannot be used on ppc64 (le) this would be important 
> information for us.
These start/stop macros are the last issue I have outstanding. I hope to
be done in a few hours.




More information about the ffmpeg-devel mailing list