[Libav-user] gcc auto-vectorisation

"René J.V. Bertin" rjvbertin at gmail.com
Wed Feb 27 17:31:37 CET 2013


I've added a benchmarking option to ffmpeg, allowing to break down the time spent in various stages without dumping everything immediately as -benchmark_all does. The results are quite educative. It turns out that there is actually a slight penalty to running auto-vectorised code (probably not large enough to be due to non-optimal vectorisation due to alignment assumptions not being met). There's also a huge chunk of work that's not being benchmarked at all - and it is multithreaded. I haven't gone so far as to delve into ffmpeg.c to figure out what it corresponds to, though. The format conversion(s), maybe?

For me this settles the question: better stick to not using auto-vectorisation esp. since it causes a few tests to fail.

I have yet to test my modifications on MS Windows but I'd be willing to post a patch for this option (but also to admit it'd annoy me to have to adapt my cross-platform HR timing routines to ffmpeg naming conventions :( )

Benchmark results (intended to focus on decoding for playback; I'm surprised encoding to rawvideo is so expensive):
> time /usr/local/FFmpeg/trunk/bin/ffmpeg-rjvb -benchmark_most -y -v 0 -i ~/Desktop/Downloads/SOA4ep11.flv -pix_fmt argb -vcodec rawvideo -acodec pcm_f32le -f mov /dev/null; \
time /usr/local/FFmpeg/trunk.vect/bin/ffmpeg-rjvb -benchmark_most -y -v 0 -i ~/Desktop/Downloads/SOA4ep11.flv -pix_fmt argb -vcodec rawvideo -acodec pcm_f32le -f mov /dev/null ; \
time /usr/local/FFmpeg/trunk.O0/bin/ffmpeg-rjvb -benchmark_most -y -v 0 -i ~/Desktop/Downloads/SOA4ep11.flv -pix_fmt argb -vcodec rawvideo -acodec pcm_f32le -f mov /dev/null ; \
time /usr/local/FFmpeg/trunk.O0vect/bin/ffmpeg-rjvb -benchmark_most -y -v 0 -i ~/Desktop/Downloads/SOA4ep11.flv -pix_fmt argb -vcodec rawvideo -acodec pcm_f32le -f mov /dev/null
Detailed benchmark results: (32 bit, MMX/SSE code, -fno-tree-vectorize)
                   samples          user t        kernel t          real t           CPU %
Video decode  :      85166         27.0846s        2.48361s        13.5333s        218.484%
Audio decode  :     152971         10.5851s       0.189161s        4.71418s         228.55%
Video encode  :      85164         38.3081s       0.304017s        19.4358s        198.665%
Audio encode  :     152969         1.12343s       0.141641s       0.581738s        217.465%
Failed loops  :          1               0s          1e-06s     8.64995e-07s       115.608%
Weighed totals:   476271/5         15.4539s       0.604725s        7.59638s        211.398%
Overall execution timing:
              :           1         233.666s        6.46592s        108.363s          221.6%
233.673 user_cpu 6.472 kernel_cpu 1:48.37 total_time 221.5%CPU {0W 0X 0D 0K 21553152M 37F 12625R 0I 0O 0r 0s 0k 0w 213203c}
Detailed benchmark results: (32 bit, MMX/SSE code, -ftree-vectorize)
                   samples          user t        kernel t          real t           CPU %
Video decode  :      85166         27.9066s        2.62058s        13.9246s        219.232%
Audio decode  :     152971         11.0481s       0.201142s         4.9342s        227.985%
Video encode  :      85164         40.3674s        0.33645s        20.4187s        199.346%
Audio encode  :     152969         1.23643s       0.150545s       0.602971s        230.023%
Failed loops  :          1               0s              0s      1.012e-06s              0%
Weighed totals:   476271/5         16.1541s       0.641726s        7.91958s        212.079%
Overall execution timing:
              :           1          246.41s         6.8878s        114.681s        220.872%
246.418 user_cpu 6.894 kernel_cpu 1:54.69 total_time 220.8%CPU {0W 0X 0D 0K 21592064M 0F 12679R 0I 0O 0r 0s 0k 0w 216849c}
Detailed benchmark results: (64 bit, no MMX/SSE code, -fno-tree-vectorize)
                   samples          user t        kernel t          real t           CPU %
Video decode  :      85166         199.297s        3.36215s        49.5899s         408.67%
Audio decode  :     152971         29.3016s        0.32553s        8.54242s        346.823%
Video encode  :      85164         73.5307s       0.530001s        23.1734s        319.594%
Audio encode  :     152969         2.49674s       0.203737s       0.718678s        375.756%
Failed loops  :          1           1e-06s          1e-06s       1.07e-06s        186.915%
Weighed totals:   476271/5         58.9994s       0.865978s        15.9858s         374.49%
Overall execution timing:
              :           1         535.404s        9.23317s        155.785s        349.607%
535.408 user_cpu 9.239 kernel_cpu 2:35.79 total_time 349.5%CPU {0W 0X 0D 0K 22816768M 220F 13492R 0I 0O 0r 0s 0k 0w 470931c}
Detailed benchmark results: (64 bit, no MMX/SSE code, -ftree-vectorize)
                   samples          user t        kernel t          real t           CPU %
Video decode  :      85166         213.686s        3.43406s        53.0596s        409.201%
Audio decode  :     152971         30.5476s       0.328917s        8.79987s        350.874%
Video encode  :      85164           74.08s        0.51521s        23.2998s        320.153%
Audio encode  :     152969         2.47226s       0.202298s       0.745479s        358.771%
Failed loops  :          1               0s          1e-06s     1.24599e-06s       80.2573%
Weighed totals:   476271/5         62.0631s       0.876818s        16.7202s         376.43%
Overall execution timing:
              :           1         558.639s         9.3095s        160.509s        353.842%
558.643 user_cpu 9.318 kernel_cpu 2:40.51 total_time 353.8%CPU {0W 0X 0D 0K 22867968M 195F 13538R 0I 0O 0r 0s 0k 0w 480915c}

The test video:
> /usr/local/FFmpeg/trunk/bin/ffprobe ~/Desktop/Downloads/SOA4ep11.flv
ffprobe version N-50309-gaf0e814 Copyright (c) 2007-2013 the FFmpeg developers
  built on Feb 25 2013 19:48:25 with gcc 4.7.2 (MacPorts gcc47 4.7.2_2+universal)
  configuration: --prefix=/usr/local/FFmpeg/trunk --target-os=darwin --enable-shared --enable-static --enable-gpl --enable-nonfree --enable-libfreetype --enable-pthreads --enable-yasm --disable-doc --cpu=core2 --enable-debug=1 --disable-stripping --enable-ffmpeg --enable-ffprobe --disable-ffplay --enable-hwaccels --enable-libx264 --cc=gcc-mp-4.7 --disable-outdev=sdl
  libavutil      52. 17.103 / 52. 17.103
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.100 / 54. 63.100
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 41.100 /  3. 41.100
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, flv, from '/Users/bertin/Desktop/Downloads/SOA4ep11.flv':
  Metadata:
    canSeekToEnd    : false
    hasCuePoints    : false
    hasVideo        : true
    videosize       : 101806465
    lasttimestamp   : 3552
    hasMetadata     : true
    hasKeyframes    : true
    metadatacreator : inlet media FLVTool2 v1.0.6 - http://www.inlet-media.de/flvtool2
    hasAudio        : true
    audiodelay      : 0
    lastkeyframetimestamp: 3539
    datasize        : 139300867
    audiosize       : 37480392
  Duration: 00:59:11.99, start: 0.042000, bitrate: 315 kb/s
    Stream #0:0: Video: h264 (High), yuv420p, 624x352 [SAR 1:1 DAR 39:22], 232 kb/s, 23.98 tbr, 1k tbn, 47.95 tbc
    Stream #0:1: Audio: aac, 44100 Hz, stereo, fltp, 82 kb/s



More information about the Libav-user mailing list