[FFmpeg-trac] #7690(undetermined:new): FFmpeg QSV decode + VPP performance is just a fraction of what one gets with VA-API and MediaSDK

FFmpeg trac at avcodec.org
Wed Oct 16 11:18:03 EEST 2019

#7690: FFmpeg QSV decode + VPP performance is just a fraction of what one gets
with VA-API and MediaSDK
             Reporter:  eero-t       |                    Owner:
                 Type:  defect       |                   Status:  new
             Priority:  normal       |                Component:
                                     |  undetermined
              Version:  git-master   |               Resolution:
             Keywords:  qsv          |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |

Comment (by fulinjie):

 Hi eero-t:

 The performance evaluation may be a bit confused and there is a related
 discussion in MSDK about this performance issue:

 > With 8-bit 1920x540 HEVC decode, QSV is clearly faster than VA-API:
 > {{{
 > ffmpeg  -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v hevc_qsv -i
 1920x540_60_yuv420p_4800.h265 -f null -
 > ...
 > ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i
 1920x540_60_yuv420p_4800.h265 -y -f null -
 > }}}

 Above command line is not fair.

 For VAAPI, "-f null -" means no copy from video surface to system memory.
 For QSV, even if "-f null -" is set, there is memory copy from video
 surface to system memory internally in MSDK:

 1 ) app initializes MSDK to produce system memory. MSDK internally decodes
 to video memory and then internally
 2) makes copy from video memory to system memory. It can be done by sw
 or GPUCopy. Application
 3) gets system memory.

 That's the root cause for
 >* Resolution impacts whether doing (HEVC) decoding is slower with QSV or
 VA-API backends
 > * In larger resolutions, VPP operations with QSV backend are slower than
 with VA-API
  (VAAPI without copy, but QSV with copy)

 The performance gap is related with copy video memory to system memory.

 For qsv, the best performance may be
 1. gpucopy for Tile surface data(like nv12)

 2. hwmap=mode=direct(if possible) for Linear surface data, derive data in
 the surface and use it directly to avoid any memory copy.

 3. hwdownload

 You can compare the results of exactly output to /dev/null to evalute the

Ticket URL: <https://trac.ffmpeg.org/ticket/7690#comment:15>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker

More information about the FFmpeg-trac mailing list