[FFmpeg-devel] [PATCH 3/6] lavc/qsv: Enable hwaccel qsv_vidmem.

Wed Sep 14 20:33:20 EEST 2016

>>  ffmpeg_qsv.c              | 636 +++++++++++++++++++++++++++++++++++++++++++++-
>>  libavcodec/qsv.h          |   3 +
>>  libavcodec/qsv_internal.h |   2 +
>>  libavcodec/qsvdec.c       |   5 +-
>>  libavcodec/qsvenc.c       |   2 +
>>  8 files changed, 649 insertions(+), 5 deletions(-)
>>
>
> This is a giant patch that doesnt even begin to describe what it does.
> So, whats it good for? We can already do transcoding of video from QSV
> decoder to QSV encoder all in GPU memory without 600+ lines of new
> code. Admittedly it currently has a few issues, but those could be
> fixed, but why do we need 600 new lines of code?

1.      In GPU level, all frames are processed in tiled mode (we called 
video memory mode) which cannot be read/write by cpu directly. The frame 
buffer should be allocated via vaCreateSurface. Any non-tiled memory 
must be copied to tiled memory when using GPU acceleration. The copying 
task is done by MediaSDK internally.

2.      In current implementation, frame buffer is allocated by ffmpeg 
in linear mode (we called system memory) ; QSV deocder’s output and QSV 
encoder’s input are all set to video memory mode ( e.g. iopattern  = 
MFX_IOPATTERN_OUT_SYSTEM_MEMORY in qsv decoder); so there are 2 memory 
copy processes in mediaSDK: one is copying from video_memory to system 
memory when output from HW decoder, another is copying from system 
memory to video memory when feeding to HW encoder. It will decrease 
transcoding performance greatly, especially for  high resolution such as 
1080 & 4K.

3.      The patches are avoiding such additional memory copy when all 
modules in transcoding pipeline can be accelerated by GPU. To achieving 
it, iopattern must be set to video_memory, and an external allocator 
must be implemented as mediaSDK requirements, and set it to QSV codec. 
Most of the 600 lines in the patches are the code to implement the 
external allocator. At the same time, the patches also add some code to 
checking whether all modules in transcoding pipeline can be accelerated 
by GPU or not, so that transcoder can select video-memory or 
system-memory automatically.

4.      As our test, the transcoding performance can be improved about 
20% or more according to resolution with patches. And it can reach the 
performance which is declared in QSV specification.