[FFmpeg-devel] [PATCH 3/6] lavc/qsv: Enable hwaccel qsv_vidmem.
Nablet Developer
sdk at nablet.com
Wed Sep 14 20:33:20 EEST 2016
>> ffmpeg_qsv.c | 636 +++++++++++++++++++++++++++++++++++++++++++++-
>> libavcodec/qsv.h | 3 +
>> libavcodec/qsv_internal.h | 2 +
>> libavcodec/qsvdec.c | 5 +-
>> libavcodec/qsvenc.c | 2 +
>> 8 files changed, 649 insertions(+), 5 deletions(-)
>>
>
> This is a giant patch that doesnt even begin to describe what it does.
> So, whats it good for? We can already do transcoding of video from QSV
> decoder to QSV encoder all in GPU memory without 600+ lines of new
> code. Admittedly it currently has a few issues, but those could be
> fixed, but why do we need 600 new lines of code?
1. In GPU level, all frames are processed in tiled mode (we called
video memory mode) which cannot be read/write by cpu directly. The frame
buffer should be allocated via vaCreateSurface. Any non-tiled memory
must be copied to tiled memory when using GPU acceleration. The copying
task is done by MediaSDK internally.
2. In current implementation, frame buffer is allocated by ffmpeg
in linear mode (we called system memory) ; QSV deocder’s output and QSV
encoder’s input are all set to video memory mode ( e.g. iopattern =
MFX_IOPATTERN_OUT_SYSTEM_MEMORY in qsv decoder); so there are 2 memory
copy processes in mediaSDK: one is copying from video_memory to system
memory when output from HW decoder, another is copying from system
memory to video memory when feeding to HW encoder. It will decrease
transcoding performance greatly, especially for high resolution such as
1080 & 4K.
3. The patches are avoiding such additional memory copy when all
modules in transcoding pipeline can be accelerated by GPU. To achieving
it, iopattern must be set to video_memory, and an external allocator
must be implemented as mediaSDK requirements, and set it to QSV codec.
Most of the 600 lines in the patches are the code to implement the
external allocator. At the same time, the patches also add some code to
checking whether all modules in transcoding pipeline can be accelerated
by GPU or not, so that transcoder can select video-memory or
system-memory automatically.
4. As our test, the transcoding performance can be improved about
20% or more according to resolution with patches. And it can reach the
performance which is declared in QSV specification.
More information about the ffmpeg-devel
mailing list