[FFmpeg-devel] [PATCH] hwcontext_vaapi: use the special UC copy for downloading, frames.

Jun Zhao mypopydev at gmail.com
Wed Apr 12 05:19:26 EEST 2017



On 2017/4/12 5:00, Mark Thompson wrote:
> On 11/04/17 12:26, Mark Thompson wrote:
>> On 11/04/17 08:30, Jun Zhao wrote:
>>> From 9bab458006369f427fa2f4c6248ee89329e81067 Mon Sep 17 00:00:00 2001
>>> From: Jun Zhao <jun.zhao at intel.com>
>>> Date: Tue, 11 Apr 2017 14:37:07 +0800
>>> Subject: [PATCH] hwcontext_vaapi: use the special UC copy for downloading
>>>  frames.
>>>
>>> used SSE4 UC function for copying image data from GPU mapped memory,
>>> see https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
>>>
>>> before this change, VA-API HWAccel decoder copy image data from GPU
>>> mapped memory used vaCreateImage/vaGetImage/av_frame_copy, now use
>>> vaDeriveImage/av_image_copy_uc_from.
>>>
>>> decoding a 3000 frames 1080p h264 stream in Intel(R) Core(TM)
>>> i5-6260U CPU @ 1.80GHz, the CPU usage and decode fps as follow:
>>>
>>> 1. Software decoder.
>>> ./ffmpeg -i ./skyfall2-trailer.mp4 -f null /dev/null
>>>
>>> CPU: 80%, fps: 334fps
>>>
>>> 2a. vaCreateImage/vaGetImage/av_frame_copy
>>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i skyfall2-trailer.mp4 -f null /dev/null
>>>
>>> CPU: 12%, fps: 147fps
>>>
>>> 2b. vaDeriveImage/av_image_copy_uc_from
>>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i skyfall2-trailer.mp4 -f null /dev/null
>>>
>>> CPU: 23%, fps: 628fps
>>>
>>> Signed-off-by: Jun Zhao <jun.zhao at intel.com>
>>> ---
>>
>> This change was considered in libav when the UC copy function was introduced (<https://lists.libav.org/pipermail/libav-devel/2016-August/078826.html>, <https://lists.libav.org/pipermail/libav-devel/2016-August/078825.html>), but was not in the end applied.
>>
>> The reasons for this were:
>>
>> * It had much worse performance on the low-power cores - try your benchmark above on Braswell.
> 
> Running on a Braswell N3700, input is 38072 frames of 1920x1080 H.264.
> 
> No download at all:        520fps,   52s CPU
> Before patch, 4 threads:   107fps,  237s CPU
> Before patch, 1 thread:     90fps,  233s CPU
> After patch, 4 threads:     30fps, 1294s CPU
> After patch, 1 thread:      28fps, 1305s CPU
> 
> 

I will try to reproduce this in BSW.

> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 


More information about the ffmpeg-devel mailing list