[FFmpeg-devel] have some major changes for nvenc support

Agatha Hu ahu at nvidia.com
Thu Nov 5 09:23:04 CET 2015


Recently Nvidia did some work on improving nvenc performance, it 
includes lots of change so I attach the patch instead of direct send.

Here are the explanations:
1) The first main change is adding an nvresize filter (1:N, one input, 
multiple outputs) to do hardware resizing, because during our interal 
1:N encoding test, we found swscale becomes bottleneck. So we use cuda 
kernel instead.

2) We use AVFrame::opaque field to store a customized ffnvinfo struture 
to prevent expensive CPU<->GPU transferration. Without it, the workflow 
will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy 
to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes:
CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding.
Our strategy is to check whether AVFrame::opaque is not null AND its 
first 128 bytes matches some particular GUID. If so, AVFrame::opaque is 
a valid ffnvinfo struture and we read GPU address directly from it 
instead of copying data from AVFrame.
Nvresize filter has a -readback parameter, if it's set as 0, resized 
result won't be copied back to CPU, mostly in case it's connected to an 
NVENC encoder。 If it's set as 1, resized result will still be copied 
back to AVFrame so that it could be compatible with other components.

3) Because we are using CUDA address now, input buffer becomes CUDA 
external memory. We replaced NvEncCreateInputBuffer to 
cuMemAllocPitch+NvEncRegisterInputBuffer, and 
NvEncLock/UnlockInputBuffer to NvEncMap/UnmapInputBuffer.

4) And because of using cuda input, it exposed some driver bugs, e.g. 
nvenc generates corrupted chroma plane data if buffer format is YUV420p. 
Bug-fixed driver will soon be released, but considering backwards 
compatibility we decided to convert YUV420P to NV12 explicitly by a cuda 
kernel in nvenc.c. Even in the bug-fixed driver, there's still a 
YUV420P->NV12 conversion kernel. The only difference is that kernel is 
provided along with driver, but here we did it within nvenc.c.
The same reason, YUV444P is removed temporarily, there's a bug for cuda 
input. Once the fix is released, we should enable the support again.
We choose to backwards support YUV420p is because it's much more popular 
than YUV444P.

5) Last is, we move most of cuda typedefs/functions/helpers to cudautils.h/c

A typical use case is:
ffmpeg -y -i $1 $2 $3 -filter_complex \


         -map [out0] -an -vcodec nvenc_h264 -preset slow -profile:v main 
-async 1 -b:v 200M -bufsize 200M -maxrate 200M -refs 1 -bf 2 $1_1080p.mp4 \

         -map [out1] -an -vcodec nvenc_h264 -preset slow -profile:v main 
-async 1 -b:v 100M -bufsize 100M -maxrate 100M -refs 1 -bf 2 $1_720p.mp4 \

         -map [out2] -an -vcodec nvenc_h264 -preset slow -profile:v main 
-async 1 -b:v  50M -bufsize  50M -maxrate  50M -refs 1 -bf 2 $1_480p.mp4 \

         -map [out3] -an -vcodec nvenc_h264 -preset slow -profile:v main 
-async 1 -b:v  25M -bufsize  25M -maxrate  25M -refs 1 -bf 2 $1_wvga.mp4 \

         -map [out4] -an -vcodec nvenc_h264 -preset slow -profile:v main 
-async 1 -b:v  10M -bufsize  10M -maxrate  10M -refs 1 -bf 2 $1_cif.mp4

Agatha Hu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-combined-cuda-resize-yuv420-fix-remove-yuv444-add-AQ_v6.0.patch
Type: text/x-patch
Size: 108910 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151105/2f9dde51/attachment.bin>

More information about the ffmpeg-devel mailing list