[FFmpeg-devel] [PATCH] avcodec/nvenc: Add support for H.265 encoding

Ali KIZIL alikizil at gmail.com
Thu Mar 26 22:41:46 CET 2015


Philip Langdale <philipl <at> overt.org> writes:

> 
> On 2015-03-26 04:30, Ali KIZIL wrote:
> > 
> > It works fine now Phil. One more comment:
> > 
> > I have a GTX 980. It can encode upto 30-33 fps for 4K 60fps YUV Raw
> > input file using nvenc_h265 avcodec with FFmpeg. First a side, It 
> > looked
> > to me like lack of performance of card. However; after I split the 
> > video
> > with crop filter into 2:
> > 
> > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i
> > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v
> > "crop=in_w:in_h/2:0:0" -r 50 -g 50 -preset hp -f hevc top.hevc
> > 
> > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i
> > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v
> > "crop=in_w:in_h/2:0:in_h/2" -r 50 -g 50 -preset hp -f hevc 
bottom.hevc
> > 
> > When I run them at the same time, both can be encoded with 50 fps. I
> > tried to joing output files with padding but FFmpeg needs re-
encoding
> > and it makes no sense.
> > 
> > Do you have any comment or idea to use full performance of the card 
> > over
> > a single ffmpeg nvenc_h265 instance ?
> > 
> > Additional note: GTX cards can suport up to 2 HEVC encoding at the 
same
> > time (as limitation.).
> 
> I honestly don't know. The hardware performance may not scale linearly 
> with
> frame size, so you might see a disproportionate slowdown past a 
certain 
> size,
> perhaps reflecting the need to use multiple buffers, etc.
> 
> Do you see any evidence that you're CPU bound? That might happen if 
our 
> buffer
> management is too inefficient, but I'd be surprised.
> 
> --phil
> 

CPU is fine. I have 2 x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz on 
server and Mem Total 49413456 kB, MemFree: 32030320 kB. So, mem is not 
an issue also. Here is top output on run:

top - 23:39:18 up 1 day, 21 min,  2 users,  load average: 0.08, 0.03, 
0.05
Tasks: 371 total,   3 running, 368 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu2  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu4  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu9  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu16 : 29.1 us, 20.3 sy,  0.0 ni, 50.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu17 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu22 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu23 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu24 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu25 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu29 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu30 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu31 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
KiB Mem:  49413456 total, 19607392 used, 29806064 free,   106188 buffers
KiB Swap: 50282492 total,        0 used, 50282492 free. 16826488 cached 
Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
COMMAND
 9563 root      20   0 70.432g 2.003g 1.948g R  49.2  4.3   0:08.02 
ffmpeg
  735 root      20   0       0      0      0 S   0.3  0.0   5:49.83 
blackmagic
 9600 root      20   0   22240   1844   1112 R   0.3  0.0   0:00.02 top
    1 root      20   0   33696   2960   1472 S   0.0  0.0   0:08.37 init


FFmpeg output is:

ffmpeg version N-71096-g2139e58 Copyright (c) 2000-2015 the FFmpeg 
developers
  built with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
  configuration: --prefix=/opt/ffmpeghw --extra-cflags=-
I/opt/ffmpeghw/include --extra-ldflags=-L/opt/ffmpeghw/lib --
bindir=/opt/ffmpeghw/bin --extra-libs=-ldl --enable-libx264 --enable-
libx265     --enable-libvpx --enable-libfdk-aac --enable-nonfree --
enable-gpl --enable-nvenc
  libavutil      54. 20.101 / 54. 20.101
  libavcodec     56. 30.100 / 56. 30.100
  libavformat    56. 26.101 / 56. 26.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 13.101 /  5. 13.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
[rawvideo @ 0x38051a0] Estimating duration from bitrate, this may be 
inaccurate
Input #0, rawvideo, from '/Projects/YUV/soccer.yuv':
  Duration: 00:01:53.74, start: 0.000000, bitrate: 681695 kb/s
    Stream #0:0: Video: rawvideo, 1 reference frame (I420 / 0x30323449), 
yuv420p, 3840x2160, 681672 kb/s, 50 tbr, 50 tbn, 50 tbc
[graph 0 input from stream 0:0 @ 0x38050e0] w:3840 h:2160 pixfmt:yuv420p 
tb:1/50 fr:50/1 sar:0/1 sws_param:flags=2
[auto-inserted scaler 0 @ 0x37f1160] w:iw h:ih flags:'0x4' interl:0
[format @ 0x37fa9c0] auto-inserting filter 'auto-inserted scaler 0' 
between the filter 'Parsed_null_0' and the filter 'format'
[auto-inserted scaler 0 @ 0x37f1160] w:3840 h:2160 fmt:yuv420p sar:0/1 -
> w:3840 h:2160 fmt:nv12 sar:0/1 flags:0x4
[nvenc_h265 @ 0x3807900] 1 CUDA capable devices found
[nvenc_h265 @ 0x3807900] [ GPU #0 - < GeForce GTX 980 > has Compute SM 
5.2, NVENC Available ]
[nvenc_h265 @ 0x3807900] Nvenc initialized successfully
SOME ADDITIONAL CODE FOR DEBUGGING
ctx->init_encode_params.version               = -804976880
ctx->init_encode_params.encodeWidth           = 3840
ctx->init_encode_params.encodeHeight          = 2160
ctx->init_encode_params.darWidth              = 3840
ctx->init_encode_params.darHeight             = 2160
ctx->init_encode_params.frameRateNum          = 50
ctx->init_encode_params.frameRateDen          = 1
ctx->init_encode_params.enableEncodeAsync     = 0
ctx->init_encode_params.enablePTD             = 1
ctx->init_encode_params.reportSliceOffsets    = 0
ctx->init_encode_params.enableSubFrameWrite   = 0
ctx->init_encode_params.enableExternalMEHints = 0
ctx->init_encode_params.privDataSize          = 0
ctx->init_encode_params.enableExternalMEHints = 0
ctx->init_encode_params.maxEncodeWidth        = 3840
ctx->init_encode_params.maxEncodeHeight       = 2160

ctx->init_encode_params.gopLength             = 12
ctx->init_encode_params.frameIntervalP        = 1
ctx->init_encode_params.monoChromeEncoding    = 0
ctx->init_encode_params.frameFieldMode        = 1
ctx->init_encode_params.mvPrecision           = 3

encodeConfig.level                             = 0
encodeConfig.tier                              = 0
encodeConfig.minCUSize                         = 2
encodeConfig.maxCUSize                         = 3
encodeConfig.useConstrainedIntraPred           = 0
encodeConfig.disableDeblockAcrossSliceBoundary = 0
encodeConfig.outputBufferingPeriodSEI          = 0
encodeConfig.outputPictureTimingSEI            = 0
encodeConfig.outputAUD                         = 0
encodeConfig.enableLTR                         = 0
encodeConfig.disableSPSPPS                     = 0
encodeConfig.repeatSPSPPS                      = 1
encodeConfig.enableIntraRefresh                = 0
encodeConfig.idrPeriod                         = 12
encodeConfig.intraRefreshPeriod                = 0
encodeConfig.intraRefreshCnt                   = 0
encodeConfig.maxNumRefFramesInDPB              = 1
encodeConfig.ltrNumFrames                      = 0
encodeConfig.vpsId                             = 0
encodeConfig.spsId                             = 0
encodeConfig.ppsId                             = 0
encodeConfig.sliceMode                         = 0
encodeConfig.sliceModeData                     = 0
encodeConfig.maxTemporalLayersMinus1           = 0

rc_param.constQP      = 28
rc_param.averageBitRate = 0
rc_param.maxBitRate     = 0
rc_param.vbvBufferSize  = 0
[mpegts @ 0x3806820] muxrate VBR, pcr every 5 pkts, sdt every 200, 
pat/pmt every 40 pkts
Output #0, mpegts, to 'out.ts':
  Metadata:
    encoder         : Lavf56.26.101
    Stream #0:0: Video: hevc (nvenc_h265), 1 reference frame, nv12, 
3840x2160, q=-1--1, 50 fps, 90k tbn, 50 tbc
    Metadata:
      encoder         : Lavc56.30.100 nvenc_h265
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo (native) -> hevc (nvenc_h265))
Press [q] to stop, [?] for help
frame=  765 fps= 29 q=0.0 Lsize=  137360kB time=00:00:15.30 
bitrate=73545.9kbits/s
video:127337kB audio:0kB subtitle:0kB other streams:0kB global 
headers:0kB muxing overhead: 7.871264%
Input file #0 (/Projects/YUV/soccer.yuv):
  Input stream #0:0 (video): 765 packets read (9517824000 bytes); 765 
frames decoded;
  Total: 765 packets (9517824000 bytes) demuxed
Output file #0 (out.ts):
  Output stream #0:0 (video): 765 frames encoded; 765 packets muxed 
(130392951 bytes);
  Total: 765 packets (130392951 bytes) muxed
[nvenc_h265 @ 0x3807900] Nvenc unloaded

I think you are right, performance is not going linear with FPS + Video 
Size.  In a few days, I will be able to test with a higher GM2xx card. I 
will let you know.



More information about the ffmpeg-devel mailing list