[FFmpeg-user] Compiling ffmpeg with NVENC and NVIDIA GRID K1

Steven Liu lingjiujianke at gmail.com
Thu Aug 27 10:03:21 CEST 2015


2015-08-27 15:08 GMT+08:00 Steven Liu <lingjiujianke at gmail.com>:

>
>
> 2015-08-27 14:52 GMT+08:00 Steven Liu <lingjiujianke at gmail.com>:
>
>>
>> 2015-06-29 18:12 GMT+08:00 Klaus Schürmann <ks at mediabeam.com>:
>>
>>> Hello,
>>>
>>> I compiled ffmpeg with nvenc support. The compile process worked without
>>> any error. But if I try to convert a file with nvenc I got the error
>>> message "[nvenc @ 0x39dc1c0] CreateInputBuffer failed".
>>>
>>> Can somebody help me to fix this problem?
>>>
>>> Best Regards
>>> Klaus Schuermann
>>>
>>> OS: Ubuntu 14.04.2 LTS
>>> NVidia driver: 346
>>>
>>> Her is the complete output oft he convert job:
>>>
>>> root at video-convert1:~/ffmpeg_sources/ffmpeg_libnvenc# ffmpeg -i
>>> /media/testfile.mkv -r 60 -s 1024x768 -vcodec nvenc -b:v 5750k testfile.mp4
>>> ffmpeg version N-73133-gd7e224e Copyright (c) 2000-2015 the FFmpeg
>>> developers
>>>   built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04)
>>>   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
>>> --extra-cflags=-I/root/ffmpeg_build/include
>>> --extra-ldflags=-L/root/ffmpeg_build/lib --bindir=/root/bin --enable-gpl
>>> --enable-libass --enable-libfdk-aac --enable-libfreetype
>>> --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis
>>> --enable-libvpx --enable-libx264 --enable-libx265 --enable-nvenc
>>> --enable-nonfree
>>>   libavutil      54. 27.100 / 54. 27.100
>>>   libavcodec     56. 44.101 / 56. 44.101
>>>   libavformat    56. 38.101 / 56. 38.101
>>>   libavdevice    56.  4.100 / 56.  4.100
>>>   libavfilter     5. 18.100 /  5. 18.100
>>>   libswscale      3.  1.101 /  3.  1.101
>>>   libswresample   1.  2.100 /  1.  2.100
>>>   libpostproc    53.  3.100 / 53.  3.100
>>> Input #0, matroska,webm, from '/media/testfile.mkv':
>>>   Metadata:
>>>     encoder         : libebml v1.3.0 + libmatroska v1.4.1
>>>     creation_time   : 2014-09-29 00:31:12
>>>   Duration: 00:21:03.51, start: 0.000000, bitrate: 3015 kb/s
>>>     Stream #0:0(eng): Video: h264 (High), yuv420p(tv,
>>> bt709/unknown/unknown), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr,
>>> 1k tbn, 47.95 tbc (default)
>>>     Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
>>> (default) [nvenc @ 0x39dc1c0] CreateInputBuffer failed Output #0, mp4, to
>>> 'testfile.mp4':
>>>   Metadata:
>>>     encoder         : libebml v1.3.0 + libmatroska v1.4.1
>>>     Stream #0:0(eng): Video: h264, none, q=2-31, 128 kb/s, SAR 4:3 DAR
>>> 0:0, 60 fps (default)
>>>     Metadata:
>>>       encoder         : Lavc56.44.101 nvenc
>>>     Stream #0:1: Audio: aac, 0 channels, 128 kb/s (default)
>>>     Metadata:
>>>       encoder         : Lavc56.44.101 libfdk_aac
>>> Stream mapping:
>>>   Stream #0:0 -> #0:0 (h264 (native) -> h264 (nvenc))
>>>   Stream #0:1 -> #0:1 (ac3 (native) -> aac (libfdk_aac)) Error while
>>> opening encoder for output stream #0:0 - maybe incorrect parameters such as
>>> bit_rate, rate, width or height
>>>
>>> Output of devicequery:
>>>
>>> root at video-convert1:~#
>>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery
>>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...
>>>
>>>  CUDA Device Query (Runtime API) version (CUDART static linking)
>>>
>>> Detected 4 CUDA Capable device(s)
>>>
>>> Device 0: "GRID K1"
>>>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>>>   CUDA Capability Major/Minor version number:    3.0
>>>   Total amount of global memory:                 4096 MBytes (4294770688
>>> bytes)
>>>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>>>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>>>   Memory Clock rate:                             891 Mhz
>>>   Memory Bus Width:                              128-bit
>>>   L2 Cache Size:                                 262144 bytes
>>>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
>>> 65536), 3D=(4096, 4096, 4096)
>>>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>>>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
>>> layers
>>>   Total amount of constant memory:               65536 bytes
>>>   Total amount of shared memory per block:       49152 bytes
>>>   Total number of registers available per block: 65536
>>>   Warp size:                                     32
>>>   Maximum number of threads per multiprocessor:  2048
>>>   Maximum number of threads per block:           1024
>>>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>>>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535,
>>> 65535)
>>>   Maximum memory pitch:                          2147483647 bytes
>>>   Texture alignment:                             512 bytes
>>>   Concurrent copy and kernel execution:          Yes with 1 copy
>>> engine(s)
>>>   Run time limit on kernels:                     No
>>>   Integrated GPU sharing Host Memory:            No
>>>   Support host page-locked memory mapping:       Yes
>>>   Alignment requirement for Surfaces:            Yes
>>>   Device has ECC support:                        Disabled
>>>   Device supports Unified Addressing (UVA):      Yes
>>>   Device PCI Domain ID / Bus ID / location ID:   0 / 132 / 0
>>>   Compute Mode:
>>>      < Default (multiple host threads can use ::cudaSetDevice() with
>>> device simultaneously) >
>>>
>>> Device 1: "GRID K1"
>>>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>>>   CUDA Capability Major/Minor version number:    3.0
>>>   Total amount of global memory:                 4096 MBytes (4294770688
>>> bytes)
>>>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>>>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>>>   Memory Clock rate:                             891 Mhz
>>>   Memory Bus Width:                              128-bit
>>>   L2 Cache Size:                                 262144 bytes
>>>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
>>> 65536), 3D=(4096, 4096, 4096)
>>>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>>>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
>>> layers
>>>   Total amount of constant memory:               65536 bytes
>>>   Total amount of shared memory per block:       49152 bytes
>>>   Total number of registers available per block: 65536
>>>   Warp size:                                     32
>>>   Maximum number of threads per multiprocessor:  2048
>>>   Maximum number of threads per block:           1024
>>>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>>>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535,
>>> 65535)
>>>   Maximum memory pitch:                          2147483647 bytes
>>>   Texture alignment:                             512 bytes
>>>   Concurrent copy and kernel execution:          Yes with 1 copy
>>> engine(s)
>>>   Run time limit on kernels:                     No
>>>   Integrated GPU sharing Host Memory:            No
>>>   Support host page-locked memory mapping:       Yes
>>>   Alignment requirement for Surfaces:            Yes
>>>   Device has ECC support:                        Disabled
>>>   Device supports Unified Addressing (UVA):      Yes
>>>   Device PCI Domain ID / Bus ID / location ID:   0 / 133 / 0
>>>   Compute Mode:
>>>      < Default (multiple host threads can use ::cudaSetDevice() with
>>> device simultaneously) >
>>>
>>> Device 2: "GRID K1"
>>>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>>>   CUDA Capability Major/Minor version number:    3.0
>>>   Total amount of global memory:                 4096 MBytes (4294770688
>>> bytes)
>>>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>>>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>>>   Memory Clock rate:                             891 Mhz
>>>   Memory Bus Width:                              128-bit
>>>   L2 Cache Size:                                 262144 bytes
>>>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
>>> 65536), 3D=(4096, 4096, 4096)
>>>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>>>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
>>> layers
>>>   Total amount of constant memory:               65536 bytes
>>>   Total amount of shared memory per block:       49152 bytes
>>>   Total number of registers available per block: 65536
>>>   Warp size:                                     32
>>>   Maximum number of threads per multiprocessor:  2048
>>>   Maximum number of threads per block:           1024
>>>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>>>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535,
>>> 65535)
>>>   Maximum memory pitch:                          2147483647 bytes
>>>   Texture alignment:                             512 bytes
>>>   Concurrent copy and kernel execution:          Yes with 1 copy
>>> engine(s)
>>>   Run time limit on kernels:                     No
>>>   Integrated GPU sharing Host Memory:            No
>>>   Support host page-locked memory mapping:       Yes
>>>   Alignment requirement for Surfaces:            Yes
>>>   Device has ECC support:                        Disabled
>>>   Device supports Unified Addressing (UVA):      Yes
>>>   Device PCI Domain ID / Bus ID / location ID:   0 / 134 / 0
>>>   Compute Mode:
>>>      < Default (multiple host threads can use ::cudaSetDevice() with
>>> device simultaneously) >
>>>
>>> Device 3: "GRID K1"
>>>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>>>   CUDA Capability Major/Minor version number:    3.0
>>>   Total amount of global memory:                 4096 MBytes (4294770688
>>> bytes)
>>>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>>>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>>>   Memory Clock rate:                             891 Mhz
>>>   Memory Bus Width:                              128-bit
>>>   L2 Cache Size:                                 262144 bytes
>>>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
>>> 65536), 3D=(4096, 4096, 4096)
>>>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>>>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
>>> layers
>>>   Total amount of constant memory:               65536 bytes
>>>   Total amount of shared memory per block:       49152 bytes
>>>   Total number of registers available per block: 65536
>>>   Warp size:                                     32
>>>   Maximum number of threads per multiprocessor:  2048
>>>   Maximum number of threads per block:           1024
>>>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>>>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535,
>>> 65535)
>>>   Maximum memory pitch:                          2147483647 bytes
>>>   Texture alignment:                             512 bytes
>>>   Concurrent copy and kernel execution:          Yes with 1 copy
>>> engine(s)
>>>   Run time limit on kernels:                     No
>>>   Integrated GPU sharing Host Memory:            No
>>>   Support host page-locked memory mapping:       Yes
>>>   Alignment requirement for Surfaces:            Yes
>>>   Device has ECC support:                        Disabled
>>>   Device supports Unified Addressing (UVA):      Yes
>>>   Device PCI Domain ID / Bus ID / location ID:   0 / 135 / 0
>>>   Compute Mode:
>>>      < Default (multiple host threads can use ::cudaSetDevice() with
>>> device simultaneously) >
>>> > Peer access from GRID K1 (GPU0) -> GRID K1 (GPU1) : Yes Peer access
>>> > from GRID K1 (GPU0) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
>>> > (GPU0) -> GRID K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID
>>> > K1 (GPU1) : No Peer access from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes
>>> > Peer access from GRID K1 (GPU1) -> GRID K1 (GPU3) : Yes Peer access
>>> > from GRID K1 (GPU2) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
>>> > (GPU2) -> GRID K1 (GPU2) : No Peer access from GRID K1 (GPU2) -> GRID
>>> > K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU0) :
>>> > Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU1) : No Peer access
>>> > from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
>>> > (GPU2) -> GRID K1 (GPU0) : Yes Peer access from GRID K1 (GPU2) -> GRID
>>> > K1 (GPU1) : Yes Peer access from GRID K1 (GPU2) -> GRID K1 (GPU2) : No
>>> > Peer access from GRID K1 (GPU3) -> GRID K1 (GPU0) : Yes Peer access
>>> > from GRID K1 (GPU3) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
>>> > (GPU3) -> GRID K1 (GPU2) : Yes
>>>
>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA
>>> Runtime Version = 7.0, NumDevs = 4, Device0 = GRID K1, Device1 = GRID K1,
>>> Device2 = GRID K1, Device3 = GRID K1 Result = PASS
>>>
>>
>> I have the same error message, What should we attention? my GPU pcie
>> message bellow:
>>
>>
>> [root at localhost release]# ./deviceQuery
>> ./deviceQuery Starting...
>>
>>  CUDA Device Query (Runtime API) version (CUDART static linking)
>>
>> Detected 1 CUDA Capable device(s)
>>
>> Device 0: "Tesla K20c"
>>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>>   CUDA Capability Major/Minor version number:    3.5
>>   Total amount of global memory:                 4800 MBytes (5032706048
>> bytes)
>>   (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
>>   GPU Max Clock rate:                            706 MHz (0.71 GHz)
>>   Memory Clock rate:                             2600 Mhz
>>   Memory Bus Width:                              320-bit
>>   L2 Cache Size:                                 1310720 bytes
>>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
>> 65536), 3D=(4096, 4096, 4096)
>>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
>> layers
>>   Total amount of constant memory:               65536 bytes
>>   Total amount of shared memory per block:       49152 bytes
>>   Total number of registers available per block: 65536
>>   Warp size:                                     32
>>   Maximum number of threads per multiprocessor:  2048
>>   Maximum number of threads per block:           1024
>>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535,
>> 65535)
>>   Maximum memory pitch:                          2147483647 bytes
>>   Texture alignment:                             512 bytes
>>   Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
>>   Run time limit on kernels:                     No
>>   Integrated GPU sharing Host Memory:            No
>>   Support host page-locked memory mapping:       Yes
>>   Alignment requirement for Surfaces:            Yes
>>   Device has ECC support:                        Enabled
>>   Device supports Unified Addressing (UVA):      Yes
>>   Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
>>   Compute Mode:
>>      < Default (multiple host threads can use ::cudaSetDevice() with
>> device simultaneously) >
>>
>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA
>> Runtime Version = 7.0, NumDevs = 1, Device0 = Tesla K20c
>> Result = PASS
>>
>>
> Hi Timo,
>
> I saw the status of 0x0A if header file /usr/include/nvEncodeAPI.h
> perhaps the memory alloc is large?
>
>  /* 1MB is large enough to hold most output frames. NVENC increases this
> automaticaly if it's not enough. */
>         allocOut.size = 1024 * 1024;
>
>         allocOut.memoryHeap = NV_ENC_MEMORY_HEAP_SYSMEM_CACHED;
>
>
>     /**
>      * This indicates that the API call failed because it was unable to
> allocate
>      * enough memory to perform the requested operation.
>      */
>     NV_ENC_ERR_OUT_OF_MEMORY,
>
>
I make mistake, this is not the error code info,

i got the gdb message is :
Missing separate debuginfo for /lib64/libcuda.so
[nvenc @ 0x1a8f700] 1 CUDA capable devices found
[nvenc @ 0x1a8f700] [ GPU #0 - < Tesla K20c > has Compute SM 3.5, smver 53
target_smver 48 NVENC Available ]
[nvenc @ 0x1a8f700] Nvenc initialized successfully
[New Thread 0x7ffff1906700 (LWP 8614)]
[nvenc @ 0x1a8f700] in for surfaceCount = 0 ctx->max_surface_count = 48

Breakpoint 2, nvenc_encode_init (avctx=0x1a8f700) at
/home/liuqi/ffmpeg/libavcodec/nvenc.c:981
981             nv_status = p_nvenc->nvEncCreateInputBuffer(ctx->nvencoder,
&allocSurf);
(gdb) p allocSurf
$1 = {version = 1342243592, width = 512, height = 288, memoryHeap =
NV_ENC_MEMORY_HEAP_SYSMEM_CACHED, bufferFmt = NV_ENC_BUFFER_FORMAT_YV12_PL,
reserved = 0,
  inputBuffer = 0x0, pSysMemBuffer = 0x0, reserved1 = {0 <repeats 57
times>}, reserved2 = {0x0 <repeats 63 times>}}


More information about the ffmpeg-user mailing list