[FFmpeg-trac] #7582(undetermined:new): hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or spatial-aq) with multiple concurrent encodes
FFmpeg
trac at avcodec.org
Sun Dec 2 15:41:49 EET 2018
#7582: hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or
spatial-aq) with multiple concurrent encodes
-------------------------------------+-------------------------------------
Reporter: malakudi | Owner:
Type: defect | Status: new
Priority: important | Component:
Version: git-master | undetermined
Keywords: regresssion | Resolution:
cuda nvenc | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Changes (by cehoyos):
* keywords: => regresssion cuda nvenc
* priority: normal => important
Old description:
> Running multiple hwaccel cuvid/nvenc sessions that utilise temporal-aq or
> spatial-aq AND 3 or more reference frames results in a performance
> degradation since following commits:
> https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/9b82e333b7c4235a3de7ce8d8fe115c53c11f50c
> https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/93d1756af2908150f7c8c0590b9ed246951d474a
> Those commits enabled the use of cuMemcpy2DAsync instead of cuMemcpy2D.
> With this and aq enabled and 3 or more reference frames, performance
> seems to be degraded at around 50% of the nvenc capacity. Maybe it could
> be a driver problem but still, makes ffmpeg problematic on multiple
> realtime encodes scenario. With -hwaccel nvdec this doesn't happen, but
> since -hwaccel nvdec utilises much more VRAM, I cannot run the same
> amount of concurrent sessions.
>
> To reproduce, I use as input the following file:
> https://download.blender.org/demo/movies/BBB/bbb_sunflower_1080p_30fps_normal.mp4
>
> Running with following bash script:
> {{{
> #!/bin/bash
> for i in `seq 1 16` ;
> do
> ./ffmpeg-git -nostdin -loglevel error -hwaccel cuvid -c:v h264_cuvid -re
> -i bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null &
> done
> wait
> echo done
>
> }}}
> Checking utilization with nvidia-smi you will see very low utilization,
> and if you run one more session interactively you will see that it cannot
> keep encoding at 30 fps, although the utilization of nvenc is very low.
> If you set temporal-aq 0 on same script, you will see much higher
> utilization.
>
> Sample output of interactive encoding session while already running 16
> sessions and nvidia-smi dmon output:
> {{{
> ./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -re -i
> bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null
> ffmpeg version N-92462-g529debc987 Copyright (c) 2000-2018 the FFmpeg
> developers
> built with gcc 8 (Debian 8.2.0-9)
> configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
> --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
> --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-
> mipsfpu --disable-msa --disable-libopencv --disable-podpages --disable-
> sndio --disable-debug --enable-libaom --enable-avfilter --enable-gcrypt
> --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
> libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
> libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-
> libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-
> libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-
> libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-
> libopenmpt --enable-libopus --enable-libpulse --enable-librubberband
> --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex
> --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-
> libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265
> --enable-libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk
> --enable-nonfree --enable-opencl --enable-opengl --enable-postproc
> --enable-pthreads --enable-static --disable-shared --enable-version3
> --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu
> --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened
> --enable-frei0r --enable-chromaprint --enable-libx264 --enable-
> libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-
> altivec --shlibdir=/usr/lib/x86_64-linux-gnu
> libavutil 56. 23.101 / 56. 23.101
> libavcodec 58. 39.100 / 58. 39.100
> libavformat 58. 22.100 / 58. 22.100
> libavdevice 58. 6.100 / 58. 6.100
> libavfilter 7. 44.100 / 7. 44.100
> libswscale 5. 4.100 / 5. 4.100
> libswresample 3. 4.100 / 3. 4.100
> libpostproc 55. 4.100 / 55. 4.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> 'bbb_sunflower_1080p_30fps_normal.mp4':
> Metadata:
> major_brand : isom
> minor_version : 1
> compatible_brands: isomavc1
> creation_time : 2013-12-16T17:44:39.000000Z
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> composer : Sacha Goedegebure
> Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
> Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
> 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
> (default)
> Metadata:
> creation_time : 2013-12-16T17:44:39.000000Z
> handler_name : GPAC ISO Video Handler
> Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 160 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Side data:
> audio service type: main
> Stream mapping:
> Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
> Stream #0:2 -> #0:1 (copy)
> Press [q] to stop, [?] for help
> Output #0, mpegts, to '/dev/null':
> Metadata:
> major_brand : isom
> minor_version : 1
> compatible_brands: isomavc1
> composer : Sacha Goedegebure
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> encoder : Lavf58.22.100
> Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720
> [SAR 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
> Metadata:
> creation_time : 2013-12-16T17:44:39.000000Z
> handler_name : GPAC ISO Video Handler
> encoder : Lavc58.39.100 h264_nvenc
> Side data:
> cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
> vbv_delay: -1
> Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Side data:
> audio service type: main
> frame= 1372 fps= 14 q=21.0 Lsize= 14369kB time=00:00:45.73
> bitrate=2573.8kbits/s speed=0.483x
> video:11382kB audio:1781kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 9.155905%
>
> nvidia-smi dmon
> # gpu pwr gtemp mtemp sm mem enc dec mclk pclk
> # Idx W C C % % % % MHz MHz
> 0 56 49 - 11 4 30 37 6800 1560
> 0 54 49 - 10 4 30 39 6800 1560
> 0 54 49 - 10 4 32 43 6800 1590
> 0 55 49 - 10 4 31 40 6800 1515
> 0 57 49 - 10 4 31 40 6800 1635
> }}}
> getting just near 15 fps instead of 30.
>
> If you check with ffmpeg-4.0.3 (that doesn't have the above mentioned
> commits) you will also see correct utilization even when using temporal-
> aq 1.
> If you use -hwaccel nvdec or don't use -hwaccel at all (software decoding
> and scaling) the problem also doesn't happen.
> Finally, if you use nvidia-cuda-mps to handle the encodes, the problem
> also doesn't show.
>
> Finally, a sample output of running with ffmpeg-4.0.3 interactively while
> already running 16 sessions AND nvidia-smi dmon output
>
> {{{
> ./ffmpeg-4.0.3 -hwaccel cuvid -c:v h264_cuvid -re -i
> bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null
> ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
> built with gcc 8 (Debian 8.2.0-9)
> configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
> --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
> --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-
> mipsfpu --disable-msa --disable-libopencv --disable-podpages --disable-
> sndio --disable-stripping --enable-libaom --enable-avfilter --enable-
> gcrypt --enable-gnutls --enable-gpl --enable-libass --enable-libbluray
> --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2
> --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-
> libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-
> libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-
> libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-
> libopenmpt --enable-libopus --enable-libpulse --enable-librubberband
> --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex
> --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-
> libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265
> --enable-libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk
> --enable-nonfree --enable-opencl --enable-opengl --enable-postproc
> --enable-pthreads --enable-static --disable-shared --enable-version3
> --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu
> --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened
> --enable-frei0r --enable-chromaprint --enable-libx264 --enable-
> libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-
> altivec --shlibdir=/usr/lib/x86_64-linux-gnu
> libavutil 56. 14.100 / 56. 14.100
> libavcodec 58. 18.100 / 58. 18.100
> libavformat 58. 12.100 / 58. 12.100
> libavdevice 58. 3.100 / 58. 3.100
> libavfilter 7. 16.100 / 7. 16.100
> libswscale 5. 1.100 / 5. 1.100
> libswresample 3. 1.100 / 3. 1.100
> libpostproc 55. 1.100 / 55. 1.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> 'bbb_sunflower_1080p_30fps_normal.mp4':
> Metadata:
> major_brand : isom
> minor_version : 1
> compatible_brands: isomavc1
> creation_time : 2013-12-16T17:44:39.000000Z
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> composer : Sacha Goedegebure
> Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
> Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
> 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
> (default)
> Metadata:
> creation_time : 2013-12-16T17:44:39.000000Z
> handler_name : GPAC ISO Video Handler
> Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 160 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Side data:
> audio service type: main
> Stream mapping:
> Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
> Stream #0:2 -> #0:1 (copy)
> Press [q] to stop, [?] for help
> Output #0, mpegts, to '/dev/null':
> Metadata:
> major_brand : isom
> minor_version : 1
> compatible_brands: isomavc1
> composer : Sacha Goedegebure
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> encoder : Lavf58.12.100
> Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720
> [SAR 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
> Metadata:
> creation_time : 2013-12-16T17:44:39.000000Z
> handler_name : GPAC ISO Video Handler
> encoder : Lavc58.18.100 h264_nvenc
> Side data:
> cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
> vbv_delay: -1
> Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
> Metadata:
> creation_time : 2013-12-16T17:44:42.000000Z
> handler_name : GPAC ISO Audio Handler
> Side data:
> audio service type: main
> frame= 1106 fps= 30 q=26.0 Lsize= 11893kB time=00:00:36.92
> bitrate=2638.3kbits/s speed=0.999x
> video:9454kB audio:1444kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 9.133109%
>
> # gpu pwr gtemp mtemp sm mem enc dec mclk pclk
> # Idx W C C % % % % MHz MHz
> 0 90 55 - 21 9 52 73 6800 1950
> 0 90 55 - 20 8 53 73 6800 1950
> 0 91 55 - 20 9 52 78 6800 1950
> 0 90 55 - 20 8 52 75 6800 1950
> 0 88 55 - 20 8 51 75 6800 1950
> 0 87 55 - 21 8 53 71 6800 1950
> 0 88 55 - 20 8 49 75 6800 1950
> 0 85 55 - 21 8 53 74 6800 1950
> 0 87 55 - 20 8 49 74 6800 1950
> 0 85 55 - 21 9 54 74 6800 1950
> 0 87 55 - 20 8 49 75 6800 1950
> 0 84 55 - 21 9 54 70 6800 1950
> }}}
New description:
Running multiple hwaccel cuvid/nvenc sessions that utilise temporal-aq or
spatial-aq AND 3 or more reference frames results in a performance
degradation since following commits:
9b82e333b7c4235a3de7ce8d8fe115c53c11f50c
93d1756af2908150f7c8c0590b9ed246951d474a
Those commits enabled the use of cuMemcpy2DAsync instead of cuMemcpy2D.
With this and aq enabled and 3 or more reference frames, performance seems
to be degraded at around 50% of the nvenc capacity. Maybe it could be a
driver problem but still, makes ffmpeg problematic on multiple realtime
encodes scenario. With -hwaccel nvdec this doesn't happen, but since
-hwaccel nvdec utilises much more VRAM, I cannot run the same amount of
concurrent sessions.
To reproduce, I use as input the following file:
https://download.blender.org/demo/movies/BBB/bbb_sunflower_1080p_30fps_normal.mp4
Running with following bash script:
{{{
#!/bin/bash
for i in `seq 1 16` ;
do
./ffmpeg-git -nostdin -loglevel error -hwaccel cuvid -c:v h264_cuvid -re
-i bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null &
done
wait
echo done
}}}
Checking utilization with nvidia-smi you will see very low utilization,
and if you run one more session interactively you will see that it cannot
keep encoding at 30 fps, although the utilization of nvenc is very low. If
you set temporal-aq 0 on same script, you will see much higher
utilization.
Sample output of interactive encoding session while already running 16
sessions and nvidia-smi dmon output:
{{{
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -re -i
bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null
ffmpeg version N-92462-g529debc987 Copyright (c) 2000-2018 the FFmpeg
developers
built with gcc 8 (Debian 8.2.0-9)
configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
--disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
--disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
--disable-msa --disable-libopencv --disable-podpages --disable-sndio
--disable-debug --enable-libaom --enable-avfilter --enable-gcrypt
--enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
--enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
--enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
--enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
libopus --enable-libpulse --enable-librubberband --enable-libshine
--enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
pthreads --enable-static --disable-shared --enable-version3 --enable-
libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
--enable-vaapi --enable-libmfx --disable-altivec
--shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 56. 23.101 / 56. 23.101
libavcodec 58. 39.100 / 58. 39.100
libavformat 58. 22.100 / 58. 22.100
libavdevice 58. 6.100 / 58. 6.100
libavfilter 7. 44.100 / 7. 44.100
libswscale 5. 4.100 / 5. 4.100
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'bbb_sunflower_1080p_30fps_normal.mp4':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
creation_time : 2013-12-16T17:44:39.000000Z
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
composer : Sacha Goedegebure
Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
(default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 160 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
Stream mapping:
Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
Stream #0:2 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, mpegts, to '/dev/null':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
composer : Sacha Goedegebure
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
encoder : Lavf58.22.100
Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
encoder : Lavc58.39.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
vbv_delay: -1
Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
frame= 1372 fps= 14 q=21.0 Lsize= 14369kB time=00:00:45.73
bitrate=2573.8kbits/s speed=0.483x
video:11382kB audio:1781kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 9.155905%
nvidia-smi dmon
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 56 49 - 11 4 30 37 6800 1560
0 54 49 - 10 4 30 39 6800 1560
0 54 49 - 10 4 32 43 6800 1590
0 55 49 - 10 4 31 40 6800 1515
0 57 49 - 10 4 31 40 6800 1635
}}}
getting just near 15 fps instead of 30.
If you check with ffmpeg-4.0.3 (that doesn't have the above mentioned
commits) you will also see correct utilization even when using temporal-aq
1.
If you use -hwaccel nvdec or don't use -hwaccel at all (software decoding
and scaling) the problem also doesn't happen.
Finally, if you use nvidia-cuda-mps to handle the encodes, the problem
also doesn't show.
Finally, a sample output of running with ffmpeg-4.0.3 interactively while
already running 16 sessions AND nvidia-smi dmon output
{{{
./ffmpeg-4.0.3 -hwaccel cuvid -c:v h264_cuvid -re -i
bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null
ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 8 (Debian 8.2.0-9)
configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
--disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
--disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
--disable-msa --disable-libopencv --disable-podpages --disable-sndio
--disable-stripping --enable-libaom --enable-avfilter --enable-gcrypt
--enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
--enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
--enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
--enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
libopus --enable-libpulse --enable-librubberband --enable-libshine
--enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
pthreads --enable-static --disable-shared --enable-version3 --enable-
libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
--enable-vaapi --enable-libmfx --disable-altivec
--shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'bbb_sunflower_1080p_30fps_normal.mp4':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
creation_time : 2013-12-16T17:44:39.000000Z
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
composer : Sacha Goedegebure
Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
(default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 160 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
Stream mapping:
Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
Stream #0:2 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, mpegts, to '/dev/null':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
composer : Sacha Goedegebure
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
encoder : Lavf58.12.100
Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
encoder : Lavc58.18.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
vbv_delay: -1
Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
frame= 1106 fps= 30 q=26.0 Lsize= 11893kB time=00:00:36.92
bitrate=2638.3kbits/s speed=0.999x
video:9454kB audio:1444kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 9.133109%
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 90 55 - 21 9 52 73 6800 1950
0 90 55 - 20 8 53 73 6800 1950
0 91 55 - 20 9 52 78 6800 1950
0 90 55 - 20 8 52 75 6800 1950
0 88 55 - 20 8 51 75 6800 1950
0 87 55 - 21 8 53 71 6800 1950
0 88 55 - 20 8 49 75 6800 1950
0 85 55 - 21 8 53 74 6800 1950
0 87 55 - 20 8 49 74 6800 1950
0 85 55 - 21 9 54 74 6800 1950
0 87 55 - 20 8 49 75 6800 1950
0 84 55 - 21 9 54 70 6800 1950
}}}
--
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7582#comment:1>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list