[FFmpeg-trac] #7563(undetermined:new): hwaccel nvdec consumes much more VRAM when compared to hwaccel cuvid
FFmpeg
trac at avcodec.org
Fri Nov 23 22:19:10 EET 2018
#7563: hwaccel nvdec consumes much more VRAM when compared to hwaccel cuvid
-------------------------------------+-------------------------------------
Reporter: malakudi | Type: defect
Status: new | Priority: normal
Component: | Version: git-
undetermined | master
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Using hwaccel nvdec makes ffmpeg consume much more VRAM when compared to
hwaccel cuvid, thus reducing the total amount of transcode sessions that
can run on certain hardware. Tests have been done on Quadro P2000 that has
5120 MB of VRAM.
Here are some examples:
Following command allocates 193MB of VRAM
{{{
./ffmpeg-git -hwaccel nvdec -hwaccel_output_format cuda -f mpegts -i
input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
copy -f mpegts -y /dev/null
}}}
while similar command with hwaccel cuvid allocates 155MB
{{{
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -f mpegts -i
input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
copy -f mpegts -y /dev/null
}}}
and cuvid with limiting surfaces to 10 (which are enough for this input)
allocates 125 MB
{{{
/ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -surfaces 10 -f mpegts -i
input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
copy -f mpegts -y /dev/null
}}}
VRAM allocation can be seen with nvidia-smi
Differences are higher on higher input resolutions. 190MB for cuvid, 278MB
for nvdec for 1920x1080i50 input (193-125=68 MB difference, 278-190=88 MB
difference). And if I put scale_npp in the command line, like following
commands:
{{{
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -surfaces 10 -f mpegts -i
input_1080i50.ts -vf scale_npp=w=iw/2 -vcodec h264_nvenc -refs 4 -bf 2
-c:a copy -f mpegts -y /dev/null
./ffmpeg-git -hwaccel nvdec -hwaccel_output_format cuda -f mpegts -i
input_1080i50.ts -vf scale_npp=w=iw/2 -vcodec h264_nvenc -refs 4 -bf 2
-c:a copy -f mpegts -y /dev/null
}}}
then the difference is 295MB for nvdec, 167MB for cuvid. 295-167=128MB
difference.
This makes using nvdec impossible if you want to utilise 100% of the
hardware with multiple concurrent transcodes.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7563>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list