[FFmpeg-trac] #7563(undetermined:new): hwaccel nvdec consumes much more VRAM when compared to hwaccel cuvid

Fri Nov 23 22:19:10 EET 2018

#7563: hwaccel nvdec consumes much more VRAM when compared to hwaccel cuvid
-------------------------------------+-------------------------------------
             Reporter:  malakudi     |                     Type:  defect
               Status:  new          |                 Priority:  normal
            Component:               |                  Version:  git-
  undetermined                       |  master
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 Using hwaccel nvdec makes ffmpeg consume much more VRAM when compared to
 hwaccel cuvid, thus reducing the total amount of transcode sessions that
 can run on certain hardware. Tests have been done on Quadro P2000 that has
 5120 MB of VRAM.

 Here are some examples:

 Following command allocates 193MB of VRAM
 {{{
 ./ffmpeg-git -hwaccel nvdec -hwaccel_output_format cuda -f mpegts -i
 input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
 copy -f mpegts -y /dev/null
 }}}

 while similar command with hwaccel cuvid allocates 155MB
 {{{
 ./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -f mpegts -i
 input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
 copy -f mpegts -y /dev/null
 }}}

 and cuvid with limiting surfaces to 10 (which are enough for this input)
 allocates 125 MB

 {{{
 /ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -surfaces 10 -f mpegts -i
 input_hdready_progressive_ntsc.ts -vcodec h264_nvenc -refs 4 -bf 2 -c:a
 copy -f mpegts -y /dev/null
 }}}

 VRAM allocation can be seen with nvidia-smi

 Differences are higher on higher input resolutions. 190MB for cuvid, 278MB
 for nvdec for 1920x1080i50 input (193-125=68 MB difference, 278-190=88 MB
 difference). And if I put scale_npp in the command line, like following
 commands:

 {{{
 ./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -surfaces 10 -f mpegts -i
 input_1080i50.ts  -vf scale_npp=w=iw/2 -vcodec h264_nvenc -refs 4 -bf 2
 -c:a copy -f mpegts -y /dev/null

 ./ffmpeg-git -hwaccel nvdec -hwaccel_output_format cuda -f mpegts -i
 input_1080i50.ts  -vf scale_npp=w=iw/2 -vcodec h264_nvenc -refs 4 -bf 2
 -c:a copy -f mpegts -y /dev/null
 }}}

 then the difference is 295MB for nvdec, 167MB for cuvid. 295-167=128MB
 difference.

 This makes using nvdec impossible if you want to utilise 100% of the
 hardware with multiple concurrent transcodes.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/7563>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker