[FFmpeg-trac] #9285(avcodec:new): Excessive GPU memory usage with nvdec hwaccel
FFmpeg
trac at avcodec.org
Wed Jun 9 03:42:45 EEST 2021
#9285: Excessive GPU memory usage with nvdec hwaccel
-------------------------------------+-------------------------------------
Reporter: Ridley | Type: defect
Combs |
Status: new | Priority: normal
Component: avcodec | Version:
Keywords: nvdec | unspecified
nvidia | Blocked By:
Blocking: | Reproduced by developer: 1
Analyzed by developer: 1 |
-------------------------------------+-------------------------------------
When decoding video using the CUDA hwaccel, `ff_nvdec_decode_init()` sets
both `ulNumDecodeSurfaces` and `ulNumOutputSurfaces` to
`frames_ctx->initial_pool_size`, which in turn is set by
`ff_nvdec_decode_init` to `dpb_size + 2`, which in turn has 3 added by
`ff_decode_get_hw_frames_ctx()` and `extra_hw_frames` + `thread_count`
added by `avcodec_get_hw_frames_parameters`.
This is excessive. Only `ulNumDecodeSurfaces` needs additional frames
based on thread count (the output surfaces are only used in
`nvdec_retrieve_data`, which runs on the consumer's single thread), while
only `ulNumOutputSurfaces` needs the 3 additional output frames from
`ff_decode_get_hw_frames_ctx()` or the ones from `extra_hw_frames` (the
decode surfaces are never exposed to the consumer).
I'm not sure what the best way to handle this is. Maybe nvdec should
ignore what the generic code sets `initial_pool_size` to altogether and
instead calculate its buffer counts internally, duplicating the generic
code's behavior only where appropriate? The `initial_pool_size` value
seems to be designed for systems where the decoder's internal buffered
frames are returned directly to the user, but that's not the case here.
Additionally, it doesn't seem like multithreading in CUDA actually serves
any purpose; I see no performance gain when using multiple threads vs 1.
Is it useful with any hardware decoder? Should we be defaulting
multithreading off when using a hwaccel, or forcing it off unless the
hwaccel fails and software fallback occurs? This can result in some pretty
hefty memory usage for no reason by default on many-core machines.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/9285>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list