[FFmpeg-devel] [PATCH] libavcodec/cuviddec.c: increase CUVID_DEFAULT_NUM_SURFACES

Sat Feb 22 04:52:55 EET 2025

On 2/21/25 08:26, Timo Rothenpieler wrote:
> On 20.02.2025 21:37, Scott Theisen wrote:
>> The default value of CuvidContext::nb_surfaces was reduced from 25 to 
>> 5 (as
>> (CUVID_MAX_DISPLAY_DELAY + 1)) in 
>> 402d98c9d467dff6931d906ebb732b9a00334e0b.
>>
>> In cuvid_is_buffer_full() delay can be 2 * CUVID_MAX_DISPLAY_DELAY 
>> with double
>> rate deinterlacing.  ctx->nb_surfaces is CUVID_DEFAULT_NUM_SURFACES =
>> (CUVID_MAX_DISPLAY_DELAY + 1) by default, in which case 
>> cuvid_is_buffer_full()
>> will always return true and cuvid_output_frame() will never read any 
>> data since
>> it will not call ff_decode_get_packet().
>
> It's been way too long since I looked at all that code, and I didn't 
> even write most of the code involved:
>> https://github.com/FFmpeg/FFmpeg/commit/bddb2343b6e594e312dadb5d21b408702929ae04 
>>
>> https://github.com/FFmpeg/FFmpeg/commit/402d98c9d467dff6931d906ebb732b9a00334e0b 
>>
>
> But doesn't this instead mean that the logic in cuvid_is_buffer_full 
> is flawed somehow?

I think it is the number of frames ready to send to the driver + the 
number of frames in queue in the driver >= the number of decoded frame 
buffers.  However, it doesn't actually know how many frames are in queue 
in the driver and assumes the maximum.

> Just increasing the default number of surfaces does not seem like the 
> correct fix or sensible, since it will increase VRAM usage by 
> potentially quite a bit for all users.
>

The changes to cuvid_handle_video_sequence() from 
402d98c9d467dff6931d906ebb732b9a00334e0b will increase nb_surfaces once 
data has been read.

> From looking at this a bit, the issue will only happen when 
> deinterlacing, the logic in cuvid_is_buffer_full becomes stuck then, 
> and will always claim the buffer is full.
> And from my understanding, it's correct in making that claim. Due to 
> the display delay, it could in theory happen that the moment cuvid 
> starts outputting frames, there will be more output available than 
> what fits into ctx->frame_queue, since it delayed by 4 frames, which 
> results in 8 surfaces, but the queue only fits 5.
>
> So to me it looks like that the correct fix would be to double the 
> size of the frame_queue when deinterlacing, not unconditionally.

There is nothing stopping deint_mode or drop_second_field from being 
changed after cuvid_decode_init() is called, so it doesn't necessarily 
know it will deinterlace.

Regardless, 402d98c9d467dff6931d906ebb732b9a00334e0b reduced 
CUVID_DEFAULT_NUM_SURFACES from 25 to *only 5* to not break playback 
entirely.  I don't think the intention was to break playback for double 
rate deinterlacing while allowing playback for only single rate 
deinterlacing.

Also, if AV_CODEC_FLAG_LOW_DELAY is set, then only one output surface is 
needed, but there are still 5.

>
> nb_surfaces is also used to determine the size of the key_frame array, 
> which would then also be pointlessly doubled. But not like a handful 
> of extra ints would hurt that much though.
> Alternatively the size-doubling could not be reflected in nb_surfaces, 
> but that would make the logic in various other places be more 
> complicated.
>
>> ---
>>
>> I think part of the problem might be that cuvid_is_buffer_full() does 
>> not know
>> how many frames are actually in the driver's queue and assumes it is the
>> maximum, even if none have yet been added.
>>
>> This was preventing any frames from being decoded using NVDEC with 
>> MythTV for
>> some streams.  See https://github.com/MythTV/mythtv/issues/1039
>
> I'd highly recommend to not use cuviddec anymore, but instead use nvdec.
> cuviddec only still exists to sanity-check nvdec against it at times.

.*_cuvid are FFCodecs while .*_nvdec are FFHWAccels.  I don't know what 
would be required to change to .*_nvdec and the .*_cuvid FFCodecs work 
fine with this change.

>
>> ---
>>   libavcodec/cuviddec.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
>> index 67076a1752..05dcafab6e 100644
>> --- a/libavcodec/cuviddec.c
>> +++ b/libavcodec/cuviddec.c
>> @@ -120,7 +120,7 @@ typedef struct CuvidParsedFrame
>>   #define CUVID_MAX_DISPLAY_DELAY (4)
>>     // Actual pool size will be determined by parser.
>> -#define CUVID_DEFAULT_NUM_SURFACES (CUVID_MAX_DISPLAY_DELAY + 1)
>> +#define CUVID_DEFAULT_NUM_SURFACES ((2 * CUVID_MAX_DISPLAY_DELAY) + 1)
>>     static int CUDAAPI cuvid_handle_video_sequence(void *opaque, 
>> CUVIDEOFORMAT* format)
>>   {