[FFmpeg-devel] [PATCH]: Change Stack Frame Limit in Cuda Context

Fri Jan 26 13:32:57 EET 2018

On 26/01/18 09:06, Ben Chang wrote:
> Thanks for the review Mark.
> 
> On Thu, Jan 25, 2018 at 4:13 PM, Mark Thompson <sw at jkqxz.net> wrote:
>>
>>> diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
>>> index 4a91d99..2da251b 100644
>>> --- a/libavcodec/nvenc.c
>>> +++ b/libavcodec/nvenc.c
>>> @@ -420,6 +420,12 @@ static av_cold int nvenc_check_device(AVCodecContext
>> *avctx, int idx)
>>>          goto fail;
>>>      }
>>>
>>> +    cu_res = dl_fn->cuda_dl->cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 128);
>>> +    if (cu_res != CUDA_SUCCESS) {
>>> +        av_log(avctx, AV_LOG_FATAL, "Failed reducing CUDA context stack
>> limit for NVENC: 0x%x\n", (int)cu_res);
>>> +        goto fail;
>>> +    }
>>> +
>>>      ctx->cu_context = ctx->cu_context_internal;
>>>
>>>      if ((ret = nvenc_pop_context(avctx)) < 0)
>>
>> Does this actually have any effect?  I was under the impression that the
>> CUDA context created inside the NVENC encoder wouldn't actually be used for
>> any CUDA operations at all (really just a GPU device handle).
>>
>  There are some cuda kernels in the driver that may be invoked depending on
> the nvenc operations specified in the commandline. My observation from
> looking at the nvcc statistics is that most stack frame size for these cuda
> kernels are 0 (highest observed was 120 bytes).

Right, that makes sense.  If Nvidia is happy that this will always work in drivers compatible with this API version (including any future ones) then sure.

>>
>>> diff --git a/libavutil/hwcontext_cuda.c b/libavutil/hwcontext_cuda.c
>>> index 37827a7..1f022fa 100644
>>> --- a/libavutil/hwcontext_cuda.c
>>> +++ b/libavutil/hwcontext_cuda.c
>>> @@ -386,6 +386,12 @@ static int cuda_device_create(AVHWDeviceContext
>> *ctx, const char *device,
>>>          goto error;
>>>      }
>>>
>>> +    err = cu->cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 128);
>>> +    if (err != CUDA_SUCCESS) {
>>> +        av_log(ctx, AV_LOG_ERROR, "Error reducing CUDA context stack
>> limit\n");
>>> +        goto error;
>>> +    }
>>> +
>>>      cu->cuCtxPopCurrent(&dummy);
>>>
>>>      hwctx->internal->is_allocated = 1;
>>> --
>>> 2.9.1
>>>
>>
>> This is technically a user-visible change, since it will apply to all user
>> programs run on the CUDA context created here as well as those inside
>> ffmpeg.  I'm not sure how many people actually use that, though, so maybe
>> it won't affect anyone.
>>
> In ffmpeg, I see vf_thumbnail_cuda and vf_scale_cuda available (not sure if
> there is more, but these two should not be affected by this reduction).
> User can always raise the stack limit size if their own custom kernel
> require higher stack frame size.

I don't mean filters inside ffmpeg, I mean a user program which probably uses NVDEC and/or NVENC (and possibly other things) from libavcodec but then does its own CUDA processing with the same context.  This is silently changing the setup underneath it, and 128 feels like a very small number.

>>
>> If the stack limit is violated, what happens?  Will that be undefined
>> behaviour with random effects (crash / incorrect results), or is it likely
>> to be caught at program compile/load-time?
>>
> Stack will likely overflow and kernel will terminate (though I have yet
> encounter this before).

As long as the user gets a clear message that a stack overflow has occurred so that they can realise that they need to raise the value then it should be fine.

Thanks,

- Mark