[FFmpeg-devel] [PATCH 11/13] ffv1dec: reference the current packet into the main context

Thu Mar 13 14:51:13 EET 2025

On 13/03/2025 05:57, Andreas Rheinhardt wrote:
> Lynne:
>>
>>
>> On 13/03/2025 02:24, Andreas Rheinhardt wrote:
>>> Lynne:
>>>> On 10/03/2025 18:42, Lynne wrote:
>>>>> On 10/03/2025 04:14, Andreas Rheinhardt wrote:
>>>>>> Lynne:
>>>>>>> ---
>>>>>>>     libavcodec/ffv1.h    |  3 +++
>>>>>>>     libavcodec/ffv1dec.c | 19 +++++++++++++++++--
>>>>>>>     2 files changed, 20 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
>>>>>>> index 8c0e71284d..860a5c14b1 100644
>>>>>>> --- a/libavcodec/ffv1.h
>>>>>>> +++ b/libavcodec/ffv1.h
>>>>>>> @@ -174,6 +174,9 @@ typedef struct FFV1Context {
>>>>>>>          * NOT shared between frame threads.
>>>>>>>          */
>>>>>>>         uint8_t           frame_damaged;
>>>>>>> +
>>>>>>> +    /* Reference to the current packet */
>>>>>>> +    AVPacket *pkt_ref;
>>>>>>>     } FFV1Context;
>>>>>>>     int ff_ffv1_common_init(AVCodecContext *avctx, FFV1Context *s);
>>>>>>> diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
>>>>>>> index eaa21eebdf..6396f22f79 100644
>>>>>>> --- a/libavcodec/ffv1dec.c
>>>>>>> +++ b/libavcodec/ffv1dec.c
>>>>>>> @@ -469,6 +469,10 @@ static av_cold int decode_init(AVCodecContext
>>>>>>> *avctx)
>>>>>>>         f->pix_fmt = AV_PIX_FMT_NONE;
>>>>>>>         f->configured_pix_fmt = AV_PIX_FMT_NONE;
>>>>>>> +    f->pkt_ref = av_packet_alloc();
>>>>>>> +    if (!f->pkt_ref)
>>>>>>> +        return AVERROR(ENOMEM);
>>>>>>> +
>>>>>>>         if ((ret = ff_ffv1_common_init(avctx, f)) < 0)
>>>>>>>             return ret;
>>>>>>> @@ -701,6 +705,10 @@ static int decode_frame(AVCodecContext *avctx,
>>>>>>> AVFrame *rframe,
>>>>>>>         /* Start */
>>>>>>>         if (hwaccel) {
>>>>>>> +        ret = av_packet_ref(f->pkt_ref, avpkt);
>>>>>>> +        if (ret < 0)
>>>>>>> +            return ret;
>>>>>>> +
>>>>>>>             ret = hwaccel->start_frame(avctx, avpkt->data, avpkt-
>>>>>>>> size);
>>>>>>>             if (ret < 0)
>>>>>>>                 return ret;
>>>>>>> @@ -720,15 +728,21 @@ static int decode_frame(AVCodecContext *avctx,
>>>>>>> AVFrame *rframe,
>>>>>>>                 uint32_t len;
>>>>>>>                 ret = find_next_slice(avctx, avpkt->data, buf_end, i,
>>>>>>>                                       &pos, &len);
>>>>>>> -            if (ret < 0)
>>>>>>> +            if (ret < 0) {
>>>>>>> +                av_packet_unref(f->pkt_ref);
>>>>>>>                     return ret;
>>>>>>> +            }
>>>>>>>                 buf_end -= len;
>>>>>>>                 ret = hwaccel->decode_slice(avctx, pos, len);
>>>>>>> -            if (ret < 0)
>>>>>>> +            if (ret < 0) {
>>>>>>> +                av_packet_unref(f->pkt_ref);
>>>>>>>                     return ret;
>>>>>>> +            }
>>>>>>>             }
>>>>>>> +
>>>>>>> +        av_packet_unref(f->pkt_ref);
>>>>>>>         } else {
>>>>>>>             ret = decode_slices(avctx, c, avpkt);
>>>>>>>             if (ret < 0)
>>>>>>> @@ -827,6 +841,7 @@ static av_cold int
>>>>>>> ffv1_decode_close(AVCodecContext *avctx)
>>>>>>>         ff_progress_frame_unref(&s->last_picture);
>>>>>>>         av_refstruct_unref(&s->hwaccel_last_picture_private);
>>>>>>> +    av_packet_free(&s->pkt_ref);
>>>>>>>         ff_ffv1_close(s);
>>>>>>>         return 0;
>>>>>>
>>>>>> Why not simply use a const AVPacket*?
>>>>>
>>>>> No reason. Fixed locally.
>>>>> Thanks.
>>>>
>>>> *reverted this change.
>>>> We need to ref the packet, since we map its memory and let the GPU use
>>>> it directly without copying the contents. 6k16bit content at 24fps is
>>>> typically around 2Gbps when compressed, so avoiding copies is important.
>>>
>>> How long does the hwaccel need this data?
>>
>> Until the frame has been asynchronously decoded. We give an output frame
>> with a semaphore that receivers need to wait on to determine when that is.
>>
>> On the decoder-side, the hardware has a fixed number of queues where
>> submissions can be sent to asynchronously. We treat it as a ring buffer
>> and keep a reference to all resources our side for each submission,
>> until we need to reuse the slot, at which point we wait on the frame
>> decoding to complete (which it usually has), and we release all
>> resources used.
>>
>> Output frames also have a bit of state that has to be freed once the
>> frame is marked (unreferenced) by the decoder as no longer being needed
>> as a reference, this is done in the FFHWAccel.free_frame_priv callback.
>> There, we have to wait for the last internal use of the frame to be
>> finished (done via the vp->wait_semaphores() call in vulkan_decode.c).
>>
>> This is valid for both ASIC hardware decoders and a compute shader based
>> implementation, since the two share the same code, except for decode
>> submissions.
> 
> 1. If you need a reference to the packet's data, then reference
> AVPacket.buf, not the whole AVPacket. This avoids allocating a spare
> AVPacket as well as copying side data.
> 2. It sounds very wrong and fragile that the decoder has to keep a
> reference because the hwaccel might need it. There may be future
> hwaccels that don't need such a reference etc. It seems better to extend
> e.g. the start_frame callback and pass a reference to the input data (no
> need to change this for all other start_frame calls; they can pass NULL
> until needed).
> 3. If the user closes the decoder (which is allowed at any time, even
> without draining the decoder), ff_codec_close() uninitializes the
> hwaccel after calling the decoder's close function; the latter
> unreferences the reference to the packet. Is this really safe?

I wanted to avoid having to change the FFHWAccel API for a single 
hwaccel, so I opted to simply add a field to the codec private context.
I'll add an argument to start_frame and resubmit.

I don't see why it wouldn't be safe, upon uninit, all submissions are 
waited on to complete before unreferencing all resources they held.
The output frames themselves have no references to any resources used 
during decoding, so all resources are guaranteed to be freed at uninit.