[FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate hw_frames_ctx for vp9 without destroy va_context

Fri Sep 20 06:12:28 EEST 2019

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Fu, Linjie
> Sent: Wednesday, September 11, 2019 00:02
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate
> hw_frames_ctx for vp9 without destroy va_context
> 
> > -----Original Message-----
> > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> Behalf
> > Of Mark Thompson
> > Sent: Tuesday, September 10, 2019 08:02
> > To: ffmpeg-devel at ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate
> > hw_frames_ctx for vp9 without destroy va_context
> >
> > On 09/09/2019 16:40, Fu, Linjie wrote:
> > >> -----Original Message-----
> > >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > Behalf
> > >> Of Fu, Linjie
> > >> Sent: Friday, August 30, 2019 16:05
> > >> To: FFmpeg development discussions and patches <ffmpeg-
> > >> devel at ffmpeg.org>
> > >> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> recreate
> > >> hw_frames_ctx for vp9 without destroy va_context
> > >>
> > >>> -----Original Message-----
> > >>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > >> Behalf
> > >>> Of Fu, Linjie
> > >>> Sent: Friday, August 9, 2019 19:47
> > >>> To: FFmpeg development discussions and patches <ffmpeg-
> > >>> devel at ffmpeg.org>
> > >>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > recreate
> > >>> hw_frames_ctx for vp9 without destroy va_context
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > >>> Behalf
> > >>>> Of Hendrik Leppkes
> > >>>> Sent: Friday, August 9, 2019 17:40
> > >>>> To: FFmpeg development discussions and patches <ffmpeg-
> > >>>> devel at ffmpeg.org>
> > >>>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > >> recreate
> > >>>> hw_frames_ctx for vp9 without destroy va_context
> > >>>>
> > >>>> On Tue, Aug 6, 2019 at 10:55 AM Fu, Linjie <linjie.fu at intel.com>
> wrote:
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org]
> > On
> > >>>> Behalf
> > >>>>>> Of Hendrik Leppkes
> > >>>>>> Sent: Tuesday, August 6, 2019 16:27
> > >>>>>> To: FFmpeg development discussions and patches <ffmpeg-
> > >>>>>> devel at ffmpeg.org>
> > >>>>>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > >>>> recreate
> > >>>>>> hw_frames_ctx for vp9 without destroy va_context
> > >>>>>>
> > >>>>>> On Thu, Jul 11, 2019 at 10:20 AM Linjie Fu <linjie.fu at intel.com>
> > wrote:
> > >>>>>>>
> > >>>>>>> VP9 allows resolution changes per frame. Currently in VAAPI,
> > >>> resolution
> > >>>>>>> changes leads to va context destroy and reinit. This will cause
> > >>>>>>> reference frame surface lost and produce garbage.
> > >>>>>>>
> > >>>>>>> Though refs surface id could be passed to media driver and found
> > in
> > >>>>>>> RTtbl, vp9RefList[] in hal layer has already been destroyed. Thus
> > the
> > >>>>>>> new created VaContext could only got an empty RefList.
> > >>>>>>>
> > >>>>>>> As libva allows re-create surface separately without changing the
> > >>>>>>> context, this issue could be handled by only recreating
> > >>> hw_frames_ctx.
> > >>>>>>>
> > >>>>>>> Set hwaccel_priv_data_keeping flag for vp9 to only recreating
> > >>>>>>> hw_frame_ctx when dynamic resolution changing happens.
> > >>>>>>>
> > >>>>>>> Could be verified by:
> > >>>>>>> ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i
> > >>>>>>>   ./resolutions.ivf -pix_fmt p010le -f rawvideo -vframes 20 -y
> > >> vaapi.yuv
> > >>>>>>>
> > >>>>>>> Signed-off-by: Linjie Fu <linjie.fu at intel.com>
> > >>>>>>> ---
> > >>>>>>>  libavcodec/decode.c        | 10 +++++-----
> > >>>>>>>  libavcodec/internal.h      |  1 +
> > >>>>>>>  libavcodec/pthread_frame.c |  2 ++
> > >>>>>>>  libavcodec/vaapi_decode.c  | 40 ++++++++++++++++++++++----
> --
> > -
> > >> --
> > >>> --
> > >>>> -----
> > >>>>>> --
> > >>>>>>>  libavcodec/vaapi_vp9.c     |  4 ++++
> > >>>>>>>  5 files changed, 34 insertions(+), 23 deletions(-)
> > >>>>>>>
> > >>>>>>> diff --git a/libavcodec/decode.c b/libavcodec/decode.c
> > >>>>>>> index 0863b82..7b15fa5 100644
> > >>>>>>> --- a/libavcodec/decode.c
> > >>>>>>> +++ b/libavcodec/decode.c
> > >>>>>>> @@ -1254,7 +1254,6 @@ int
> > >>>>>> ff_decode_get_hw_frames_ctx(AVCodecContext *avctx,
> > >>>>>>>
> > >>>>>>>      frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx-
> > >>>> data;
> > >>>>>>>
> > >>>>>>> -
> > >>>>>>>      if (frames_ctx->initial_pool_size) {
> > >>>>>>>          // We guarantee 4 base work surfaces. The function above
> > >>>> guarantees
> > >>>>>> 1
> > >>>>>>>          // (the absolute minimum), so add the missing count.
> > >>>>>>
> > >>>>>> Unrelated whitespace change
> > >>>>>
> > >>>>> There is  a redundant whitespace here, so I removed it within this
> > patch.
> > >>>>>
> > >>>>>>> @@ -1333,7 +1332,7 @@ static int hwaccel_init(AVCodecContext
> > >>>> *avctx,
> > >>>>>>>          return AVERROR_PATCHWELCOME;
> > >>>>>>>      }
> > >>>>>>>
> > >>>>>>> -    if (hwaccel->priv_data_size) {
> > >>>>>>> +    if (hwaccel->priv_data_size && !avctx->internal-
> > >>>>> hwaccel_priv_data) {
> > >>>>>>>          avctx->internal->hwaccel_priv_data =
> > >>>>>>>              av_mallocz(hwaccel->priv_data_size);
> > >>>>>>>          if (!avctx->internal->hwaccel_priv_data)
> > >>>>>>> @@ -1396,9 +1395,10 @@ int ff_get_format(AVCodecContext
> > >> *avctx,
> > >>>>>> const enum AVPixelFormat *fmt)
> > >>>>>>>      memcpy(choices, fmt, (n + 1) * sizeof(*choices));
> > >>>>>>>
> > >>>>>>>      for (;;) {
> > >>>>>>> -        // Remove the previous hwaccel, if there was one.
> > >>>>>>> -        hwaccel_uninit(avctx);
> > >>>>>>> -
> > >>>>>>> +        // Remove the previous hwaccel, if there was one,
> > >>>>>>> +        // and no need for keeping.
> > >>>>>>> +        if (!avctx->internal->hwaccel_priv_data_keeping)
> > >>>>>>> +            hwaccel_uninit(avctx);
> > >>>>>>>          user_choice = avctx->get_format(avctx, choices);
> > >>>>>>>          if (user_choice == AV_PIX_FMT_NONE) {
> > >>>>>>>              // Explicitly chose nothing, give up.
> > >>>>>>
> > >>>>>> There could be a dozen special cases how stuff can go wrong here.
> > >>> What
> > >>>>>> if get_format actually returns a different format then the one
> > >>>>>> currently in use? Or a software format?
> > >>>>>> Just removing this alone is not safe.
> > >>>>>
> > >>>>> Didn't quite get your point.
> > >>>>> IMHO,  avctx->internal->hwaccel_priv_data_keeping won't be set in
> > >>> other
> > >>>> cases
> > >>>>> other than vaapi_vp9, so this patch won't break the default pipeline,
> > >> and
> > >>>>> hwaccel_uninit(avctx) will always be called.
> > >>>>>
> > >>>>
> > >>>> The point is that you cannot rely on get_format to return the same
> > >>>> format that it previously did. It could return a software format, or
> > >>>> in some cases possibly even a different hardware format. And you
> just
> > >>>> don't handle that.
> > >>>
> > >>> Got it. Thanks for the explanation, it should be reconsidered in case it
> > >>> happens.
> > >>>
> > >>>> The entire approach here smells a bit of hack. Lets try to think this
> > >>>> through and do it properly. One idea that comes to mind is a new
> > >>>> hwaccel callback that checks if a in-place re-init is possible without
> > >>>> destroying everything. This could be used for a multitude of different
> > >>>> situations, and not just this one special case.
> > >>>
> > >>> Sounds great, and just FYI, this similar issue is reproduced with
> > >> nvdec/dxva2
> > >>> as well. Clips and some details are provided on trac #8068 in case you
> and
> > >>> other developers may be interested in or need to verify your solution.
> > >>> http://trac.ffmpeg.org/ticket/8068
> > >>
> > >> Any step-further progress for the hwaccel callback methods or
> something
> > I
> > >> can
> > >> help to fix this gap?
> > >>
> > >
> > > Ping?
> > > A general solution works for multitude situation is great to me, and how
> > about having
> > > one solution specific for vp9 which introduces no regression as the first
> > step,
> > > since there are lots of cases(1400 +) failed/blocked and could be fixed by
> > this patch.
> > >
> > > This blocked quite a lot, please comment what I can do to get this step
> > further.
> >
> > I still don't understand how the error here can be in FFmpeg - it looks more
> > like a driver problem to me.
> >
> > The sequence with VAAPI using HW_CONFIG_METHOD_HW_DEVICE_CTX
> on
> > your lena_resolution_change_on_inter_frame.ivf should be as follows:
> >
> > 1.  The header of frame 0 is read, it's a key frame with resolution is 352x288.
> > 2.  The user-supplied get_format() is called, they pick AV_PIX_FMT_VAAPI
> > and supply a VAAPI device.
> > 3.  Output/reference surfaces are created in a new frames context at
> > 352x288.
> > 4.  A decoder context is created for VP9 at 352x288, rendering to the
> surfaces
> > created in the previous step.
> > 5.  Frame 0 is decoded and placed in all reference slots.
> > 6.  Frames 1-49 are decoded normally, they overwrite slots 0 and 1 only.
> > 7.  The header of frame 50 is read, it's an inter frame but with a new
> > resolution of 240x196.
> > 8.  The old decoder context is discarded, since it has the wrong resolution
> and
> > is bound to the wrong render targets.
> > 9.  The old frames context is unreferenced, but references remain to its
> > frames in slots 2-7 so the actual frames themselves stay around.
> > 10. The user-supplied get_format() is called, they pick AV_PIX_FMT_VAAPI
> > again.
> > 11. Output/reference surfaces are created in a new frames context at
> > 240x196.
> > 12. A new decoder context is created for VP9 at 240x196, rendering to the
> > new surfaces.
> > 13. Frame 50 is decoded with reference to frame slots 0, 1 and 2 (those are
> all
> > in the old frames context and have the old resolution); the result is placed
> in
> > slots 0 and 1.
> > 14. Frames 51-100 are all decoded with reference to slots 0, 1 and 2,
> > overwriting slots 0 and/or 1 only (in every case slot 2 still contains the
> original
> > key frame).
> >
> > (Using HW_CONFIG_METHOD_HW_FRAMES_CTX the main difference is
> that
> > steps 3 and 11 would be removed, replaced by user action in the
> get_format()
> > callback in the steps immediately preceding them.)
> >
> > The iHD driver indeed returns "internal error" immediately on step 13.
> > However, looking at traces made with libva everything about that render
> call
> > looks correct - the decoder context is the new one which matches the
> > resolution and render target, and the right surfaces are provided as
> > reference frames (in the old frames context, but definitely haven't been
> > destroyed).
> >
> > So, can you explain more about what is going wrong?
> >
> 
> Hi Mark,
> 
> Thanks for the detailed response. If I got it correctly, the concern is mainly
> about
> "since the reference surface is definitely not destroyed, destroy/recreate
> context
> shouldn't have blocked the decode for vp9"
> 
> Actually, there are some intermedia dependencies in driver beside reference
> frame itself.
> For example,
> 1. Motion vector dependency.
> 
> As chapter 5.13 (motion vector prediction )in VP9 spec has said:
> "When the spatial neighbors do not provide sufficient information, it can fall
> back to using the
> Motion vectors from the previous decoded frames".
> 
> MV buffer is the dependency across DPB, driver have to allocate the internal
> buffer to store the
> MV information associated to each frame on DPB.
> We need previous frame’s mv, and we hold mv  in decoder context.
> Destroying the context would
> cause the loss of mv information.
> 
> 2.Besides, segmentation map buffer is the another dependency.
> 
> It is allocated in driver as well, and could be updated by every frame and
> impact next frame.
> The segmentation map is coded differentially across frames in order to
> minimize the bit overheads.
> 
> As a consequence, context destroying would block the decode for clips with
> resolution changing on
> inter frames.
> 
A friendly ping in case this message is missed.

- linjie