[FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing: add format GRAY8 and GRAYF32 support

Fri Dec 13 16:40:15 EET 2019

Em sex., 13 de dez. de 2019 às 08:23, Guo, Yejun <yejun.guo at intel.com> escreveu:
>
> > From: Pedro Arthur [mailto:bygrandao at gmail.com]
> > Sent: Friday, December 13, 2019 12:45 AM
> > To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> > Cc: Guo, Yejun <yejun.guo at intel.com>
> > Subject: Re: [FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing: add
> > format GRAY8 and GRAYF32 support
> > Hi,
> >
> > how should I test this patch?
>
> the fourth patch of this patch set is the fate test for this feature, so I ignored comments here.
> I'll add the test descriptions back in v2.
>
> >
> > Em sex., 22 de nov. de 2019 às 04:57, Guo, Yejun <yejun.guo at intel.com>
> > escreveu:
> >
> > > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > > ---
> > >  doc/filters.texi                |   8 ++-
> > >  libavfilter/vf_dnn_processing.c | 147
> > > ++++++++++++++++++++++++++++++----------
> > >  2 files changed, 118 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/doc/filters.texi b/doc/filters.texi
> > > index 1f86ae1..c3f7997 100644
> > > --- a/doc/filters.texi
> > > +++ b/doc/filters.texi
> > > @@ -8992,7 +8992,13 @@ Set the input name of the dnn network.
> > >  Set the output name of the dnn network.
> > >
> > >  @item fmt
> > > -Set the pixel format for the Frame. Allowed values are
> > > @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}.
> > > +Set the pixel format for the Frame, the value is determined by the input
> > > of the dnn network model.
> > >
> > This sentence is a bit confusing, also I think this property should be
> > removed. (I will explain bellow).
>
> sure, no problem.
>
> >
> > +
> > > +If the model handles RGB (or BGR) image and the data type of model input
> > > is uint8, fmt must be @code{AV_PIX_FMT_RGB24} (or
> > @code{AV_PIX_FMT_BGR24}.
> > > +If the model handles RGB (or BGR) image and the data type of model input
> > > is float, fmt must be @code{AV_PIX_FMT_RGB24} (or
> > @code{AV_PIX_FMT_BGR24},
> > > and this filter will do data type conversion internally.
> > > +If the model handles GRAY image and the data type of model input is
> > > uint8, fmt must be @code{AV_PIX_FMT_GRAY8}.
> > > +If the model handles GRAY image and the data type of model input is
> > > float, fmt must be @code{AV_PIX_FMT_GRAYF32}.
> > > +
> > >  Default value is @code{AV_PIX_FMT_RGB24}.
> > >
> > >  @end table
> > > diff --git a/libavfilter/vf_dnn_processing.c
> > > b/libavfilter/vf_dnn_processing.c
> > > index ce976ec..963dd5e 100644
> > > --- a/libavfilter/vf_dnn_processing.c
> > > +++ b/libavfilter/vf_dnn_processing.c
> > > @@ -70,10 +70,12 @@ static av_cold int init(AVFilterContext *context)
> > >  {
> > >      DnnProcessingContext *ctx = context->priv;
> > >      int supported = 0;
> > > -    // as the first step, only rgb24 and bgr24 are supported
> > > +    // to support more formats
> > >      const enum AVPixelFormat supported_pixel_fmts[] = {
> > >          AV_PIX_FMT_RGB24,
> > >          AV_PIX_FMT_BGR24,
> > > +        AV_PIX_FMT_GRAY8,
> > > +        AV_PIX_FMT_GRAYF32,
> > >      };
> > >      for (int i = 0; i < sizeof(supported_pixel_fmts) / sizeof(enum
> > > AVPixelFormat); ++i) {
> > >          if (supported_pixel_fmts[i] == ctx->fmt) {
> > > @@ -156,14 +158,38 @@ static int config_input(AVFilterLink *inlink)
> > >          return AVERROR(EIO);
> > >      }
> > >
> > I think the filter should not check formats manually in the init function
> > (unless I'm missing something), it would be best if you query for all the
> > above supported formats in query_formats and later in config_input you make
> > sure the expected model format matches the frame format.
>
> I'm afraid it is too late if we find the format mismatch in function config_input.
>
> For example, the dnn module is designed to accept BGR24 data, and the actual
> format comes into config_input is RGB24 or YUV420P (we'll add yuv formats later in
> supported pixel fmts) or something else such as GRAY8. We have two choices:
>
> - return error, and the application ends.
>   This is not what we want.
> - return no_error, and do format conversion at the beginning of function filter_frame.
>   It makes this filter complex, and our implementation for the conversion might not be the best optimized.
>   My idea is to keep this filter simple. And the users can choose what they want to do conversion in the filters graph.
>
Indeed if the filter receives the format already converted is much
better, but if you already have to specify the format the model
expects as a parameter one could simply use the -pix_fmt to set the
format instead of having one more filter parameter.
Is there any downside if it uses "-pix_fmt" that "fmt" avoids?

> Regarding the below description in doc/filters.texi, the reason is that we do not have format such as rgbf32 and bgrf32, we have to do conversion within this filter.
> "If the model handles RGB (or BGR) image and the data type of model input is float, fmt must be @code{AV_PIX_FMT_RGB24} (or @code{AV_PIX
> _FMT_BGR24}, and this filter will do data type conversion internally."
>
>
> Consider the filter vf_dnn_analytic in my plan, I think the conversion is necessary since mostly a scale down is needed.
>
> >
> >
> > > -    if (model_input.channels != 3) {
> > > -        av_log(ctx, AV_LOG_ERROR, "the model requires input channels
> > > %d\n",
> > > -                                   model_input.channels);
> > > -        return AVERROR(EIO);
> > > -    }
> > > -    if (model_input.dt != DNN_FLOAT && model_input.dt != DNN_UINT8) {
> > > -        av_log(ctx, AV_LOG_ERROR, "only support dnn models with input
> > > data type as float32 and uint8.\n");
> > > -        return AVERROR(EIO);
> > > +    if (ctx->fmt == AV_PIX_FMT_RGB24 || ctx->fmt == AV_PIX_FMT_BGR24)
> > {
> > > +        if (model_input.channels != 3) {
> > > +            av_log(ctx, AV_LOG_ERROR, "channel number 3 is required,
> > but
> > > the actual channel number is %d\n",
> > > +                                       model_input.channels);
> > > +            return AVERROR(EIO);
> > > +        }
> > > +        if (model_input.dt != DNN_FLOAT && model_input.dt !=
> > DNN_UINT8) {
> > > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> > input
> > > data type as float32 and uint8.\n");
> > > +            return AVERROR(EIO);
> > > +        }
> > > +    } else if (ctx->fmt == AV_PIX_FMT_GRAY8) {
> > > +        if (model_input.channels != 1) {
> > > +            av_log(ctx, AV_LOG_ERROR, "channel number 1 is required,
> > but
> > > the actual channel number is %d\n",
> > > +                                       model_input.channels);
> > > +            return AVERROR(EIO);
> > > +        }
> > > +        if (model_input.dt != DNN_UINT8) {
> > > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> > input
> > > data type as uint8.\n");
> > > +            return AVERROR(EIO);
> > > +        }
> > > +    } else if (ctx->fmt == AV_PIX_FMT_GRAYF32) {
> > > +        if (model_input.channels != 1) {
> > > +            av_log(ctx, AV_LOG_ERROR, "channel number 1 is required,
> > but
> > > the actual channel number is %d\n",
> > > +                                       model_input.channels);
> > > +            return AVERROR(EIO);
> > > +        }
> > > +        if (model_input.dt != DNN_FLOAT) {
> > > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> > input
> > > data type as float.\n");
> > > +            return AVERROR(EIO);
> > > +        }
> > > +    } else {
> > > +        av_assert0(!"should not reach here.");
> > >      }
> > >
> > General comment on the above and following chained ifs testing pixel
> > formats, personally,  using switch(fmt) seems more readable.
>
> good point, I'll refine the code.