[FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing: add format GRAY8 and GRAYF32 support

Fri Dec 13 13:23:30 EET 2019

> From: Pedro Arthur [mailto:bygrandao at gmail.com]
> Sent: Friday, December 13, 2019 12:45 AM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Cc: Guo, Yejun <yejun.guo at intel.com>
> Subject: Re: [FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing: add
> format GRAY8 and GRAYF32 support
> Hi,
> 
> how should I test this patch?

the fourth patch of this patch set is the fate test for this feature, so I ignored comments here.
I'll add the test descriptions back in v2.

> 
> Em sex., 22 de nov. de 2019 às 04:57, Guo, Yejun <yejun.guo at intel.com>
> escreveu:
> 
> > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > ---
> >  doc/filters.texi                |   8 ++-
> >  libavfilter/vf_dnn_processing.c | 147
> > ++++++++++++++++++++++++++++++----------
> >  2 files changed, 118 insertions(+), 37 deletions(-)
> >
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index 1f86ae1..c3f7997 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -8992,7 +8992,13 @@ Set the input name of the dnn network.
> >  Set the output name of the dnn network.
> >
> >  @item fmt
> > -Set the pixel format for the Frame. Allowed values are
> > @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}.
> > +Set the pixel format for the Frame, the value is determined by the input
> > of the dnn network model.
> >
> This sentence is a bit confusing, also I think this property should be
> removed. (I will explain bellow).

sure, no problem.

> 
> +
> > +If the model handles RGB (or BGR) image and the data type of model input
> > is uint8, fmt must be @code{AV_PIX_FMT_RGB24} (or
> @code{AV_PIX_FMT_BGR24}.
> > +If the model handles RGB (or BGR) image and the data type of model input
> > is float, fmt must be @code{AV_PIX_FMT_RGB24} (or
> @code{AV_PIX_FMT_BGR24},
> > and this filter will do data type conversion internally.
> > +If the model handles GRAY image and the data type of model input is
> > uint8, fmt must be @code{AV_PIX_FMT_GRAY8}.
> > +If the model handles GRAY image and the data type of model input is
> > float, fmt must be @code{AV_PIX_FMT_GRAYF32}.
> > +
> >  Default value is @code{AV_PIX_FMT_RGB24}.
> >
> >  @end table
> > diff --git a/libavfilter/vf_dnn_processing.c
> > b/libavfilter/vf_dnn_processing.c
> > index ce976ec..963dd5e 100644
> > --- a/libavfilter/vf_dnn_processing.c
> > +++ b/libavfilter/vf_dnn_processing.c
> > @@ -70,10 +70,12 @@ static av_cold int init(AVFilterContext *context)
> >  {
> >      DnnProcessingContext *ctx = context->priv;
> >      int supported = 0;
> > -    // as the first step, only rgb24 and bgr24 are supported
> > +    // to support more formats
> >      const enum AVPixelFormat supported_pixel_fmts[] = {
> >          AV_PIX_FMT_RGB24,
> >          AV_PIX_FMT_BGR24,
> > +        AV_PIX_FMT_GRAY8,
> > +        AV_PIX_FMT_GRAYF32,
> >      };
> >      for (int i = 0; i < sizeof(supported_pixel_fmts) / sizeof(enum
> > AVPixelFormat); ++i) {
> >          if (supported_pixel_fmts[i] == ctx->fmt) {
> > @@ -156,14 +158,38 @@ static int config_input(AVFilterLink *inlink)
> >          return AVERROR(EIO);
> >      }
> >
> I think the filter should not check formats manually in the init function
> (unless I'm missing something), it would be best if you query for all the
> above supported formats in query_formats and later in config_input you make
> sure the expected model format matches the frame format.

I'm afraid it is too late if we find the format mismatch in function config_input.

For example, the dnn module is designed to accept BGR24 data, and the actual
format comes into config_input is RGB24 or YUV420P (we'll add yuv formats later in
supported pixel fmts) or something else such as GRAY8. We have two choices:

- return error, and the application ends.
  This is not what we want.
- return no_error, and do format conversion at the beginning of function filter_frame.
  It makes this filter complex, and our implementation for the conversion might not be the best optimized.
  My idea is to keep this filter simple. And the users can choose what they want to do conversion in the filters graph.

Regarding the below description in doc/filters.texi, the reason is that we do not have format such as rgbf32 and bgrf32, we have to do conversion within this filter.
"If the model handles RGB (or BGR) image and the data type of model input is float, fmt must be @code{AV_PIX_FMT_RGB24} (or @code{AV_PIX
_FMT_BGR24}, and this filter will do data type conversion internally."

Consider the filter vf_dnn_analytic in my plan, I think the conversion is necessary since mostly a scale down is needed.

> 
> 
> > -    if (model_input.channels != 3) {
> > -        av_log(ctx, AV_LOG_ERROR, "the model requires input channels
> > %d\n",
> > -                                   model_input.channels);
> > -        return AVERROR(EIO);
> > -    }
> > -    if (model_input.dt != DNN_FLOAT && model_input.dt != DNN_UINT8) {
> > -        av_log(ctx, AV_LOG_ERROR, "only support dnn models with input
> > data type as float32 and uint8.\n");
> > -        return AVERROR(EIO);
> > +    if (ctx->fmt == AV_PIX_FMT_RGB24 || ctx->fmt == AV_PIX_FMT_BGR24)
> {
> > +        if (model_input.channels != 3) {
> > +            av_log(ctx, AV_LOG_ERROR, "channel number 3 is required,
> but
> > the actual channel number is %d\n",
> > +                                       model_input.channels);
> > +            return AVERROR(EIO);
> > +        }
> > +        if (model_input.dt != DNN_FLOAT && model_input.dt !=
> DNN_UINT8) {
> > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> input
> > data type as float32 and uint8.\n");
> > +            return AVERROR(EIO);
> > +        }
> > +    } else if (ctx->fmt == AV_PIX_FMT_GRAY8) {
> > +        if (model_input.channels != 1) {
> > +            av_log(ctx, AV_LOG_ERROR, "channel number 1 is required,
> but
> > the actual channel number is %d\n",
> > +                                       model_input.channels);
> > +            return AVERROR(EIO);
> > +        }
> > +        if (model_input.dt != DNN_UINT8) {
> > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> input
> > data type as uint8.\n");
> > +            return AVERROR(EIO);
> > +        }
> > +    } else if (ctx->fmt == AV_PIX_FMT_GRAYF32) {
> > +        if (model_input.channels != 1) {
> > +            av_log(ctx, AV_LOG_ERROR, "channel number 1 is required,
> but
> > the actual channel number is %d\n",
> > +                                       model_input.channels);
> > +            return AVERROR(EIO);
> > +        }
> > +        if (model_input.dt != DNN_FLOAT) {
> > +            av_log(ctx, AV_LOG_ERROR, "only support dnn models with
> input
> > data type as float.\n");
> > +            return AVERROR(EIO);
> > +        }
> > +    } else {
> > +        av_assert0(!"should not reach here.");
> >      }
> >
> General comment on the above and following chained ifs testing pixel
> formats, personally,  using switch(fmt) seems more readable.

good point, I'll refine the code.