[FFmpeg-devel] [PATCH] add support for ROI-based encoding

Mon Dec 10 05:09:56 EET 2018

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: Monday, December 10, 2018 3:21 AM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] add support for ROI-based encoding
> 
> On 05/12/2018 09:58, Guo, Yejun wrote:
> > this patch is not ask for merge, it is more to get a feature feedback.
> >
> > The encoders such as libx264 support different QPs offset for
> > different MBs, it makes possible for ROI-based encoding. It makes
> > sense to add support within ffmpeg to generate/accept ROI infos and pass
> into encoders.
> >
> > Typical usage: After AVFrame is decoded, a ffmpeg filter or user's
> > code generates ROI info for that frame, and the encoder finally does
> > the ROI-based encoding. And so I choose to maintain the ROI info
> > within AVFrame struct.
> >
> > TODO:
> > - remove code in vf_scale.c, it is just an example to generate ROI
> > info
> > - use AVBufferRef instead of current implementation within AVFrame
> struct.
> > - add other encoders support
> >
> > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > ---
> >  libavcodec/libx264.c   | 35 +++++++++++++++++++++++++++++++++++
> >  libavfilter/vf_scale.c |  8 ++++++++
> >  libavutil/frame.c      |  9 +++++++++
> >  libavutil/frame.h      | 14 ++++++++++++++
> >  4 files changed, 66 insertions(+)
> 
> This approach seems reasonable to me; some thoughts below.

yeah, :)

> 
> > ...
> > diff --git a/libavutil/frame.c b/libavutil/frame.c index
> > 9b3fb13..9c38bdd 100644
> > --- a/libavutil/frame.c
> > +++ b/libavutil/frame.c
> > @@ -425,6 +425,15 @@ FF_DISABLE_DEPRECATION_WARNINGS
> > FF_ENABLE_DEPRECATION_WARNINGS  #endif
> >
> > +    dst->nb_rois = src->nb_rois;
> > +    for (int i = 0; i < dst->nb_rois; ++i) {
> > +        dst->rois[i].top = src->rois[i].top;
> > +        dst->rois[i].bottom = src->rois[i].bottom;
> > +        dst->rois[i].left = src->rois[i].left;
> > +        dst->rois[i].right = src->rois[i].right;
> > +        dst->rois[i].qoffset = src->rois[i].qoffset;
> > +    }
> > +
> >      av_buffer_unref(&dst->opaque_ref);
> >      av_buffer_unref(&dst->private_ref);
> >      if (src->opaque_ref) {
> > diff --git a/libavutil/frame.h b/libavutil/frame.h index
> > 66f27f4..b245a90 100644
> > --- a/libavutil/frame.h
> > +++ b/libavutil/frame.h
> > @@ -193,6 +193,15 @@ typedef struct AVFrameSideData {
> >      AVBufferRef *buf;
> >  } AVFrameSideData;
> >
> > +
> > +typedef struct AVFrameROI {
> > +    size_t top;
> > +    size_t bottom;
> > +    size_t left;
> > +    size_t right;
> 
> Do you want some additional constraints on this?  For example, must be
> integers in every plane (so even values only for 4:2:0), or maybe even must
> match some codec-specific block size.

my thought is they are the coordinates at frame pixel level no matter what format is.
And the ROI info generator can focus on their logic, do not care the underlying codec.

If the ROI area does not exactly match with the codec-specific block size, my thought
is to just internally extend the ROI area to match. This will added into the comments.

> 
> > +    float qoffset;
> 
> Everything else uses integers here, so this should be an integer.  The
> meaning of it will probably be entirely codec-dependent.

I chose float because the interface provided by libx264 is float, see https://github.com/mirror/x264/blob/master/x264.h#L760.
And I now think we might use a more general concept (enum AVRoiQuality), to make the codec features transparent to users.

enum AVRoiQuality {
    AV_RQ_NONE = 0,
    AV_RQ_BETTER = 1,
    AV_RQ_BEST = 2,
};

Actually, my initial idea is to export more control here, but it might
give the ROI generator burden to decide this value for different codecs.
For example, the ROI might be generated by a neural network (NN), and the NN
developers mainly focus on network model, not on the codec detail. 

It is still open for me to choose which direction, I now prefer to 'enum AVRoiQuality'

> 
> > +} AVFrameROI;
> > +
> >  /**
> >   * This structure describes decoded (raw) audio or video data.
> >   *
> > @@ -556,6 +565,11 @@ typedef struct AVFrame {
> >      attribute_deprecated
> >      AVBufferRef *qp_table_buf;
> >  #endif
> > +
> > +    //TODO: AVBufferRef*
> > +    AVFrameROI rois[2];
> > +    size_t nb_rois;
> 
> This should be side-data (which is automatically refcounted) rather than
> included directly in the AVFrame.

I guess it is AVBufferRef*, I will switch to it in the final patch. I'm not familiar with it
and so chose the array method for a quick implementation.

> 
> > +
> >      /**
> >       * For hwaccel-format frames, this should be a reference to the
> >       * AVHWFramesContext describing the frame.
> >
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel