[FFmpeg-devel] [PATCH V6] Add a filter implementing HDR image generation from a single exposure using deep CNNs

Guo, Yejun yejun.guo at intel.com
Wed Nov 28 09:35:48 EET 2018


thanks for the reviews, let me summarize the unfixed issues and my plan.

- more resolutions support besides 1080p. (comment from Vittorio Giovara, Liu Steven, Li Zhong)
I've sent an issue to tensorflow to explain the issue and provide a possible solution, 
see https://github.com/tensorflow/tensorflow/issues/2118#issuecomment-441146241.
Before it is finally fixed by tensorflow, as a workaround, I'll prepare more model files
for typical resolutions, one model file for one resolution. And the user need to choose
the correct model file for the given resolution.

- native mode support.   (comment from Pedro Arthur, Liu Steven)
There are 16 ops not supported now,  I plan to add them one by one. And there is 
another thing to add a tool to write the native model file directly or convert
from TF model file.

- metadata for HDR video encoding. (comment from Vittorio Giovara, Li Zhong)
will figure out a method for it.


> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Guo, Yejun
> Sent: Friday, November 16, 2018 10:30 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH V6] Add a filter implementing HDR image
> generation from a single exposure using deep CNNs
> 
> see the algorithm's paper and code below.
> 
> the filter's parameter looks like:
> sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmt=gbrp10l
> e
> 
> The input of the deep CNN model is RGB24 while the output is float for each
> color channel. This is the filter's default behavior to output format with
> gbrpf32le. And gbrp10le is also supported as the output, so we can see the
> rendering result in a player, as a reference.
> 
> To generate the model file, we need modify the original script a little.
> - set name='y' for y_final within script at
> https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py
> - add the following code to the script at
> https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict.py
> 
> graph = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def,
> ["y"]) tf.train.write_graph(graph, '.', 'graph.pb', as_text=False)
> 
> And I also uploaded the model file under
> https://drive.google.com/drive/folders/1URsRY5g-VdE-kHlP5vQoLoimMIZ-
> SX00?usp=sharing.
> 
> The filter only works when tensorflow C api is supported in the system,
> native backend is not supported since there are some different types of
> layers in the deep CNN model, besides CONV and DEPTH_TO_SPACE.
> 
> https://arxiv.org/pdf/1710.07480.pdf:
>   author       = "Eilertsen, Gabriel and Kronander, Joel, and Denes, Gyorgy and
> Mantiuk, RafaƂ and Unger, Jonas",
>   title        = "HDR image reconstruction from a single exposure using deep
> CNNs",
>   journal      = "ACM Transactions on Graphics (TOG)",
>   number       = "6",
>   volume       = "36",
>   articleno    = "178",
>   year         = "2017"
> 
> https://github.com/gabrieleilertsen/hdrcnn
> 
> btw, as a whole solution, metadata should also be generated from the sdr
> video, so to be encoded as a HDR video. Not supported yet.
> This patch just focuses on this paper.
> 
> This filter accepts 8bit frame (RGB24) and outputs 10bit/float frame, and
> there's no reference image, so it is not feasible to use criteria such as PNSR,
> SSIM.
> 
> I choose the same method described in the paper to demo the filter effect,
> that means the frames before/after the filter are reduced by 3 stops.
> 
> The native video (test.native.mp4) is created from 7 png files at
> https://github.com/gabrieleilertsen/hdrcnn/tree/master/data (the size of
> the image is enlarged to 1920*1080 with extra area filled with white) with
> command line:
> ffmpeg -f image2 -i ./img_%03d.png -c:v libx264 -preset veryslow -crf 1
> test.native.mp4.
> 
> And two rgb24 videos are generated before/after the filter with -3 stops by
> modifying the code a little, see in the video folder at the google drive (the
> same place as where the model file locates).
> 
> For your convenient, I also dump png files from generated videos and
> combine the before/after pngs into one file, see in png folder at the google
> drive.
> 
> Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> ---
>  configure                |   1 +
>  doc/filters.texi         |  38 +++++++
>  libavfilter/Makefile     |   1 +
>  libavfilter/allfilters.c |   1 +
>  libavfilter/vf_sdr2hdr.c | 270
> +++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 311 insertions(+)
>  create mode 100644 libavfilter/vf_sdr2hdr.c
> 
> diff --git a/configure b/configure
> index 9bc4cf3..08db4eb 100755
> --- a/configure
> +++ b/configure
> @@ -3447,6 +3447,7 @@ sab_filter_deps="gpl swscale"
>  scale2ref_filter_deps="swscale"
>  scale_filter_deps="swscale"
>  scale_qsv_filter_deps="libmfx"
> +sdr2hdr_filter_deps="libtensorflow"
>  select_filter_select="scene_sad"
>  sharpness_vaapi_filter_deps="vaapi"
>  showcqt_filter_deps="avcodec avformat swscale"
> diff --git a/doc/filters.texi b/doc/filters.texi index ab58e53..86432d9 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -14872,6 +14872,44 @@ Scale a subtitle stream (b) to match the main
> video (a) in size before overlayin  @end example  @end itemize
> 
> + at section sdr2hdr
> +
> +HDR image generation from a single exposure using deep CNNs with
> TensorFlow C library.
> +The input format of the filter is RGB24, and now only supports
> +resolution with 1920*1080, there's no meta data generated for HDR video
> yet.
> +
> + at itemize
> + at item
> +paper:  see @url{https://arxiv.org/pdf/1710.07480.pdf}
> +
> + at item
> +code with model and trained parameters: see
> + at url{https://github.com/gabrieleilertsen/hdrcnn}
> + at end itemize
> +
> +The filter accepts the following options:
> +
> + at table @option
> +
> + at item model_filename
> +Set path to model file specifying network architecture and its
> +parameters, can download from
> + at url{https://drive.google.com/drive/folders/1URsRY5g-VdE-
> kHlP5vQoLoimMI
> +Z-SX00?usp=sharing}
> +
> + at item out_fmt
> +the data format of the filter's output.
> +
> +It accepts the following values:
> + at table @samp
> + at item gbrpf32le
> +force gbrpf32le output
> +
> + at item gbrp10le
> +force gbrp10le output
> + at end table
> +
> +Default value is @samp{gbrpf32le}.
> +
> + at end table
> +
>  @anchor{selectivecolor}
>  @section selectivecolor
> 
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile index a7ebd02..7ad8250
> 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -366,6 +366,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER)           +=
> vf_convolution_opencl.o opencl.o
>  OBJS-$(CONFIG_SPLIT_FILTER)                  += split.o
>  OBJS-$(CONFIG_SPP_FILTER)                    += vf_spp.o
>  OBJS-$(CONFIG_SR_FILTER)                     += vf_sr.o
> +OBJS-$(CONFIG_SDR2HDR_FILTER)                += vf_sdr2hdr.o
>  OBJS-$(CONFIG_SSIM_FILTER)                   += vf_ssim.o framesync.o
>  OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
>  OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o
> framesync.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 484b080..622f9f3
> 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -322,6 +322,7 @@ extern AVFilter ff_vf_scale_npp;  extern AVFilter
> ff_vf_scale_qsv;  extern AVFilter ff_vf_scale_vaapi;  extern AVFilter
> ff_vf_scale2ref;
> +extern AVFilter ff_vf_sdr2hdr;
>  extern AVFilter ff_vf_select;
>  extern AVFilter ff_vf_selectivecolor;
>  extern AVFilter ff_vf_sendcmd;
> diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c new file mode
> 100644 index 0000000..fcee404
> --- /dev/null
> +++ b/libavfilter/vf_sdr2hdr.c
> @@ -0,0 +1,270 @@
> +/*
> + * Copyright (c) 2018 Guo Yejun
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> +02110-1301 USA  */
> +
> +/**
> + * @file
> + * Filter implementing HDR image generation from a single exposure using
> deep CNNs.
> + * https://arxiv.org/pdf/1710.07480.pdf
> + */
> +
> +#include "avfilter.h"
> +#include "formats.h"
> +#include "internal.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/qsort.h"
> +#include "libavformat/avio.h"
> +#include "libswscale/swscale.h"
> +#include "dnn_interface.h"
> +#include <math.h>
> +
> +typedef struct SDR2HDRContext {
> +    const AVClass *class;
> +
> +    char* model_filename;
> +    enum AVPixelFormat out_fmt;
> +    DNNModule* dnn_module;
> +    DNNModel* model;
> +    DNNData input, output;
> +} SDR2HDRContext;
> +
> +#define OFFSET(x) offsetof(SDR2HDRContext, x) #define FLAGS
> +AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM static
> const
> +AVOption sdr2hdr_options[] = {
> +    { "model_filename", "path to model file specifying network architecture
> and its parameters", OFFSET(model_filename), AV_OPT_TYPE_STRING,
> {.str=NULL}, 0, 0, FLAGS },
> +    { "out_fmt", "the data format of the filter's output, it could be gbrpf32le
> [default] or gbrp10le", OFFSET(out_fmt), AV_OPT_TYPE_PIXEL_FMT,
> {.i64=AV_PIX_FMT_GBRPF32LE}, AV_PIX_FMT_NONE, AV_PIX_FMT_NB - 1,
> FLAGS },
> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(sdr2hdr);
> +
> +static av_cold int init(AVFilterContext* context) {
> +    SDR2HDRContext* ctx = context->priv;
> +
> +    if (ctx->out_fmt != AV_PIX_FMT_GBRPF32LE && ctx->out_fmt !=
> AV_PIX_FMT_GBRP10LE) {
> +        av_log(context, AV_LOG_ERROR, "could not support the output
> format\n");
> +        return AVERROR(ENOSYS);
> +    }
> +
> +    ctx->dnn_module = ff_get_dnn_module(DNN_TF);
> +    if (!ctx->dnn_module){
> +        av_log(context, AV_LOG_ERROR, "could not create DNN module for
> tensorflow backend\n");
> +        return AVERROR(ENOMEM);
> +    }
> +    if (!ctx->model_filename){
> +        av_log(context, AV_LOG_ERROR, "model file for network was not
> specified\n");
> +        return AVERROR(EIO);
> +    }
> +    if (!ctx->dnn_module->load_model) {
> +        av_log(context, AV_LOG_ERROR, "load_model for network was not
> specified\n");
> +        return AVERROR(EIO);
> +    }
> +    ctx->model = (ctx->dnn_module->load_model)(ctx->model_filename);
> +    if (!ctx->model){
> +        av_log(context, AV_LOG_ERROR, "could not load DNN model\n");
> +        return AVERROR(EIO);
> +    }
> +    return 0;
> +}
> +
> +static int query_formats(AVFilterContext* context) {
> +    const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24,
> +                                             AV_PIX_FMT_NONE};
> +    enum AVPixelFormat out_formats[2];
> +    SDR2HDRContext* ctx = context->priv;
> +    AVFilterFormats* formats_list;
> +    int ret = 0;
> +
> +    formats_list = ff_make_format_list(in_formats);
> +    if ((ret = ff_formats_ref(formats_list, &context->inputs[0]->out_formats))
> < 0)
> +        return ret;
> +
> +    out_formats[0] = ctx->out_fmt;
> +    out_formats[1] = AV_PIX_FMT_NONE;
> +    formats_list = ff_make_format_list(out_formats);
> +    if ((ret = ff_formats_ref(formats_list, &context->outputs[0]->in_formats))
> < 0)
> +        return ret;
> +
> +    return 0;
> +}
> +
> +static int config_props(AVFilterLink* inlink) {
> +    AVFilterContext* context = inlink->dst;
> +    SDR2HDRContext* ctx = context->priv;
> +    AVFilterLink* outlink = context->outputs[0];
> +    DNNReturnType result;
> +
> +    // the dnn model is tied with resolution due to deconv layer of tensorflow
> +    // now just support 1920*1080 and so the magic numbers within this file
> +    if (inlink->w != 1920 || inlink->h != 1080) {
> +        av_log(context, AV_LOG_ERROR, "only support frame size with
> 1920*1080\n");
> +        return AVERROR(ENOSYS);
> +     }
> +
> +    ctx->input.width = inlink->w;
> +    ctx->input.height = FFALIGN(inlink->h, 32);  //the model requires height is
> a multiple of 32,
> +    ctx->input.channels = 3;
> +
> +    result = (ctx->model->set_input_output)(ctx->model->model, &ctx-
> >input, &ctx->output);
> +    if (result != DNN_SUCCESS){
> +        av_log(context, AV_LOG_ERROR, "could not set input and output for
> the model\n");
> +        return AVERROR(EIO);
> +    }
> +
> +    memset(ctx->input.data, 0, ctx->input.channels * ctx->input.width * ctx-
> >input.height * sizeof(float));
> +    outlink->h = inlink->h;
> +    outlink->w = inlink->w;
> +    return 0;
> +}
> +
> +static float qsort_comparison_function_float(const void *a, const void
> +*b) {
> +    return *(const float *)a - *(const float *)b; }
> +
> +static int filter_frame(AVFilterLink* inlink, AVFrame* in) {
> +    DNNReturnType dnn_result = DNN_SUCCESS;
> +    AVFilterContext* context = inlink->dst;
> +    SDR2HDRContext* ctx = context->priv;
> +    AVFilterLink* outlink = context->outputs[0];
> +    AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    int total_pixels = in->height * in->width;
> +
> +    if (!out){
> +        av_log(context, AV_LOG_ERROR, "could not allocate memory for
> output frame\n");
> +        av_frame_free(&in);
> +        return AVERROR(ENOMEM);
> +    }
> +
> +    av_frame_copy_props(out, in);
> +
> +    for (int i = 0; i < in->linesize[0] * in->height; ++i) {
> +        ctx->input.data[i] = in->data[0][i] / 255.0f;
> +    }
> +
> +    dnn_result = (ctx->dnn_module->execute_model)(ctx->model);
> +    if (dnn_result != DNN_SUCCESS){
> +        av_log(context, AV_LOG_ERROR, "failed to execute loaded model\n");
> +        return AVERROR(EIO);
> +    }
> +
> +    if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) {
> +        float* outg = (float*)out->data[0];
> +        float* outb = (float*)out->data[1];
> +        float* outr = (float*)out->data[2];
> +        for (int i = 0; i < total_pixels; ++i) {
> +            float r = ctx->output.data[i*3];
> +            float g = ctx->output.data[i*3+1];
> +            float b = ctx->output.data[i*3+2];
> +            outr[i] = r;
> +            outg[i] = g;
> +            outb[i] = b;
> +        }
> +    } else if (ctx->out_fmt == AV_PIX_FMT_GBRP10LE) {
> +        // here, we just use a rough mapping to the 10bit contents.
> +        // meta data generation for HDR video encoding is not supported yet
> +        float* converted_data = (float*)av_malloc(total_pixels * 3 *
> sizeof(float));
> +        int16_t* outg = (int16_t*)out->data[0];
> +        int16_t* outb = (int16_t*)out->data[1];
> +        int16_t* outr = (int16_t*)out->data[2];
> +
> +        float max = 1.0f;
> +        for (int i = 0; i < total_pixels * 3; ++i) {
> +            float d = ctx->output.data[i];
> +            d = sqrt(d);
> +            converted_data[i] = d;
> +            max = FFMAX(d, max);
> +        }
> +
> +        if (max > 1.0f) {
> +            AV_QSORT(converted_data, total_pixels * 3, float,
> qsort_comparison_function_float);
> +            // 0.5% pixels are clipped
> +            max = converted_data[(int)(total_pixels * 3 * 0.995)];
> +            max = FFMAX(max, 1.0f);
> +
> +            for (int i = 0; i < total_pixels * 3; ++i) {
> +                float d = ctx->output.data[i];
> +                d = sqrt(d);
> +                d = FFMIN(d, max);
> +                converted_data[i] = d;
> +            }
> +        }
> +
> +        for (int i = 0; i < total_pixels; ++i) {
> +            float r = converted_data[i*3];
> +            float g = converted_data[i*3+1];
> +            float b = converted_data[i*3+2];
> +            outr[i] = r / max * 1023;
> +            outg[i] = g / max * 1023;
> +            outb[i] = b / max * 1023;
> +        }
> +
> +        av_free(converted_data);
> +    } else {
> +        assert(!"should not reach here");
> +    }
> +
> +    av_frame_free(&in);
> +    return ff_filter_frame(outlink, out); }
> +
> +static av_cold void uninit(AVFilterContext* context) {
> +    SDR2HDRContext* ctx = context->priv;
> +
> +    if (ctx->dnn_module){
> +        (ctx->dnn_module->free_model)(&ctx->model);
> +        av_freep(&ctx->dnn_module);
> +    }
> +}
> +
> +static const AVFilterPad sdr2hdr_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +        .config_props = config_props,
> +        .filter_frame = filter_frame,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad sdr2hdr_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_VIDEO,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_vf_sdr2hdr = {
> +    .name          = "sdr2hdr",
> +    .description   = NULL_IF_CONFIG_SMALL("HDR image generation from a
> single exposure using deep CNNs."),
> +    .priv_size     = sizeof(SDR2HDRContext),
> +    .init          = init,
> +    .uninit        = uninit,
> +    .query_formats = query_formats,
> +    .inputs        = sdr2hdr_inputs,
> +    .outputs       = sdr2hdr_outputs,
> +    .priv_class    = &sdr2hdr_class,
> +    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC,
> +};
> --
> 2.7.4
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list