[FFmpeg-devel] [PATCH] delogo filter: new "uglarm" interpolation mode added

Paul B Mahol onemda at gmail.com
Sun Dec 30 15:53:03 EET 2018


On 12/30/18, Nicolas George <george at nsup.org> wrote:
> Uwe Freese (2018-12-29):
>> The downside is that the term "power" (or "strength") would be more
>> generic
>> and would probably be better reusable for other modes.
>>
>> What do you think? Is "exponent" now ok, or should I change it back to
>> "power", "strength" or something alike?
>
> I would say: do not bother. This can be decided when a new mode is
> added, and options can have several names.
>
> By the way, you would not be interested in developing a logo-detection
> filter, per chance? Something that detects is a certain rectangle is
> likely to contain a logo and computes its bounding box.
>
>> Again, I would be glad about comments and reviews. - Let me know what
>> should
>> be changed and I'll create a new patch in some days.
>
> Here is a more in-depth review from me:
>
>> >From 83e79abb3311eb2dd92c63e8e15e0476af2f2891 Mon Sep 17 00:00:00 2001
>> From: breaker27 <mail at uwe-freese.de>
>> Date: Wed, 26 Dec 2018 18:16:48 +0100
>> Subject: [PATCH] avfilter/vf_delogo: add uglarm interpolation mode
>>
>> ---
>>  doc/filters.texi        |  28 +++++++
>>  libavfilter/vf_delogo.c | 189
>> +++++++++++++++++++++++++++++++++++++++++++++---
>>  2 files changed, 208 insertions(+), 9 deletions(-)
>>
>> diff --git a/doc/filters.texi b/doc/filters.texi
>> index 65ce25bc18..792560ad79 100644
>> --- a/doc/filters.texi
>> +++ b/doc/filters.texi
>> @@ -7952,6 +7952,34 @@ Specify the thickness of the fuzzy edge of the
>> rectangle (added to
>>  deprecated, setting higher values should no longer be necessary and
>>  is not recommended.
>>
>> + at item mode
>> +Specify the interpolation mode used to remove the logo.
>> +It accepts the following values:
>> + at table @option
>> + at item xy
>> +Only the border pixels in straight x and y direction are considered
>> +to replace any logo pixel. This mode tends to flicker and to show
>> +horizontal and vertical lines.
>> + at item uglarm
>> +Consider the whole border for every logo pixel to replace. This mode
>> +uses the distance raised to the power of the given @var{exponent} as
>> +weight that decides to which amount every border pixel is taken into
>> +account. This results in a more blurred area, which tends to be less
>> +distracting. The uglarm mode was first implemented in VirtualDub's
>> +LogoAway filter and is also known from ffdshow and is the
>> +abbreviation for "Uwe's Great LogoAway Remove Mode".
>> + at end table
>> +The default value is @code{xy}.
>> +
>> + at item exponent
>> +Specify the exponent used to calculate the weight out of the
>> +distance in @code{uglarm} mode (weight = distance ^ @var{exponent}).
>> +The value @code{0} results in a logo area which has
>> +the same average color everywhere. The higher the value, the more
>> +relevant a near border pixel will get, meaning that the borders of
>> +the logo area are more similar to the surrounding pixels. The default
>> +value is @code{3}.
>> +
>>  @item show
>>  When set to 1, a green rectangle is drawn on the screen to simplify
>>  finding the right @var{x}, @var{y}, @var{w}, and @var{h} parameters.
>> diff --git a/libavfilter/vf_delogo.c b/libavfilter/vf_delogo.c
>> index 065d093641..dcb0366e63 100644
>> --- a/libavfilter/vf_delogo.c
>> +++ b/libavfilter/vf_delogo.c
>> @@ -2,6 +2,7 @@
>>   * Copyright (c) 2002 Jindrich Makovicka <makovick at gmail.com>
>>   * Copyright (c) 2011 Stefano Sabatini
>>   * Copyright (c) 2013, 2015 Jean Delvare <jdelvare at suse.com>
>> + * Copyright (c) 2018 Uwe Freese <mail at uwe-freese.de>
>>   *
>>   * This file is part of FFmpeg.
>>   *
>> @@ -25,12 +26,16 @@
>>   * A very simple tv station logo remover
>>   * Originally imported from MPlayer libmpcodecs/vf_delogo.c,
>>   * the algorithm was later improved.
>> + * The "UGLARM" mode was first implemented 2001 by Uwe Freese for
>> Virtual
>> + * Dub's LogoAway filter (by Krzysztof Wojdon), taken over into
>> ffdshow's
>> + * logoaway filter (by Milan Cutka), from where it was ported to ffmpeg.
>>   */
>>
>>  #include "libavutil/common.h"
>>  #include "libavutil/imgutils.h"
>>  #include "libavutil/opt.h"
>>  #include "libavutil/pixdesc.h"
>
>> +#include "libavutil/avassert.h"
>
> We try to keep the includes in alphabetical order.
>
>>  #include "avfilter.h"
>>  #include "formats.h"
>>  #include "internal.h"
>> @@ -50,6 +55,10 @@
>>   * @param logo_w width of the logo
>>   * @param logo_h height of the logo
>>   * @param band   the size of the band around the processed area
>> + * @param *uglarmtable      pointer to weight table in UGLARM
>> interpolation mode,
>> + *                          zero when x-y mode is used
>> + * @param *uglarmweightsum  pointer to weight sum table in UGLARM
>> interpolation mode,
>> + *                          zero when x-y mode is used
>>   * @param show   show a rectangle around the processed area, useful for
>>   *               parameters tweaking
>>   * @param direct if non-zero perform in-place processing
>> @@ -58,7 +67,8 @@ static void apply_delogo(uint8_t *dst, int
>> dst_linesize,
>>                           uint8_t *src, int src_linesize,
>>                           int w, int h, AVRational sar,
>>                           int logo_x, int logo_y, int logo_w, int logo_h,
>> -                         unsigned int band, int show, int direct)
>> +                         unsigned int band, double *uglarmtable,
>> +                         double *uglarmweightsum, int show, int direct)
>>  {
>>      int x, y;
>>      uint64_t interp, weightl, weightr, weightt, weightb, weight;
>> @@ -89,6 +99,7 @@ static void apply_delogo(uint8_t *dst, int
>> dst_linesize,
>>      dst += (logo_y1 + 1) * dst_linesize;
>>      src += (logo_y1 + 1) * src_linesize;
>>
>> +    if (!uglarmtable) {
>>      for (y = logo_y1+1; y < logo_y2; y++) {
>>          left_sample = topleft[src_linesize*(y-logo_y1)]   +
>>                        topleft[src_linesize*(y-logo_y1-1)] +
>> @@ -151,12 +162,125 @@ static void apply_delogo(uint8_t *dst, int
>> dst_linesize,
>>          dst += dst_linesize;
>>          src += src_linesize;
>>      }
>> +    } else {
>> +        int bx, by;
>> +        double interpd;
>> +
>
>> +        for (y = logo_y1 + 1; y < logo_y2; y++) {
>> +            for (x = logo_x1 + 1,
>> +                xdst = dst + logo_x1 + 1,
>> +                xsrc = src + logo_x1 + 1; x < logo_x2; x++, xdst++,
>> xsrc++) {
>
> I think a line break after the semicolons would make this easier to
> understand.
>
> Also, since x and y always used subtracted to logo_x1 or logo_y1, I
> think changing the bounds of the loop would make the rest of the code
> more readable:
>
> 	for (x = 1; x < logo_w_minus_one; x++)
>
>> +
>> +                if (show && (y == logo_y1 + 1 || y == logo_y2 - 1 ||
>> +                            x == logo_x1 + 1 || x == logo_x2 - 1)) {
>
> Nit: the alignment is off.
>
>> +                    *xdst = 0;
>> +                    continue;
>> +                }
>> +
>> +                interpd = 0;
>> +
>
>> +                for (bx = 0; bx < logo_w; bx++) {
>> +                    interpd += topleft[bx] *
>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (y -
>> logo_y1) * (logo_w - 1)];
>> +                    interpd += botleft[bx] *
>> +                        uglarmtable[abs(bx - (x - logo_x1)) + (logo_h -
>> (y - logo_y1) - 1) * (logo_w - 1)];
>> +                }
>
> This can be optimized, and since it is the inner loop of the filter, I
> think it is worth it. You can declare a pointer that will stay the same
> for the whole y loop:
>
> 	double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
>
> and then use it in that loop:
>
> 	interpd += ... * xweight[abs(bx - (x - logo_x1))];
>
> It avoids a lot of multiplications.
>
>> +
>> +                for (by = 1; by < logo_h - 1; by++) {
>> +                    interpd += topleft[by * src_linesize] *
>> +                        uglarmtable[(x - logo_x1) + abs(by - (y -
>> logo_y1)) * (logo_w - 1)];
>> +                    interpd += topleft[by * src_linesize + (logo_w - 1)]
>> *
>> +                        uglarmtable[logo_w - (x - logo_x1) - 1 + abs(by -
>> (y - logo_y1)) * (logo_w - 1)];
>> +                }
>
> This one is more tricky to optimize, because of the abs() within the
> multiplication, but it can be done by splitting the loop in two:
>
> 	for (by = 1; by < y; by++) {
> 	    interpd += ... * yweight[x - logo_x1];
> 	    yweight -= logo_w_minus_one;
> 	}
> 	for (; by < logo_h_minus_one; by++) {
> 	    interpd += ... * yweight[x - logo_x1];
> 	    yweight += logo_w_minus_one;
> 	}
> 	av_assert2(yweight == the_correct_value);
>
> In fact, I think even the x loop would be a little more readable if it
> was split in two like that.
>
>> +
>> +                interp = (uint64_t)(interpd /
>> +                    uglarmweightsum[(x - logo_x1) - 1 + (y - logo_y1 - 1)
>> * (logo_w - 2)]);
>
> The cast to uint64_t seems suspicious. Can you explain?
>
>> +                *xdst = interp;
>> +            }
>> +
>> +            dst += dst_linesize;
>> +            src += src_linesize;
>> +        }
>> +    }
>> +}
>> +
>> +/**
>> + * Calculate the lookup tables to be used in UGLARM interpolation mode.
>> + *
>> + * @param *uglarmtable      Pointer to table containing weights for each
>> possible
>> + *                          diagonal distance between a border pixel and
>> an inner
>> + *                          logo pixel.
>> + * @param *uglarmweightsum  Pointer to a table containing the weight sum
>> to divide
>> + *                          by for each pixel within the logo area.
>> + * @param sar               The sar to take into account when calculating
>> lookup
>> + *                          tables.
>> + * @param logo_w            width of the logo
>> + * @param logo_h            height of the logo
>> + * @param exponent          exponent used in uglarm interpolation
>> + */
>> +static void calc_uglarm_tables(double *uglarmtable, double
>> *uglarmweightsum,
>> +                               AVRational sar, int logo_w, int logo_h,
>> +                               float exponent)
>> +{
>
>> +    double aspect = (double)sar.num / sar.den;
>
> Tiny optimization:
>
> 	double aspect2 = aspect * aspect;
>
> for use later.
>
>> +    int x, y;
>
>
>> +
>> +    /* uglarmtable will contain a weight for each possible diagonal
>> distance
>> +     * between a border pixel and an inner logo pixel. The maximum
>> distance in
>> +     * each direction between border and an inner pixel can be logo_w -
>> 1. The
>> +     * weight of a border pixel which is x,y pixels away is stored at
>> position
>> +     * x + y * (logo_w - 1). */
>> +    for (y = 0; y < logo_h - 1; y++)
>> +        for (x = 0; x < logo_w - 1; x++) {
>> +            if (x + y != 0) {
>
>> +                double d = pow(sqrt(x * x * aspect * aspect + y * y),
>> exponent);
>
> 	pow(sqrt(x * x * aspect2 + y * y), exponent / 2);
>
>> +                uglarmtable[x + y * (logo_w - 1)] = 1.0 / d;
>> +            } else {
>> +                uglarmtable[x + y * (logo_w - 1)] = 1.0;
>> +            }
>> +        }
>> +
>> +    /* uglarmweightsum will contain the sum of all weights which is used
>> when
>> +     * an inner pixel of the logo at position x,y is calculated out of
>> the
>> +     * border pixels. The aggregated value has to be divided by that. The
>> value
>> +     * to use for the inner 1-based logo position x,y is stored at
>> +     * (x - 1) + (y - 1) * (logo_w - 2). */
>> +    for (y = 1; y < logo_h - 1; y++)
>> +        for (x = 1; x < logo_w - 1; x++) {
>> +            double weightsum = 0;
>> +
>
>> +            for (int bx = 0; bx < logo_w; bx++) {
>
> Our coding style forbids declaring variables on the fly like that. Same
> below.

This is simply not more true.
Look at developer guidelines for this project.

>
>> +                /* top border */
>> +                weightsum += uglarmtable[abs(bx - x) + y * (logo_w -
>> 1)];
>> +                /* bottom border */
>> +                weightsum += uglarmtable[abs(bx - x) + (logo_h - y - 1) *
>> (logo_w - 1)];
>> +            }
>> +
>> +            for (int by = 1; by < logo_h - 1; by++) {
>> +                /* left border */
>> +                weightsum += uglarmtable[x + abs(by - y) * (logo_w -
>> 1)];
>> +                /* right border */
>> +                weightsum += uglarmtable[(logo_w - x - 1) + abs(by - y) *
>> (logo_w - 1)];
>> +            }
>> +
>> +            uglarmweightsum[(x - 1) + (y - 1) * (logo_w - 2)] =
>> weightsum;
>> +        }
>>  }
>>
>> +enum mode {
>> +    MODE_XY,
>> +    MODE_UGLARM,
>> +    MODE_NB
>> +};
>> +
>
>> +#define MAX_PLANES 10
>
> You could use FF_ARRAY_ELEMS(AVPixFmtDescriptor.comp), and the
> consistency would be guaranteed. Well, that syntax is not valid, but it
> can be expressed:
>
> #define MAX_PLANES FF_ARRAY_ELEMS(((AVPixFmtDescriptor *)0)->comp)
>
> But I have a better suggestion:
>
> #define MAX_SUB 2
>
> 	double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
>
> and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
> table will be computed only once for Y and A and once for U and V.
>
> The assert is still needed with that solution, though.
>
>> +
>>  typedef struct DelogoContext {
>>      const AVClass *class;
>> -    int x, y, w, h, band, show;
>> -}  DelogoContext;
>> +    int x, y, w, h, band, mode, show;
>> +    float exponent;
>> +    double *uglarmtable[MAX_PLANES], *uglarmweightsum[MAX_PLANES];
>> +} DelogoContext;
>>
>>  #define OFFSET(x) offsetof(DelogoContext, x)
>>  #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
>> @@ -171,6 +295,10 @@ static const AVOption delogo_options[]= {
>>      { "band", "set delogo area band size", OFFSET(band), AV_OPT_TYPE_INT,
>> { .i64 =  0 },  0, INT_MAX, FLAGS },
>>      { "t",    "set delogo area band size", OFFSET(band), AV_OPT_TYPE_INT,
>> { .i64 =  0 },  0, INT_MAX, FLAGS },
>>  #endif
>> +    { "mode", "set the interpolation mode",OFFSET(mode), AV_OPT_TYPE_INT,
>> { .i64 = MODE_XY }, MODE_XY, MODE_NB-1, FLAGS, "mode" },
>> +        { "xy",     "use pixels in straight x and y direction",
>> OFFSET(mode), AV_OPT_TYPE_CONST, { .i64 = MODE_XY },     0, 0, FLAGS,
>> "mode" },
>> +        { "uglarm", "UGLARM mode, use full border",
>> OFFSET(mode), AV_OPT_TYPE_CONST, { .i64 = MODE_UGLARM }, 0, 0, FLAGS,
>> "mode" },
>> +    { "exponent","exponent used for UGLARM interpolation",
>> OFFSET(exponent), AV_OPT_TYPE_FLOAT, { .dbl = 3.0 }, 0, 6, FLAGS },
>>      { "show", "show delogo area",          OFFSET(show),
>> AV_OPT_TYPE_BOOL,{ .i64 =  0 },  0, 1,       FLAGS },
>>      { NULL }
>>  };
>> @@ -215,8 +343,8 @@ static av_cold int init(AVFilterContext *ctx)
>>  #else
>>      s->band = 1;
>>  #endif
>> -    av_log(ctx, AV_LOG_VERBOSE, "x:%d y:%d, w:%d h:%d band:%d
>> show:%d\n",
>> -           s->x, s->y, s->w, s->h, s->band, s->show);
>> +    av_log(ctx, AV_LOG_VERBOSE, "x:%d y:%d, w:%d h:%d band:%d mode:%d
>> exponent:%f show:%d\n",
>> +           s->x, s->y, s->w, s->h, s->band, s->mode, s->exponent,
>> s->show);
>>
>>      s->w += s->band*2;
>>      s->h += s->band*2;
>> @@ -226,6 +354,18 @@ static av_cold int init(AVFilterContext *ctx)
>>      return 0;
>>  }
>>
>> +static av_cold void uninit(AVFilterContext *ctx)
>> +{
>> +    DelogoContext *s = ctx->priv;
>> +
>
>> +    if (s->mode == MODE_UGLARM) {
>
> No need to test, we can free anyway.
>
>> +        for (int plane = 0; plane < MAX_PLANES; plane++) {
>
> Forbidden variable declaration.
>
>> +            av_freep(&s->uglarmtable[plane]);
>> +            av_freep(&s->uglarmweightsum[plane]);
>> +        }
>> +    }
>> +}
>> +
>>  static int config_input(AVFilterLink *inlink)
>>  {
>>      DelogoContext *s = inlink->dst->priv;
>> @@ -270,20 +410,50 @@ static int filter_frame(AVFilterLink *inlink,
>> AVFrame *in)
>>      if (!sar.num)
>>          sar.num = sar.den = 1;
>>
>> +    if (s->mode == MODE_UGLARM)
>> +        av_assert0(desc->nb_components <= MAX_PLANES);
>> +
>>      for (plane = 0; plane < desc->nb_components; plane++) {
>>          int hsub = plane == 1 || plane == 2 ? hsub0 : 0;
>>          int vsub = plane == 1 || plane == 2 ? vsub0 : 0;
>>
>
>> +        /* Up and left borders were rounded down, inject lost bits
>> +        * into width and height to avoid error accumulation */
>
> Not: the alignment is off.
>
>> +        int logo_w = AV_CEIL_RSHIFT(s->w + (s->x & ((1<<hsub)-1)),
>> hsub);
>> +        int logo_h = AV_CEIL_RSHIFT(s->h + (s->y & ((1<<vsub)-1)),
>> vsub);
>> +
>> +        /* Init lookup tables once */
>> +        if (s->mode == MODE_UGLARM) {
>> +            if (!s->uglarmtable[plane]) {
>> +                s->uglarmtable[plane] =
>
>> +                    (double*)av_malloc((logo_w - 1) * (logo_h - 1) *
>> sizeof(double));
>
> No casting of malloc returns in C, this is a c++ thing. Also, use
> av_malloc_array().
>
>> +
>> +                if (!s->uglarmtable[plane]) {
>> +                    return AVERROR(ENOMEM);
>> +                }
>
> The braces could be dropped.
>
>> +
>> +                s->uglarmweightsum[plane] =
>> +                    (double*)av_malloc((logo_w - 2) * (logo_h - 2) *
>> sizeof(double));
>> +
>> +                if (!s->uglarmweightsum[plane]) {
>> +                    return AVERROR(ENOMEM);
>> +                }
>
> Same as above, of course.
>
>> +
>> +                calc_uglarm_tables(s->uglarmtable[plane],
>> +                                s->uglarmweightsum[plane],
>> +                                sar, logo_w, logo_h, s->exponent);
>> +            }
>> +        }
>> +
>
> I think this whole block could and should be moved into config_input().
>
>>          apply_delogo(out->data[plane], out->linesize[plane],
>>                       in ->data[plane], in ->linesize[plane],
>>                       AV_CEIL_RSHIFT(inlink->w, hsub),
>>                       AV_CEIL_RSHIFT(inlink->h, vsub),
>>                       sar, s->x>>hsub, s->y>>vsub,
>> -                     /* Up and left borders were rounded down, inject
>> lost bits
>> -                      * into width and height to avoid error accumulation
>> */
>> -                     AV_CEIL_RSHIFT(s->w + (s->x & ((1<<hsub)-1)),
>> hsub),
>> -                     AV_CEIL_RSHIFT(s->h + (s->y & ((1<<vsub)-1)),
>> vsub),
>> +                     logo_w, logo_h,
>>                       s->band>>FFMIN(hsub, vsub),
>> +                     s->uglarmtable[plane],
>> +                     s->uglarmweightsum[plane],
>>                       s->show, direct);
>>      }
>>
>> @@ -317,6 +487,7 @@ AVFilter ff_vf_delogo = {
>>      .priv_size     = sizeof(DelogoContext),
>>      .priv_class    = &delogo_class,
>>      .init          = init,
>> +    .uninit        = uninit,
>>      .query_formats = query_formats,
>>      .inputs        = avfilter_vf_delogo_inputs,
>>      .outputs       = avfilter_vf_delogo_outputs,
>
> Regards,
>
> --
>   Nicolas George
>


More information about the ffmpeg-devel mailing list