[FFmpeg-devel] [PATCH] lavfi: edgedetect filter

Wed Aug 8 15:28:58 CEST 2012

On date Wednesday 2012-08-08 14:12:02 +0200, Clément Bœsch encoded:
> On Wed, Aug 08, 2012 at 12:55:15PM +0200, Stefano Sabatini wrote:
> [...]
> > > +static void gaussian_blur(AVFilterContext *ctx, int w, int h,
> > > +                                uint8_t *dst, int dst_linesize,
> > > +                          const uint8_t *src, int src_linesize)
> > > +{
> > > +    int i, j;
> > > +
> > > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > > +    for (j = 2; j < h - 2; j++) {
> > > +        dst[0] = src[0];
> > > +        dst[1] = src[1];
> > > +        for (i = 2; i < w - 2; i++) {
> > > +            dst[i] = ((src[-2*src_linesize + i-2] + src[2*src_linesize + i-2]) * 2
> > > +                    + (src[-2*src_linesize + i-1] + src[2*src_linesize + i-1]) * 4
> > > +                    + (src[-2*src_linesize + i  ] + src[2*src_linesize + i  ]) * 5
> > > +                    + (src[-2*src_linesize + i+1] + src[2*src_linesize + i+1]) * 4
> > > +                    + (src[-2*src_linesize + i+2] + src[2*src_linesize + i+2]) * 2
> > > +
> > > +                    + (src[  -src_linesize + i-2] + src[  src_linesize + i-2]) *  4
> > > +                    + (src[  -src_linesize + i-1] + src[  src_linesize + i-1]) *  9
> > > +                    + (src[  -src_linesize + i  ] + src[  src_linesize + i  ]) * 12
> > > +                    + (src[  -src_linesize + i+1] + src[  src_linesize + i+1]) *  9
> > > +                    + (src[  -src_linesize + i+2] + src[  src_linesize + i+2]) *  4
> > > +
> > > +                    + src[i-2] *  5
> > > +                    + src[i-1] * 12
> > > +                    + src[i  ] * 15
> > > +                    + src[i+1] * 12
> > > +                    + src[i+2] *  5) / 159;
> > > +        }
> > 
> > My feeling is that we should avoid to hardcode convolution operations,
> > and write generic code for it. Also we may want to make the size of
> > the gaussian mask parametric, as well as the sigma parameter used to
> > compute the mask.
> > 
> > Convolution operations are useful per-se, and could be used to
> > implement ad-hoc filters.
> > 
> 
> I've just used the standard matrix for that algorithm; from a performance
> point of view it also has some benefits. Note that it is relatively
> trivial to write so I'm not sure such generic code would be required.
> Maybe we could macro-generate various version of that but I'm not sure
> that's really important in that case.
> This step really is just to quickly get rid of some noise.

It's not that easy in case the size of the convolution matrix is
variable. Note that the parameters are important to address specific
solutions (you may need to tweak the parameters on a case by case
basis).

Also I'm not sure it is correct to just discard pixels at the image
border (we may consider to change the output size accordingly).

> Do we have already use these in some other filters?

Uhm, not that I know.

> > > +        dst[i    ] = src[i    ];
> > > +        dst[i + 1] = src[i + 1];
> > > +
> > > +        dst += dst_linesize;
> > > +        src += src_linesize;
> > > +    }
> > > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > > +}
> > > +
> > > +enum {
> > > +    DIRECTION_45UP,
> > > +    DIRECTION_45DOWN,
> > > +    DIRECTION_HORIZONTAL,
> > > +    DIRECTION_VERTICAL,
> > > +};
> > > +
> > > +static int get_rounded_direction(int gx, int gy)
> > > +{
> > > +    /* reference angles:
> > > +     *   tan( pi/8) = sqrt(2)-1
> > > +     *   tan(3pi/8) = sqrt(2)+1
> > > +     * Gy/Gx is the tangent of the angle (theta), so Gy/Gx is compared against
> > > +     * <ref-angle>, or more simply Gy against <ref-angle>*Gx
> > > +     *
> > > +     * Gx and Gy bounds = [1020;1020], using 16-bit arith:
> > > +     *   round((sqrt(2)-1) * (1<<16)) =  27146
> > > +     *   round((sqrt(2)+1) * (1<<16)) = 158218
> > > +     */
> > > +    if (gx) {
> > > +        int tanpi8gx, tan3pi8gx;
> > > +
> > > +        if (gx < 0)
> > > +            gx = -gx, gy = -gy;
> > > +        gy <<= 16;
> > > +        tanpi8gx  =  27145 * gx;
> 
> (Note: 27145 changed locally to 27146, test unchanged)
> 
> > > +        tan3pi8gx = 158218 * gx;
> > > +        if (gy > -tan3pi8gx && gy < -tanpi8gx)  return DIRECTION_45UP;
> > > +        if (gy > -tanpi8gx  && gy <  tanpi8gx)  return DIRECTION_HORIZONTAL;
> > > +        if (gy >  tanpi8gx  && gy <  tan3pi8gx) return DIRECTION_45DOWN;
> > > +    }
> > > +    return DIRECTION_VERTICAL;
> > > +}
> > > 
> > 
> > > +static void sobel(AVFilterContext *ctx, int w, int h,
> > > +                        uint16_t *dst, int dst_linesize,
> > > +                  const uint8_t  *src, int src_linesize)
> > > +{
> > > +    int i, j;
> > > +    EdgeDetectContext *edgedetect = ctx->priv;
> > > +
> > > +    for (j = 1; j < h - 1; j++) {
> > > +        dst += dst_linesize;
> > > +        src += src_linesize;
> > > +        for (i = 1; i < w - 1; i++) {
> > > +            const int gx =
> > > +                -1*src[-src_linesize + i-1] + 1*src[-src_linesize + i+1]
> > > +                -2*src[                i-1] + 2*src[                i+1]
> > > +                -1*src[ src_linesize + i-1] + 1*src[ src_linesize + i+1];
> > > +            const int gy =
> > > +                -1*src[-src_linesize + i-1] + 1*src[ src_linesize + i-1]
> > > +                -2*src[-src_linesize + i  ] + 2*src[ src_linesize + i  ]
> > > +                -1*src[-src_linesize + i+1] + 1*src[ src_linesize + i+1];
> > > +
> > > +            dst[i] = FFABS(gx) + FFABS(gy);
> > > +            edgedetect->directions[j*w + i] = get_rounded_direction(gx, gy);
> > > +        }
> > > +    }
> > > +}
> > 
> > Same consideration as above.
> > 
> 
> I'm not sure what you could really generalize here since I'm also
> computing the directions based on this.

Well this is technically equivalent to two convolutions (one for each
Sobel masks)

> 
> > > +
> > > +static void non_maximum_suppression(AVFilterContext *ctx, int w, int h,
> > > +                                          uint8_t  *dst, int dst_linesize,
> > > +                                    const uint16_t *src, int src_linesize)
> > > +{
> > > +    int i, j;
> > > +    EdgeDetectContext *edgedetect = ctx->priv;
> > > +
> > > +#define COPY_MAXIMA(ay, ax, by, bx) do {                \
> > > +    if (src[i] > src[(ay)*src_linesize + i+(ax)] &&     \
> > > +        src[i] > src[(by)*src_linesize + i+(bx)])       \
> > > +        dst[i] = av_clip_uint8(src[i]);                 \
> > > +} while (0)
> > > +
> > > +    for (j = 1; j < h - 1; j++) {
> > > +        dst += dst_linesize;
> > > +        src += src_linesize;
> > > +        for (i = 1; i < w - 1; i++) {
> > > +            switch (edgedetect->directions[j*w + i]) {
> > > +            case DIRECTION_45UP:        COPY_MAXIMA( 1, -1, -1,  1); break;
> > > +            case DIRECTION_45DOWN:      COPY_MAXIMA(-1, -1,  1,  1); break;
> > > +            case DIRECTION_HORIZONTAL:  COPY_MAXIMA( 0, -1,  0,  1); break;
> > > +            case DIRECTION_VERTICAL:    COPY_MAXIMA(-1,  0,  1,  0); break;
> > > +            }
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +static void double_threshold(AVFilterContext *ctx, int w, int h,
> > > +                                   uint8_t *dst, int dst_linesize,
> > > +                             const uint8_t *src, int src_linesize)
> > > +{
> > > +    int i, j;
> > > +
> > > +#define THRES_HIGH 80
> > > +#define THRES_LOW  20
> > 
> > This values should be made parametric (and expressed as float values
> > in the [0, 1] range).
> > 
> 
> Yes, but right now the filter is called "edgedetect" and only implements
> the canny algo. If we were to add more algo, we would end up with various
> mode such as vf edgedetect=canny, or edgedetect=marrhildreth or such. So I
> wasn't confident about making it customizable yet.
> 
> This parametrization can be done later without breaking the usage, that
> really was the main point.

OK but this is trivial to implement, so I'd rather have this in the
first version (without the need to add named options and stuff in a
second commit).

[...]
> > Another consideration: we could optionally support faster less
> > accurate algorithms (e.g. Marr-Hildreth).
> 
> Feel free to implement it. I was just having fun with lavfi.

Resume: threshold parametrization is trivial and should be added
when pushing the first version.

Gaussian mask sigma and size customization can be added later, same
for generic convolution computation and for other edge detection
algorithms with different features.
-- 
FFmpeg = Faithless and Fanciful Mournful Puritan Ermetic Gem