[FFmpeg-devel] [PATCHv2] add signature filter for MPEG7 video signature

Mon Apr 11 14:54:57 CEST 2016

On Mon, Apr 11, 2016 at 02:30:37PM +0200, Gerion Entrup wrote:
> On Montag, 11. April 2016 12:57:17 CEST Michael Niedermayer wrote:
> > On Mon, Apr 11, 2016 at 04:25:28AM +0200, Gerion Entrup wrote:
> > > On Donnerstag, 7. April 2016 00:35:25 CEST Michael Niedermayer wrote:
> > > > On Wed, Mar 30, 2016 at 11:02:36PM +0200, Gerion Entrup wrote:
> > > > > On Mittwoch, 30. März 2016 22:57:47 CEST Gerion Entrup wrote:
> > > > > > Add improved patch.
> > > > > 
> > > > > Rebased to master.
> > > > > 
> > > > 
> > > > >  Changelog                      |    1 
> > > > >  configure                      |    1 
> > > > >  doc/filters.texi               |   70 +++
> > > > >  libavfilter/Makefile           |    1 
> > > > >  libavfilter/allfilters.c       |    1 
> > > > >  libavfilter/signature.h        |  554 ++++++++++++++++++++++++++++++
> > > > >  libavfilter/signature_lookup.c |  550 ++++++++++++++++++++++++++++++
> > > > >  libavfilter/version.h          |    4 
> > > > >  libavfilter/vf_signature.c     |  741 +++++++++++++++++++++++++++++++++++++++++
> > > > >  9 files changed, 1921 insertions(+), 2 deletions(-)
> > > > > 9192f27ded45c607996b4e266b6746f807c9a7fd  0001-add-signature-filter-for-MPEG7-video-signature.patch
> > > > > From 9646ed6f0cf78356cf2914a60705c98d8f21fe8a Mon Sep 17 00:00:00 2001
> > > > > From: Gerion Entrup <gerion.entrup at flump.de>
> > > > > Date: Sun, 20 Mar 2016 11:10:31 +0100
> > > > > Subject: [PATCH] add signature filter for MPEG7 video signature
> > > > > 
> > > > > This filter does not implement all features of MPEG7. Missing features:
> > > > > - compression of signature files
> > > > > - work only on (cropped) parts of the video
> > > > > ---
> > > > >  Changelog                      |   1 +
> > > > >  configure                      |   1 +
> > > > >  doc/filters.texi               |  70 ++++
> > > > >  libavfilter/Makefile           |   1 +
> > > > >  libavfilter/allfilters.c       |   1 +
> > > > >  libavfilter/signature.h        | 554 ++++++++++++++++++++++++++++++
> > > > >  libavfilter/signature_lookup.c | 550 ++++++++++++++++++++++++++++++
> > > > >  libavfilter/version.h          |   4 +-
> > > > >  libavfilter/vf_signature.c     | 741 +++++++++++++++++++++++++++++++++++++++++
> > > > >  9 files changed, 1921 insertions(+), 2 deletions(-)
> > > > >  create mode 100644 libavfilter/signature.h
> > > > >  create mode 100644 libavfilter/signature_lookup.c
> > > > >  create mode 100644 libavfilter/vf_signature.c
> > > > > 
> > > > > diff --git a/Changelog b/Changelog
> > > > > index 7b0187d..8a2b7fd 100644
> > > > > --- a/Changelog
> > > > > +++ b/Changelog
> > > > > @@ -18,6 +18,7 @@ version <next>:
> > > > >  - coreimage filter (GPU based image filtering on OSX)
> > > > >  - libdcadec removed
> > > > >  - bitstream filter for extracting DTS core
> > > > > +- MPEG-7 Video Signature filter
> > > > >  
> > > > >  version 3.0:
> > > > >  - Common Encryption (CENC) MP4 encoding and decoding support
> > > > > diff --git a/configure b/configure
> > > > > index e550547..fe29827 100755
> > > > > --- a/configure
> > > > > +++ b/configure
> > > > > @@ -2979,6 +2979,7 @@ showspectrum_filter_deps="avcodec"
> > > > >  showspectrum_filter_select="fft"
> > > > >  showspectrumpic_filter_deps="avcodec"
> > > > >  showspectrumpic_filter_select="fft"
> > > > > +signature_filter_deps="gpl avcodec avformat"
> > > > >  smartblur_filter_deps="gpl swscale"
> > > > >  sofalizer_filter_deps="netcdf avcodec"
> > > > >  sofalizer_filter_select="fft"
> > > > > diff --git a/doc/filters.texi b/doc/filters.texi
> > > > > index 5d6cf52..a95f5a7 100644
> > > > > --- a/doc/filters.texi
> > > > > +++ b/doc/filters.texi
> > > > > @@ -11559,6 +11559,76 @@ saturation maximum: %@{metadata:lavfi.signalstats.SATMAX@}
> > > > >  @end example
> > > > >  @end itemize
> > > > >  
> > > > > + at anchor{signature}
> > > > > + at section signature
> > > > > +
> > > > > +Calculates the MPEG-7 Video Signature. The filter could handle more than one
> > > > > +input. In this case the matching between the inputs could be calculated. The
> > > > > +filter passthrough the first input. The output is written in XML.
> > > > > +
> > > > > +It accepts the following options:
> > > > > +
> > > > > + at table @option
> > > > > + at item mode
> > > > 
> > > > > +Enable the calculation of the matching. The option value must be 0 (to disable
> > > > > +or 1 (to enable). Optionally you can set the mode to 2. Then the detection ends,
> > > > > +if the first matching sequence it reached. This should be slightly faster.
> > > > > +Per default the detection is disabled.
> > > > 
> > > > these shuld probably support named identifers not (only) 0/1/2
> > > done
> > 
> > it should use AV_OPT_TYPE_INT and AV_OPT_TYPE_CONST not a string
> > 
> > 
> > > 
> > > > 
> > > > 
> > > > > +
> > > > > + at item nb_inputs
> > > > > +Set the number of inputs. The option value must be a non negative interger.
> > > > > +Default value is 1.
> > > > > +
> > > > > + at item filename
> > > > > +Set the path to witch the output is written. If there is more than one input,
> > > > > +the path must be a prototype, i.e. must contain %d or %0nd (where n is a positive
> > > > > +integer), that will be replaced with the input number. If no filename is
> > > > > +specified, no output will be written. This is the default.
> > > > > +
> > > > 
> > > > > + at item xml
> > > > > +Choose the output format. If set to 1 the filter will write XML, if set to 0
> > > > > +the filter will write binary output. The default is 0.
> > > > 
> > > > format=xml/bin/whatever
> > > > seems better as its more extensible
> > > done
> > > 
> > > > 
> > > > 
> > > > > +
> > > > > + at item th_d
> > > > > +Set threshold to detect one word as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 9000.
> > > > > +
> > > > > + at item th_dc
> > > > > +Set threshold to detect all words as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 60000.
> > > > > +
> > > > > + at item th_xh
> > > > > +Set threshold to detect frames as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 116.
> > > > > +
> > > > > + at item th_di
> > > > > +Set the minimum length of a sequence in frames to recognize it as matching
> > > > > +sequence. The option value must be a non negative integer value.
> > > > > +The default value is 0.
> > > > > +
> > > > > + at item th_it
> > > > > +Set the minimum relation, that matching frames to all frames must have.
> > > > > +The option value must be a double value between 0 and 1. The default value is 0.5.
> > > > > + at end table
> > > > > +
> > > > > + at subsection Examples
> > > > > +
> > > > > + at itemize
> > > > > + at item
> > > > > +To calculate the signature of an input video and store it in signature.xml:
> > > > > + at example
> > > > > +ffmpeg -i input.mkv -vf signature=filename=signature.xml -map 0:v -c rawvideo -f null -
> > > > > + at end example
> > > > 
> > > > the output seems to differ between 32 an 64bit x86
> > > > this would make any regression testing rather difficult
> > > > why is there a difference ? can this be avoided or would that result in
> > > > some disadvantage ?
> > > This is due to this line:
> > > sum -= ((double) blocksum)/(blocksize * denum);
> > > 
> > > sum was a double. It seems the difference leads to different results in 32 and 64 bit
> > > (the 5 decimal place). I have reworked the filter part so it does not use double at all.
> > > This also leads in some fewer divisions, but the numbers get really big. The relevant
> > > parts use int63_t.
> > > 
> > > If the videos gets really big, the numbers could overflow. Can I restrict this someway?
> > > 
> > > An upper bound could be find with:
> > > 255 * BLOCK_LCM * (width/32+1)^2 * (height/32+1)^2 < 2^63
> > > I tested it with 4K (UHD) input. This does not give any problems, but it is near the limit.
> > > (As a note: Especially 4K is a certain amount under the limit, because the width 3840 is
> > > dividable by 32, so the square in the above formula could be deleted)
> > > 
> > > The filter should generate the same signatures as in 64 bit before, now with 32 and 64 bit.
> > 
> > if you really need more tha 64bit ints you can take a look at
> > libavutil/integer.h
> > it would be better if the operations can be reshuffled to keep using
> > intXY_t
> This depends, IMHO 4K UHD is enough for now, and given, that you can simply rescale a higher
> resolution to somewhat below, without changing the function of the signature, I would simply add
> a check in config_input or so, that throws an error, if the resolution is too high. Would this be ok?

not optimal but ok i guess

[...]
[...]
>
> > 
> > 
> > > 
> > > Then I added a few TODOs in the code, was about parts I don't know. Would be nice,
> > > if you comment there, too.
> > > 
> > 
> > > I attached the new (complete) patch, the diff to the last time and the updated check script.
> > 
> > looks like the old patch + diff to the new
> Yes. Thought you can see the differences to the already rewieved patch much faster.

the new patch + diff is better than the old + diff
for testing that is


> 
> > 
> > [...]
> > > +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
> > > +{
> > > +    AVFilterContext *ctx = inlink->dst;
> > > +    SignatureContext *sic = ctx->priv;
> > > +    StreamContext *sc = &(sic->streamcontexts[FF_INLINK_IDX(inlink)]);
> > > +    FineSignature* fs;
> > > +
> > > +    static const uint8_t pot3[5] = { 3*3*3*3, 3*3*3, 3*3, 3, 1 };
> > > +    /* indexes of words : 210,217,219,274,334  44,175,233,270,273  57,70,103,237,269  100,285,295,337,354  101,102,111,275,296
> > > +    s2usw = sorted to unsorted wordvec: 44 is at index 5, 57 at index 10...
> > > +    */
> > > +    static const unsigned int wordvec[25] = {44,57,70,100,101,102,103,111,175,210,217,219,233,237,269,270,273,274,275,285,295,296,334,337,354};
> > > +    static const uint8_t s2usw[25]   = { 5,10,11, 15, 20, 21, 12, 22,  6,  0,  1,  2,  7, 13, 14,  8,  9,  3, 23, 16, 17, 24,  4, 18, 19};
> > > +
> > > +    uint8_t wordt2b[5] = { 0, 0, 0, 0, 0 }; /* word ternary to binary */
> > > +    uint64_t intpic[32][32];
> > > +    uint64_t rowcount;
> > > +    uint8_t *p = picref->data[0];
> > > +    int inti, intj;
> > > +    int *intjlut;
> > > +
> > > +    double conflist[DIFFELEM_SIZE];
> > > +    int f = 0, g = 0, w = 0;
> > > +    int dh1 = 1, dh2 = 1, dw1 = 1, dw2 = 1, denum, a, b;
> > > +    int i,j,k,ternary;
> > > +    uint64_t blocksum;
> > > +    int blocksize;
> > > +    double th; /* threshold */
> > > +    double sum;
> > > +
> > > +    /* initialize fs */
> > > +    if(sc->curfinesig){
> > > +        fs = av_mallocz(sizeof(FineSignature));
> > > +        if (!fs)
> > > +            return AVERROR(ENOMEM);
> > > +        sc->curfinesig->next = fs;
> > > +        fs->prev = sc->curfinesig;
> > > +        sc->curfinesig = fs;
> > > +    }else{
> > > +        fs = sc->curfinesig = sc->finesiglist;
> > > +        sc->curcoursesig1->first = fs;
> > > +    }
> > > +
> > > +    fs->pts = picref->pts;
> > > +    fs->index = sc->lastindex++;
> > > +
> > > +    memset(intpic, 0, sizeof(uint64_t)*32*32);
> > > +    intjlut = av_malloc(inlink->w * sizeof(int));
> > > +    if (!intjlut)
> > > +        return AVERROR(ENOMEM);
> > > +    for (i=0; i < inlink->w; i++){
> > > +        intjlut[i] = (i<<5)/inlink->w;
> > > +    }
> > > +
> > > +    for (i=0; i < inlink->h; i++){
> > > +        inti = (i<<5)/inlink->h;
> > > +        for (j=0; j< inlink->w; j++){
> > > +            intj = intjlut[j];
> > > +            intpic[inti][intj] += p[j];
> > > +        }
> > > +        p += picref->linesize[0];
> > > +    }
> > > +    av_free(intjlut);
> > > +
> > > +    /* The following calculate a summed area table (intpic) and brings the numbers
> > > +     * in intpic to to the same denuminator.
> > > +     * So you only have to handle the numinator in the following sections.
> > > +     */
> > > +    dh1 = inlink->h/32;
> > > +    if (inlink->h%32)
> > > +        dh2 = dh1 + 1;
> > > +    dw1 = inlink->w/32;
> > > +    if (inlink->w%32)
> > > +        dw2 = dw1 + 1;
> > 
> > > +    denum = dh1 * dh2 * dw1 * dw2;
> > 
> > this will overflow if w and h are not multiplies of 32 and large
> > the multiplication is done in 32bit not 64
> Don't get it. All of this are 32 bit integer. Given the input is:
> 3842x2160 (nearly 4K), this would lead in a denum of:
> 120 * 121 * 67 * 68 = 66153120
> 
> This is far below the 32 bit maximum.

it will overflow with higher resolution

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160411/7f842623/attachment.sig>