[FFmpeg-devel] [PATCH] lavfi: add volumedetect filter.

Stefano Sabatini stefasab at gmail.com
Sat Aug 18 17:29:57 CEST 2012


On date Saturday 2012-08-18 13:50:42 +0200, Nicolas George encoded:
> 
> Signed-off-by: Nicolas George <nicolas.george at normalesup.org>
> ---
>  Changelog                     |    1 +
>  doc/filters.texi              |   33 +++++++++
>  libavfilter/Makefile          |    1 +
>  libavfilter/af_volumedetect.c |  159 +++++++++++++++++++++++++++++++++++++++++
>  libavfilter/allfilters.c      |    1 +
>  5 files changed, 195 insertions(+)
>  create mode 100644 libavfilter/af_volumedetect.c
> 
> diff --git a/Changelog b/Changelog
> index 1f7ca21..9b9ac06 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -48,6 +48,7 @@ version next:
>  - ICO muxer
>  - SubRip encoder and decoder without embedded timing
>  - edge detection filter
> +- volume measurement filter
>  
>  
>  version 0.11:
> diff --git a/doc/filters.texi b/doc/filters.texi
> index e6279bd..37bd1c2 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -690,6 +690,39 @@ volume=-12dB
>  @end example
>  @end itemize
>  
> + at section volumedetect
> +
> +Detect the volume of the input video.
> +
> +The filter has no parameters. The input is not modified. The measured volume
> +is printed at the end in the log.
> +
> +Here is an excerpt of the output:
> + at example
> +[Parsed_volumedetect_0 @ 0xa23120] mean_volume: -27 dB
> +[Parsed_volumedetect_0 @ 0xa23120] max_volume: -4 dB

min_volume may also be useful (and having more precision may help).

> +[Parsed_volumedetect_0 @ 0xa23120] histogram_4db: 6
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_5db: 62
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_6db: 286
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_7db: 1042
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_8db: 2551
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_9db: 4609
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_10db: 8409
> + at end example
> +
> +It means that:
> + at itemize
> + at item
> +The mean square energy is approximately -27 dB, or 10^-2.7.
> + at item
> +The largest sample is at -4 dB, or more precisely between -4 dB and -5 dB.
> + at item
> +There are 6 samples at -4 dB, 62 at -5 dB, 286 at -6 dB, etc.
> + at end itemize
> +
> +In other words, raising the volume by +4 dB does not cause any clipping,
> +raising it by +5 dB causes clipping for 6 samples, etc.

I dislike documentation by examples, since it omits many potentially
important details. What I mean is that we should try to document the
behavior in a detailed way, *and* then provide examples, rather than
asking the user to guess the detailed behavior from the examples.

I suggest something along the line:

This filter will print some volume statistics on the log when the
input stream end is reached.

In particular it will show the mean volume, the max volume, and an
histogram of registered volume values, ranging from the maximum volume
interval to ... etc.

> +
>  @section asyncts
>  Synchronize audio data with timestamps by squeezing/stretching it and/or
>  dropping samples/adding silence when needed.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 5b2ccdb..aa78be9 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -67,6 +67,7 @@ OBJS-$(CONFIG_PAN_FILTER)                    += af_pan.o
>  OBJS-$(CONFIG_RESAMPLE_FILTER)               += af_resample.o
>  OBJS-$(CONFIG_SILENCEDETECT_FILTER)          += af_silencedetect.o
>  OBJS-$(CONFIG_VOLUME_FILTER)                 += af_volume.o
> +OBJS-$(CONFIG_VOLUMEDETECT_FILTER)           += af_volumedetect.o
>  
>  OBJS-$(CONFIG_AEVALSRC_FILTER)               += asrc_aevalsrc.o
>  OBJS-$(CONFIG_ANULLSRC_FILTER)               += asrc_anullsrc.o
> diff --git a/libavfilter/af_volumedetect.c b/libavfilter/af_volumedetect.c
> new file mode 100644
> index 0000000..3a67412
> --- /dev/null
> +++ b/libavfilter/af_volumedetect.c
> @@ -0,0 +1,159 @@
> +/*
> + * Copyright (c) 2012 Nicolas George
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public License
> + * as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public License
> + * along with FFmpeg; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * filter for showing textual audio frame information
> + */
> +
> +#include "libavutil/audioconvert.h"
> +#include "libavutil/avassert.h"
> +#include "audio.h"
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +typedef struct {
> +    uint64_t histogram[0x10001];

please doxy

> +} VolDetectContext;
> +
> +static int query_formats(AVFilterContext *ctx)
> +{
> +    enum AVSampleFormat sample_fmts[] = {
> +        AV_SAMPLE_FMT_S16,
> +        AV_SAMPLE_FMT_S16P,
> +        AV_SAMPLE_FMT_NONE
> +    };
> +    AVFilterFormats *formats;
> +
> +    if (!(formats = ff_make_format_list(sample_fmts)))
> +        return AVERROR(ENOMEM);
> +    ff_set_common_formats(ctx, formats);
> +
> +    return 0;
> +}
> +
> +static int filter_samples(AVFilterLink *inlink, AVFilterBufferRef *samples)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    VolDetectContext *vd = ctx->priv;
> +    int64_t layout  = samples->audio->channel_layout;
> +    int nb_samples  = samples->audio->nb_samples;
> +    int nb_channels = av_get_channel_layout_nb_channels(layout);
> +    int nb_planes   = nb_planes;
> +    int plane, i;
> +    int16_t *pcm;
> +    
> +    if (!av_sample_fmt_is_planar(samples->format)) {
> +        nb_samples *= nb_channels;
> +        nb_planes = 1;
> +    }

> +    for (plane = 0; plane < nb_planes; plane++) {
> +        pcm = (int16_t *)samples->extended_data[plane];
> +        for (i = 0; i < nb_samples; i++)
> +            vd->histogram[pcm[i] + 0x8000]++;
> +    }

A comment mentioning that the values are normalized into unsigned
values may help first-time readers.

> +
> +    return ff_filter_samples(inlink->dst->outputs[0], samples);
> +}
> +
> +#define MAX_DB 91
> +

> +static inline int logdb(uint64_t v)
> +{
> +    double d = v / (double)(0x8000 * 0x8000);

Could take a value and compute its square, rather than do this in the
caller.

> +    if (!v)
> +        return MAX_DB;
> +    return log(d) * -4.3429448190325182765112891891660508229;

I'm lost here, what's this constant?

> +}
> +
> +static void print_stats(AVFilterContext *ctx)
> +{
> +    VolDetectContext *vd = ctx->priv;
> +    int i, max_volume, shift;
> +    uint64_t nb_samples = 0, power = 0, nb_samples_shift = 0, sum = 0;
> +    uint64_t histdb[MAX_DB + 1] = { 0 };
> +

> +    if(0)
> +    for (i = 0; i < 0x10000; i++)
> +        vd->histogram[i] *= 100000;

relic?

> +
> +    for (i = 0; i < 0x10000; i++)
> +        nb_samples += vd->histogram[i];
> +    av_log(ctx, AV_LOG_INFO, "n_samples: %"PRId64"\n", nb_samples);
> +    if (!nb_samples)
> +        return;
> +

> +    shift = av_log2(nb_samples >> 33);
> +    for (i = 0; i < 0x10000; i++) {
> +        nb_samples_shift += vd->histogram[i] >> shift;
> +        power += (i - 0x8000) * (i - 0x8000) * (vd->histogram[i] >> shift);
> +    }
> +    if (!nb_samples_shift)
> +        return;

Please comment this block.

[...]
-- 
FFmpeg = Formidable & Frenzy Mind-dumbing Portentous Extreme Gargoyle


More information about the ffmpeg-devel mailing list