[FFmpeg-devel] One pass volume normalization (ebur128)

Wed Jul 17 20:36:44 CEST 2013

Nicolas George in gmane.comp.video.ffmpeg.devel (Wed, 17 Jul 2013 17:11:15 +0200):
>I consider this very bad design. I already explained why I think this is bad
>design from the result point of view, but you are of course free to keep it
>in your software.
>
>From ffmpeg point of view, I consider this bad design for the following
>reasons:
>
>* Two filters with almost identical features and no good reason to separate
>  them (there is a good reason to have ebur128 and volumedetect: correctness
>  vs. speed).

The reason of the duplicate is clear: I was expecting a reaction by you
like this.

>* Exposing intermediate values that have no relevancy whatsoever. The final
>  results of volumedetect are already quite dubious as volume measurements,
>  but at least they have a clear mathematical meaning.

Strange. First you steer me in the direction of volumedetect and now it
seems the other way around.

>* Adding a feature to suit a very personal and specific need.

Here we differ. I do not think having the equivalent of MEncoder's '-af
volnorm' is a very personal and specific need. For live broadcasts you
need a way to normalize the loudness and the only way to do that is
looking back in time at the previous frames.

>IMHO, the correct design for solving this issue would require some or all
>the following points:
>
>* Dynamic expression evaluation for the volume filter. IIRC, Stefano had a
>  patch that was pretty good; in fact, I thought it was already applied
>  since a long time ago. The expression should be able to reference
>  metadata (at least one item).

As far as I know, this was never implemented. Neither were any of
Clement's proposals. That is exactly the reason why I brought up the
subject once again.

>* A filter to smooth a metadata value over time, so that r128.M can be
>  turned into something suitable for volume normalization.
>
>* A switch to volumedetect to inject as metadata the momentary RMS of the
>  signal over a configurable frame, to use in place of r128.M and trade
>  correctness for speed.

I would welcome those features, but keep in mind that neither r128.I nor
r128.M can be used at all at the moment.

>* A switch to volumedetect to inject as metadata the final results (on a
>  dummy final frame maybe?).

As a switch is is OK, of course. But it will not help for live streams.

>Some of these are fairly easy, other are quite hard, and some pose problems
>of design decisions rather than implementation. Anyone should feel free to
>submit patches implementing any of these points.

My suggestion was to start with implementing a basic filter like I
proposed when I started this discussion. It was met with not only the
expected technical comments, but also with arguments against the idea
of one-pass or on-the-fly normalization at all.

See my current working version below. I have taken away some of the
technical issues. Do with it whatever you like. My idea is that is
is a good starting point for future patches like you suggested.

Jan

diff --git a/libavfilter/af_volume.c b/libavfilter/af_volume.c
index a2ac1e2..87491ea 100644
--- a/libavfilter/af_volume.c
+++ b/libavfilter/af_volume.c
@@ -51,18 +51,26 @@ static const AVOption volume_options[] = {
         { "fixed",  "select 8-bit fixed-point",     0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FIXED  }, INT_MIN, INT_MAX, A|F, "precision" },
         { "float",  "select 32-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FLOAT  }, INT_MIN, INT_MAX, A|F, "precision" },
         { "double", "select 64-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_DOUBLE }, INT_MIN, INT_MAX, A|F, "precision" },
+    { "metadata", "set the metadata key for loudness normalization", OFFSET(metadata), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = A|F },
+    { "normvol", "set volume normalization level",
+            OFFSET(normvol), AV_OPT_TYPE_DOUBLE, { .dbl = -23.0 }, INT_MIN, INT_MAX, A|F },
     { NULL },
 };
 
 AVFILTER_DEFINE_CLASS(volume);
 
+static void set_fixed_volume(VolumeContext *vol, double volume)
+{
+    vol->volume_i = (int)(volume * 256 + 0.5);
+    vol->volume   = vol->volume_i / 256.0;
+}
+
 static av_cold int init(AVFilterContext *ctx)
 {
     VolumeContext *vol = ctx->priv;
 
     if (vol->precision == PRECISION_FIXED) {
-        vol->volume_i = (int)(vol->volume * 256 + 0.5);
-        vol->volume   = vol->volume_i / 256.0;
+        set_fixed_volume(vol, vol->volume);
         av_log(ctx, AV_LOG_VERBOSE, "volume:(%d/256)(%f)(%1.2fdB) precision:fixed\n",
                vol->volume_i, vol->volume, 20.0*log(vol->volume)/M_LN10);
     } else {
@@ -216,11 +224,31 @@ static int config_output(AVFilterLink *outlink)
 
 static int filter_frame(AVFilterLink *inlink, AVFrame *buf)
 {
-    VolumeContext *vol    = inlink->dst->priv;
-    AVFilterLink *outlink = inlink->dst->outputs[0];
+    AVFilterContext *ctx  = inlink->dst;
+    VolumeContext *vol    = ctx->priv;
+    AVFilterLink *outlink = ctx->outputs[0];
     int nb_samples        = buf->nb_samples;
     AVFrame *out_buf;
 
+    if (vol->metadata) {
+        double loudness, new_volume, pow_volume, timestamp, mx;
+        AVDictionaryEntry *e;
+        mx = 20; 
+        timestamp = (float)(1.0 * buf->pts / outlink->sample_rate);
+        mx = fmin(mx, timestamp);
+        e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
+        if (e) {
+            loudness = av_strtod(e->value, NULL);
+            if (loudness > -69) {
+                new_volume = fmax(-mx, fmin(mx, (vol->normvol - loudness)));
+                pow_volume = pow(10, new_volume / 20);
+                av_log(ctx, AV_LOG_VERBOSE, "loudness=%f => %f => volume=%f\n",
+                    loudness, new_volume, pow_volume);
+                set_fixed_volume(vol, pow_volume);
+            }
+        }
+    }
+
     if (vol->volume == 1.0 || vol->volume_i == 256)
         return ff_filter_frame(outlink, buf);
 
diff --git a/libavfilter/af_volume.h b/libavfilter/af_volume.h
index bd7932e..d79d040 100644
--- a/libavfilter/af_volume.h
+++ b/libavfilter/af_volume.h
@@ -48,6 +48,8 @@ typedef struct VolumeContext {
     void (*scale_samples)(uint8_t *dst, const uint8_t *src, int nb_samples,
                           int volume);
     int samples_align;
+    char *metadata;
+    double normvol;
 } VolumeContext;
 
 void ff_volume_init_x86(VolumeContext *vol);