[FFmpeg-devel] [PATCH] avfilter: add Dynamic Audio Normalizer filter

Fri Jul 17 12:53:10 CEST 2015

On 2015-07-09 18:55, Paul B Mahol wrote:
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 3fce874..74c408a 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -1520,6 +1520,164 @@ Optional. It should have a value much less than 1 (e.g. 0.05 or 0.02) and is
>  used to prevent clipping.
>  @end table
>  
> + at section dynaudnorm
> +Dynamic Audio Normalizer.
> +
> +This filter applies a certain amount of gain to the input audion in order
                                                             ^^^^^^
"audio"

> +to bring its peak magnitude to a target level (e.g. 0 dBFS). However, in
> +contrast to more "simple" normalization algorithms, the Dynamic Audio
> +Normalizer *dynamically* re-adjusts the gain factor to the input audio.
> +This allows for applying extra gain to the "quiet" sections of the audio
> +while avoiding distortions or clipping the "loud" sections. In other words:
> +The Dynamic Audio Normalizer will "even out" the volume of quiet and loud
> +sections, in the sense that the volume of each section is brought to the
> +same target level. Note, however, that the Dynamic Audio Normalizer achieves
> +this goal *without* applying "dynamic range compressing". It will retain 100%
> +of the dynamic range *within* each section of the audio file.
> +
> + at table @option
> + at item f
> +Set the frame length in milliseconds. In range from 10 to 8000 milliseconds.
> +Default is 500 milliseconds.
> +The Dynamic Audio Normalizer processes the input audio in small chunks,
> +referred to as frames. This is required, because a peak magnitude has no
> +meaning for just a single sample value. Instead, we need to determine the
> +peak magnitude for a contiguous sequence of sample values. While a "standard"
> +normalizer would simply use the peak magnitude of the complete file, the
> +Dynamic Audio Normalizer determines the peak magnitude individually for each
> +frame. The length of a frame is specified in milliseconds. By default, the
> +Dynamic Audio Normalizer uses a frame length of 500 milliseconds, which has
> +been found to give good results with most files.
> +Note that the exact frame length, in number of samples, will be determined
> +automatically, based on the sampling rate of the individual input audio file.
> +
> + at item g
> +Set the Gaussian filter window size. In range from 3 to 301, must be odd
> +number. Default is 31.
> +Probably the most important parameter of the Dynamic Audio Normalizer is the
> + at code{window size} of the Gaussian smoothing filter. The filter's window size
> +is specified in frames, centered around the current frame. For the sake of
> +simplicity, this must be an odd number. Consequently, the default value of 31
> +takes into account the current frame, as well as the 15 preceding frames and
> +the 15 subsequent frames. Using a larger window results in a stronger
> +smoothing effect and thus in less gain variation, i.e. slower gain
> +adaptation. Conversely, using a smaller window results in a weaker smoothing
> +effect and thus in more gain variation, i.e. faster gain adaptation.
> +In other words, the more you increase this value, the more the Dynamic Audio
> +Normalizer will behave like a "traditional" normalization filter. On the
> +contrary, the more you decrease this value, the more the Dynamic Audio
> +Normalizer will behave like a dynamic range compressor.
> +
> + at item p
> +Set the target peak value. This specifies the highest permissible magnitude
> +level for the normalized audio input. This filter will try to approach the
> +target peak magnitude as closely as possible, but at the same time it also
> +makes sure that the normalized signal will never exceed the peak magnitude.
> +A frame's maximum local gain factor is imposed directly by the target peak
> +magnitude. The default value is 0.95 and thus leaves a headroom of 5%*.
> +It is not recommended to go above this value.
> +
> + at item m
> +Set the maximum gain factor. In range from 1.0 to 100.0. Default is 10.0.
> +The Dynamic Audio Normalizer determines the maximum possible (local) gain
> +factor for each input frame, i.e. the maximum gain factor that does not
> +result in clipping or distortion. The maximum gain factor is determined by
> +the frame's highest magnitude sample. However, the Dynamic Audio Normalizer
> +additionally bounds the frame's maximum gain factor by a predetermined
> +(global) maximum gain factor. This is done in order to avoid excessive gain
> +factors in "silent" or almost silent frames. By default, the maximum gain
> +factor is 10.0, For most inputs the default value should be sufficient and
> +it usually is not recommended to increase this value. Though, for input
> +with an extremely low overall volume level, it may be necessary to allow even
> +higher gain factors. Note, however, that the Dynamic Audio Normalizer does
> +not simply apply a "hard" threshold (i.e. cut off values above the threshold).
> +Instead, a "sigmoid" threshold function will be applied. This way, the
> +gainfactors will smoothly approach the threshold value, but never exceed that
   ^^^^^^^^^^^
"gain factors", maybe.

P.S.  Sorry about the two messages.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150717/bce50c6d/attachment.sig>