[FFmpeg-devel] [PATCH] avfilter: add Dynamic Audio Normalizer filter

Fri Jul 17 13:00:10 CEST 2015

On 7/17/15, James Darnley <james.darnley at gmail.com> wrote:
> On 2015-07-09 18:55, Paul B Mahol wrote:
>> diff --git a/doc/filters.texi b/doc/filters.texi
>> index 3fce874..74c408a 100644
>> --- a/doc/filters.texi
>> +++ b/doc/filters.texi
>> @@ -1520,6 +1520,164 @@ Optional. It should have a value much less than 1
>> (e.g. 0.05 or 0.02) and is
>>  used to prevent clipping.
>>  @end table
>>
>> + at section dynaudnorm
>> +Dynamic Audio Normalizer.
>> +
>> +This filter applies a certain amount of gain to the input audion in
>> order
>                                                              ^^^^^^
> "audio"
>
>> +to bring its peak magnitude to a target level (e.g. 0 dBFS). However, in
>> +contrast to more "simple" normalization algorithms, the Dynamic Audio
>> +Normalizer *dynamically* re-adjusts the gain factor to the input audio.
>> +This allows for applying extra gain to the "quiet" sections of the audio
>> +while avoiding distortions or clipping the "loud" sections. In other
>> words:
>> +The Dynamic Audio Normalizer will "even out" the volume of quiet and
>> loud
>> +sections, in the sense that the volume of each section is brought to the
>> +same target level. Note, however, that the Dynamic Audio Normalizer
>> achieves
>> +this goal *without* applying "dynamic range compressing". It will retain
>> 100%
>> +of the dynamic range *within* each section of the audio file.
>> +
>> + at table @option
>> + at item f
>> +Set the frame length in milliseconds. In range from 10 to 8000
>> milliseconds.
>> +Default is 500 milliseconds.
>> +The Dynamic Audio Normalizer processes the input audio in small chunks,
>> +referred to as frames. This is required, because a peak magnitude has no
>> +meaning for just a single sample value. Instead, we need to determine
>> the
>> +peak magnitude for a contiguous sequence of sample values. While a
>> "standard"
>> +normalizer would simply use the peak magnitude of the complete file, the
>> +Dynamic Audio Normalizer determines the peak magnitude individually for
>> each
>> +frame. The length of a frame is specified in milliseconds. By default,
>> the
>> +Dynamic Audio Normalizer uses a frame length of 500 milliseconds, which
>> has
>> +been found to give good results with most files.
>> +Note that the exact frame length, in number of samples, will be
>> determined
>> +automatically, based on the sampling rate of the individual input audio
>> file.
>> +
>> + at item g
>> +Set the Gaussian filter window size. In range from 3 to 301, must be odd
>> +number. Default is 31.
>> +Probably the most important parameter of the Dynamic Audio Normalizer is
>> the
>> + at code{window size} of the Gaussian smoothing filter. The filter's window
>> size
>> +is specified in frames, centered around the current frame. For the sake
>> of
>> +simplicity, this must be an odd number. Consequently, the default value
>> of 31
>> +takes into account the current frame, as well as the 15 preceding frames
>> and
>> +the 15 subsequent frames. Using a larger window results in a stronger
>> +smoothing effect and thus in less gain variation, i.e. slower gain
>> +adaptation. Conversely, using a smaller window results in a weaker
>> smoothing
>> +effect and thus in more gain variation, i.e. faster gain adaptation.
>> +In other words, the more you increase this value, the more the Dynamic
>> Audio
>> +Normalizer will behave like a "traditional" normalization filter. On the
>> +contrary, the more you decrease this value, the more the Dynamic Audio
>> +Normalizer will behave like a dynamic range compressor.
>> +
>> + at item p
>> +Set the target peak value. This specifies the highest permissible
>> magnitude
>> +level for the normalized audio input. This filter will try to approach
>> the
>> +target peak magnitude as closely as possible, but at the same time it
>> also
>> +makes sure that the normalized signal will never exceed the peak
>> magnitude.
>> +A frame's maximum local gain factor is imposed directly by the target
>> peak
>> +magnitude. The default value is 0.95 and thus leaves a headroom of 5%*.
>> +It is not recommended to go above this value.
>> +
>> + at item m
>> +Set the maximum gain factor. In range from 1.0 to 100.0. Default is
>> 10.0.
>> +The Dynamic Audio Normalizer determines the maximum possible (local)
>> gain
>> +factor for each input frame, i.e. the maximum gain factor that does not
>> +result in clipping or distortion. The maximum gain factor is determined
>> by
>> +the frame's highest magnitude sample. However, the Dynamic Audio
>> Normalizer
>> +additionally bounds the frame's maximum gain factor by a predetermined
>> +(global) maximum gain factor. This is done in order to avoid excessive
>> gain
>> +factors in "silent" or almost silent frames. By default, the maximum
>> gain
>> +factor is 10.0, For most inputs the default value should be sufficient
>> and
>> +it usually is not recommended to increase this value. Though, for input
>> +with an extremely low overall volume level, it may be necessary to allow
>> even
>> +higher gain factors. Note, however, that the Dynamic Audio Normalizer
>> does
>> +not simply apply a "hard" threshold (i.e. cut off values above the
>> threshold).
>> +Instead, a "sigmoid" threshold function will be applied. This way, the
>> +gainfactors will smoothly approach the threshold value, but never exceed
>> that
>    ^^^^^^^^^^^
> "gain factors", maybe.

Those two typos also fixed.

>
> P.S.  Sorry about the two messages.