[FFmpeg-user] Audio normalization using "volume" and "compand" filters

Wed Nov 27 21:49:51 CET 2013

On 11/27/13, Thierry Lelegard <thierry at lelegard.fr> wrote:
> Hello,
>
> A have a couple of questions regarding the normalization of audio levels in
> a file. Note that I am not an audio expert.
>
> The idea is to bring the RMS level to a given value "out_rms", say -20
> dBFS,
> with a given maximum peak level "out_peak_max", say -1 dBFS. In a first
> pass,
> I measure "in_rms" and "in_peak" using the audio filter "volumedetect".
>
> My first thought is to use the audio filter "volume" if the input dynamics
> is less than the output one (ie. in_peak - in_rms < out_peak_max -
> out_rms).
> I expect to shift the whole signal to out_rms without distorsion and keep
> out_peak < out_peak_max.
>
> My second thought is to use the audio filter "compand" to adjust the volume
> and compress the dynamics if the input dynamics is too large. I expect to
> obtain out_rms for the mean level and out_peak = out_peak_max.
>
> First problem: The filter "volume" works but the actual adjustment is
> always
> shifted by -0.5dB from the requested value. For -af volume=+5dB, I get an
> actual +4.5 dB. For -af volume=-10dB, I get -10.5 dB. Etc. See the details
> below.
>
> Second problem: The usage of the filter "compand" is extremely obscure. Its
> documentation (http://ffmpeg.org/ffmpeg-filters.html#compand) can hardly be
> understood if you do not already know the meaning of each parameter. See
> below some tests I made without deeply understanding what they mean.
>
> ------------------------
>
> More details on the -0.5dB offset with the filter "volume":
> Let's take as input an MP3 file with low-level audio volume.
>
>> ffmpeg -i test-audio-1.mp3 -af volumedetect -f null -y nul
> ffmpeg version 2.1.1 Copyright (c) 2000-2013 the FFmpeg developers
>   built on Nov 20 2013 21:13:48 with gcc 4.8.2 (GCC)
>   configuration: --enable-gpl --enable-version3 --disable-w32threads
> --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r
> --enable-gnutls --enable-iconv --enable-libass --enable-libbluray
> --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc
> --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb
> --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus
> --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex
> --enable-libtheora --enable-libtwolame --enable-libvidstab
> --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis
> --enable-libvpx --enable-libwavpack --enable-libx264 --enable-libxavs
> --enable-libxvid --enable-zlib
>   libavutil      52. 48.101 / 52. 48.101
>   libavcodec     55. 39.101 / 55. 39.101
>   libavformat    55. 19.104 / 55. 19.104
>   libavdevice    55.  5.100 / 55.  5.100
>   libavfilter     3. 90.100 /  3. 90.100
>   libswscale      2.  5.101 /  2.  5.101
>   libswresample   0. 17.104 /  0. 17.104
>   libpostproc    52.  3.100 / 52.  3.100
> Input #0, mp3, from 'test-audio-1.mp3':
>   Metadata:
>     encoder         : Lavf55.19.104
>   Duration: 00:01:00.00, start: 0.000000, bitrate: 96 kb/s
>     Stream #0:0: Audio: mp3, 48000 Hz, stereo, s16p, 96 kb/s
> Output #0, null, to 'nul':
>   Metadata:
>     encoder         : Lavf55.19.104
>     Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> Stream mapping:
>   Stream #0:0 -> #0:0 (mp3 -> pcm_s16le)
> Press [q] to stop, [?] for help
> size=N/A time=00:01:00.00 bitrate=N/A
> video:0kB audio:11248kB subtitle:0 global headers:0kB muxing overhead
> -100.000191%
> [Parsed_volumedetect_0 @ 0000000002888ba0] n_samples: 5758942
> [Parsed_volumedetect_0 @ 0000000002888ba0] mean_volume: -34.6 dB
> [Parsed_volumedetect_0 @ 0000000002888ba0] max_volume: -14.5 dB
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_14db: 23
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_15db: 92
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_16db: 219
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_17db: 778
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_18db: 2862
> [Parsed_volumedetect_0 @ 0000000002888ba0] histogram_19db: 7000
>
> The RMS level is -34.6 dBFS and the peak level is -14.5 dBFS.
>
> I apply the audio filter "volume" with the following command and check the
> audio levels of the output file using "-af volumedetect" again.
>
>   ffmpeg -i test-audio-1.mp3 -af "volume=...dB" -y test-audio-2.mp3
>   ffmpeg -i test-audio-2.mp3 -af volumedetect -f null -y nul
>
> Here are the results with various volume options:
>
> Input file:
>   mean_volume: -34.6 dB
>   max_volume: -14.5 dB
>
> Using -af "volume=+5dB":
>   mean_volume: -30.1 dB => +4.5
>   max_volume: -9.9 dB   => +4.6
>
> Using -af "volume=+13.5dB":
>   mean_volume: -21.6 dB => +13
>   max_volume: -1.4 dB   => +13.1
>
> Using -af "volume=-10dB":
>   mean_volume: -45.1 dB => -10.5
>   max_volume: -24.9 dB  => -10.4
>
> So it seems that the actual adjustment is the requested value - 0.5 for the
> RMS level and - 0.4 for the peak.
>
> Is there any rational explanation for this?
>
> How can we compute the right volume adjustment for bringing the input RMS
> level to a precise value (apart from blindly applying a hardcoded +0.5 to
> compensate the observed behavior)?
>
> ------------------------
>
> Some tests using the filter "compand".
>
> Let's use the same input file with in_rms = -34.6 dBFS and in_peak = -14.5
> dBFS. Let's try to compress the dynamics and adjust the level so that
> out_rms = -10 dBFS and out_peak = -1 dBFS. This is too compressed, I know,
> this is just to test the filter.
>
> My understanding is to get a dynamics of peak - rms = 9 dB, so compress the
> signal centered on the in_rms = -34.6 dBFS with a peak of -25.6 dBFS and
> then
> apply a gain of +24.6 dB. Am I right or completely out?
>
> Using a mixture of recommended values in the documentation and what I think
> I understand of the "compand" parameters, I tried the following filters
> with
> the following results.
>
> Input file
>   mean_volume: -34.6 dB
>   max_volume: -14.5 dB
>
> -af "compand=attacks=0.3 0.3:decays=0.8 0.8:points=-90/-900 -70/-70
> -34.6/-34.6 -14.5/-25.6
> 0/-25.6:soft-knee=0.01:gain=24.6:volume=0:delay=0.8"
>   mean_volume: -12.6 dB
>   max_volume: 0.0 dB
>
> I though that the points "-90/-900 -70/-70 -34.6/-34.6 -14.5/-25.6 0/-25.6"
> would keep -34.6 dbFS at its level and compress -14.5 dBFS and higher to
> -25.6 and then the signal would be shifted up to out_rms = -10 dBFS and
> out_peak = -1 dBFS. RMS is not that far from -10 dBFS but there is clearly
> some clipping.
>
> Same test without the additional gain:
>
> -af "compand=attacks=0.3 0.3:decays=0.8 0.8:points=-90/-900 -70/-70
> -34.6/-34.6 -14.5/-25.6 0/-25.6:soft-knee=0.01:gain=0:volume=0:delay=0.8"
>   mean_volume: -36.7 dB
>   max_volume: -17.1 dB
>
> The RMS has been lowered and the dynamics is 19.6 dB instead of 9.
>
> Could someone please explain how to use "compand" and its parameters?
>
> More precisely, how can we compress an input audio with the characteristics
> in_rms and in_peak into a given target out_rms and out_peak?

compand filter is port of sox effect filter of same name.
I really doubt that its documentation is obscure.

>
> Thanks for your help.
> -Thierry
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>