[FFmpeg-user] Advice on using silence removal

Alex R ralienpp at gmail.com
Sat Sep 18 11:40:25 EEST 2021

Hi everyone,

Thank you for providing valuable feedback about silence removal last month.
For the benefit of future archaeologists, I summarize the steps I've taken
and the key elements of the solution. Note that while this worked for me, I
do not claim that this is the optimal approach.

- As Carl pointed out, don't normalize before silence removal. This is
obvious in retrospect, but I didn't think of it myself.
- The "compand" filter makes a substantial contribution to the quality of
the output.
- This article provides a clear, step by step explanation of how to use
this feature of ffmpeg; there are also illustrations that show how the
waveform changes after each step
- Use the mean volume as a threshold for the silence detector (in the past
I used the maximum value)

In case the site above is not available, here is a relevant excerpt:

ffmpeg -i in.mp3  -filter_complex
 "compand=attacks=0:points=-30/-900|-20/-20" out.wav

- attacks=0 means that I wanted to measure absolute volume, not averaging
the sound over a short (or long period of time)
- followed by points, which is a series of "from->to" mappings that are to
be interpreted as:
  - -30/-900, which means that volume below -30db in the original input
track gets converted to -900db (completely silent)
  - -20/-20 means that at -20db the volume remains unchanged

In practical terms, here are the steps I currently use in my noise gate
1. cut the leading and trailing 200ms of the file (this is where I usually
had the sound of a click/tap when users begin/stop the recording)

2. use a combination of a high-pass and low-pass filter for the range 200
.. 4000 that should cover a typical human voice
ffmpeg -i out-02-trim-ex.wav -af "highpass=f=200, lowpass=f=4000"

3. apply the compand filter
ffmpeg -i out-03-range-filter.wav  -filter_complex
"compand=attacks=0:points=-30/-900|-20/-20" out-04-compand.wav

4. apply the silence removal filter
ffmpeg -i out-04-compand.wav -af


     d=0.3,areverse,afade=t=in:st=0:d=0.3 out-05-silence-fade.wav

- the threshold of -6dB in the command line above is not hardcoded, but it
is the mean value as detected by `volumedetect`
- we remove silence from the beginning, then turn the signal around and
repeat the process, then turn it around again - such that both ends are
without silence

5. normalize it to the max value returned by `volumedetect`
ffmpeg -i out-05-silence-fade.wav -af "volume=18.2 dB" out-06-normalized.wav

Thanks again for your assistance, I greatly appreciate it. If anyone comes
up with refinements of the describe approach, please share your methodology.


More information about the ffmpeg-user mailing list