[FFmpeg-devel] One pass volume normalization (ebur128)

Mon Jul 15 06:47:48 CEST 2013

Nicolas George in gmane.comp.video.ffmpeg.devel (Sun, 14 Jul 2013
23:14:59 +0200):
>Le sextidi 26 messidor, an CCXXI, Jan Ehrhardt a écrit :
>> Simple, speed.
>
>> I am in no way bound to EBU recommendations
>
>Then use two passes and volumedetect.
>
>If I take the decoding time for audio (Vorbis -q 4) as measurement unit, -af
>ebur128 costs about 4 while -af volumedetect costs about 0.4 (and -af volume
>0.04). Therefore, two decoding for two passes plus volumedetect cost ~2.5
>while one pass with ebur128 costs ~5.

OK, you've got some points for using volumedetect. The question is if
you still get those differences, taken into account that disk speed
might be a limiting factor. Our recordings are on a SD card and should
stay there. Any one-pass scheme has the definite advantage that the disk
has to be accessed only once for reading the input file (and, no, SD
cards do not normally have a disk cache).
Besides that, using two passes is not easy, as I will explain below.

>If you have enough RAM, you can save the PCM while running volumedetect and
>avoid the second decoding too.

I do not control the user environment at all. The only thing I know is
that the host PC has at least Windows XP. For my users it is a one click
experience: they connect a SD card with a Camcorder recording through
USB to their computer, start our software which is on the same SD card,
review if the right video is there and click on 'Yes, upload this for
me'. Then, behind the scenes, MEncoder or FFMpeg starts compressing the
recording, saves the resulting video on the SD card (I am not allowed to
use the HD of the host PC) and LFTP starts uploading the compressed file
over a SFTP-connection to our server. Sometimes hours after the
one-click the video is online.

As there is no user-interaction at all, I will have to find a way to
pass the outcome of the volumedetect to a second FFMpeg-commandline for
the second pass. As far as I know now, this can only be done by parsing
the screen output of the first pass, extract the volume level and use
that for composing the second command line.

I would rather not do that, even if it delivers another 5% (or something
like that) speed increase. Everything that breaks the process into more
steps is bound to lead to errors.

Would it be possible to insert the momentary value for volumedetect in
the metadata and use that as input in af_volume.c? One pass
normalization based on volumedetect should be faster than what we have
now.

How would we achieve that?

Jan