[FFmpeg-user] Preserving perceived loudness when downmixing audio from 5.1 AC3 to stereo AAC

Wed Aug 7 19:15:38 CEST 2013

Francois Visagie wrote:

> Would it therefore be correct to assume that -request_channels leads
> to only that number of channels being extracted, hence no down-mix?

I am no expert, just a little learning, which of course can be a
dangerous thing :-)

Only for TrueHD so far, dtsMA will likely be the same but it's not
currently supported.

ac3 will still down mix in the decoder, but with request_channels it can
take in to account meta data in the stream so the mix should be more
like the studio intended. In practice it's not that much different from
-ac 2 level wise because it's still normalised to avoid clipping.

> I'm now thoroughly confused by the various "down-mixing"
> possibilities and their potentially differing behaviour, but let me
> try to consolidate:
>
> * you suggest processing individually which of course is the best
> approach in principle

If you want max volume but not to clip I can't think of any other way.
The whole soundtrack needs to be analysed to assess headroom then the
volume can be boosted by whatever amount there is. Even then I guess a
real sound engineer could work out if some of the peaks are very
rare/excessive and decide to clip those in order to get more volume.

I don't know if ffmpeg can do this with sox it's as easy as

sox in.wav  out.wav  gain -n

perceived volume of course gets even more complicated and depends on the
dynamic range of the source - it can be controlled for  studio ac3 but
nothing else so far AFAIK, even though AAC and DTS use it, it's not
supported by their ffmpeg decoders. FAAD2 supports it for AAC but it's
buggy so avoid.

> * once intended down-mixing and perhaps level adjustment have been
> decided upon, which ffmpeg mechanism: * produces technically correct
> down-mixing?

Apart from getting the codec to do it where possible (excluding dts
until fixed) I think that -ac 2 should be considered correct.

> * works for most common audio input formats (e.g. according to Carl
> Eugen aac does not support -request_channels?); * or, can these two
> only be satisfied by down-mixing externally?

As I wrote elsewhere in this thread, having something that hides the
complexity of what the decoders can/can't do would be user friendly, but
I don't know the code/complexity of actually having -aformat do this.