[Ffmpeg-devel] channel ordering and downmixing

Michael Niedermayer michaelni
Sat Apr 7 02:48:58 CEST 2007


On Fri, Apr 06, 2007 at 08:17:29PM -0400, Justin Ruggles wrote:
> > 
> > 
> >> - have the decoder override the channel layout if it wants to
> >> - user-level API: av_channel_mix_init(), av_channel_mix(),
> >>                   av_channel_mix_close()
> >> - the encoder can set the channel layout in encode_init or just set the
> >>   number of channels and set the mask to CHANNEL_MASK_NONE to let the
> >>   muxer decide
> >> - if avctx->channel_layout.mask is CHANNEL_MASK_NONE, the muxer should
> >>   set the channel layout
> > 
> > 
> > i think the AVCodec encoder should have a list of supported layouts and
> > the user app should choose one
> Ok.  Then you have muxer support to consider as well.  e.g. pcm codecs
> can support any layout, but certain containers only support particular
> layouts.  One solution to this might be for the muxer to also have a
> list of supported layouts.  Then we could either have the muxing fail if
> the codec's layout is non-compatible with the muxer's list or else just
> let it be on the user's head of they decide to mix incompatible layouts.

adding such a list to the muxer is trivial if that is needed

> > 
> >>Any suggestions/critiques would be great. :)
> > 
> > 
> > id say first get rid of the floats
> Downmixing doesn't need very high accuracy, so how does 8-bit
> fixed-point sound?  The AC-3 spec gives a suggestion of 6-bit
> coeffs...how odd.  Maybe that's in order to fit values >1.0 into an
> 8-bit integer?

i hear the audphile lynchmob outside already ...
so i suggest 16.16 fixed point

> > then there are no ff_/av_ prefixes on non static things
> True.  I considered that, but I was mimicking the naming scheme for
> codecs, parsers, and bitstream filters.  Is it different in this case
> because the channel layouts are const or because of history?

alot of stuff in lav* is historically missnamed

> > now to the actual design
> > i think downmix coeffs are not a part of the channel layout
> > the channel layout is rather location and type of speakers, that could be
> > simply right, left, front, ... or x,y / x,y,z coordinates or direction in
> > radians or something
> > 
> > from that you can then somehow ;) find the default mixing coeffs to convert
> > from layout X to Y, hardcoding them all is not a good idea, as there are too
> > many as soon as you consider more then mono / stereo as target
> One issue here is that the decoder should be able to specify downmixing
> coeffs to the user based on codec-specific defaults and/or values in the
> bitstream.  The only thread-safe way I can think of to do this is to put
> them in the AVCodecContext.

you missunderstood, i didnt object putting them into AVCC, i just thought 
that the layout should be seperate from the coeffs, that is

    struct AVFoobarChannelLyout
    struct array whatever dowmixcoeffs;

> Another tricky thing is that the values of the coeffs depend on the
> channel layout being downmixed to.  I don't see how the decoder can know
> this without putting both src_channel_layout and dst_channel_layout into
> AVCodecContext or having the decoder provide all sets of coeffs for only
> certain target channel layouts.  Either way could get very messy.  That

IMHO the decoder should always provide all the coeffs it knows, this is
important for converting to another codec (some downmix coeffs would get
lost otherwise)

i even suggest to completely ignore downmix coeffs first and design just the
channel layout stuff that way we have a smaller set of things to consider
and after the channel layout is done and in svn work on the downmix coeffs
(just a random suggestion)

> typedef struct AVChannelDescription {
>     /**
>      * predefined channel label, from enum ChannelLabel
>      */
>     int label;
>     /**
>      * speaker position, [x][y][z] in ?? units
>      */
>     int position[3];
> } AVChannelDescription;
> typedef struct AVChannelLayout {
>     int mask;
>     AVChannelDescription *description;
> } AVChannelLayout

ive no objections to adding anything that simplifies code ...

> I have no idea how to do downmixing based on speaker coordinates.  From
> what little I've read it involves using physics formulas, different
> kinds of filters, and other math which I don't have the desire to delve
> into right now.  But maybe there are simpler solutions I don't know of
> that would be good enough for our purposes...

iam sure there are simple solutions ..


Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070407/a5dbce7e/attachment.pgp>

More information about the ffmpeg-devel mailing list