[FFmpeg-devel] [PATCH 001/244] Add a new channel layout API

Anton Khirnov anton at khirnov.net
Sun Dec 29 18:39:01 EET 2019


Quoting Nicolas George (2019-12-07 21:25:53)
> Anton Khirnov (12019-12-06):
> > The new API is more extensible and allows for custom layouts.
> > More accurate information is exported, eg for decoders that do not
> > set a channel layout, lavc will not make one up for them.
> > 
> > Deprecate the old API working with just uint64_t bitmasks.
> > 
> > Expanded and completed by Vittorio Giovara <vittorio.giovara at gmail.com>.
> > Signed-off-by: Vittorio Giovara <vittorio.giovara at gmail.com>
> > ---
> >  libavutil/channel_layout.c | 393 +++++++++++++++++++++++++++++------
> >  libavutil/channel_layout.h | 415 ++++++++++++++++++++++++++++++++++---
> >  libavutil/version.h        |   3 +
> >  3 files changed, 719 insertions(+), 92 deletions(-)
> 
> Thanks for sending this here. I will make a few preliminary remarks
> below.
> 
> Before that, I want to mention I think the real trial for this API will
> be integration with libavfilter, especially the format negotiation and
> the user interface of the filters that allow fine control of channels.
> The negotiation, in particular, promises to be an interesting
> algorithmic problem. By "real trial", I mean that it is very possible
> that trying to implement that will make us realize some details need to
> be changed. I have these questions in mind while writing my comments.
> 
> Two important comments to consider, because they affect the structure of
> the API itself:
> 
> - about av_channel_layout_describe(), a general reflection on functions
>   that return strings;
> 
> - about getting a channel by name.
> 

> 
> > +int av_channel_layout_from_mask(AVChannelLayout *channel_layout,
> > +                                uint64_t mask)
> 
> I find that "channel_layout" all over the place makes the code bulky,
> harder to read. And tiring to write. We could decide on a standard
> shortening for it everywhere. For example chlayout.

In the API namespace (function names) or the parameter names? For the
latter, it can be changed at any time without problem and I don't really
care much. For the former, the header is called channel_layout and I'd
lean towards keeping that aligned with the namespace. Not a very strong
opinion though.

> > +enum AVChannel {
> > +    AV_CHAN_FRONT_LEFT,
> > +    AV_CHAN_FRONT_RIGHT,
> > +    AV_CHAN_FRONT_CENTER,
> > +    AV_CHAN_LOW_FREQUENCY,
> > +    AV_CHAN_BACK_LEFT,
> > +    AV_CHAN_BACK_RIGHT,
> > +    AV_CHAN_FRONT_LEFT_OF_CENTER,
> > +    AV_CHAN_FRONT_RIGHT_OF_CENTER,
> > +    AV_CHAN_BACK_CENTER,
> > +    AV_CHAN_SIDE_LEFT,
> > +    AV_CHAN_SIDE_RIGHT,
> > +    AV_CHAN_TOP_CENTER,
> > +    AV_CHAN_TOP_FRONT_LEFT,
> > +    AV_CHAN_TOP_FRONT_CENTER,
> > +    AV_CHAN_TOP_FRONT_RIGHT,
> > +    AV_CHAN_TOP_BACK_LEFT,
> > +    AV_CHAN_TOP_BACK_CENTER,
> > +    AV_CHAN_TOP_BACK_RIGHT,
> > +    /** Stereo downmix. */
> > +    AV_CHAN_STEREO_LEFT = 29,
> > +    /** See above. */
> > +    AV_CHAN_STEREO_RIGHT,
> > +    AV_CHAN_WIDE_LEFT,
> > +    AV_CHAN_WIDE_RIGHT,
> > +    AV_CHAN_SURROUND_DIRECT_LEFT,
> > +    AV_CHAN_SURROUND_DIRECT_RIGHT,
> > +    AV_CHAN_LOW_FREQUENCY_2,
> > +
> > +    /** Channel is empty can be safely skipped. */
> > +    AV_CHAN_SILENCE = 64,
> > +};
> > +
> > +enum AVChannelOrder {
> > +    /**
> > +     * The native channel order, i.e. the channels are in the same order in
> > +     * which they are defined in the AVChannel enum. This supports up to 63
> > +     * different channels.
> > +     */
> > +    AV_CHANNEL_ORDER_NATIVE,
> > +    /**
> > +     * The channel order does not correspond to any other predefined order and
> > +     * is stored as an explicit map. For example, this could be used to support
> 
> > +     * layouts with 64 or more channels, or with channels that could be skipped.
> 
> [ Already said on 2019-10-30 ]
> "or with channels that could be skipped"? What does that mean. If a
> channel is skipped, it is not present, that can be expressed in the
> mask. Is this a mistake for "duplicated"?

Hmm, this was apparently added by Vittorio so I'm not sure, but would
assume it refers to AV_CHAN_SILENCE.

> 
> > +     */
> > +    AV_CHANNEL_ORDER_CUSTOM,
> > +    /**
> > +     * Only the channel count is specified, without any further information
> > +     * about the channel order.
> > +     */
> > +    AV_CHANNEL_ORDER_UNSPEC,
> > +};
> > +
> > +
> >  /**
> >   * @defgroup channel_masks Audio channel masks
> >   *
> > @@ -46,36 +103,41 @@
> >   *
> >   * @{
> >   */
> > -#define AV_CH_FRONT_LEFT             0x00000001
> > -#define AV_CH_FRONT_RIGHT            0x00000002
> > -#define AV_CH_FRONT_CENTER           0x00000004
> > -#define AV_CH_LOW_FREQUENCY          0x00000008
> > -#define AV_CH_BACK_LEFT              0x00000010
> > -#define AV_CH_BACK_RIGHT             0x00000020
> > -#define AV_CH_FRONT_LEFT_OF_CENTER   0x00000040
> > -#define AV_CH_FRONT_RIGHT_OF_CENTER  0x00000080
> > -#define AV_CH_BACK_CENTER            0x00000100
> > -#define AV_CH_SIDE_LEFT              0x00000200
> > -#define AV_CH_SIDE_RIGHT             0x00000400
> > -#define AV_CH_TOP_CENTER             0x00000800
> > -#define AV_CH_TOP_FRONT_LEFT         0x00001000
> > -#define AV_CH_TOP_FRONT_CENTER       0x00002000
> > -#define AV_CH_TOP_FRONT_RIGHT        0x00004000
> > -#define AV_CH_TOP_BACK_LEFT          0x00008000
> > -#define AV_CH_TOP_BACK_CENTER        0x00010000
> > -#define AV_CH_TOP_BACK_RIGHT         0x00020000
> > -#define AV_CH_STEREO_LEFT            0x20000000  ///< Stereo downmix.
> > -#define AV_CH_STEREO_RIGHT           0x40000000  ///< See AV_CH_STEREO_LEFT.
> > -#define AV_CH_WIDE_LEFT              0x0000000080000000ULL
> > -#define AV_CH_WIDE_RIGHT             0x0000000100000000ULL
> > -#define AV_CH_SURROUND_DIRECT_LEFT   0x0000000200000000ULL
> > -#define AV_CH_SURROUND_DIRECT_RIGHT  0x0000000400000000ULL
> > -#define AV_CH_LOW_FREQUENCY_2        0x0000000800000000ULL
> > +#define AV_CH_FRONT_LEFT             (1ULL << AV_CHAN_FRONT_LEFT           )
> > +#define AV_CH_FRONT_RIGHT            (1ULL << AV_CHAN_FRONT_RIGHT          )
> > +#define AV_CH_FRONT_CENTER           (1ULL << AV_CHAN_FRONT_CENTER         )
> > +#define AV_CH_LOW_FREQUENCY          (1ULL << AV_CHAN_LOW_FREQUENCY        )
> > +#define AV_CH_BACK_LEFT              (1ULL << AV_CHAN_BACK_LEFT            )
> > +#define AV_CH_BACK_RIGHT             (1ULL << AV_CHAN_BACK_RIGHT           )
> > +#define AV_CH_FRONT_LEFT_OF_CENTER   (1ULL << AV_CHAN_FRONT_LEFT_OF_CENTER )
> > +#define AV_CH_FRONT_RIGHT_OF_CENTER  (1ULL << AV_CHAN_FRONT_RIGHT_OF_CENTER)
> > +#define AV_CH_BACK_CENTER            (1ULL << AV_CHAN_BACK_CENTER          )
> > +#define AV_CH_SIDE_LEFT              (1ULL << AV_CHAN_SIDE_LEFT            )
> > +#define AV_CH_SIDE_RIGHT             (1ULL << AV_CHAN_SIDE_RIGHT           )
> > +#define AV_CH_TOP_CENTER             (1ULL << AV_CHAN_TOP_CENTER           )
> > +#define AV_CH_TOP_FRONT_LEFT         (1ULL << AV_CHAN_TOP_FRONT_LEFT       )
> > +#define AV_CH_TOP_FRONT_CENTER       (1ULL << AV_CHAN_TOP_FRONT_CENTER     )
> > +#define AV_CH_TOP_FRONT_RIGHT        (1ULL << AV_CHAN_TOP_FRONT_RIGHT      )
> > +#define AV_CH_TOP_BACK_LEFT          (1ULL << AV_CHAN_TOP_BACK_LEFT        )
> > +#define AV_CH_TOP_BACK_CENTER        (1ULL << AV_CHAN_TOP_BACK_CENTER      )
> > +#define AV_CH_TOP_BACK_RIGHT         (1ULL << AV_CHAN_TOP_BACK_RIGHT       )
> > +#define AV_CH_STEREO_LEFT            (1ULL << AV_CHAN_STEREO_LEFT          )
> > +#define AV_CH_STEREO_RIGHT           (1ULL << AV_CHAN_STEREO_RIGHT         )
> > +#define AV_CH_WIDE_LEFT              (1ULL << AV_CHAN_WIDE_LEFT            )
> > +#define AV_CH_WIDE_RIGHT             (1ULL << AV_CHAN_WIDE_RIGHT           )
> > +#define AV_CH_SURROUND_DIRECT_LEFT   (1ULL << AV_CHAN_SURROUND_DIRECT_LEFT )
> > +#define AV_CH_SURROUND_DIRECT_RIGHT  (1ULL << AV_CHAN_SURROUND_DIRECT_RIGHT)
> > +#define AV_CH_LOW_FREQUENCY_2        (1ULL << AV_CHAN_LOW_FREQUENCY_2      )
> >  
> > +#if FF_API_OLD_CHANNEL_LAYOUT
> >  /** Channel mask value used for AVCodecContext.request_channel_layout
> >      to indicate that the user requests the channel order of the decoder output
> > -    to be the native codec channel order. */
> > +    to be the native codec channel order.
> > +    @deprecated channel order is now indicated in a special field in
> > +                AVChannelLayout
> > +    */
> >  #define AV_CH_LAYOUT_NATIVE          0x8000000000000000ULL
> > +#endif
> >  
> >  /**
> >   * @}
> > @@ -122,6 +184,139 @@ enum AVMatrixEncoding {
> >      AV_MATRIX_ENCODING_NB
> >  };
> >  
> > +/**
> > + * @}
> > + */
> > +
> > +/**
> > + * An AVChannelLayout holds information about the channel layout of audio data.
> > + *
> > + * A channel layout here is defined as a set of channels ordered in a specific
> > + * way (unless the channel order is AV_CHANNEL_ORDER_UNSPEC, in which case an
> > + * AVChannelLayout carries only the channel count).
> > + *
> 
> > + * Unlike most structures in Libav, sizeof(AVChannelLayout) is a part of the
> > + * public ABI and may be used by the caller. E.g. it may be allocated on stack.
> > + * In particular, this structure can be initialized as follows:
> > + * - default initialization with {0} or by setting all used fields correctly
> > + * - with predefined layout as initializer (AV_CHANNEL_LAYOUT_STEREO, etc.)
> > + * - with a constructor function such as av_channel_layout_default()
> > + * On that note, this also applies:
> > + * - copy via assigning is forbidden, av_channel_layout_copy() must be used
> > + *   instead (and its return value should be checked)
> > + * - if order is AV_CHANNEL_ORDER_CUSTOM, then it must be uninitialized with
> > + *   av_channel_layout_uninit().
> > + *
> > + * No new fields may be added to it without a major version bump, except for
> > + * new elements of the union fitting in sizeof(uint64_t).
> 
> [ Already said on 2019-10-30 ]
> This is a significant difference from the other APIs, and we need to be
> really sure we can live with it.
> 
> I am rather for this version, but I am afraid we will find cases that
> need extending the structure.
> 
> I suggest we explicitly declare this API unstable for a few months,
> during which we consider acceptable to add fields.

I am aware that it is different, but making the struct dynamic would
make using it significantly more cumbersome. Given how long we have
lived with the bitmask, I do not expect there to be a significant need
to add more fields to it.

Plus, it can still be extended by adding a new AV_CHANNEL_ORDER types
and a corresponding new union member.

> 
> > +
> > +    /**
> > +     * Details about which channels are present in this layout.
> > +     * For AV_CHANNEL_ORDER_UNSPEC, this field is undefined and must not be
> > +     * used.
> > +     */
> > +    union {
> > +        /**
> > +         * This member must be used for AV_CHANNEL_ORDER_NATIVE.
> > +         * It is a bitmask, where the position of each set bit means that the
> > +         * AVChannel with the corresponding value is present.
> > +         *
> > +         * I.e. when (mask & (1 << AV_CHAN_FOO)) is non-zero, then AV_CHAN_FOO
> > +         * is present in the layout. Otherwise it is not present.
> > +         *
> > +         * @note when a channel layout using a bitmask is constructed or
> > +         * modified manually (i.e.  not using any of the av_channel_layout_*
> > +         * functions), the code doing it must ensure that the number of set bits
> > +         * is equal to nb_channels.
> > +         */
> > +        uint64_t mask;
> > +        /**
> > +         * This member must be used when the channel order is
> > +         * AV_CHANNEL_ORDER_CUSTOM. It is a nb_channels-sized array, with each
> > +         * element signalling the presend of the AVChannel with the
> > +         * corresponding value.
> > +         *
> > +         * I.e. when map[i] is equal to AV_CHAN_FOO, then AV_CH_FOO is the i-th
> > +         * channel in the audio data.
> > +         */
> 
> > +        enum AVChannel *map;
> 
> I suggest:
> 
>     typedef struct AVChannelMapped {
>         enum AVChannel id;
>         const char *descr;
>     } AVChannelMapped;
> 
>     AVChannelMapped *map;
> 
> That way, the application can provide a description for each channel. It
> is useful if the same AVChannel value is present several times.

Given that a copy of this struct is embedded in every single frame, I'd
rather not add extensive dynamically allocated metadata to it. Those
should rather appear in some higher level API.

> 
> > +    } u;
> > +} AVChannelLayout;
> > +
> 
> > +#define AV_CHANNEL_LAYOUT_MONO \
> > +    { .order = AV_CHANNEL_ORDER_NATIVE, .nb_channels = 1,  .u = { .mask = AV_CH_LAYOUT_MONO }}
> 
> [ Already said on 2019-10-30 ]
> These macros can only be used to init structures. This is unusual for
> macros in ALL_CAPS, and therefore it needs to be documented.

They also can (and are, in the patchset) used as
(AVChannelLayout)AV_CHANNEL_LAYOUT_FOO

We might want to add a macro for that.

> > +
> > +/**
> > + * Free any allocated data in the channel layout and reset the channel
> > + * count to 0.
> > + *
> > + * @note this only used for structure initialization and for freeing the
> > + *       allocated memory for AV_CHANNEL_ORDER_CUSTOM order.
> > + *
> > + * @param channel_layout the layout structure to be uninitialized
> > + */
> > +void av_channel_layout_uninit(AVChannelLayout *channel_layout);
> > +
> > +/**
> > + * Make a copy of a channel layout. This differs from just assigning src to dst
> > + * in that it allocates and copies the map for AV_CHANNEL_ORDER_CUSTOM.
> > + *
> > + * @note the destination channel_layout will be always uninitialized before copy.
> > + *
> > + * @param dst destination channel layout
> > + * @param src source channel layout
> > + * @return 0 on success, a negative AVERROR on error.
> > + */
> > +int av_channel_layout_copy(AVChannelLayout *dst, const AVChannelLayout *src);
> > +
> 
> > +/**
> > + * Get a human-readable string describing the channel layout properties.
> > + *
> > + * @note The returned string is allocated with av_malloc(),
> > + *       and must be freed by the caller with av_free().
> > + *
> > + * @param channel_layout channel layout to be described
> > + * @return a string describing the structure or NULL on failure in the same
> > + *         format that is accepted by @ref av_channel_layout_from_string().
> > + */
> 
> > +char *av_channel_layout_describe(const AVChannelLayout *channel_layout);
> 
> The usual function that returns a string. I have always found these
> functions very clumsy. They either take a buffer and size, which is
> simple but very inconvenient when the resulting string can be long, or
> they do memory allocations and require annoying error checks. Both are
> inefficient because the returned string will probably be immediately
> copied into another buffer or written somewhere.
> 
> I have been thinking about a better interface for a long time, and I
> recently found the right way to do it. I implemented a first draft, it
> works. And it also makes the implementation of this function simpler.
> 
> I propose you this:
> 
> void av_channel_layout_write(AVWriter wr, const AVChannelLayout *channel_layout);
> 
> It can be used several ways:
> 
>     char buf[arbitrary];
>     av_channel_layout_write(av_buf_writer_array(buf), layout);
> 
> Or:
> 
>     AVWriter wr = av_dynbuf_writer();
>     av_channel_layout_write(wr, layout);
>     ret = av_dynbuf_writer_finalize(wr, &data, &size);
> 
> Or several other possibilities to write directly to av_log() or to
> user-supplied callbacks.

That still requires error checks and adds considerable complexity.

There is no hiding from the fact that the string size is not bounded, so
either you make up an arbitrary limit and hope it's enough (and have it
fail for special cases), or you do dynamic allocation with error checks.

> 
> > +
> > +/**
> > + * Get the channel with the given index in a channel layout.
> > + *
> > + * @param channel_layout input channel layout
> > + * @return channel with the index idx in channel_layout on success or a negative
> > + *                 AVERROR on failure (if idx is not valid or the channel order
> > + *                 is unspecified)
> > + */
> 
> > +int av_channel_layout_get_channel(const AVChannelLayout *channel_layout, int idx);
> 
> [ Already said on 2019-10-30 ]
> unsigned idx

ok

> 
> But do we really need this as part of the API?

Yes. The point of the API is to provide a logical mapping between
audio channels and their indices in a stream. That way, the callers that
only read the channel layout do not (typically) need to know how exactly
it is stored.

This function is the canonical way of answering the question 'what
semantics does n-th channel in this stream have?', so the callers don't
need to handle ORDER_NATIVE vs ORDER_CUSTOM, or do any messing with
bitmasks.

> 
> [ New ]
> 
> On the other hand, I am pretty sure we will need a function to let the
> user specify one channel within the layout. This is important with this
> new API, since the channel name is not enough: since channels can be
> duplicated, the name of a channel is not enough to identify it. We need
> something to express "the third front-left channel" for example.
> 
> I suggest
> 
> int av_channel_layout_find_channel(const AVChannelLayout *channel_layout, 
>                                    const char *name);
> 
> where name could be "FL" if there is only one, "c7" for the channel
> number 7 whatever the layout, "FL3" for the third FL, and possibly a
> name that appears in AVChannelMapped.descr.

I do not agree. Duplicated channels in a layout are expected to be a
fringe thing and how you handle them highly depends on the specific use
case. I expect a typical caller will want to disregard that possibility
and just take the first channel of each semantics.
So I do not believe a dedicated function for this makes sense. We could
always add something later though, if it turns out to be necessary.

> 
> > +
> > +/**
> > + * Get the index of a given channel in a channel layout. In case multiple
> > + * channels are found, only the first match will be returned.
> > + *
> > + * @param channel_layout input channel layout
> > + * @return index of channel in channel_layout on success or a negative number if
> > + *         channel is not present in channel_layout.
> > + */
> 
> > +int av_channel_layout_channel_index(const AVChannelLayout *channel_layout,
> > +                                    enum AVChannel channel);
> 
> [ Already said on 2019-10-30 ]
> Same: do we really need this?

Again, yes. E.g. you are decoding an audio stream with multiple channels
and want to know which is the 'front left' one. This function gives you
the canonical answer to that.

> 
> > +
> > +/**
> > + * Find out what channels from a given set are present in a channel layout,
> > + * without regard for their positions.
> > + *
> > + * @param channel_layout input channel layout
> > + * @param mask a combination of AV_CH_* representing a set of channels
> > + * @return a bitfield representing all the channels from mask that are present
> > + *         in channel_layout
> > + */
> 
> > +uint64_t av_channel_layout_subset(const AVChannelLayout *channel_layout,
> > +                                  uint64_t mask);
> 
> [ Already said on 2019-10-30 ]
> Same.

It turns out to be useful in flacenc, matroskaenc and mlpdec
conversions. And of course can be potentially useful to the callers.
Since we are not deprecating the AV_CH_FOO macros, I do not see a
problem with having it public.

-- 
Anton Khirnov


More information about the ffmpeg-devel mailing list