[FFmpeg-devel] [PATCH 4/6] avformat/mov: parse ISO-14496-12 ChannelLayout
Zhao Zhili
quinkblack at foxmail.com
Tue Oct 31 05:15:36 EET 2023
> On Feb 24, 2023, at 21:49, Jan Ekström <jeebjp at gmail.com> wrote:
>
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack at foxmail.com <mailto:quinkblack at foxmail.com>> wrote:
>>
>> From: Zhao Zhili <zhilizhao at tencent.com <mailto:zhilizhao at tencent.com>>
>>
>> Signed-off-by: Zhao Zhili <zhilizhao at tencent.com <mailto:zhilizhao at tencent.com>>
>
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
>
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .
>
> Will comment on some things.
I only have an old copy of the spec, and I may have missed some comments
and made some mistakes. Please notify me in mailing list or personal email
(this one) if I didn’t something wrong.
I have network issue with IRC, can only read the archives if I get the time.
I don’t work on open source for daily jobs.
>
>> ---
>> libavformat/mov.c | 79 +++++++++++-
>> libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>> libavformat/mov_chan.h | 26 ++++
>> 3 files changed, 369 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index b125343f84..1db869aa2e 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> return 0;
>> }
>>
>> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> +{
>> + int64_t end = av_sat_add64(avio_tell(pb), atom.size);
>> + int stream_structure;
>> + int ret = 0;
>> + AVStream *st;
>> +
>> + if (c->fc->nb_streams < 1)
>> + return 0;
>> + st = c->fc->streams[c->fc->nb_streams-1];
>> +
>> + /* skip version and flags */
>> + avio_skip(pb, 4);
>
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.
>
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
>
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
>
> 12.2.4 Channel layout
>
> 12.2.4.1 Definition
>
> Box Types: 'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
>
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
>
> 12.2.4.2 Syntax
>
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
> if (version==0) {
> unsigned int(8) stream_structure;
> if (stream_structure & channelStructured) {
> unsigned int(8) definedLayout;
> if (definedLayout==0) {
> for (i = 1 ; i <= layout_channel_count ; i++) {
> // layout_channel_count comes from the sample entry
> unsigned int(8) speaker_position;
> if (speaker_position == 126) { // explicit position
> signed int (16) azimuth;
> signed int (8) elevation;
> }
> }
> } else {
> unsigned int(64) omittedChannelsMap;
> // a ‘1’ bit indicates ‘not in this track’
> }
> }
> if (stream_structure & objectStructured) {
> unsigned int(8) object_count;
> }
> } else {
> unsigned int(4) stream_structure;
> unsigned int(4) format_ordering;
> unsigned int(8) baseChannelCount;
> if (stream_structure & channelStructured) {
> unsigned int(8) definedLayout;
> if (definedLayout==0) {
> unsigned int(8) layout_channel_count;
> for (i = 1 ; i <= layout_channel_count ; i++) {
> unsigned int(8) speaker_position;
> if (speaker_position == 126) { // explicit position
> signed int (16) azimuth;
> signed int (8) elevation;
> }
> }
> } else {
> int(4) reserved = 0;
> unsigned int(3) channel_order_definition;
> unsigned int(1) omitted_channels_present;
> if (omitted_channels_present == 1) {
> unsigned int(64) omittedChannelsMap;
> // a ‘1’ bit indicates ‘not in this track’
> }
> }
> }
> if (stream_structure & objectStructured) {
> // object_count is derived from baseChannelCount
> }
> }
> }
>
> 12.2.4.3 Semantics
>
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
> preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
> version 0. Version 1 should be used to convey the base channel
> count for DRC.
>
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
> both, or neither); the following flags are defined,
> all other values are reserved:
> 1 the stream carries channels
> 2 the stream carries objects
>
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
> (see Table). Each format shall only use contiguous
> channel indices.
> format_ordering Order
> 0 unknown
> 1 Channels, possibly followed by Objects
> 2 Objects, possibly followed by Channels
> Remaining values are reserved
>
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
>
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
> then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
> order corresponds to the order of speaker positions.
>
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
>
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
>
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
> are specified (see Table).
>
> channel_order_definition Channel order specification
> 0 as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
> 1 Default order of audio codec specification
> 2 Channel ordering #2 of audio codec specification
> 3 Channel ordering #3 of audio codec specification
> 4 Channel ordering #4 of audio codec specification
> Remaining values are reserved
>
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
>
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
> least-significant to most-significant, and
> correspond in that ordering with the order of the channels
> for the configuration as documented in
> ISO/IEC 23091-3 ChannelConfiguration. 1-bits in the
> channel map mean that a channel is absent. A zero
> value of the map therefore always means that
> the given standard layout is fully present. The
> default value is 0.
>
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
> structure indicates that no channel structure is
> present. Otherwise, the value is the number of
> channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
> 1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount minus the
> channel count of the channel structure.
>
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
> The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
>
>
>> +
>> + stream_structure = avio_r8(pb);
>> +
>> + // stream carries channels
>> + if (stream_structure & 1) {
>> + int layout = avio_r8(pb);
>> +
>> + av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
>> + if (!layout) {
>> + uint8_t positions[64] = {};
>> + int enable = 1;
>> +
>> + for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
>> + int speaker_pos = avio_r8(pb);
>> +
>> + av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
>> + if (speaker_pos == 126) { // explicit position
>> + int16_t azimuth = avio_rb16(pb);
>> + int8_t elevation = avio_r8(pb);
>> +
>> + av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
>> + azimuth, elevation);
>> + // Don't support explicit position
>> + enable = 0;
>> + } else if (i < FF_ARRAY_ELEMS(positions)) {
>> + positions[i] = speaker_pos;
>> + } else {
>> + // number of channel out of our supported range
>> + enable = 0;
>> + }
>> + }
>> +
>> + if (enable) {
>> + ret = ff_mov_get_layout_from_channel_positions(positions,
>> + st->codecpar->ch_layout.nb_channels,
>> + &st->codecpar->ch_layout);
>> + if (ret) {
>> + av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
>> + ret = 0;
>> + }
>> + }
>> + } else {
>> + uint64_t omitted_channel_map = avio_rb64(pb);
>> +
>> + if (omitted_channel_map) {
>> + avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
>> + omitted_channel_map);
>> + return AVERROR_PATCHWELCOME;
>> + }
>> + ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
>> + }
>> + }
>> +
>> + // stream carries objects
>> + if (stream_structure & 2) {
>> + int obj_count = avio_r8(pb);
>> + av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
>> + }
>> +
>> + avio_seek(pb, end, SEEK_SET);
>> + return ret;
>> +}
>> +
>> static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> {
>> AVStream *st;
>> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>> { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>> { MKTAG('w','f','e','x'), mov_read_wfex },
>> { MKTAG('c','m','o','v'), mov_read_cmov },
>> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
>> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
>> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>> { MKTAG('d','v','c','1'), mov_read_dvc1 },
>> { MKTAG('s','g','p','d'), mov_read_sgpd },
>> { MKTAG('s','b','g','p'), mov_read_sbgp },
>> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
>> index f66bf0df7f..10ebcdc08f 100644
>> --- a/libavformat/mov_chan.c
>> +++ b/libavformat/mov_chan.c
>> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>>
>> return 0;
>> }
>> +
>> +/* ISO/IEC 23001-8, 8.2 */
>> +static const AVChannelLayout iso_channel_configuration[] = {
>> + // 0: any setup
>> + {},
>> +
>
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
>
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
>
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
>
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
>
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1 =
> kAudioChannelLayoutTag_MPEG_1_0, ///< C
> kAudioChannelLayoutTag_CICP_2 =
> kAudioChannelLayoutTag_MPEG_2_0, ///< L R
> kAudioChannelLayoutTag_CICP_3 =
> kAudioChannelLayoutTag_MPEG_3_0_A, ///< L R C
> kAudioChannelLayoutTag_CICP_4 =
> kAudioChannelLayoutTag_MPEG_4_0_A, ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5 =
> kAudioChannelLayoutTag_MPEG_5_0_A, ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6 =
> kAudioChannelLayoutTag_MPEG_5_1_A, ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7 =
> kAudioChannelLayoutTag_MPEG_7_1_B, ///< L R C LFE Ls Rs Lc Rc
>
> kAudioChannelLayoutTag_CICP_9 =
> kAudioChannelLayoutTag_ITU_2_1, ///< L R Cs
> kAudioChannelLayoutTag_CICP_10 =
> kAudioChannelLayoutTag_ITU_2_2, ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11 =
> kAudioChannelLayoutTag_MPEG_6_1_A, ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12 =
> kAudioChannelLayoutTag_MPEG_7_1_C, ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13 = (204U<<16) | 24,
> ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
>
> kAudioChannelLayoutTag_CICP_14 = (205U<<16) | 8,
> ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15 = (206U<<16) | 12,
> ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
>
> kAudioChannelLayoutTag_CICP_16 = (207U<<16) | 10,
> ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17 = (208U<<16) | 12,
> ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18 = (209U<<16) | 14,
> ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
>
> kAudioChannelLayoutTag_CICP_19 = (210U<<16) | 12,
> ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20 = (211U<<16) | 14,
> ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
>
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org <mailto:ffmpeg-devel-request at ffmpeg.org> with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list