[FFmpeg-devel] [PATCH 16/21] libavcodec/avcodec, libavformat/movenc: support embedding channel layout to stream side data
Erkki Seppälä
erkki.seppala.ext at nokia.com
Wed Aug 24 17:39:53 EEST 2016
On 08/23/2016 08:49 PM, Nicolas George wrote:
>> + int speaker_position; /** an OutputChannelPosition from ISO/IEC 23001-8 */
>
> I think most our API users and even developers have better use for 158 Swiss
> Francs than feeding the ISO. Please make the API self-sufficient; I suggest
> an enum.
I'll introduce an enumeration AVSpeakerPosition. However, the spec is
available publicly at
http://standards.iso.org/ittf/PubliclyAvailableStandards/c062088_ISO_IEC_23001-8_2013.zip
. (And the table of output channel positions starts at page 24.)
> Le septidi 7 fructidor, an CCXXIV, erkki.seppala.ext at nokia.com a écrit :
>> From: Erkki Seppälä <erkki.seppala.ext at nokia.com>
>>
>> Added support for passing complex channel layout configuration as side
>> packet data (AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT,
>> AV_PKT_DATA_AUDIO_CHANNEL_PREDEFINED_LAYOUT,
>> AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT_OBJECT_STRUCTURED) to ISO media files
>> as specified by ISO/IEC 14496-12.
>>
>> This information isn't integrated into the existing channel layout
>> system though,
>
> I think it needs to be.
Unfortunately properly integrating this information into FFmpeg sounds
larger than the scope of my project. I imagine this still would be very
helpful base work should such information be actually supported by
various parts of FFmpeg.
A special bit in the AV_CH_LAYOUT_* could be used for indicating that a
special layout exists, but that information by itself is nowhere enough
to work with this data with ie. in the functions available in
channel_layout.c. And if codec or client code is able to deal with the
advanced channel layout configurations, they also have access to this
side data.
I also believe is is easier to study more deeper integration once a
non-intrusive possibility to work with advanced ISOBMFF channel layouts
exists.
Here is how the mapping on speaker position level might be done:
/** Speaker positions according to ISO/IEC 23001-8 */
enum AVSpeakerPosition {
! AV_SPEAKER_POSITION_L = 0, /// Left front
! AV_SPEAKER_POSITION_R = 1, /// Right front
AV_SPEAKER_POSITION_C = 2, /// Centre front
AV_SPEAKER_POSITION_LFE = 3, /// Low frequency enhancement
AV_SPEAKER_POSITION_LS = 4, /// Left surround
AV_SPEAKER_POSITION_RS = 5, /// Right surround
! AV_SPEAKER_POSITION_LC = 6, /// Left front centre
! AV_SPEAKER_POSITION_RC = 7, /// Right front centre
AV_SPEAKER_POSITION_LSR = 8, /// Rear surround left
AV_SPEAKER_POSITION_RSR = 9, /// Rear surround right
AV_SPEAKER_POSITION_CS = 10, /// Rear centre
AV_SPEAKER_POSITION_LSD = 11, /// Left surround direct
AV_SPEAKER_POSITION_RSD = 12, /// Right surround direct
AV_SPEAKER_POSITION_LSS = 13, /// Left side surround
AV_SPEAKER_POSITION_RSS = 14, /// Right side surround
! AV_SPEAKER_POSITION_LW = 15, /// Left wide front
! AV_SPEAKER_POSITION_RW = 16, /// Right wide front
AV_SPEAKER_POSITION_LV = 17, /// Left front vertical height
AV_SPEAKER_POSITION_RV = 18, /// Right front vertical height
AV_SPEAKER_POSITION_CV = 19, /// Centre front vertical height
AV_SPEAKER_POSITION_LVR = 20, /// Left surround vertical height
rear
AV_SPEAKER_POSITION_RVR = 21, /// Right surround vertical
height rear
AV_SPEAKER_POSITION_CVR = 22, /// Centre vertical height rear
AV_SPEAKER_POSITION_LVSS = 23, /// Left vertical height side
surround
AV_SPEAKER_POSITION_RVSS = 24, /// Right vertical height side
surround
AV_SPEAKER_POSITION_TS = 25, /// Top centre surround
AV_SPEAKER_POSITION_LFE2 = 26, /// E2 Low frequency enhancement 2
AV_SPEAKER_POSITION_LB = 27, /// Left front vertical bottom
AV_SPEAKER_POSITION_RB = 28, /// Right front vertical bottom
AV_SPEAKER_POSITION_CB = 29, /// Centre front vertical bottom
AV_SPEAKER_POSITION_LVS = 30, /// Left vertical height surround
AV_SPEAKER_POSITION_RVS = 31, /// Right vertical height surround
/// 32-45 Reserved
AV_SPEAKER_POSITION_LFE3 = 36, /// E3 Low frequency enhancement 3
AV_SPEAKER_POSITION_LEOS = 37, /// Left edge of screen
AV_SPEAKER_POSITION_REOS = 38, /// Right edge of screen
AV_SPEAKER_POSITION_HWBCAL = 39, /// half-way between centre of
screen and left edge of screen
AV_SPEAKER_POSITION_HWBCAR = 40, /// half-way between centre of
screen and right edge of screen
AV_SPEAKER_POSITION_LBS = 41, /// Left back surround
AV_SPEAKER_POSITION_RBS = 42, /// Right back surround
/// 43–125 Reserved
AV_SPEAKER_POSITION_EXPL = 126, /// Explicit position (see text)
/// 127 Unknown / undefined
};
I suppose if this is preferred, then one for 23001-8 channel layouts
would be preferred as well..
It does seem confusing though that similar #defines are available in
channel_layout.h and I am a bit uncertain of a 1:1 mapping between the
two. I've marked in above with ! the ones that I found a match from the
defines:
AV_CH_FRONT_LEFT AV_SPEAKER_POSITION_L
AV_CH_FRONT_RIGHT AV_SPEAKER_POSITION_R
AV_CH_FRONT_CENTER AV_SPEAKER_POSITION_LC
AV_CH_LOW_FREQUENCY AV_SPEAKER_POSITION_LFE
AV_CH_BACK_LEFT AV_SPEAKER_POSITION_LBS
AV_CH_BACK_RIGHT AV_SPEAKER_POSITION_RBS
AV_CH_FRONT_LEFT_OF_CENTER AV_SPEAKER_POSITION_LC
AV_CH_FRONT_RIGHT_OF_CENTER AV_SPEAKER_POSITION_RC
AV_CH_BACK_CENTER
AV_CH_SIDE_LEFT AV_SPEAKER_POSITION_LSS
AV_CH_SIDE_RIGHT AV_SPEAKER_POSITION_RSS
AV_CH_TOP_CENTER AV_SPEAKER_POSITION_TS
AV_CH_TOP_FRONT_LEFT
AV_CH_TOP_FRONT_CENTER
AV_CH_TOP_FRONT_RIGHT
AV_CH_TOP_BACK_LEFT
AV_CH_TOP_BACK_CENTER
AV_CH_TOP_BACK_RIGHT
AV_CH_STEREO_LEFT
AV_CH_STEREO_RIGHT
AV_CH_WIDE_LEFT AV_SPEAKER_POSITION_LW
AV_CH_WIDE_RIGHT AV_SPEAKER_POSITION_RW
AV_CH_SURROUND_DIRECT_LEFT AV_SPEAKER_POSITION_LSD
AV_CH_SURROUND_DIRECT_RIGHT AV_SPEAKER_POSITION_RSD
AV_CH_LOW_FREQUENCY_2 AV_SPEAKER_POSITION_LFE2
If this approach were taken, then the missing speakers positions
(channels?) would need to be added, conversion maps for 23001-8 would
need to be implemented (and probably refusing if there are non-mappable
positions). Then layout bitmaps would be mapped into 23001-8 ones or if
the layout cannot be found, a custom one generated.. Personally I don't
mind how separate&simple this system still is :).
>> which is much more restricted compared to what the
>> standard permits.
>
> Certainly, but the bitfield channel layout system is enough for most cases
> and well tested. Any new system must work with it, not around it.
But as you may agree, such a modification would not be small. Certainly
not one I would try as my first feature to FFmpeg.
> Please follow the surrounding style for Doxygen comments: /** and */ on a
> separate line, * at the beginning of each line.
Oops. I'll fix those.
>> + /** The following are used if speaker_position == 126 */
>> + int azimuth;
>> + int elevation;
>
> Please document the units and references.
Done.
> Also, since it is repeated and packed, using a small type would make sense
> here if possible.
I'll use the same types as in the headers: int16_t and int8_t.
>> +} AVAudioTrackChannelPosition;
>> +
>> +/** Describes the channel layout (ie. speaker position) of a single audio track */
>> +typedef struct AVAudioTrackChannelLayout {
>> + int nb_positions;
>
>> + /** followed by: AVAudioTrackChannelPosition positions[nb_positions]; */
>
> AVAudioTrackChannelPosition positions[0];
>
> That way, computing the address is taken care of by the compiler without
> tricky pointer arithmetic, including alignment requirements.
Actually this is the way I had it in the first place, but compiler
warnings convinced me not to do it. If I'll use positions[1] the
allocations will end up slightly too large unless without slightly
convoluted code.
Is there a preferred way? It doesn't seem like foo[1] is used anywhere.
> (We already use 0-sized final arrays once in vaapi_encode.h; since VA API is
> for Linux and BSD, it does not probe the crappy proprietary compilers
> support it. In that case, using 1 instead of 0 is fine.)
In particular I don't want to do it conditionally :).
>
>> +} AVAudioTrackChannelLayout;
>> +
>
>> +/** Describes the channel layout based on predefined layout of a single track
>> + by providing the layout and the list of channels are are omitted */
>
> Could you make that clearer?
>
>> +typedef struct AVAudioTrackChannelPredefinedLayout {
>> + int layout; /** ChannelConfiguration from ISO/IEC 23001-8 */
>
>> + int nb_omitted_channels;
>> + /** followed by: char omitted_channels[nb_omitted_channels]; - non-zero indicates the channel is omitted */
>
> Is there a reasonable upper bound to nb_omitted_channels?
Yes, ISOBMFF supports only 64 channels for this mask. But perhaps even a
higher number is usable for some special case scenarios, not involving
ISOBMFF?
> Also, I think the naming is inconsistent: if nb_omitted_channels = 3 and
> omitted_channels[] = { 1, 0, 1 }, there are only two omitted channels, not
> three.
I agree that it seems a bit inconsistent.. Would an integer array
enumerating the omitted channels be better? Of course this whole problem
goes away if the structure is fixed-size (ie. uint8_t
omitted_channels_bitmask[64 / 8] or the uint64_t you suggested).
>> enum AVPacketSideDataType {
>> AV_PKT_DATA_PALETTE,
>>
>> @@ -1556,7 +1585,27 @@ enum AVPacketSideDataType {
>> * Configured the timed metadata parameters, such as the uri and
>> * meta data configuration. The key is of type AVTimedMetadata.
>> */
>> - AV_PKT_DATA_TIMED_METADATA_INFO
>> + AV_PKT_DATA_TIMED_METADATA_INFO,
>> +
>> + /**
>> + * Channel layout, describing the position of spakers for the
>> + * channels of a track, following the structure
>
>> + * AVAudioTrackChannelLayout.
>> + */
>> + AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT,
>
> I think it would be a good idea if the name of the side data matches the
> name of the structure as much as possible: Track -> _TRACK_. Ditto for the
> others.
>
> Would it make sense to have a single side data type and an integer field at
> the beginning announcing the type of the rest:
>
> typedef struct AVComplexChannelLayout {
> enum AVComplexChannelLayoutType type;
> union {
> uint64_t bitmask;
> AVAudioTrackChannelLayout complete;
> AVAudioTrackChannelPredefinedLayout predefined;
> AVAudioTrackChannelLayoutObjectStructured object;
> } layout;
> };
It does seem nice. The actual structs/enums might look like this:
typedef struct AVAudioTrackChannelLayout {
int nb_positions;
/** Contains nb_positions entries*/
AVAudioTrackChannelPosition positions[1];
} AVAudioTrackChannelLayout;
typedef struct AVAudioTrackChannelPredefinedLayout {
int layout; /** ChannelConfiguration from ISO/IEC 23001-8 */
uint64_t omitted_channels;
} AVAudioTrackChannelPredefinedLayout;
typedef enum AVComplexAudioTrackChannelLayoutType {
AV_COMPLEX_CHANNEL_LAYOUT_PREDEFINED,
AV_COMPLEX_CHANNEL_LAYOUT_CUSTOM
} AVComplexAudioTrackChannelLayoutType;
typedef struct AVComplexAudioChannelLayout {
enum AVComplexAudioTrackChannelLayoutType type;
union {
AVAudioTrackChannelPredefinedLayout predefined;
AVAudioTrackChannelLayout complete;
};
} AVComplexAudioChannelLayout;
enum AVPacketSideDataType {
...
/**
* Channel layout, describing the position of speakers for the
* channels of a track, following the structure
* AVComplexAudioChannelLayout.
*/
AV_PKT_DATA_COMPLEX_AUDIO_CHANNEL_LAYOUT,
/**
* The channel layout is object structured with the number of
objects in an int (may accompany AV_PKT_DATA_COMPLEX_AUDIO_CHANNEL_LAYOUT)
*/
AV_PKT_DATA_COMPLEX_AUDIO_TRACK_CHANNEL_LAYOUT_OBJECT_STRUCTURED,
} AVPacketSideDataType;
I'm not super happy about the long names (though tempted to
s/AVAudio/AVComplexAudio/), but what is the alternative :).
Thanks for reviewing and feedback!
More information about the ffmpeg-devel
mailing list