[FFmpeg-devel] [PATCH] Metadata

Baptiste Coudurier baptiste.coudurier
Tue Jan 6 00:40:06 CET 2009

Michael Niedermayer wrote:
> On Mon, Jan 05, 2009 at 11:40:12AM -0800, Baptiste Coudurier wrote:
>> Hi Michael,
>> Michael Niedermayer wrote:
>>> On Sat, Jan 03, 2009 at 03:26:05PM -0800, Baptiste Coudurier wrote:
>>>> Hi Michael,
>>>> Michael Niedermayer wrote:
>>>>> [...]
>>>>  >
>>>>> + * 3. A tag whichs value is translated has the ISO 639 3-letter language code
>>>>> + *    with a '-' between appended. So for example Author-ger=Michael, Author-eng=Mike
>>>>> + *    the original/default language is in the unqualified "Author"
>>>>> + *    A demuxer should set a default if it sets any translated tag.
>>>>  >
>>>>> [...]
>>>>  >
>>>>> +typedef struct {
>>>>> +    char *key;
>>>>> +    char *value;
>>>>> +}AVMetaDataTag;
>>>> Maybe it would be simpler and more extensible to have a "const char 
>>>> **attributes" field where to store language, or anything else related to 
>>>> the AVMetaDataTag entry. This would avoid parsing the '-'.
>>>> What do people think ?
>>> I am against it, let me explain why
>>> First, currently metadata support in svn is "too little" that is nothing
>>> is really supported, no preserving of arbitrary tags, no way for users to
>>> add anything but 5 standard tags ...
>> I definitely agree.
>>> Aurels variant, that had a language field and did use a tree based metadata
>>> system allowing metadata about metadata is IMHO "too much" Its not something
>>> anyone should need, nor is it really needed for language & metadata about
>>> metadata, and still it wouldnt be able to handle all metadata about other
>>> metadata like "the email address of the child of the author and producer"
>>> my sugestion of a simple key-value based system
>>> can be stored in any container that supporte key-value string based
>>> metadata, and still can represent language and metadata about other metadata.
>>> Also it can very easily be implemented efficiently, currently all operations
>>> are O(n) thus it would become slow if there are many tags. But if we would
>>> use tree.c/h it would all just be O(log n) and its very easy to use tree.c/h
>>> with it ...
>>> Now if we do add attributes
>>> * The api to search for tags becomes more complex
>>> * It is more difficult to use tree.c/h (it needs like qsort a sanely
>>>   behaving comparission function, which is trivial for char*, less
>>>   so with an additional attriute list, and even a lot less if we want
>>>   to actually search for specific attributes)
>>> * No container i know supports arbitrary attributes, thus muxers would
>>>   either have to convert the attribute list into a string or extract the
>>>   2 or 3 they suport.
>> Well, these are good point.
>> To be clear, I'm not suggesting a tree metadata scheme, but a way to
>> easily specifiy this key/value metadata details.
>> Like language, type (comes from .mov so excpect '\r' as line separator,
>> encoding is UTF8, etc...)
>> Parsing for '-' is not convenient, 
> either theres a single string, in which case some muxers have to parse for -
> or
> there are many fields, in which case some other muxers have to combine them
> in a single string.

Which muxers ?
How does .mkv stores lang metadata info if it does so ?

All I see is that for .mov you would have to concatenate key name and
lang, and muxer would have to split lang from metadata.
Combining is easier than splitting when dealing with strings in my

I don't know of any container that use "key"-"lang" at metadata scheme
(nut maybe ?).

> The convertion doesnt dissapear and because of this IMHO i would prefer the
> simpler internal repressentation.

How would you specify to the user that data stored in value is raw data
 (like jpeg cover), encoded in UTF8/16, special like '\r' line ended ?

I believe we need a way to specify this to facilitate usage, and this
could fall into attributes.

>> and especially wrong if 'key'
>> contains '-' in its name.
> '-' was just an idea, we could easily switch to another char based on
> * it should be printable (so a user can enter & see it easily)
> * it should be a ASCII char (0..127)
> * it should be very unlikely to occur in a valid&sane key
> * it should interfere little with /bin/sh command line handling

I do agree with this.

>> Itunes and .mp4 contains language for each metadata item.
>>> * The user interface from the command line would be more complex, or it
>>>   would be maybe just a string but then why not keep the string internally
>> Well -metadata "author"="baptiste":"lang"="eng" with escaping is not
>> that complex IMHO.
> thats true
> but similarly, its not that complex either to do it in the demuxer
> and have just 2 strings.
> key='author,lang=eng'
> value='baptiste'
> internally

Well, yes, here we clearly separate "author" from "lang" and this is
good IMHO, also we come nearer the concept of attribute of "author"

> I mean the parsing stays the same its just done somewhere else.
> Like calling a av_metadata_split_key(key_string, &key, &lang)

I've always found string manipulation tedious and boring, but that's
only my opinion. Strcmp on attributes key is simple and does not need to
clutter the public API.

> Also, i belive escaping, no matter if from the command line or internally
> is overkill.

IMHO we will have hard time avoiding it if we choose a printable char
for delimiter. I find sad forbidding a char because of design limitations.

Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
checking for life_signs in -lkenny... no

More information about the ffmpeg-devel mailing list