[FFmpeg-devel] Internal handling of subtitles in ffmpeg

Reimar Döffinger Reimar.Doeffinger
Fri Jan 2 09:18:10 CET 2009

On Fri, Jan 02, 2009 at 02:20:43AM +0100, Michael Niedermayer wrote:
> On Fri, Jan 02, 2009 at 12:08:00AM +0100, Reimar D?ffinger wrote:
> > On Thu, Jan 01, 2009 at 10:36:36PM +0100, Michael Niedermayer wrote:

Most relevant things first:

> If you want to convince me to drop ASS in AVSubtitleRect and rather use
> UTF8, for now, i dont think you would have much difficulty convincing me.
> but that would mean no formating at all for now ...

I think that's close to what I was thinking about, but note that I am
considering AVSubtitle as the "public" face of FFmpeg subtitles
and the big cases I can see are:
1) they have a ASS renderer. They want just one ASS string to pass to
2) they have something that can do text. They want text. Possibly in
logical order, possibly with coordinates (movie-absolute will do, even
if badly).
3) they can display or blend bitmaps. They want bitmaps as currently

I'd prefer those to be available directly without having 50 more fields
in the struct I have to read up on first to know if they are relevant
for me and 10 conversion functions I have to call at the right place.

> > Well, you know, I am trying to convince you to say: hell, let's do the
> > simple stuff simple and proper and leave the rest to a complicated
> > extension.
> Iam still waiting for you to explain your simple & proper solution. It seems
> what you suggest has changed somewhat so iam not entirely sure if you still
> argue in favor of replacing decoder->encoder by bitstream filters or what
> the intermediate format is supposed to be, originally you suggested ASS but it
> seems you dont suggest this anymore?

I am arguing that "the one subtitle representation" should not exist.
Unfortunately that is not possible if conversion between all imaginable
formats should work.
That is why I came up with the "one ASS string blob (possibly in AVSubtitle)"
because that at least hides it well (putting it in AVSubtitleRect does
not hide it well because that will affect the meaning of x,y - actually
already just allowing text in there will though).
As to why I am arguing against it? I think that "the one subtitle
representation" is most likely to lead to an API where writing your own
decoder is less hassle than learning to use the API correctly.
I ended the discussion because that is of course an impossible argument
to make against a non-existent API, no matter what a convoluted example
one makes someone else can say "well, but there is special function XY
just for that!".
Also I did not really feel that anyone gained any real insight by
the "convoluted examples" I made (if you find them useful, I have two
more at the end).
I guess it comes down to design philosophy:
> simple & proper solution
I'm advocating a simple solution, and someone else may add as many hacks outside my view
to handle the "proper" (supporting everything) stuff.
I'd say you want a solution designed to be simple and proper, though to
me it seems likely that anyone involved in the long discussion about it
will still be able to judge if it is simple.

> > > The advantage is the same that there is for using AVCodecContext instead of
> > > using a char* of an mpeg4 header to represent the related info.
> > > it would very well be possible to make our mpeg2 decoder convert width/height
> > > and so on into a mpeg4 bitstream and export that ...
> > > Its just that working with int, float, ... is easier than parsing bitstreams
> > > or strings
> > 
> > But that is exactly the point! Width and height for video are always
> > simple ints, but once they could be arbitrary formulas wouldn't all you
> > do just be inventing yet another encoding for the formulas?
> i dont understand what you try to say.
> I was arguing to export values through a struct instead of a char* using a
> using a complex encoding.

And I say: If the problem is difficult enough, your struct becomes just
yet another complex encoding, i.e. you win a minor simplification by doubling
the number of representations.

> > > Besides if some information from mpeg2 has no place in mpeg4, its a lot easier
> > > to add the extra field or value to a struct than to find some way to squeeze
> > > it in a string or bitstream.
> > 
> > What if MPEGn used XML structs with user defined elements that only very
> > few people need? Would it still be the best way to export it that way
> > when it muddles the API instead of just letting the people who want the
> > really difficult things bear a bit more pain?
> if there where xml in mpeg, i would see no problem exporting this in a
> new and seperate field. Users wanting it could get it from there, others
> could ignore it.

That is what I want your "general purpose subtitle representation" to
be: something that was primarily designed so it can be ignored, or
alternatively you fail hard - but not something that most likely will be
used in a way that is half working and half broken.

Reimar D?ffinger

The two more "convoluted examples":
1) If there is some text and the same text again representing its
shadow, should they be in the same AVSubtitleRect? If they're not that
makes it harder to remove the "shadow" if you output it non-graphically.
You might also render the glyphs more often than necessary if the shadow
has the same size and shape.
If they are, they may be far apart and the coordinates x/y are wrong for
at least one.

2) Next, should the AVSubtitleRects be in logical order or rendering order?
Your examples for the simplest text rendering assumed logical order.
But you might have something like
> small multiline subtitle text
to be read in that order, but rendered overlapping, with the large text
of course below (it would not be readable the other way round).
So if you want to actually render them on screen that must be done in
the opposite order from how you would read them.

More information about the ffmpeg-devel mailing list