[FFmpeg-devel] Internal handling of subtitles in ffmpeg

Uoti Urpala uoti.urpala
Fri Jan 2 18:42:09 CET 2009

On Fri, 2009-01-02 at 17:47 +0100, Michael Niedermayer wrote:
> Dialogue event lines in ASS have 10 fields

Not necessarily. They have the fields specified by the [Events] Format:

> if we drop the actual text and effects 8
> They are
> Layer, Start, End, Style, Name, MarginL, MarginR, MarginV
> What is your oppinion about adding the ones that are interger to the struct?

Why only integer ones? And I doubt the margins would be any more
relevant than the position information inside text content. The start
and end times are useful information in code that doesn't care about the
exact rendering but I doubt much of the rest is worth extracting.

> Also style is a string refering to a style which has
> fontname, fontsize, PrimaryColour, SecondaryColour, OutlineColor, Bold
> Italic, Underline, Strikeout, ScaleX, ScaleY, Spacing, Angle, BorderStyle,
> Outline,Shadow,Alignment,MarginL,MarginR,MarginV

These can change depending on a Format: line too.

> Similarly to above the integer ones (which are most) could be added to the
> struct, making access easier and faster.

Again, why only integer ones? If you implement enough parsing to extract
the fields then you could as well process all fields that you

Do you mean copying the style information to each packet which contains
a subtitle using that style? Since each subtitle can override the style
defaults that doesn't seem too useful - anything which needs reliable
values needs better parsing anyway, and could handle the styles

> a union would directly conflict with the idea of caching, that is the
> requirements you set that there be a plain text and a plain ASS char* field
> always and directly could then only be met if these where outside the union.
> Also as you said some things like width are likely only calculateable by
> rendering the whole. Now if we end up needing width for something the
> rendering would only be needed once if it can be cached ...

What exactly would you want to cache? Is there a particular reason to
expect that width will be useful, and lots of other possible things to
calculate will not be? In general width is not a property of a single
subtitle either; at most it's a property of a particular subtitle at a
particular point in time. And if you calculate width by rendering then
will you cache _only_ the width, not the resulting bitmap? If you do
start caching bitmaps then how well are you going to implement that?
There are details such as correspondence between subtitles and bitmaps
not being one-to-one - it's wasteful to render a big glyph multiple
times so the bitmap should be shared between subtitles, and on the other
hand a single rotating subtitle can correspond to lots of bitmaps at
different points in time.

If you do have a real renderer for the subtitles then I think a better
architecture is to have a single "renderer data" pointer and let the
renderer decide how it wants to store any cached information for the
subtitle. That in case such functionality is needed at all - is there
any current use case for knowing width outside actual rendering for

More information about the ffmpeg-devel mailing list