[FFmpeg-devel] Internal handling of subtitles in ffmpeg

Reimar Döffinger Reimar.Doeffinger
Thu Jan 1 20:38:57 CET 2009

On Thu, Jan 01, 2009 at 06:56:45PM +0100, Michael Niedermayer wrote:
> > > Also in the light of "horribly complex", does it not feel horribly complex
> > > to require every ASS->X bitstream filter to be able to extract things like
> > > position, i mean in my suggestion these would be stored in a easy accessable
> > > struct doing the extraction just at one spot.
> > 
> > And they would be wrong for any "non-trivial" text subtitle.
> I think you misunderstand what iam suggesting
> I do not suggest to convert "left margin 5, top middle" to (512,50)-(600,100)
> but rather store exactly a semantically equivalent for
> "left margin 5, top middle" in AVSubtitleRect

Well, then instead of every encoder implementing a ASS decoder they all
implement a AVSubtitleRect decoder?
As I see it, either your AVSubtitleRect can represent only a small
fraction (well, probably quite a large fraction of what is actually
used) or it is no longer any simpler than an ASS blob.
The question is, what is AVSubtitleRect or whatever you want to call it
supposed to represent? What is the advantage it is supposed to add?
What meaning does it have if two text parts (e.g. words) are in a different
AVSubtitleRect? What if they are in the same one? That is unclear to me.
And will you require a width/height for AVSubtitleRect or not?
Generating those might be a lot of wasted effort for formats that are
similar (the same actually applies to X/Y if they are some sin(time) +
... I don't know if any subtitle formats actually do this, but they
might specify the position in a way that allows interpolation for frames
generated during deinterlacing, would you want AVSubtitleRect to be able
to handle that as well?).

> > > and general case here means
> > > text -> text while not loosing effects when the destination supports the
> > >     effects
> > > text -> bitmaps (not a single 95% transparent screen sized bitmap)
> > > bitmaps -> display (with bitmaps not being colorspace converted twice)
> > > text+bitmaps -> text+bitmaps
> > 
> > Well, I just think you'd have to extend this to have at least those
> > "basic" subtitle types:
> > "DATA blob" (ASS with bitmap support extensions?, not possible to correctly
> > represent as AVSubtitleRects, thus not using them - alternatively
> > giving up on a common representation format for anything so advanced)
> > "trivial" bitmap only (using AVSubtitleRects)
> > "trivial" text only (using AVSubtitleRects)
> > "trivial" bitmap+text (using AVSubtitleRects)
> Please elaborate on what you consider trivial and non trivial, i have
> difficulty understanding this.

"trivial": fixed position, no effects/transformations or anything.
Should be possible to render onto screen with no more than maybe 100
lines of code.
That is the meaning AVSubtitleRect has for me currently, due to the way
it is designed currently: something really easy to put over a video, and
IMHO it is unacceptable to loose this (but as said it could be a special

> To me, any way to specify a position in a non ambigous way is equivalent
> i mean no matter if text is specified with pixel based margins rectangle
> left/right justified flags, screen or display relative coordinates with some
> rotation/sheer/... (aka affine transformation) or other.

Ok, I'll try it to say it in a different way: I currently feel that by
extending AVSubtitleRect that way you will loose simplicity without
gaining anything, and that worries me (you know, that make simple thinks
really simple and hard things possible thing).
I think a good criteria for a good API here to me would be that you'd need
maybe 20 lines of code to make ffplay just display all text subtitles on the
console during playback, and maybe 50 more to display them at somewhat
accurate positions (including setup work for ncurses or some such, and
those numbers actually feel a bit high to me).

Reimar D?ffinger

More information about the ffmpeg-devel mailing list