[FFmpeg-devel] [PATCH] matroska subtitle tracks support

Aurelien Jacobs aurel
Tue Jul 10 09:58:12 CEST 2007


On Tue, 10 Jul 2007 08:26:42 +0200
Reimar D?ffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de> wrote:

> Hello,
> On Mon, Jul 09, 2007 at 11:53:36PM +0200, Aurelien Jacobs wrote:
> > On Mon, 09 Jul 2007 22:16:31 +0100
> > M?ns Rullg?rd <mans at mansr.com> wrote:
> > > Aurelien Jacobs <aurel at gnuage.org> writes:
> > > > On Mon, 9 Jul 2007 18:22:18 +0200
> > > > Michael Niedermayer <michaelni at gmx.at> wrote:
> > > >> On Mon, Jul 09, 2007 at 03:47:21PM +0200, Aurelien Jacobs wrote:
> > > >> > I want to apply the attached patch which provide support for Matroska
> > > >> > subtitle tracks.
> > > >> > Before applying it I wanted to be sure if it is OK to add
> > > >> > CODEC_ID_TEXT_SUBTITLE ?
> > > >> 
> > > >> what is TEXT ?
> > > >> is it raw  ASCII? raw UTF8? raw plain text (with some encoding specified
> > > >> somehow)?
> > > >
> > > > I intended it to be raw UTF8, but that indeed needs to be clarified.
> > > > CODEC_ID_TEXT_UTF8_SUBTITLE seems clear enough to me. OK ?
> 
> Isn't there also the issue that Matroska can contain ASS and other
> subtitles?

Indeed, I think some CODEC_ID_ASS_SUBTITLE will need to be added at
some point. But unfortunatly, Matrsoka don't store ASS subtitles
as is. They are stored butchered, in a way that makes them useless
for an ASS parser. I guess they do this to save a few bytes.
It should be possible to reconstruct them, but at first I will use
them simplest solution which is to only extract the raw UTF8 text
out of the ASS data.

> > > I can't say I like it.  What we need is something analogous to
> > > PixelFormat for text.
> > 
> > In some way, I think the idea is quite good. But on the other hand,
> > I wonder if it's a good idea to support anything else than UTF8 encoding ?
> > I think it would be better if all the text that goes out of libav* would
> > be UTF8.
> 
> Do any of those that do not use UTF8 even specify which format it is?

I don't know about other containers, but in Matroska, subtitles are
either UTF8 or using the local system encoding (IOW undefined).

> I think not, so then e.g. UTF8 and LEGACY would be enoug, the using
> application is the left to do the guessing what LEGACY is supposed to
> be. That scheme seems to work "well enough" for MPlayer at least.

This would indeed be enough for Matroska. I have no idea about other
containers.

Aurel




More information about the ffmpeg-devel mailing list