[FFmpeg-devel] [PATCH 2/3] textdec: Rename all generic parts from srt to text.

Thu Aug 2 17:23:15 CEST 2012

On Thu, Aug 02, 2012 at 05:00:05PM +0200, Nicolas George wrote:
> Le sextidi 16 thermidor, an CCXX, Clément Bœsch a écrit :
> > A lot of formats don't define a clear standard either. If the current
> > SubRip markup gets renamed into TagSoup, it will likely evolve to support
> > more tags initially not supposed to be in SubRip. I'm not sure that's a
> > good idea. Example: "MySubFormat supports tags like <i>, <b> just like the
> > generic TagSoup but I also have the tag <img>, let's add the support in
> > TagSoup", and now you can decode MKV+TEXT with muxed <img>, and you can
> > put <img> into SRT files as well.
> 
> The format will not evolve without our control. Or more precisely: the
> features of the lavc decoder will not evolve without our control.
> 
> But if some idiot fansub team starts to create Matroska files with S_TEXT
> tracks and <img> tags, and users are nagging us to support them, we will
> have to consider it.
> 
> On the other hand, if someone invents a format with pseudo-HTML tags,
> including IMG, and wants ffmpeg to support it, a new, dedicated CODEC_ID can
> be added.
> 
> > Where do you want to handle this?
> > 
> > The ASS tags presence is unconditional to the formats. You can have
> > extra ASS tags in SRT, as well as SAMI and maybe more. That needs to be
> > handled at a higher level, for every subtitles I guess.
> 
> Did you actually meet some of these files, or are you guessing?
> 

In SRT yes. For SAMI as I said it's based on the subtitles reader in
MPlayer. And it wouldn't be a surprise to find more of them. I can try
various subtitles editor and do some subtitles convert with them from ASS
to another format, I'm pretty sure I'll be able to generate mixed markups,
which will be btw "well" rendered in some players...

> My rationale to want TAGSOUP instead of SUBRIP is to make it clear when a
> format is really specified or when it is parsed defensively to handle
> anything in the wild.
> 

Well, this tagging system is better known as "SubRip" so I think it makes
more sense to use that name...

> If you want to define CODEC_ID_SUBRIP for text with a finite set of known
> tags, and not recognizing wild ASS tags, that is fine with me, but people
> will probably complain.

My point is that it should be in another pass.

I see two possibilities according to the decision we will make concerning
the storage of the decoded subtitle event.

1) If we keep the current state (the decoders directly convert markup to
   ASS tags), we will need that each decoder access and honor a
   -subtitles_escape_ass general option. During the process of converting
   from markup to ass, { and } will need to be escaped or not. This is
   also where we would need to honor a potential -subtitles_codepage or
   stuff like that.

2) If we decide to store the events in an AST, it simplifies that: each
   decoder just output struct event chunk lists, such as:

   struct event_chunk {
       AVDictionnary *style;
       const chart *content;
       struct event_chunk *next_chunk;
   };

   content will have the unescaped ASS tags.

   Then in the avcodec_decode_subtitle (or any other subtitles post
   processing API you want to) we decide, according to the escape ASS
   general option, to escape or not at that level, eventually honor
   codepage/encoding, and then convert it to the RAW format for subtitles
   (ASS, right?).

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120802/2f77ace2/attachment.asc>