[FFmpeg-devel] Format of decoded subtitles (was: matroska: Identify S_TEXT/UTF-8 tracks as SRT and not TEXT.)

Clément Bœsch ubitux at gmail.com
Thu Jun 7 09:39:46 CEST 2012

On Thu, May 31, 2012 at 05:07:33PM +0200, Nicolas George wrote:
> Le sextidi 6 prairial, an CCXX, Clément Bœsch a écrit :
> > So if I understand well, you would propose a model with libsubconvert
> > doing any kind of markup conversion instead of the current model where the
> > decoder is "encoding" the event in ASS, bitmap or text?
> Well, it does not need to be a separate library per se, but I really think
> we need some kind of:
> 	ctx = avsub_markup_convert_init(ASS, HTML);
> 	avsub_markup_convert(ctx, sub_ass, sub_html);
> or something.

Maybe, still I'm not sure how this would help, but feel free to propose

> > It should, for text-based subtitles. At least for the "useful" markup. But
> > I admit ASS has some annoying limitations, especially with some particular
> > subtitles features:
> > 
> >  - the first one I have in mind is that there is no text representation
> >    for the "last up to the next subtitles" feature. Example: MicroDVD (and
> >    SAMI which I'm working on ATM) have features like this:
> > 
> >    {500}{600}this is printed starting at frame 500 and last until frame 600
> >    {1234}{}this starts being displayed at frame 1234...
> >    {1400}{}...and will be "replaced" by this text until the end.
> > 
> >    We can express this in the AVPacket (pkt.duration = -1 for example),
> >    but to encode the ASS event, it's not possible to have 00:01:02:03
> >    -1:-1:-1:-1 for instance. So we need to workaround this.
> I am not sure I follow you: this is not markup, this is timing, and IMHO,
> timing belongs in the demuxer and should be decoded by it. For the example,
> the demuxer should output packets like that:
>   { .pts =  500, .duration = 100, .data = "this is printed starting..." },
>   { .pts = 1234, .duration = 166, .data = "this starts being displayed..." },
>   { .pts = 1400, .duration = PTS_MAX - 1400, .data = "...and will..." },

Yes this was a timing issue, which I indeed solved in the demuxer context
(see 2d52ee8a1a4f9438df90f3c95a6fbfc8f6e812f3). But this kind of
"workaround" could have been put at another level; for instance there is a
similar issue with SAMI: the next subtitle replaces the previous one
(there is no duration field or something), and thus you always need to
demux two packets at a time, buffer one, etc (while we could just have put
a duration = -1 in the packet).

> >  - One random limitation against SAMI: this insane HTML-based format
> >    (actually not HTML at all, but full CSS2 compliant...), has two
> >    subtitles place holders. Basically it's two subtitles in one (one to
> >    print the talker name, and one for what's being said), relying on
> >    various presentation markup expectation which ASS can't honor (I don't
> >    want to try converting <table> into ASS markup for example).
> > 
> >  - Other crazy, but of limited usefulness: <img> tag in SAMI (yes...) or
> >    even in JACOSub.
> Even if they are crazy and we will never support them for rendering, we need
> to support them for encoding and decoding and stream copy. Therefore, I do
> not believe we can use ASS as an universal markup.
> The pseudo-HTML of SRT, OTOH, can pretty well be converted into ASS and
> back.
> But considering ASS, I am quite unsure about what part of the line should go
> into the decoded text. IMHO, "Start" and "End" should not (they are timing,
> not markup), but the other fields affect the markup.

See my other comment on the other thread.

> >  - Last one is the precision limitation we already talked about (tb 1/100
> >    for ASS, and 1/1000 for ones like SRT).
> Again, timing, not markup.

Yup sorry I dispersed.


Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120607/e67a8e67/attachment.asc>

More information about the ffmpeg-devel mailing list