[FFmpeg-devel] [RFC] Format of packets for text subtitles

Clément Bœsch ubitux at gmail.com
Fri Jun 8 00:59:43 CEST 2012

On Thu, Jun 07, 2012 at 10:48:14PM +0200, Nicolas George wrote:
> Le decadi 20 prairial, an CCXX, Clément Bœsch a écrit :
> > You might want to look at
> > http://gpac.wp.mines-telecom.fr/mp4box/ttxt-format-documentation/ though
> > (MP4 related).
> Unless I am mistaken, it does not explain the format of the packets inside
> the ISOM structure. That would be the interesting point.

Yep that specs are not free AFAIK.

> > About timing, the DTS is also generally set; maybe it can be useful for
> > reordering subtitles?
> Actually, I do not understand the point of DTS at all, but I would be
> interested by explanations.

Maybe not setting them is disrupting lavf?

> > Note: what to do about unrelated information such as comments? JACOSub for
> > instance can be filled with random comments all over the file. ATM I'm
> > dropping them (because it's a pain to handle). Should we "standardize"
> > the way of handling this?
> That could go in side_data, maybe?

Ah, indeed, that might be a good idea.

> > > All subtitle packets are flagged as keyframes.
> > Demuxers must do it? In what case wouldn't you put them?
> I guess: a demuxer for a generic format takes the flag from the container,
> and we hope the file has the flag set; a demuxer for a specialized format
> sets it.
> > Should we make an "input subtitle encoding" option available for all
> > subtitles format at a lavf top level? (available in with CONFIG_ICONV)
> I am not sure whether that belongs at the top level or on a per-demuxer
> basis. What format do you know, apart from MKV, MP4 and OGM, that can hold
> text subtitles?

Maybe FLV (though I don't think there is something really official, you
might want to talk to Luca). MXF might have something too (IIRC it's XML

> > Dropping the timing information from the packet means we might not be able
> > to reconstruct it exactly based on the pts+duration, but I'm not sure
> > that's really a problem.
> I do not understand what you mean here.

ATM with something like ffmpeg -i in.srt -c copy out.srt, the srt muxer (a
raw one, lavf/rawenc.c) will just copy the timing information and thus you
are sure the timing info are the same in {in,out}.srt.

If even in the case of remuxing you are reconstructing the timing
information based on the packet pts+duration (because it's now not part of
the data anymore) you might alter the original timing.

"I'm not sure that's really a problem" because it's hard to find a use
case: you would want a -c copy only if it's a muxed subtitles, and we have
yet to find a format muxing SRT with the timing information. OTOH, I
wouldn't be surprised to find a format storing the whole srt file in a

> > Also, I would actually have a CODEC_ID_SRT and a CODEC_ID_SUBRIP:
> > CODEC_ID_SRT would contain the timing information + markup data, and
> > CODEC_ID_SUBRIP only the markup.
> Having two codec IDs to distinguish the format is an interesting idea, but
> how do we tell the demuxer or whatever that we are interested in sane
> CODEC_ID_SUBRIP or in compatibility CODEC_ID_SRT?

lavf/srtdec.c demuxer would create a CODEC_ID_SRT stream (already the
case) that can be remuxed "verbatim" (including timing)

lavf/matroskadec.c demuxer would create a CODEC_ID_SUBRIP stream without
the timing info in the data.

And then you would just add a ff_subrip_decoder in lavc/srtdec.c to handle
CODEC_ID_SUBRIP, just changing the decode callback.

Again, just a suggestion.

> >				   Note that for instance the "SRT" format
> > can have some extra coordinates data mixed with the timing of each
> > event...
> Could go in the side_data too.

Indeed, that's a solution.

> > Yes this was a timing issue, which I indeed solved in the demuxer context
> > (see 2d52ee8a1a4f9438df90f3c95a6fbfc8f6e812f3). But this kind of
> > "workaround" could have been put at another level; for instance there is a
> > similar issue with SAMI: the next subtitle replaces the previous one
> > (there is no duration field or something), and thus you always need to
> > demux two packets at a time, buffer one, etc (while we could just have put
> > a duration = -1 in the packet).
> A SAMI file is a dedicated format, so we can have the demuxer do more or
> less what we need it to do.

Sure, but yet, this is fairly common; we already know two subtitles
formats with the "event last until the next one" feature. And this is
pretty common, see for instance this (almost) randomly selected one:

It would be way easier for demuxers to parse event one by one and just set
a duration=-1

> It would be much more problematic if you knew a format that can contain text
> subtitles interleaved with video and that relies on that trick to mark the
> end of a subtitle, because we can not afford to buffer several seconds worth
> of stream just to find the duration information.

I don't know any, but my knowledge of formats is very limited.

Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120608/bc83db52/attachment.asc>

More information about the ffmpeg-devel mailing list