[FFmpeg-devel] Matroska subtitle decoding: CODEC_ID_TEXT vs. CODEC_ID_SRT

Martin Lambers marlam at marlam.de
Sun Apr 24 17:48:28 CEST 2011


Hello Reimar,

thanks for having a look at this.

On 24/04/11 17:16, Reimar Döffinger wrote:
> On Sun, Apr 24, 2011 at 02:54:09PM +0200, Martin Lambers wrote:
>> According to this site
>> <http://matroska.org/technical/specs/codecid/index.html>, "S_TEXT/UTF8"
>> are SRT subtitles.
> 
> What they are writing is complete nonsense.
> Quote:
> "When placing SRT in Matroska, part 3 is converted to UTF-8 and placed in the data portion of the Block. Part 2 is used to set the timecode of the Block, and BlockDuration element. Nothing else is used."
> Or in other words they don't write any SRT data into the file at all,
> but plain text. From what it looks like whoever wrote it was unable
> of distinguishing between the mkvmerge tool and the MKV format.
> I see really no relation at all between SRT and what MKV stores
> there, except that the conversion is really easy.

I agree that the wording of the text is confused. My interpretation is
that when they say "plain text" or "UTF-8 text" they really mean SRT
text with the UTF-8 character encoding (this would apply to part 3).

I think the intention here is that SRT subtitles can be stored in the
Matroska container, namely as "S_TEXT/UTF8".

>> Yet in libavformat/matroska.c, there are two entries
>> for "S_TEXT/UTF8":
>>
>> ...
>> {"S_TEXT/UTF8"      , CODEC_ID_TEXT},
>> {"S_TEXT/UTF8"      , CODEC_ID_SRT},
>> ...
>>
>> This has the result that SRT subtitles are handled with CODEC_ID_TEXT
>> during decoding since this is the first entry found. Therefore, they do
>> not get translated to ASS as would happen with SRT subtitles from other
>> sources.
> 
> CODEC_ID_TEXT is correct by what I can tell, it might help if
> you actually told us what specifically the issue is you have.

The issue is that if SRT subtitles are part of a .mkv file, they are
stored as "S_TEXT/UTF8", and FFmpeg will decode them as plain text
subtitles, and thus a video player based on FFmpeg will not properly
interpret SRT tags such as <i>, <b> etc. This was reported by a user of
the Bino video player.

>> Is there a reason to keep this behaviour? Can the first entry be removed?
> 
> Removing it would probably break muxing in text subtitles.
> You could just change their order around, but IMO that's still wrong.

>From my understanding of the text, I think it would be correct to change
the order. Otherwise, there would be no way to decode SRT subtitles
stored in a Matroska container.

And this change should do no harm, either, since interpreting plain text
as SRT has no effects unless the plain text contains SRT tags.

Martin


More information about the ffmpeg-devel mailing list