[FFmpeg-devel] Matroska subtitle decoding: CODEC_ID_TEXT vs. CODEC_ID_SRT

Sun Apr 24 17:16:02 CEST 2011

On Sun, Apr 24, 2011 at 02:54:09PM +0200, Martin Lambers wrote:
> Hi all,
> 
> According to this site
> <http://matroska.org/technical/specs/codecid/index.html>, "S_TEXT/UTF8"
> are SRT subtitles.

What they are writing is complete nonsense.
Quote:
"When placing SRT in Matroska, part 3 is converted to UTF-8 and placed in the data portion of the Block. Part 2 is used to set the timecode of the Block, and BlockDuration element. Nothing else is used."
Or in other words they don't write any SRT data into the file at all,
but plain text. From what it looks like whoever wrote it was unable
of distinguishing between the mkvmerge tool and the MKV format.
I see really no relation at all between SRT and what MKV stores
there, except that the conversion is really easy.

> Yet in libavformat/matroska.c, there are two entries
> for "S_TEXT/UTF8":
> 
> ...
> {"S_TEXT/UTF8"      , CODEC_ID_TEXT},
> {"S_TEXT/UTF8"      , CODEC_ID_SRT},
> ...
> 
> This has the result that SRT subtitles are handled with CODEC_ID_TEXT
> during decoding since this is the first entry found. Therefore, they do
> not get translated to ASS as would happen with SRT subtitles from other
> sources.

CODEC_ID_TEXT is correct by what I can tell, it might help if
you actually told us what specifically the issue is you have.

> Is there a reason to keep this behaviour? Can the first entry be removed?

Removing it would probably break muxing in text subtitles.
You could just change their order around, but IMO that's still wrong.