[FFmpeg-devel] [PATCH] movtextdec: fix handling of UTF-8 subtitles

Jan Ekström jeebjp at gmail.com
Sat Mar 24 16:54:37 EET 2018


On Sat, Mar 24, 2018 at 4:48 PM, wm4 <nfxjfg at googlemail.com> wrote:
> Subtitles which contained styled UTF-8 subtitles (i.e. not just 7 bit
> ASCII characters) were not handled correctly. The spec mandates that
> styling start/end ranges are in "characters". It's not quite clear what
> a "character" is supposed to be, but maybe they mean unicode codepoints.
>
> FFmpeg's decoder treated the style ranges as byte idexes, which could
> lead to UTF-8 sequences being broken, and the common code dropping the
> whole subtitle line.
>
> Change this and count the codepoint instead. This also means that even
> if this is somehow wrong, the decoder won't break UTF-8 sequences
> anymore. The sample which led me to investigate this now appears to work
> correctly.
> ---
> https://github.com/mpv-player/mpv/issues/5675

For reference, the relevant specification for MOV/3GPP Timed Text
seems to be ETSI TS 126 245, which is currently at version 14
(2017-04), available at
http://www.etsi.org/deliver/etsi_ts/126200_126299/126245/14.00.00_60/ts_126245v140000p.pdf
.

It is indeed rather ambiguous in 5.2 regarding what a "character" is
in the context of UTF-8 or UTF-16.

Best regards,
Jan


More information about the ffmpeg-devel mailing list