[FFmpeg-devel] Captions SCC

Sun Feb 9 20:45:28 EET 2025

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Devlist Archive
> Sent: Sunday, February 9, 2025 6:42 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] Captions SCC
> 
> >
> > Not to start an argument, but WebVTT is kind of a terrible format.
> > It's a lowest common denominator and loses most formatting
> information
> > available even in 608 (which is now more than 40 years old).  Stuff
> > like rollup captions for live programming, color (to distinguish
> > speakers) and caption positioning are pretty important to the
> hearing
> > impaired.
> 
> 
> From the reading I have done, the WebVTT does support some placement,
> italics, and appearance information, but not all players or ripping
> programs support those functions.  

Yes, that's right. It also supports colors. The unfortunate part is that colors and styles need to be predefined as (CSS) classes. It's not possible to use inline styles, which essentially forces doing two passes for precise colors. With the 8 colors in case of 608 it's easy though.
During the subtitle filtering work I had actually started adding missing features to the webvtt encoder but the requirement for predefined styles eventually set me off, making accurate conversions hardly possible. An idea was to create a predefined set of some-thousand styles and then always pick the closes matching one, but I hadn't followed that.

> On Sun, Feb 9, 2025 at 6:03 AM Devin Heitmueller <
> devin.heitmueller at ltnglobal.com> wrote:
> 
> >
> > To my point: no, I don't think normalizing everything down to
> WebVTT
> > is a good idea.

Yes, WebVTT is not capable enough. ffmpeg internally uses the SSA/ASS format (for all text subs), which is undoubtedly the most capable format that exists.
Any subtitle conversion in ffmpeg goes through this format, so when you convert sub title format A to B, it's always 

A => ASS => B

So, when it comes to the question about a normalization, ASS is the way to go and ffmpeg made a good choice to do so.
For those who haven't seen it yet:

https://github.com/softworkz/SubtitleFilteringDemos/tree/master/Demo1

In this demo, the input is DVB bitmap subtitles and the output is DVB bitmap subtitles as well.
But in-between, the OCR filter takes the bitmap and outputs ASS subs. The next filter manipulates the text and afterwards the ASS subs are rendered as bitmaps and encoded as DVB subs again.

When you see this, you might think that it's kind-of like taking the source bitmaps, and writing new text on them, but that's not the case. Right in the middle, there's just the ASS format - which allows you to replicate any text subtitle feature that other formats have.

> > Much of the goal, at least in the work that I do, is to conform to
> the
> > FCC requirements, which generally require that the original 608/708
> > from the content provider be preserved.

All the above for getting to this answer: With ASS as storage/intermediate format it is possible to preserve the original content very precisely - without having to deal with a bitstream that cannot be safely applied to videos with different parameters than the original source.

It "just" requires an encoder for 608/708, hopefully it's more clear now why I had emphasized that earlier.

PS: Please note that this is not a proposal towards using ASS. The point is that ASS already _IS_ the intermediate format in ffmpeg and this won't and can't change (without re-implementing all text-subtitle encoders and decoders). 

sw