[FFmpeg-devel] [PATCH] lavc: Replace 181 magic number with ITU_T_T35_COUNTRY_CODE_US

Mon Mar 10 15:57:26 EET 2025

On Mon, Mar 10, 2025 at 5:57 AM Tomas Härdin <git at haerdin.se> wrote:
> Just an aside: I've had potential clients ask about decoding teletext
> from the video essence itself, for stuffing into MXF's VBI thing. Here
> in Sweden subtitles are still done with teletext. I wouldn't be
> surprised if some broadcasters use both teletext and 708

So it's actually pretty easy to go from teletext in a TS stream to
teletext in VBI.  I've done it in the SDI domain in both directions
(for example, I have code in my ffmpeg decklink output avdevice to put
it out over VBI for SD as well as in SDI VANC for HD), and you don't
need to fully decode the stream.  I leveraged a couple of functions in
libzvbi to make my life easier, but in any case it's not much code.  I
think in my tree it was something like 100 lines of code.

> > Muxing together captions from different sources is pretty painful,
> > since you have to parse/decompose the 708 stream and recombine streams
> > from different sources (and then update the PMT).  I have code which
> > does it but haven't made any effort to open source it, and I'm not
> > confident it could easily be done within ffmpeg due to limitations in
> > the ffmpeg framework.
>
> It is indeed painful. With the client I'm doing this with we use
> libcaption to do this, outside of FFmpeg, precisely because FFmpeg's
> "model" is wholly unsuited for stuffing subs into encoded video essence
> streams. But even libcaption is rather lackluster, and does not support
> setting the channel index of each 608 stream. Not hard to modify
> libcaption to gain this feature, but still

Yeah, I didn't have a particularly positive experience with
libcaption, despite the fact that I know it's what a number of
projects use (including OBS).  Everytime I look at that code I spot a
whole pile of things they are doing wrong.  Probably the most obvious
is that they don't properly do rate control for the 608/708 tuples to
embed in the stream, so the resulting stream won't work properly with
many decoders/transcoders.  This was actually one reason I wrote the
vf_ccrepack filter in ffmpeg, to deal with cases where somebody used
libcaption to embed CC into a TS.  The ccrepack filter puts the
caption tuples into.a queue and then re-embeds them at the appropriate
rate given the target framerate.

> > It's also worth noting that the Caption descriptor as defined in the
> > standard does not let you specify the language of individual CTA-608
> > channels within a 708 stream (which is what most people care about).
> > The only way to specify the language for the 608 channels (e.g.
> > CC1-CC4) is via XDS bytes within the 608 stream, which almost nobody
> > does nowadays.  I ran a scan across my network of thousands of
> > channels from different commercial hardware encoders, and couldn't
> > find a single one that specified the 608 language in XDS (if I found
> > cases where it was, I was prepared to submit patches to VLC to show
> > the language in the subtitle dropdown menu).
>
> So everyone just uses a single CC channel (or pair) within each 608
> stream? I'd probably do that too tbh, if I were already doing 708. From
> what I remember of 608 it's meant to be bilingual at most, carrying
> English and Spanish subs for the North-, Middle- and South American
> market.

So CEA-608 provides for up to four channels of captions (using a pair
of bytes for each field of video, or two pairs for progressive
frames).  Broadcasters often embed more than one language (Most common
is English and Spanish) using commercial solutions, but there aren't
really any open source solutions I can think of which combine multiple
caption languages into a single CEA-608 stream.

The character set support is definitely limited (you can't do Unicode
as you can with CEA-708).  What is supported is mostly oriented around
what you would typically find in ISO8859-1 (i.e. North and South
America, and Western Europe).  Also worth noting that many decoders
don't support the full character set, where it works fine with the
characters found typically in English and Spanish, but overlooks
characters from Western Europe.  Same goes for transcoders; even some
expensive commercial transcoders I can think of don't work with all
character sets.

Devin

-- 
Devin Heitmueller, Senior Software Engineer
LTN Global Communications
o: +1 (301) 363-1001
w: https://ltnglobal.com  e: devin.heitmueller at ltnglobal.com