[Libav-user] MPEG1-2 CEA-608 CC (subtitle) decoding fails for interlaced content

Wed Dec 5 10:08:35 EET 2018

> Interesting, how did you test?
> I ask because I don't get anything useful if I don't discard the first
> field's data.

The Closed caption format is described in CFR-2010-title47-vol1-sec15-119.pdf
(and ANSI-CEA-608-E-R-2014.pdf)
When you strip the parity bit from the raw caption data (character &= 0x7F)
you can sort of read the captions. The 'rubbisch' between the captions are
formatting information like color and position.

A frame of interlaced video contains odd field1 and even field 2, and each 
field contains individual closed caption channels.
There are 4 closed captions channels, 4 text channels and a XDS data channel.
Odd field1 contains CC1 (Primary), CC2 (Special non-synchronous use captions),
T1 and T2 (text service). Even field2 contains CC3 (secondary), CC4 (Special 
non-synchronous use captions), T3 and T4 (text service) and XDS data. 
Note that each field contains only 2 bytes of CC data, so the datarate is only
30 * 2 = 60 characters per second.

The above is valid for analoge NTSC television broadcast. The document
ANSI-CTA-708-E-R-2018-Final.pdf describes the successor of CEA-608. For digital
channels only, there is much more capacity. This format is used in our mpeg
datastream, with the CEA-608 datastream embedded in it. 

The data in field2 (first field in our teststream) is probably XDS data, I can
see the bytes "7024 SEG 1 8h] &HD01 } ]DPS17024 S". Other mpeg datastreams might
contain an extra closed caption channel, which should be decoded also as a
seperate closed caption channel.

For best backward compatibility we should make sure the AV_FRAME_DATA_A53_CC
data for field1 comes first in AVFrame, followed by the AV_FRAME_DATA_A53_CC 
data for field2. (can we detect the field order in the mpeg stream with 
AVFrame::top_field_first ?)

Eric de Jong