[FFmpeg-devel] [RFC] AVSubtitles rework

Clément Bœsch ubitux at gmail.com
Mon Sep 3 21:40:25 CEST 2012

On Mon, Sep 03, 2012 at 08:01:04PM +0200, Nicolas George wrote:
> L'octidi 18 fructidor, an CCXX, Clément Bœsch a écrit :
> > I'm not very fond of introducing a new structure for a few reasons:
> >  - having a AVSubtitle2 will require to maintain both paths for longer,
> >    and the problem is already hard to deal with even if starting from
> >    scratch
> >  - if we do that, it will require duplicating the current public API for a
> >    while, which sounds kind of a pain
> All that is true, but that is the burden of compatibility. If we do not
> introduce a new structure, all programs that currently allocate AVSubtitle
> themselves will break if dynamically linked with a more recent lavc.
> >  - I don't think the current AVSubtitle API is really used apart from
> >    MPlayer, but I may be wrong
> A Google search for avcodec_decode_subtitle2 shows VLC, XBMC, and a few
> small projects.

TL;DR: follow up and extend brainstorming after VDD/subtitles talks

Mmh OK. Well then should we introduce an experimental AVSubtitle2 directly
into libavutil to ease the integration with libavfilter later on?

If we are to start a new structure, we should consider designing it the
proper way at first, so a subtitle structure being able to store two types
of subtitles as we already discussed:

 == bitmap subtitles ==

For the bitmap stuff I don't have much opinions on how it should be done.
IIRC, we agreed that the current AVSubtitle structure was mostly fine
(since AVSubtitle is designed for such kind of subtitles at first) except
that it it is missing the pixel format information, and we were wondering
where to put that info (in each AVSubtitle2->rects or at the root of the
AVSubtitle2 structure).

 == styled events for text based subtitles ==

For the styled text events, each AVSubtitle2 would have, instead of a
AVSubtitle->rects[N]->ass an exploitable N AVSubtitleEvent (or maybe only
one?). This is what the subtitles decoders would output (in a decode2
callback for example, depending on how we keep compat with AVSubtitle) and
what the users would exploit (by reading that AST to use it in their
rendering engine/converter/etc, or simply pass it along to our encoders
and muxers). Additionally, we may want to provide a "TEXT" encoder to
provide a raw text version (stripping all markups) for simple rendering

So, here is a suggestion of the classic workflow:

                                                     /* common transmuxing/coding path */
DEMUXER -> [AVPacket] -> DECODER -> [AVSubtitle2] -> ENCODER -> [AVPacket] -> MUXER
                        /* lavfi/hardsub or video player path */
                                         / \
                                        /   \
       custom rendering                /     \
       engine using the  <--------- text?  bitmap?
      AVSubtitle2->events            /         \
           structure                /           \
                            libass to render?   bitmap overlay
                                 /     \
                           yes  /       \ no
                               /         \
                     ENCODER:assenc   ENCODER:textenc          (<== both lavc encoders)
                             /             \
   AVPacket->data is an ASS /               \
   payload (no timing)     /                 \ AVPacket->data is raw text
 (need to mux for timings)/                   \
                         /                     \
                 libass:parse&render    freetype/mplayer-osd/etc

At least, that's how I would see the usage from a user perspective.

Now if we agree with such model, we need to focus on how to store the
events & styles. Basically, each AVSubtitle2 must make available as AST
the following:

 - an accessible header with all the global styles (such as an external
   .css for WebVTT, the event styles in the ASS header, palettes with some
   formats, etc.); maybe that one would belong in the AVCodecContext
 - one (or more?) events with links to styles structure: either in the
   global header, or associated with that specific event. BTW, these
   "styles" info must be able to contain various information such as
   karaoke or ruby stuff (WebVTT supports that,

We still need to agree on how to store that (and Nicolas already proposed
something related already), but I'd like to check if everyone would agree
with such model at first. And then we might engage in the API for text


Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120903/4b0a6aec/attachment.asc>

More information about the ffmpeg-devel mailing list