[FFmpeg-devel] Subtitles for GSoC

Tue Mar 8 20:42:39 CET 2016

On Tue, Mar 08, 2016 at 06:21:12PM +0100, Gerion Entrup wrote:
> Hello,
> 

Hi,

> my own ideas seems not to be suitable for GSoC, so I looked again on the ideas page,
> because I have high interest to do something for FFmpeg this summer.
> 
> The project, that I find most interesting, unfortunately is a unmentored one, the subtitle
> support. Is someone willing to mentor this?
> 

I added this task for previous OPW (and maybe GSoC, can't remember). I'm
unfortunately not available for mentoring (too much time, energy and
responsibility). Though, I can provide standard help as a developer.

The main issue with this task is that it involves API redesign, which is
often not a good idea for a GSoC task.

That said, a bunch of core limitations have been solved in the past so
it's starting to be comfortable to work on top of the current stack.

I'm summarizing the current state at the end of this mail, which can be
useful for any potential mentor and eventually student.

> On the ideas page the mentioned subtitle for the qualification task is Spruce subtitle. It
> seems, it is already supported, so I would try to implement the corepart of usf. I know
> it is not widely used, but very powerful with similar features as SSA, if I get it right. Do you
> think, that is suitable?
> 

Spruce has indeed been added in last OPW as a qualification task. USF is
more painful but a basic support could be a potential qualification task
indeed. You might be able to figure out something playing with the
ff_smil_* functions for the demuxing part.

So basically you would have to:

- an USF demuxer which extracts the timing and text (with its markup) of
  every event, and put them into an AVPacket

- introduce an USF codec and write a decoder that will transform the
  xml-like markup into ASS markup (see below)

Again, I'm not a mentor, so you need confirmation from someone else.

> And then another question. You mentioned as ultimate goal the libavfilter integration.
> If I get it right, ATM no rendering is implemented, and libavfilter would allow an (automatic)
> rendering from SSA to e.g. dvdsub. Would the rendering itself part of the project (because
> this is very extensive, I think)?
> 

So, yeah, currently the subtitles are decoded into an AVSubtitle
structure, which hold one or several AVSubtitleRect (AVSubtitle.rects[N]).

For graphic subtitles, each rectangle contains a paletted buffer and its
position, size, ...

For text subtitles, the ass field contains the text in ASS markup: indeed,
we consider the ASS markup to be the best/least worst superset supporting
almost every style of every other subtitles formats have, so it's used as
the "decoded" form for all text subtitles. For example, the SubRip (the
"codec", or markup you find in SRT files) decoder will transform
"<i>foo</i>" into "{\i1}foo{\i0}".

So far so good.  Unfortunately, this is not sufficient, because the
AVSubtitle* structs are old and not convenient for several reasons:

- they are allocated on the stack by the users, so we can't extend them
  (add fields) without breaking the ABI (= angry users).

- they are defined in libavcodec, and we do not want libavfilter to
  depend on libavcodec for a core feature (we have a few filters
  depending on it, but that's optional). As such, libavutil is a much
  better place for this, which already contains the AVFrame.

- the graphic subtitles are kind of limited (palette only, can't hold YUV
  or RGB32 pixel formats for instance)

- the handling of the timing is inconsistent: pts is in AV_TIME_BASE and
  start/end display time are relative and in ms.

When these issues are sorted out, we can finally work on the integration
within libavfilter, which is yet another topic where other developers
might want to comment. Typically, I'm not sure what is the state of
dealing with the sparse property of the subtitles. Nicolas may know :)

Anyway, there are multiple ways of dealing with the previous mentioned
issues.

The first one is to create an AVSubtitle2 or something in libavutil,
copying most of the current AVSubtitle layout but making sure the user
allocates it with av_subtitle_alloc() or whatever, so we can add fields
and extend it (mostly) at will.

The second one, which I'm currently wondering about these days is to try
to hold the subtitles data into the existing AVFrame structure. We will
for example have the frame->extended_data[N] (currently used by audio
frames to hold the channels) point on a instances of a newly defined
rectangle structure. Having the subtitles into AVFrame might simplify a
lot the future integration within libavfilter since they are already
supported as audio and video.  This needs careful thinking, but it might
be doable.

But again, these are ideas, which need to be discussed and experimented. I
don't know if it's a good idea for a GSoC, and I don't know who would be
up for mentoring.

It's nice to finally see some interest into this topic though.

> regards,

Regards,

> Gerion

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160308/d6390bc7/attachment.sig>