[FFmpeg-devel] libavfilter API design in a realtime environment

Wed Mar 16 12:26:23 CET 2016

On Wed, 16 Mar 2016 00:42:24 +0000
Kieran Kunhya <kierank at obe.tv> wrote:

> Hello,
> 
> I want to try and use the libavfilter API to overlay bitmap subtitles on
> video from a realtime source. This seems difficult/impossible to do with
> the current API hence asking on the main devel list.
> 
> Some questions:
> 
> 1: How do I know the end to end latency of the pipeline? Is it fixed, does
> it vary? This matters because my wallclock PTS needs addition of this
> latency.
> 2: Do I need to interleave video and subtitles (e.g VSVSVSVS) in
> monotonically increasing order? What happens if the subtitles stop for a
> bit (magic queues are bad in a realtime environment)? My timestamps are
> guaranteed to be the same though.
> 3: My world is CFR but libavfilter is VFR - how does the API know when to
> start releasing frames? Does this add one frame of video latency then until
> it waits for the next video frame to arrive?
> 4: What are the differences between the FFmpeg and libav implementations?
> FFmpeg uses a framesync and libav doesn't?
> 5: I know exactly which frames have associated subtitle bitmaps or not, is
> there a way I can overlay without an extra frame delay?

Most of these questions seem related to using vf_overlay and having
multiple inputs. There is a fundamental problem with the API here: it
can't know when the "next" frame will come.

All inputs ("buffer" filters and buffersrc.h API) are equal. There is
no order in which you have to feed frames to them. It also doesn't
matter how many frames you queue to them (if there are too many they get
put into a FIFO in the buffersrc or buffersink). But you still have to
feed each buffersrc "enough" to actually get output. How many frames
you have to feed is not defined, and is determined at runtime by the
internal frame scheduling logic. av_buffersrc_get_nb_failed_requests()
can apparently be used to check which buffersrc needs a frame next (or
you'd end up queuing tons of frames to one which doesn't need more
frames).

Now assume vf_overlay synchronizes to the first input, and for the
second input needs to know the previous and next frame PTS to decide
which frame to pick (that is how the "old" logic in Libav appears to
work). The API requires the filter to buffer enough frames until it got
a frame on the second input with PTS >= that of the main input.

In a setting where the second input is a subtitle, this could lead to
excessive buffering, because if a frame has no associated subtitle,
vf_overlay will wait until the next subtitle is received. There is no
way around this and it's just how the API works (at least in my
understanding).

VFR vs. CFR actually doesn't make a difference here. It's sparse stream
of frames vs. non-sparse.

There are 2 possible workarounds:

1. Send an empty frame to the subtitle input if there is no subtitle.
This wastes CPU time, but it's the simplest solution.
2. Send EOF to the second subtitle input. Unfortunately, this means the
graph has to rebuilt to reset the EOF state, so this is probably not
very feasible.
3. Completely bypass libavfilter if you don't have a subtitle frame for
a given video frame.

Possible solutions:

1. Add a way to signal to a buffersrc that definitely no new frame will
come within a given PTS range. This would signal to vf_overlay that
there is no data on the second input.
2. Add a way to force output of a frame, which wouldn't wait for new
input. Not sure if this is feasible, I can see it becoming a big mess.

In both cases, vf_overlay would have to understand not to use the
previous secondary input frame. (Not sure what it does right now.)