[FFmpeg-devel] Blocking vs non-blocking modes in protocols

Mon Sep 2 21:53:29 EEST 2019

So, let us survey the state of blocking and non-blocking mode at this
date in FFmpeg.

First, why and how.

This is a question of how the program is designed. There are two big
classes of design: output-driven and input-driven. An output-driven
design decides it wants to produce some output and reads as much as
input to do it, waiting for it if necessary. An input-driven design
takes input as it comes, updates its internal state or buffers it, and
produces output as it can. There are obviously intermediate designs.

Output-driven design is easier because state can be kept in local
variables, including the instruction pointer. In input-driven design, if
the amount if input available does not allow to perform a full step of
processing, it must be fragmented or delayed somehow.

On the other hand, as soon as a program has several live inputs, input
that can receive data at any time and require immediate attention,
input-driven is the only solution. If you are old enough, you may
remember the time when web browsers would completely freeze when the DNS
was lagging, because the DNS was handled in an output-driven way,
without letting the browser react even to its GUI input.

Historically, Unix favors an output-driven design. FFmpeg closely
imitates Unix's design. When it became obvious that was not sustainable,
Unix first cut the problem like the Gordian knot: signals that bypass
the normal code paths and cause immediate action: ^C is how you provide
user "input" to a program that is blocked reading from its data input.

Since that is not sufficient for more modern applications, they
introduced non-blocking mode. In this mode, inputs do not wait for data,
they return immediately a special code that means "no data available
immediately". Programs need to be ready for that code and try other
inputs immediately and this one later.

This is not enough: trying the input later is bad, because it requires a
guess at how much later means. If later is long, the program is
unresponsive. If later is short, the program will wake often for
nothing. This last point is especially crucial for embedded application,
where unnecessary wake-ups can prevent the hardware from going into deep
sleep, causing a huge drain on the battery. (FFmpeg should be usable for
embedded applications.) To solve this issue, Unix introduced the
select() and then poll() functions, that block until one of the
specified inputs is ready.

With non-blocking mode and poll(), the normal structure of a program
with several inputs becomes a single loop, blocking waiting for any
input in a single poll() call, and then processing the ones that are
ready before starting again. This is called an event loop.

A few extra notes before starting with FFmpeg itself. I have only spoken
about inputs yet. Outputs are easier: just send the data. And take care
of flow control, which is my next point.

Flow control means that the program can stop an input from receiving
data. It may or may not be possible, depending on the input: TCP has
flow control, broadcast television does not. Plain files need flow
control, or they will flood your input.

There is another solution to handle several inputs in a program:
threads. They promise to make everything about handling several inputs
simpler: just have one thread per input. But the reality is not so
bright. If you need to stop a read operation from the network because
the user canceled it from the GUI, you need a way of controlling a
thread that is blocked in a system call. For this very simple and common
case, you can use pthread_cancel(), but we have seen how even a simple
function like that can cause no end of portability problems. For more
unusual cases, you will need to implement some kind of message passing
with the thread, and since message passing is not integrated with
blocking system calls, you will also need to implement an event loop in
the thread. An event loop in each thread.

Yet, threads can be a life-saving solution for a particular issue: they
allow to take an output-driven piece of code and run it as input-driven:
just leave the thread running when the code is blocked on an input, and
wake it when the input is received. Note that the ability of threads to
run concurrently is not necessary for this to work, only the ability to
switch context and back. But nowadays, threads are probably more tested
and portable than contexts.

Now, the state of non-blocking in current FFmpeg code. Most input APIs
of FFmpeg are output-driven: av_read_frame() will wait until input is
received. (For comparison, filters, codecs and bitstream filters are now
all input-driven.) Blocking mode is the default, and it works. On the
other hand, non-blocking often does not work.

There will be a lot of cases:

- Simple protocols, i.e. protocols that interact directly with a network
  API or a library, mostly work in both non-blocking mode, and can be
  integrated in a poll()/select() call using ffurl_get_file_handle(),
  internally but not externally.

- Complex protocols, i.e. protocols that rely on one or several other
  protocols (like HTTP relies on TCP, or even on TLS that itself relies
  on TCP) may or may not work in non-blocking mode. Mostly not.

- Simple demuxers, i.e. demuxers that use a single AVIO stream for their
  input, mostly do not work in non-blocking mode.

- Complex demuxers, i.e. demuxers that use other demuxers or several
  AVIO streams, probably do not work.

- Some (most?) device demuxers, i.e. demuxers that directly read from a
  device, work.

- Simple/complex/device muxers are mostly the same as demuxers, but
  since it is about output, it is less a problem.

And finally, how do we make it better?

What we need urgently is an event loop. Or even better, a scheduler,
i.e. a multi-thread event loop. This is not a very complex task. The
hard part is to carefully plan ahead to make sure it can handle all the
cases we need, including flow control.

Demuxers and protocols will then need to be adapted to integrate in the
scheduler. But it will not happen at once, so we will need wrappers to
run the current demuxers and protocols in threads connected to the
scheduler.

Then we have four cases, depending on whether the application uses the
API with the scheduler or the current output-driven API, and whether is
uses a current output-driven module or an updated scheduled module.

To call a current output-driven module from the current output-driven
API, nothing changes.

To call an updated scheduled module with the scheduler API, that is
straightforward.

To call an updated scheduled module from the current output-driven API,
we create a new instance of scheduler just for it, and we run it until
we have out output. The scheduler needs to be light-weight to make this
workable.

To call a current output-driven module with the scheduler API, we call
its blocking function from a separate thread. We do not need to create a
thread per module, but only a thread per level of recursive blocking
calls; for example concat→Matroska→HTTP→TCP requires four threads, even
if there are thousand of Matroska files in the concat.

What would it change in practice?

I am quite sure it can be made reasonably simple. First, allocate a
scheduler for your whole program: av_scheduler_create(&scheduler, ...);.
Then attach the AVFormatContext and AVIO instances to the scheduler
before using them: avformat_schedule(avf, scheduler);. Then, optionally,
attach callbacks for messages from your demuxers: your function will be
called each time the demuxer has produced output, the equivalent of
av_read_frame() returning. Or, if you do not like callbacks, the default
can be to stop the scheduler and return the frame, with all the metadata
telling you from which demuxer it comes: process it and re-run the
scheduler.

I have been thinking about this for a long time, and I think it is
mostly ready in my head. I can produce the outline of an actual API with
its documentation to really start the conversation.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190902/4afe05fb/attachment.sig>