[Libav-user] Check my understanding of threaded/overlapped/asynchronous/nonblocking libavcodec operation?

Sat Feb 12 20:43:49 EET 2022

Hi and apologies for the noob Q's. I've had trouble finding authoritative
documentation -- pointers welcome. Meanwhile here's what I THINK the
situation is and I would appreciate corrections and/or confirmations!!

SHORT VERSION

- The main libavcodec APIs (avcodec_{send/receive}_{frame/packet}) are
*synchronous* (blocking); if you don't want to block, use a thread
- Almost any non-ancient build of libav* will be "thread friendly", don't
use any given "object" from multiple threads at once but otherwise threads
are fine
- Various codecs will use threads and even off-core resources (GPUs, vector
units etc), this can be tweaked with things like the thread_count setting,
it depends on the specific codec and YMMV

LONG VERSION

There are two main reasons to care about threaded/asynchronous use of
libavcodec (and libav* in general): not blocking other work (e.g. UI
thread), and making the most efficient use of resources.

*** Not blocking other work: "Just make your own thread"? ***

For not blocking other work (e.g. keeping a responsive UI while
encoding/decoding), it's important to know that calls to
avcodec_{send/receive}_{frame/packet} may block for a significant amount of
time. This can be confusing because the use of EAGAIN is typically
associated with nonblocking operation but that's not the case with this
API, where EAGAIN just means "go to the other side of the buffer", but
either one side or the other will block until work is done. So, if that's a
problem, make a thread that manages the encoding/decoding, keep it separate
from your UI thread (or whatever), and use appropriate thread-safe
communication between them.

There WAS a patch submitted to add an explicitly async API to libavcodec (
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/191922.html) but as
far as I can tell it was rejected, perhaps because an asynchronous API that
only some codecs support and which adds complexity to libavcodec internals
isn't that useful when apps can just use a thread.

Incidentally there's no quick way to interrupt encoding/decoding in process
but it happens a frame at a time and usually any given frame doesn't take
too long. Still be aware that gracefully shutting down an encoding/decoding
thread can take a bit of time to finish the frame in progress.

*** Being efficient: The story is more complicated here? ***

Decoding video involves several stages of work
- I/O to read the compressed data (libavformat)
- CPU to parse the container (libavformat) & embedded bitstream (libavcodec)
- CPU or GPU work to do actual decompression
- Possible post-decompression processing (pixel format conversion etc)
- Display on screen
(Encoding has analogous stages, I'll focus on decoding for simplicity)

It makes sense to overlap this work for efficiency. Ideally the disk is
fetching one frame, the CPU is parsing the previous frame, the GPU is
uncompressing the one before that, and the one before that is shown on the
screen. Also within a stage (especially decompression) there may be
opportunities for parallelism and using multiple cores.

I THINK "macro parallelism" across phases is up to the app author. You can
use libavformat in one thread and libavcodec in another and push data
between them with a queue of some kind. Notably ffmpeg as an app will do
this I think?? Also "filter graph" postprocessing in libavfilter can do
some things in threads??

NOTE that there is advice not to overthink or prematurely optimize this.
For example most OS are pretty good at prefetching when you're loading from
a stream so it may not be that beneficial to run your own I/O thread. YMMV.

Some codecs are "behind the scenes asynchronous" -- with these codecs if
you use avcodec_send_packet() to send in some data they will get started in
the background and then you can come back later and call
avcodec_receive_frame() and get data right away. Other codecs however will
block and do the work synchronously on either the send_packet() end or the
receive_frame() end. It's best not to make assumptions, BUT, if you don't
have a dedicated codec thread, it can be beneficial to fill the
app-to-codec pipeline before switching away to something else in case the
codec is able to work in the background??

In any case "micro parallelism" is well supported within individual
codecs.You can set thread_count and thread_type in AVCodecContext and a lot
of codecs will internally spin up threads to crunch data faster on
multicore machines (and almost all machines are multicore now). This is
transparent to the caller for the most part, you just tune those numbers
for the best performance.

ANYWAY that's my understanding based on groveling through Q&A posts and
forum threads and doing some experiments, did I get it approximately right??

-- egnor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ffmpeg.org/pipermail/libav-user/attachments/20220212/477f0e42/attachment.htm>