[FFmpeg-user] VAAPI decoding/encoding of several video streams in parallel

Tue Dec 20 22:51:24 EET 2016

On 20/12/16 18:21, Anton Sviridenko wrote:
> I want to use hardware acceleration for processing multiple realtime
> videostreams (coming over RTSP).
> I've read this thread -
> http://ffmpeg.org/pipermail/ffmpeg-user/2016-December/034530.html
> 
> and have some questions related to scaling this everything to several
> video streams:
> 
> 1) Is it possible at all to use VAAPI acceleration for several
> independent video streams simultaneously?

Yes, the kernel drivers deal with all of the detail here - it's exactly the same as using your GPU to run multiple OpenGL programs at the same time.

> 2) How should I initialize VAAPI related stuff? Do I have to create
> separate hwframe context for each stream?

Not necessarily, it depends on what you want to do.

Some things to consider:
* A hwframe pool needs to be fixed-size to use as output from a decoder or filter (note the render_targets argument to vaCreateContext(), which you supply the surfaces member of AVVAAPIFramesContext to), so can be exhausted.  Decoders and encoders may both hold on to frames for some length of time (to use as reference frames, to wait for the stream delay), so a pool used by multiple of them needs to be large enough to not run out even when they sit on some of the surfaces for a while.
* All surfaces in a single hwframe context are the same size and format.  While it's perfectly valid to decode a frame onto a surface which is larger than the frame, it does waste memory so you may want to make the surfaces of the correct size when that is known.
* A filter or encoder should only be given input which matches the hwframe context you declared as its input when you created it.  This is primarily an API restriction and some other cases do work some of the time, but keeping to it will avoid any surprises.

The easiest way to do it is probably to follow what ffmpeg itself does: make a single hwframe context for the output of each decoder or other operation, and then give that to whatever the next thing is which will consume those frames.  This won't necessarily be sufficient in all cases - if you have something more complex with output from multiple decoders being combined somehow then you'll need to think about it more carefully keeping the restrictions above in mind.

> 3) Can I use single hwdevice and vaapi_context instances for all
> streams or there should be own instance for each decoded/encoded
> stream?

Typically you will want to make one device and then use it everywhere.  Multiple devices should also work, but note that different devices can't interoperate at all (so a decoder, scaler and encoder working with the same surfaces and hwframe context need to be using the same device, say).

You need to make exactly one struct vaapi_context for each decoder (with configuration appropriate to that decoder).

- Mark