[FFmpeg-devel] Evolution of lavfi's design and API

Thu Oct 30 11:50:46 CET 2014

Sorry for the slow reply.

On date Wednesday 2014-10-22 23:45:42 +0200, Nicolas George encoded: 
> [ CCing Anton, as most that is written here also apply to libav too, and
> this would be a good occasion to try a cross-fork cooperation; if that is
> not wanted, please let us know so we can drop the cc. ]
> 
> 1. Problems with the current design
> 
>   1.1. Mixed input-/output-driven model
> 
>     Currently, lavfi is designed to work in a mixed input-driven and
>     output-driven model. That means the application needs sometimes to add
>     input to buffersources and sometimes request output to buffersinks. This
>     is a bit of a nuisance, because it requires the application to do it
>     properly: adding input on the wrong input or requesting a frame on the
>     wrong output will cause extra memory consumption or latency.
> 
>     With the libav API, it can not work at all since there is no mechanism
>     to determine which input needs a frame in order to proceed.
> 
>     The libav API is clearly designed for a more output-driven
>     implementation, with FIFOs anywhere to prevent input-driven frames to
>     reach unready filters. Unfortunately, since it is impossible from the
>     outside to guess what output will get a frame next, that can cause
>     frames to accumulate anywhere in the filter graph, eating a lot of
>     memory unnecessarily.
> 
>     FFmpeg's API has eliminated FIFOs in favour of queues in filters that
>     need it, but these queues can not be controlled for unusual filter
>     graphs with extreme needs. Also, there still is an implicit FIFO inside
>     buffersink.
> 
>   1.2. Recursive implementation
> 
>     All work in a filter graph is triggered by recursive invocations of the
>     filters' methods. It makes debugging harder. It also can lead to large
>     stack usage and makes frame- and filter-level multithreading harder to
>     implement. It also prevents some diagnosis from working reliably.
> 
>   1.3. EOF handling
> 
>     Currently, EOF is propagated only through the return value of the
>     request_frame() method. That means it only works in an output-driven
>     scheme. It also means that it has no timestamp attached to it; this is
>     an issue for filters where the duration of the last frame is relevant,
>     like vf_fps.
> 
>   1.4. Latency
> 
>     Some filters need to know the timestamp of the next frame in order to
>     know when the current frame will stop and be able to process it:
>     overlay, fps are two examples. These filters will introduce a latency of
>     one input frame that could otherwise be avoided.
> 
>   1.5. Timestamps
> 
>     Some filters do not care about timestamps at all. Some check and have a
>     proper handling of NOPTS values. Some filters just assume the frames
>     will have timestamps, and possibly make extra assumptions on that:
>     monotony, consistency, etc. That is an inconsistent mess.
> 
>   1.6. Sparse streams
> 
>     There is a more severe instance of the latency issue when the input
>     comes from an interleaved sparse stream: in that case, waiting for the
>     next frame in order to find the end of the current one may require
>     demuxing a large chunk of input, in turn provoking a lot of activity on
>     other inputs of the graph.

Other issues.

S1. the filtergraph can't properly readapt to mid-stream
changes involving assumed invariants (aspect ratio, size, timebase,
pixel format, sample_rate). Indeed the framework was designed as
though some of these properties (the ones defined by query_formats)
were not allowed to change.

S2. Another problem is that we initialize the filter before the
filtergraph, so for example the single filter can't readapt to the
filtergraph topology. For example it would be useful to have the split
filter to change the number of outputs depending on the number of
outputs specified, but this can't be easily achieved. (That's in my
opinion a minor problem though).

S3. It is not possible to direct commands towards a specific
filter. For this we can add an ID to each filter instance. We could
have something has:
color:left_color=c=red   [left]
color:right_color=c=blue [right]

then you can send commands (e.g. with zmq) with:
echo left_color c yellow | tools/zmqsend

S4. We should support output encoding movie. We got stuck designing
the interface for that.

...

About fifos and queues, we could add some options to control fifo
filters to limit their size.

For example we could specify the maximum number of allowed queued
frames, or the total allowed size, and the dropping policy (drop last,
drop first, drop random frame in the midst).

> 2. Proposed API changes
> 
>   To fix/enhance all these issues, I believe a complete rethink of the
>   scheduling design of the library is necessary. I propose the following
>   changes.
> 
>   Note: some of these changes are not 100% related to the issues I raised,
>   but looked like a good idea while thinking on an API rework.
> 
>   2.1. AVFrame.duration
> 
>     Add a duration field to AVFrame; if set, it indicates the duration of
>     the frame. Thus, it becomes unnecessary to wait for the next frame to
>     know when the current frame stops, reducing the latency.
> 
>     Another solution would be to add a dedicated function on buffersrc to
>     inject a timestamp for end or activity on a link. That would avoid the
>     need of adding a field to AVFrame.

Currently we have pkt_duration in AVFrame. The main problem is that we
would need to rescale the duration accordingly in all filters when we
change the timebase.

Alternatively we should add a timebase value to AVFrame. Currently
that information is stored externally, and this was never easy to
handle. For example the timestamp in a filter is currently interpreted
according to the input link time base. Unfortunately this will
introduce redundancy.

> 
>   2.2. Add some fields to AVFilterLink
> 
>     AVFilterLink.pts: current timestamp of the link, i.e. end timestamp of
>     the last forwarede frame, assuming the duration was correct. This is
>     somewhat redundant with the fields in AVFrame, but can carry the
>     information even when there is no actual frame.
> 
>     AVFilterLink.status: if not 0, gives the return status of trying to pass
>     a frame on this link. The typical use would be EOF.
> 
>   2.3. AVFilterLink.need_ts
> 
>     Add a field to AVFilterLink to specify that the output filter requires
>     reliable timestamps on its input. More precisely, specify how reliable
>     the timestamps need to be: is the duration necessary? do the timestamps
>     need to be monotonic? continuous?
> 

>     For audio streams, consistency between timestamps and the number of
>     samples may also be tested. For video streams, constant frame rate may
>     be enforced, but I am not sure about this one.

Yes, although these checks should be made optional.

>     A "fixpts" filter should be provided to allow the user to tweak how the
>     timestamps are fixed (change the timestamps to match the duration or
>     change the duration to match the timestamps?).

Yes, this could be properly done with a complex expression in setpts,
but having an ad-hoc filter should ease user.

>     When no explicit filter is inserted, the framework should do the work of
>     fixing them automatically. I am not sure whether that should be done
>     directly or by automatically inserting the fixpts filter. The later
>     solution is more elegant, but it requires more changes to the framework
>     and the filters (because the correctness of the timestamps would need to
>     be merged just like formats), so I am rather for the former.
> 
>     Note that for a lot of filters, the actual duration or end timestamp is
>     not required, only a lower bound for it. For sparse interleaved streams,
>     that is very relevant as we may not know the exact time for the next
>     frame until we reach it, but we can know it is later than the other
>     streams' timestamps minus the interleaving delta.
> 
>   2.4. Build FIFOs directly in AVFilterLink
> 
>     Instead of automatically insert an additional filter like libav, handle
>     the FIFO operation directly in the framework using fields in
>     AVFilterLink.
> 
>     The main benefit is that the framework can examine the inside of the
>     FIFOs to make scheduling decisions. It can also do so to provide the
>     user with more accurate diagnostics.
> 
>     An extra benefit: the memory pool for the FIFOed frames can more easily
>     be shared, across the whole filter graph or the whole application.
>     Memory management becomes easier: just take a good heuristics (half the
>     RAM?), no need to guess what FIFOs will actually need a lot of memory
>     and what FIFOs are just there mostly useless.
> 
>     Last but not least, FIFOs now become potential thread communication /
>     synchronization points, making filter-level multithreading easier.
> 
>     For audio streams, framing (i.e. ensuring all frame have an exact /
>     minimum / maximum number of samples) can be merged with FIFOs.
> 

>   2.5. Allow status change pseudo-frames inside FIFOs
> 
>     To propagate EOF and possibly other status changes (errors) in
>     input-driven model, allow FIFOs to contain not only frames but also kind
>     of pseudo-frames with a timestamp and metadata attached.
> 
>     Depending on the filters, these pseudo-frames may be directly passed to
>     the filter's methods, or they may be interpreted by the framework to
>     just change fields on the AVFilterLink structure.
> 
>   2.6. Change the scheduling logic for filters
> 
>     From the outside of a filter with several outputs, it is usually not
>     possible to guess what output will get a frame next. Requesting a frame
>     on output #0 may cause activity on the filter graph that produce a frame
>     on output #1 instead, or possibly on a completely different filter.
> 
>     Therefore, having a request_frame() method on all outputs seems
>     pointless.
> 
>     Instead, use a global AVFilter.activate() method that causes the filter
>     to do one step of work if it can. This method is called each time
>     something is changed to the filter: new frame on input, output ready,
>     status change. It returns as soon at it could do something, either
>     producing output and/or consuming input, or nothing if nothing can be
>     done.
> 
>   2.7. Add fields to AVFilterLink for flow control.
> 
>     Add to AVFilterLinks a few field to help filters decide if they need to
>     process something, and if relevant in what order. The most obvious idea
>     would be AVFilterLink.frames_needed, counting how many frames are
>     probably needed on a link before anything can be done. For example, with
>     concat, after input has been consumed, the frames_needed fields on the
>     current input are set according to the corresponding output.
> 
>   2.8. Activate the filters iteratively
> 
>     Keep a global (per graph) priority queue of filters that are supposed to
>     be ready and call the activate() method on them.
> 
>   2.9. AVFrame.stream_id
> 
>     Add an integer (or pointer: intptr_t maybe?) field to AVFrame to allow
>     passing frames related to distinct streams on the same link. That would
>     allow to multiplex all outputs of a graph into a single output, making
>     the application simpler.
> 
>     Not sure this is really useful or necessary: for the graph outputs, a
>     convenience function iterating on all of them and returning the frame
>     and the output index separately would do the trick too.
> 

>   2.10. buffersrc.callback and buffersink.callback
> 
>     Add a callback on both buffersource and buffersink, called respectively
>     when a frame is necessary on input and a frame has arrived on output.
>     This allows pure input-driven and pure output-driven design to work.
>
>   2.11. Links groups
> 
>     Links that carry frames from related interleaved streams should be
>     explicitly connected together so that the framework can use the
>     information.
> 
>     The typical use would be to group all the links from buffersrc that come
>     from the same interleaved input file.
> 
>     When a frame is passed on a link, all links in the same group(s) that
>     are too late (according to an interleaving tolerance that can be set)
>     are activated using a dummy frame.
> 

>   2.12. FIFOs with compression and external storage
> 
>     All FIFOs should be able to off-load some of their memory requirements
>     by either compressing the frames (using a lossless or optionally lossy
>     codec) and/or storing them on mass storage.
> 
>     The options for that should be changeable globally or on a per-link
>     basis.

This is an interesting idea.

>   2.13. AVFrame.owner
> 
>     Add a owner field (probably with type "AVObject", i.e. "void *" pointing
>     to AVClass *) to AVFrame, and update it whenever the frame is passed
>     from one filter to the other. That way, inconsistent ref/unref
>     operations can be detected.

BTW, probably we should drop the rotting tracing code (I'm not using
it since ages, don't know about others).
-- 
FFmpeg = Free and Formidable Multipurpose Pitiless Enlightened Genius