[FFmpeg-devel] Evolution of lavfi's design and API

Thu Oct 23 15:45:16 CEST 2014

On Wed, Oct 22, 2014 at 11:45:42PM +0200, Nicolas George wrote:
> 
> [ CCing Anton, as most that is written here also apply to libav too, and
> this would be a good occasion to try a cross-fork cooperation; if that is
> not wanted, please let us know so we can drop the cc. ]
> 
> 1. Problems with the current design
> 
>   1.1. Mixed input-/output-driven model
> 
>     Currently, lavfi is designed to work in a mixed input-driven and
>     output-driven model. That means the application needs sometimes to add
>     input to buffersources and sometimes request output to buffersinks. This
>     is a bit of a nuisance, because it requires the application to do it
>     properly: adding input on the wrong input or requesting a frame on the
>     wrong output will cause extra memory consumption or latency.
> 
>     With the libav API, it can not work at all since there is no mechanism
>     to determine which input needs a frame in order to proceed.
> 
>     The libav API is clearly designed for a more output-driven
>     implementation, with FIFOs anywhere to prevent input-driven frames to
>     reach unready filters. Unfortunately, since it is impossible from the
>     outside to guess what output will get a frame next, that can cause
>     frames to accumulate anywhere in the filter graph, eating a lot of
>     memory unnecessarily.
> 
>     FFmpeg's API has eliminated FIFOs in favour of queues in filters that
>     need it, but these queues can not be controlled for unusual filter
>     graphs with extreme needs. Also, there still is an implicit FIFO inside
>     buffersink.
> 

>   1.2. Recursive implementation
> 
>     All work in a filter graph is triggered by recursive invocations of the
>     filters' methods. It makes debugging harder. It also can lead to large
>     stack usage and makes frame- and filter-level multithreading harder to
>     implement. It also prevents some diagnosis from working reliably.
> 

This is definitely a huge hindrance and related to 1.1

I'd be curious to hear about how VapourSynth & friends handle that
problem, because AFAIK it's only one way. It's likely they don't have to
deal with the same problems we have though (the usage is more limited, no
audio typically); typically because they don't seem stream but file based
(so easy to index and exact seek etc.).

>   1.3. EOF handling
> 
>     Currently, EOF is propagated only through the return value of the
>     request_frame() method. That means it only works in an output-driven
>     scheme. It also means that it has no timestamp attached to it; this is
>     an issue for filters where the duration of the last frame is relevant,
>     like vf_fps.
> 
>   1.4. Latency
> 
>     Some filters need to know the timestamp of the next frame in order to
>     know when the current frame will stop and be able to process it:
>     overlay, fps are two examples. These filters will introduce a latency of
>     one input frame that could otherwise be avoided.
> 
>   1.5. Timestamps
> 
>     Some filters do not care about timestamps at all. Some check and have a
>     proper handling of NOPTS values. Some filters just assume the frames
>     will have timestamps, and possibly make extra assumptions on that:
>     monotony, consistency, etc. That is an inconsistent mess.
> 
>   1.6. Sparse streams
> 
>     There is a more severe instance of the latency issue when the input
>     comes from an interleaved sparse stream: in that case, waiting for the
>     next frame in order to find the end of the current one may require
>     demuxing a large chunk of input, in turn provoking a lot of activity on
>     other inputs of the graph.
> 

More still standing problems while we are at it:

   1.7. Metadata

     Metadata are not available at "graph" level, or at least filter
     level, only at frame level. We also need to define how they can be
     injected and fetched from the users (think "rotate" metadata).

   1.8. Seeking

     Way more troublesome: being able to request an exact frame in the past.
     This currently limits a lot the scope of the filters.

     thumbnail filter is a good example of this problem: the filter
     doesn't need to keep all the frames it analyzes in memory, it just
     needs statistics about them, and then fetches the best in the batch.
     Currently, it needs to keep them all because we are in a forward
     stream based logic. This model is kind of common and quite a pain to
     implement currently.

     I don't think the compression you propose at the end would really
     solve that.

   1.9. Automatic I/O count

     "... [a] split [b][c] ..." should guess there is 2 outputs.
     "... [a][b][c] concat [d] ..." as well

> 2. Proposed API changes
> 
>   To fix/enhance all these issues, I believe a complete rethink of the
>   scheduling design of the library is necessary. I propose the following
>   changes.
> 

Did you already started some development? Do you need help?

I'm asking because it looks like it could be split into small relatively
easy tasks on the Trac and helps introducing new comers (and also track
the progress if some people assign themselves to these tickets).

>   Note: some of these changes are not 100% related to the issues I raised,
>   but looked like a good idea while thinking on an API rework.
> 
>   2.1. AVFrame.duration
> 
>     Add a duration field to AVFrame; if set, it indicates the duration of
>     the frame. Thus, it becomes unnecessary to wait for the next frame to
>     know when the current frame stops, reducing the latency.
> 
>     Another solution would be to add a dedicated function on buffersrc to
>     inject a timestamp for end or activity on a link. That would avoid the
>     need of adding a field to AVFrame.
> 
>   2.2. Add some fields to AVFilterLink
> 

>     AVFilterLink.pts: current timestamp of the link, i.e. end timestamp of
>     the last forwarede frame, assuming the duration was correct. This is
>     somewhat redundant with the fields in AVFrame, but can carry the
>     information even when there is no actual frame.

The timeline system seems to be able to workaround this. How is this going
to help?

> 
>     AVFilterLink.status: if not 0, gives the return status of trying to pass
>     a frame on this link. The typical use would be EOF.
> 
>   2.3. AVFilterLink.need_ts
> 
>     Add a field to AVFilterLink to specify that the output filter requires
>     reliable timestamps on its input. More precisely, specify how reliable
>     the timestamps need to be: is the duration necessary? do the timestamps
>     need to be monotonic? continuous?
> 
>     For audio streams, consistency between timestamps and the number of
>     samples may also be tested. For video streams, constant frame rate may
>     be enforced, but I am not sure about this one.
> 
>     A "fixpts" filter should be provided to allow the user to tweak how the
>     timestamps are fixed (change the timestamps to match the duration or
>     change the duration to match the timestamps?).
> 

>     When no explicit filter is inserted, the framework should do the work of
>     fixing them automatically. I am not sure whether that should be done
>     directly or by automatically inserting the fixpts filter. The later
>     solution is more elegant, but it requires more changes to the framework
>     and the filters (because the correctness of the timestamps would need to
>     be merged just like formats), so I am rather for the former.
> 

Well, I believe it should be handled by the framework transparently
somehow. Users can already fix the timestamps themselves with [a]setpts
filters, but it's often not exactly obvious why they do need to.

It doesn't need to be a filter and can be part of the framework itself.

>     Note that for a lot of filters, the actual duration or end timestamp is
>     not required, only a lower bound for it. For sparse interleaved streams,
>     that is very relevant as we may not know the exact time for the next
>     frame until we reach it, but we can know it is later than the other
>     streams' timestamps minus the interleaving delta.
> 
[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141023/9afbfc54/attachment.asc>