[FFmpeg-devel] Evolution of lavfi's design and API

Nicolas George george at nsup.org
Fri Oct 24 16:22:23 CEST 2014

Le duodi 2 brumaire, an CCXXIII, Clement Boesch a écrit :
> More still standing problems while we are at it:
>    1.7. Metadata
>      Metadata are not available at "graph" level, or at least filter
>      level, only at frame level. We also need to define how they can be
>      injected and fetched from the users (think "rotate" metadata).

That is an interesting issue. At graph level, that is easy, but that would
be mostly useless (the rotate filter changes the rotate metadata).

At filter level, that is harder because it requires all filters to forward
the metadata at init time, so extra code in a lot of filters. Furthermore,
since our graphs are not constructed in order, and can even theoretically
contain cycles, it requires another walk-over to ensure stabilization. The
whole query_formats() / config_props() is already too complex IMHO.

Actually, I believe I can propose a simple solution: inject the stream
metadata as frame metadata on dummy frames. Filters that need them are
changed to examine the dummy frames, filters that do not need them just
ignore them and let the framework forward it.

(Of course, the whole metadata system can never work perfectly: the scale
filter does not update any "dpi" metadata; the crop filter would need to
update the "aperture" metadata for photos, and if the crop is not centered I
am not even sure this makes sense, etc. If someone adds "xmin", "xmax",
"xscl" (no, not this one, bad work habit), "ymin", "ymax" to the frames
produced by vsrc_mandelbrot, or the geographic equivalent to satellite
images, how is the rotate filter supposed to handle that? The best answer
would probably be "we do not care much".)

>    1.8. Seeking
>      Way more troublesome: being able to request an exact frame in the past.
>      This currently limits a lot the scope of the filters.
>      thumbnail filter is a good example of this problem: the filter
>      doesn't need to keep all the frames it analyzes in memory, it just
>      needs statistics about them, and then fetches the best in the batch.
>      Currently, it needs to keep them all because we are in a forward
>      stream based logic. This model is kind of common and quite a pain to
>      implement currently.
>      I don't think the compression you propose at the end would really
>      solve that.

You raise an interesting point. Unlimited FIFOs (with or without external
storage or compression: they are just means of handling larger FIFOs with
smaller hardware) can be of some help in that case, but not much.

In the particular example you indicate, I can imagine a solution with two
filters: thumbnail-detect outputs just pseudo-frame metadata with the
timestamp of the selected thumbnails images, and thumbnail-select use that
metadata from one input, reading the actual frames from its second input
connected to a large FIFO. But that is outright ugly.

For actual seeking, I suppose we would need a mechanism to send messages
backward on the graph.

As for the actual implementation, I suppose that a filter that supports
seeking would be required to advertise so on its output: "I can seek back to
pts=42", and a filter that requires seeking from its input would give
forewarning: "I may need to seek back to pts=12", so that the framework can
buffer all frames from 12 to 42.

That requires thinking.

>    1.9. Automatic I/O count
>      "... [a] split [b][c] ..." should guess there is 2 outputs.
>      "... [a][b][c] concat [d] ..." as well

I believe this one to be pretty easy, design-wise, in fact: just decide on a
standard name for the options that give the number of input and outputs,
maybe just nb_inputs and nb_outputs, and then it is only a matter of
tweaking the graph parser to set them if possible and necessary.

> Did you already started some development? Do you need help?
> I'm asking because it looks like it could be split into small relatively
> easy tasks on the Trac and helps introducing new comers (and also track
> the progress if some people assign themselves to these tickets).

I have not started writing code: for large re-design, I would not risk
someone telling me "this is stupid, you can do the same thing ten times
simpler like that".

You are right, some of the points I raise are mostly stand-alone tasks.

> >     AVFilterLink.pts: current timestamp of the link, i.e. end timestamp of
> >     the last forwarede frame, assuming the duration was correct. This is
> >     somewhat redundant with the fields in AVFrame, but can carry the
> >     information even when there is no actual frame.
> The timeline system seems to be able to workaround this. How is this going
> to help?

I do not see how this is related. When the timeline system is invoked, there
is a frame, with a timestamp. The timestamp may be NOPTS, but that is just a
matter for the enable expression to handle correctly.

The issue I am trying to address is the one raised in this example: suppose
overlay detects EOF on its secondary input; the last secondary frames were
at PTS 40, 41, 42, and now here comes a main frame at PTS 42.04: should
overlay slap the last secondary frame on it or not?

In this particular case, it is pretty obvious that EOF happens at PTS 43,
but teaching a program to see the obvious is not easy, and that may actually
be wrong. The previous filters, or the application (through the demuxer) may
have more accurate information.

What I propose for that issue is to have something like
AVFilterLink.head_pts that records the PTS of the last activity on a link.
When a frame is passed on the link, it is updated to frame.pts +
frame.duration, but it may be updated by other circumstances too.

The core idea is to have as much as possible information directly available
to filters without requiring them to work for it. A filter could always
update head_pts in its own private context, but then, if a new way of
updating it is added, all filters may need to be updated.

> Well, I believe it should be handled by the framework transparently
> somehow.

Yes, exactly.

>	   Users can already fix the timestamps themselves with [a]setpts
> filters, but it's often not exactly obvious why they do need to.

Hum, I do not think that setpts is suitable in this case: you can not use it
to set frame.duration = next_frame.pts - frame.pts because next_frame is not
available. Even for cases it can handle, it requires complex formulas (with
escaping; I still have to look at Michael's patch for balanced escaping); we
do not want users to copy-paste half-broken expressions found in obsolete
examples on the web. Plus it uses floats.

A dedicated filter seems more correct: fixpts=delta2duration=1 for example.

> It doesn't need to be a filter and can be part of the framework itself.

I believe the framework should do the work, but also expose it as an
internal API to be used by the fixpts filter when explicit handling and
user-settable options are necessary.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141024/116d82f2/attachment.asc>

More information about the ffmpeg-devel mailing list