[Ffmpeg-devel] New/extended filter API discussion

Sat Jan 6 02:23:19 CET 2007

Hi,

Alexander Chemeris wrote:
> Hello,
>
> On 1/5/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>> to repeat again some goals of a video filter API
>> * well documented (see mplayer for how not to do it)
>> * writing a filter should not require complete knowledge of all
>>   (undocumented) internals of the video filter system
>> * direct rendering (useing a buffer provided by the next filter)
>> * inplace rendering (for example adding some subtitles shouldnt need the
>>   whole frame to be read and written)
>> * slices based rendering (improves cache locality, but there are issues
>>   with out of order decoding ...)
>> * multiple inputs
>> * multiple outputs (could always trivially be handled by several filters
>>   with just a single output each)
>> * timestamps per frame
>> * also th number of frames consumed by a filter does not have to
>> match the
>>   number output ((inverse)telecine, ...)
>>
>> also i suggest that whoever designs the filter system looks at mplayers
>> video filters as they support a large number of the things above
> Take a look into Avisynth (and VirtualDub, maybe) filter API. It runs
> under
> Windows only, but have very interesting filter API with automatic buffer
> management, based on pull model as opposed to push model. It uses C++
> heavily, however may inspire you with some design ideas.
>
> I have some thoughts about video filter API, because I'm thinking about
> very similar subsystem for sipX media processing library. So, when I
> thought
> about benefits and drawbacks of "push", "pull" and "process" models of
> opperation, I came to the view, that "process" approach is simplest among
> others while almost as powerful as "push" and "pull". By "process" model
> I mean following -- each filter have 'process()' function which simply
> take
> input frames, process them and push into output buffer. Most work in this
> approach should be done by flowgraph class itself - it knows how filters
> are connected and could take output frames from preceding filter and pass
> them to subsequent filter.
>
> One drawback of "push" and "pull" models is that they could not have
> multiple inputs (for "push") or outputs (for "pull"). Lets consider
> "push" model.
> If filter graph have two inputs and one of them pushes frame behaviour
> of the second input is ill defined -- it could be ignored, or pulled,
> or smth
> else, but I do not see good approach here. The same for "pull" and
> multiple
> outputs.
>
> And finally I had an idea of universal "push-pull-process" model. Let all
> processing is done in 'process()' function, but instead of simple input
> and output buffers it have two functions - 'get_input()' and
> 'set_output()'.
> Then if 'get_input()' could take frame from input buffer or ask preceding
> filter for frame, and 'set_output()' behave similar, we could switch from
> "push" to "pull" or "process" model even at runtime.
>
> Why this runtime switching is needed? Because each model is native for
> different cases. "Push" is natural when processing frames as they come,
> e.g. when converting from one format to other with maximum speed. "Pull"
> is well-suited when frames are requested by rendering side (as in
> Avisynth).
> And "process" is useful when multiple inputs and outputs are needed.
>
> However I decided that implementing such complex mulimodel will take too
> much time and sticked to "process" model as simplest while allowing
> many fun things. But in this project I think this complexity may be
> not so
> expensive, while giving very flexible solution.
>
>
I like the proposals you are giving.  I have a couple of additional
suggestions, though it might be too early for this.  It would be good if
there was a way to tell the filters to 'hurry up' or cut back on
filtering, since the source could be a live capture.

Another thought I had was that the filters, both audio and video, will
need access to not only the current frame, but also likely the previous
and next as well.

Also, in the case of audio, different filters might want different
window sizes.  For example, for normalization I found the recommended
RMS window size is 50ms regardless of the sample rate.

One final thought is that filters may also want to have the ability to
output data to another stream (i.e. a file) for logging or multi-pass
processing.  I.e. my current normalization code is quite crude, but I
need to feed back information to the next pass for actual volume adjustment.

-Aaron