[FFmpeg-devel] [RFC] Filtering of data packets
nicolas.george at normalesup.org
Thu Jul 18 15:54:19 CEST 2013
Here are some thoughts on filtering data packets with lavfi.
Support for data packets is less hype than subtitles, but I believe it is
much easier (the hard part for subtitles is designing the data structure to
represent styled text subtitles). Also, data packets will be useful for
subtitles too (fonts), and some of the issues to solve are linked (sparse
A data packet is basically an AVPacket. Raw, undelimited byte streams can
also fit that description, by splitting them into chunks of arbitrary size
(just like audio PCM data is split into arbitrary frames).
The nice thing with data packets is that most components of ffmpeg (codecs,
muxers and demuxers, bitstream filters) can become filters. For example, the
following graph would make sense:
(Note: I would have split avi_demux into file_in + demux, but filters need
to be input-driven, and demuxers are output-driven and do not work in
non-blocking mode. That is an unrelated issue.)
That would allow various new features, such as having the setpts filter
working in copy mode.
I already checked, AVFrame has almost all the fields present in AVPacket.
The only exception is convergence_duration, that nobody uses, and it would
be trivial to add. The packet data and size can go in AVFrame.data and
Format negotiation and automatic conversion
Data packets are too abstract, I believe format negotiation and automatic
conversion are not possible/relevant. The pixel_/sample_format field can
be used to store a CodecID, to avoid feeding MP3 to an AC3 decoder, but
that is all.
We already have setpts and asetpts, adding dsetpts, and later ssetpts for
subtitles will start getting old pretty fast. Therefore, we need generic
pads, so that setpts can work with any type of packet.
Set AVFilterPad.type to AVMEDIA_TYPE_UNKNOWN to allow lavfi to connect
this pad to anything without checking the type, and let
AVFilter.query_formats check that the formats are consistent. If several
generic filters are connected together, use the complex format negotiation
to stabilize the type.
The demuxer source should be able to expose any requested stream as an
output pad, but I believe it is more practical if all attachments are
optionally combined into a single output, with one packet per attachment.
Typical use: connect the attachments output of the demuxer to the ASS
subtitles renderer for embedded fonts; no need to know how many fonts
This is the second difficulty for subtitles filtering, and I believe it
may also apply to data filtering although I do not have a specific
scenario in mind for that case. But I have a solution.
Consider the following scenario:
[0:v] [1:v] overlay [out]
A frame arrives on 0:v, so overlay will request one on 1:v. That works
well because frames can be read from 0:v and 1:v independently.
[0:a:0] [0:a:1] amerge [out]
The same will happen, but it is not possible to read frames independently
from any stream: frames will come however they appear in the input file.
Requesting a frame on 0:a:1 may cause a frame to arrive on 0:a:0 instead,
and it will need to be queued.
It will still work because audio streams are continuous and properly
interleaved: an audio frame covering the 42.1 - 42.3 time interval in
stream #0 will be almost immediately followed or preceded by a frame
covering a similar interval in stream #1. The filter needs some buffering,
but not a lot.
[0:v] [0:s] hardsub [out]
The same issue happens, but in this case the subtitles stream is not
continuous: you can have an action scene with only sfx for five minutes,
hardsub will request a frame on 0:s, that will trigger reading from the
file, it will find and decode a video frame. Video frames will accumulate
in the 0:v input until either the buffer queue overflows or OOM happens.
The solution I suggest is to have regular dummy frames on sparse streams.
A dummy frame is marked by data being NULL, and counts only for its
The demuxer source must generate the dummy frames: when a frame with
timestamp T is demuxed from the file, generate a dummy frame at T-delta on
all sparse streams that are older than that. That is the same thing as
sub2video_heartbeat() in ffmpeg.c.
For input coming from the outside, lavfi can not know when inputs are
connected and interleaved or not: reading the next subtitle packet from a
Matroska file is a problem, reading from a separate ASS file is not. The
application needs to give that information to lavfi by groping the streams
together. That can be done by the framework (add a field to build a linked
list of AVFilterLink grouped together) or an additional filter (with n
inputs and n outputs that act as individual pass-through except they
generate dummy frames if necessary).
I believe I covered most of the issues, please comment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel