[FFmpeg-devel] [RFC] Filtering of data packets

Nicolas George nicolas.george at normalesup.org
Thu Jul 18 15:54:19 CEST 2013


Here are some thoughts on filtering data packets with lavfi.

Support for data packets is less hype than subtitles, but I believe it is
much easier (the hard part for subtitles is designing the data structure to
represent styled text subtitles). Also, data packets will be useful for
subtitles too (fonts), and some of the issues to solve are linked (sparse

A data packet is basically an AVPacket. Raw, undelimited byte streams can
also fit that description, by splitting them into chunks of arbitrary size
(just like audio PCM data is split into arbitrary frames).

The nice thing with data packets is that most components of ffmpeg (codecs,
muxers and demuxers, bitstream filters) can become filters. For example, the
following graph would make sense:

                ↙         ↘
        mpeg4_dec          mp3_dec
            ↓                 ↓
          scale             volume
            ↓                 ↓
         x264_enc         vorbis_enc
                 ↘       ↙

(Note: I would have split avi_demux into file_in + demux, but filters need
to be input-driven, and demuxers are output-driven and do not work in
non-blocking mode. That is an unrelated issue.)

That would allow various new features, such as having the setpts filter
working in copy mode.

Data structure

  I already checked, AVFrame has almost all the fields present in AVPacket.
  The only exception is convergence_duration, that nobody uses, and it would
  be trivial to add. The packet data and size can go in AVFrame.data[0] and

Format negotiation and automatic conversion

  Data packets are too abstract, I believe format negotiation and automatic
  conversion are not possible/relevant. The pixel_/sample_format field can
  be used to store a CodecID, to avoid feeding MP3 to an AC3 decoder, but
  that is all.

Generic pads

  We already have setpts and asetpts, adding dsetpts, and later ssetpts for
  subtitles will start getting old pretty fast. Therefore, we need generic
  pads, so that setpts can work with any type of packet.

  Set AVFilterPad.type to AVMEDIA_TYPE_UNKNOWN to allow lavfi to connect
  this pad to anything without checking the type, and let
  AVFilter.query_formats check that the formats are consistent. If several
  generic filters are connected together, use the complex format negotiation
  to stabilize the type.


  The demuxer source should be able to expose any requested stream as an
  output pad, but I believe it is more practical if all attachments are
  optionally combined into a single output, with one packet per attachment.
  Typical use: connect the attachments output of the demuxer to the ASS
  subtitles renderer for embedded fonts; no need to know how many fonts
  there are.

Sparse streams

  This is the second difficulty for subtitles filtering, and I believe it
  may also apply to data filtering although I do not have a specific
  scenario in mind for that case. But I have a solution.

  Consider the following scenario:

    [0:v] [1:v] overlay [out]

  A frame arrives on 0:v, so overlay will request one on 1:v. That works
  well because frames can be read from 0:v and 1:v independently.

    [0:a:0] [0:a:1] amerge [out]

  The same will happen, but it is not possible to read frames independently
  from any stream: frames will come however they appear in the input file.
  Requesting a frame on 0:a:1 may cause a frame to arrive on 0:a:0 instead,
  and it will need to be queued.

  It will still work because audio streams are continuous and properly
  interleaved: an audio frame covering the 42.1 - 42.3 time interval in
  stream #0 will be almost immediately followed or preceded by a frame
  covering a similar interval in stream #1. The filter needs some buffering,
  but not a lot.

    [0:v] [0:s] hardsub [out]

  The same issue happens, but in this case the subtitles stream is not
  continuous: you can have an action scene with only sfx for five minutes,
  hardsub will request a frame on 0:s, that will trigger reading from the
  file, it will find and decode a video frame. Video frames will accumulate
  in the 0:v input until either the buffer queue overflows or OOM happens.

  The solution I suggest is to have regular dummy frames on sparse streams.
  A dummy frame is marked by data[0] being NULL, and counts only for its

  The demuxer source must generate the dummy frames: when a frame with
  timestamp T is demuxed from the file, generate a dummy frame at T-delta on
  all sparse streams that are older than that. That is the same thing as
  sub2video_heartbeat() in ffmpeg.c.

  For input coming from the outside, lavfi can not know when inputs are
  connected and interleaved or not: reading the next subtitle packet from a
  Matroska file is a problem, reading from a separate ASS file is not. The
  application needs to give that information to lavfi by groping the streams
  together. That can be done by the framework (add a field to build a linked
  list of AVFilterLink grouped together) or an additional filter (with n
  inputs and n outputs that act as individual pass-through except they
  generate dummy frames if necessary).

I believe I covered most of the issues, please comment.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130718/7b091daa/attachment.asc>

More information about the ffmpeg-devel mailing list