[FFmpeg-devel] About guess_correct_pts / AVFrame.best_effort_timestamp

Wed Feb 16 21:51:38 CET 2011

L'octidi 28 pluvi?se, an CCXIX, Ronald S. Bultje a ?crit?:
> I have no idea what guess_correct_pts or best_effort_timestamp is.
> I've brought up in one of these troll-threads that I want the whole
> PTS/DTS thing explained to me before any more change goes in. You went
> as far as to admit that "if A and B and C and D and E and F, then
> maybe sometimes perhaps PTS is this and otherwise it's very complex".
> Reimar simply admitted it's incomprehensible.
> 
> I want this whole mess re-designed. best_effort_timestamp may or may
> not be part of the solution.

Ok. I'll do my best, at first, to make things clear.

> First of all, where. This should be in lavf, not lavc, otherwise
> transcoding of formats where timestamps are missing to those where
> they are required doesn't work. lavc should do nothing more but
> reorder timestamps for these formats where display order != coding
> order.

I'm not sure this is possible, precisely due to reordering problems, but we
will see in the course of the discussion.

> If you accept that premise,

I accept that premise under the condition that it can be achieved.

>			      I'm willing to help looking into a
> solution that removes any and all non-reordering code from lavcodec,
> obliviates any and all timestamp screwups in ffmpeg.c, cmdutils.c,
> ffplay.c etc, and cleanly implements format-specific hacks where they
> belong: in lavformat.

Let us try. Please anyone correct me if I write anything wrong.

(You can skip to the end in a first reading.)

* A video frame is an image intended to be sent to a video display device or
  encoded into a file. It is made of various information: size, bitmap data,
  etc.

* One of the information in a video frame is its presentation timestamp
  (PTS). This is the timestamp of the point in time where the image should
  start being displayed on the output device. Thus, a trivial video player
  could be written like this:

	decode_frame(packet, &frame);
	sleep(frame.pts - now);
	display_frame(&frame); /// immediate
	now = frame.pts;

* Frames are decoded from packets obtained from a container format or a
  protocol. Packets are made of a payload (a sequence of bytes) and headers.
  The container can also provide global informations that are relevant to
  all frames in a stream.

* In the simple case, the payload from a packet encodes exactly one video
  frame. (For now, let us focus on this simple case.) A trivial decoder will
  just eat the packet payload and output the decoder, altering its internal
  state at the same time.

* With most codecs, most frames are not encoded as a standalone blob of
  information, but based on the state of the decoder left by previously
  decoded frames. Thus, frames must be fed to a decoder in a proper order,
  that I will call encoded frames dependency order.

* With modern codecs (B-frames), the frames dependency order is not the same
  as the chronological order of decoded frames.

* Most formats in the world carry encoded frames in dependency order. I do
  not know if there even exist formats that do otherwise.

* A non-stupid decoder will decode frames and keep them to reorder them into
  chronological order. Most non-stupid decoder adhere to the stupid API: one
  packet in, one frame out (or not). The difference with a stupid
  (non-reordering) decoder is that the returned frame can be the result of
  the decoding of a previous packet; the frame decoded from the current
  packet is then queued to be returner later. These reordering decoders need
  to be flushed somehow at the end of the stream, usually with dummy empty
  packets. Lavc implements a reordering decoder.

* Determining the chronological order when only the dependency order is
  known require at least a partial decoding of the payload, which depends on
  the codec. Thus, it can not be done in a completely codec-unaware
  container-decoding library. This kind of partial decoding of the payload
  seems to be precisely the point of parsers, but this makes the decoding
  more expensive.

* Some codecs produce payloads that include timestamps by themselves. But
  this timestamp can not be considered reliable as there are tools that cut
  and paste containers and protocols without altering the payload. Thus, the
  PTS for a frame should only be computed using information in the container
  and packet headers.

* Different formats and protocols define a wide variety of packet headers,
  but there are two that are widely present:

  - PTS: this is the presentation timestamp for the frame encoded in the
    payload of the packet. When it is present and valid, the problem of
    finding a proper timestamp for a decoded frame is trivial.

  - DTS: the exact definition of this is somewhat unclear (as it assumes a
    decoder that decodes one frame instantaneously but not several). One
    possible meaning is this: with a reordering decoder that delays frames
    "the usual way" (which depends on the codec), the DTS of a packet is the
    PTS of the frame it will cause to be flushed. But it seems to be only
    valid under certain assumptions.

* Almost any absurd combination of problem that can be imagined actually
  exist. The most obnoxious one is probably AVI: the AVI format has no
  timestamps whatsoever. The best timing information it provides is the
  assumption that each packet code for exactly one frame, and each frame has
  exactly the same duration. Thus, the packet number can more or less be
  used as a DTS, but that's all.

* lavf and lavc can be used together, of course, but they also can be used
  separately: lavc can be used to decode packets coming from a non-lavc, and
  lavf can be used to feed non-lavc decoders.

I hope I have summarized most of the aspects of the problem.

Unfortunately, I think that the last point makes your (Ronald) premise
impossible. Imagine a codec with B-frames in AVI: the format has only (more
or less) DTS. Using a parser, lavf can derive PTS. But we also want lavc to
work with a dumb parser that does not have a parser, and thus provides only
DTS.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110216/ca2f8454/attachment.pgp>