[FFmpeg-devel] About guess_correct_pts / AVFrame.best_effort_timestamp
Wed Feb 16 21:51:38 CET 2011
L'octidi 28 pluvi?se, an CCXIX, Ronald S. Bultje a ?crit?:
> I have no idea what guess_correct_pts or best_effort_timestamp is.
> I've brought up in one of these troll-threads that I want the whole
> PTS/DTS thing explained to me before any more change goes in. You went
> as far as to admit that "if A and B and C and D and E and F, then
> maybe sometimes perhaps PTS is this and otherwise it's very complex".
> Reimar simply admitted it's incomprehensible.
> I want this whole mess re-designed. best_effort_timestamp may or may
> not be part of the solution.
Ok. I'll do my best, at first, to make things clear.
> First of all, where. This should be in lavf, not lavc, otherwise
> transcoding of formats where timestamps are missing to those where
> they are required doesn't work. lavc should do nothing more but
> reorder timestamps for these formats where display order != coding
I'm not sure this is possible, precisely due to reordering problems, but we
will see in the course of the discussion.
> If you accept that premise,
I accept that premise under the condition that it can be achieved.
> I'm willing to help looking into a
> solution that removes any and all non-reordering code from lavcodec,
> obliviates any and all timestamp screwups in ffmpeg.c, cmdutils.c,
> ffplay.c etc, and cleanly implements format-specific hacks where they
> belong: in lavformat.
Let us try. Please anyone correct me if I write anything wrong.
(You can skip to the end in a first reading.)
* A video frame is an image intended to be sent to a video display device or
encoded into a file. It is made of various information: size, bitmap data,
* One of the information in a video frame is its presentation timestamp
(PTS). This is the timestamp of the point in time where the image should
start being displayed on the output device. Thus, a trivial video player
could be written like this:
sleep(frame.pts - now);
display_frame(&frame); /// immediate
now = frame.pts;
* Frames are decoded from packets obtained from a container format or a
protocol. Packets are made of a payload (a sequence of bytes) and headers.
The container can also provide global informations that are relevant to
all frames in a stream.
* In the simple case, the payload from a packet encodes exactly one video
frame. (For now, let us focus on this simple case.) A trivial decoder will
just eat the packet payload and output the decoder, altering its internal
state at the same time.
* With most codecs, most frames are not encoded as a standalone blob of
information, but based on the state of the decoder left by previously
decoded frames. Thus, frames must be fed to a decoder in a proper order,
that I will call encoded frames dependency order.
* With modern codecs (B-frames), the frames dependency order is not the same
as the chronological order of decoded frames.
* Most formats in the world carry encoded frames in dependency order. I do
not know if there even exist formats that do otherwise.
* A non-stupid decoder will decode frames and keep them to reorder them into
chronological order. Most non-stupid decoder adhere to the stupid API: one
packet in, one frame out (or not). The difference with a stupid
(non-reordering) decoder is that the returned frame can be the result of
the decoding of a previous packet; the frame decoded from the current
packet is then queued to be returner later. These reordering decoders need
to be flushed somehow at the end of the stream, usually with dummy empty
packets. Lavc implements a reordering decoder.
* Determining the chronological order when only the dependency order is
known require at least a partial decoding of the payload, which depends on
the codec. Thus, it can not be done in a completely codec-unaware
container-decoding library. This kind of partial decoding of the payload
seems to be precisely the point of parsers, but this makes the decoding
* Some codecs produce payloads that include timestamps by themselves. But
this timestamp can not be considered reliable as there are tools that cut
and paste containers and protocols without altering the payload. Thus, the
PTS for a frame should only be computed using information in the container
and packet headers.
* Different formats and protocols define a wide variety of packet headers,
but there are two that are widely present:
- PTS: this is the presentation timestamp for the frame encoded in the
payload of the packet. When it is present and valid, the problem of
finding a proper timestamp for a decoded frame is trivial.
- DTS: the exact definition of this is somewhat unclear (as it assumes a
decoder that decodes one frame instantaneously but not several). One
possible meaning is this: with a reordering decoder that delays frames
"the usual way" (which depends on the codec), the DTS of a packet is the
PTS of the frame it will cause to be flushed. But it seems to be only
valid under certain assumptions.
* Almost any absurd combination of problem that can be imagined actually
exist. The most obnoxious one is probably AVI: the AVI format has no
timestamps whatsoever. The best timing information it provides is the
assumption that each packet code for exactly one frame, and each frame has
exactly the same duration. Thus, the packet number can more or less be
used as a DTS, but that's all.
* lavf and lavc can be used together, of course, but they also can be used
separately: lavc can be used to decode packets coming from a non-lavc, and
lavf can be used to feed non-lavc decoders.
I hope I have summarized most of the aspects of the problem.
Unfortunately, I think that the last point makes your (Ronald) premise
impossible. Imagine a codec with B-frames in AVI: the format has only (more
or less) DTS. Using a parser, lavf can derive PTS. But we also want lavc to
work with a dumb parser that does not have a parser, and thus provides only
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel