[FFmpeg-devel] [PATCH] rtpdec: Emit timestamps for packets before the first RTCP packet, too
Tue Dec 28 23:22:56 CET 2010
On Mon, 27 Dec 2010, Josh Allmann wrote:
> On 27 December 2010 00:48, Martin Storsjo <martin at martin.st> wrote:
> > Timestamps in each stream start from 0, for the first received
> > RTP packet. Once an RTCP packet is received, that one is used for
> > sync, emitting timestamps that fit seamlessly into the earlier ones.
> Just a small comment: RTP timestamps do not necessarily start from
> zero (RFC 3550, section 5.1: "The initial value of the timestamp
> SHOULD be random") and the RTCP wallclock is used for syncing between
> streams that otherwise have different starting timestamps.
Yes, yes of course :-)
> I am not familiar enough with that particular piece of code to make
> judgements about the patch, though.
Verbally, here's what the code does, after my patch:
Prior to the first RTCP packet, the timestamp returned is [RTP timestamp]
- [RTP timestamp of first packet], so regardless of at what value they
start, the ones we emit start at 0. The RTP timestamp of the first packet
is named base_timestamp in the code.
Once the other patchset for modifying the header parsing is applied, we
could parse the RTP-Info header, too, and use the timestamp specified
there instead of the RTP timestamp of the first packet.
When we get the first RTCP packet, we calculate the offset from the first
RTCP packet to the base RTP timestamp, and store this in rtcp_ts_offset.
At this point, timestamps emitted are: [RTP timestamp] - [RTP timestamp of
last RTCP] + [diff between latest RTCP packet and first RTCP packet] +
[rtcp_ts_offset]. Proper rescaling between values expressed in different
units is done, of course.
Thus, all streams are synced together via the NTP timestamps once an RTCP
packet has been received in that stream, before that, the timestamps are
simple diffs against the first packet.
Actually, on top of all this, we add a variable named range_start_offset.
This is used for emitting sensible timestamps after seeking. If we seek to
e.g. 42.0, and the response to the PLAY header had a Range: 42.0- header,
we add this on top of all timestamps, so that the emitted timestamps start
A full example might be useful:
We start playing with a seek to 42.0. We don't get any RTCP packets
initially. We have both an audio and video stream, both having the
timebase 1000 for simplicity.
We receive video packets with timestamps 1000, 1100, 1200. The first
packet gives base_timestamp 1000. The diff to the initial timestamp thus
is 0.0, 0.1, 0.2, and we add range_start_offset 42.0 so we return 42.0,
42.1, 42.2. Similarly, for the audio stream, we get packets with the
timestamps 5000, 5100, 5200. Thus, base_timestamp for this stream is 5000,
and we return the summed timestamps 42.0, 42.1, 42.2.
After one second, we receive a RTCP packet in the video stream, but none
in the audio stream. This RTCP packet has the NTP timestamp 501 seconds
and RTP timestamp 2000. The diff in RTP timestamp units to base_timestamp
is 1000, 1.0 seconds, stored in rtcp_ts_offset. We set first_rtcp_ntp_time
and last_rtcp_ntp_time to 501, last_rtcp_timestamp to 2000. Following RTP
packets with RTP timestamps 2100, 2200 and 2300 get the timestamps 43.1,
43.2 and 43.3 like this: range_start_offset (42.0) + rtcp_ts_offset (1.0)
+ addend (0, diff between last_rtcp_ntp_time and first_rtcp_ntp_time) +
delta_timestamp (0.1, 0.2, 0.3, the diff between last_rtcp_timestamp and
the RTP timestamps).
A while later, we get another RTCP packet, with the NTP timestamp 502 and
RTP timestamp 3000. Following packets with RTP timestamps 3100 etc get
their timestamps like this: range_start_offset (42.0) + rtcp_ts_offset
(1.0) + addend (1.0, last_rtcp_ntp_time - first_rtcp_ntp_time) +
When we got the first RTCP packet, the values for that stream are
propagated to all other streams, namely first_rtcp_ntp_time and
rtcp_ts_offset. Since we haven't gotten any RTCP packets in the audio
stream (last_rtcp_ntp_time isn't set), the RTCP-less calculation is still
A bit later, we get the first RTCP packet for the audio stream, with NTP
time 504 seconds, RTP timestamp 9000. An audio packet with RTP timestamp
9100 gets its final timestamp calculated like this: 4.1 =
range_start_offset (42.0) + rtcp_ts_offset (1.0, propagated from the
stream with the first RTCP packet) + addend (3.0, 504 - 501, where 501 was
propagated from the stream with the first RTCP packet) + delta (0.1, 9100
More information about the ffmpeg-devel