[FFmpeg-devel] [PATCH] rtpdec: Emit timestamps for packets before the first RTCP packet, too

Martin Storsjö martin
Tue Dec 28 23:22:56 CET 2010


Hi Josh,

On Mon, 27 Dec 2010, Josh Allmann wrote:

> On 27 December 2010 00:48, Martin Storsjo <martin at martin.st> wrote:
> > Timestamps in each stream start from 0, for the first received
> > RTP packet. Once an RTCP packet is received, that one is used for
> > sync, emitting timestamps that fit seamlessly into the earlier ones.
> 
> Just a small comment: RTP timestamps do not necessarily start from
> zero (RFC 3550, section 5.1: "The initial value of the timestamp
> SHOULD be random") and the RTCP wallclock is used for syncing between
> streams that otherwise have different starting timestamps.

Yes, yes of course :-)

> I am not familiar enough with that particular piece of code to make
> judgements about the patch, though.

Verbally, here's what the code does, after my patch:

Prior to the first RTCP packet, the timestamp returned is [RTP timestamp] 
- [RTP timestamp of first packet], so regardless of at what value they 
start, the ones we emit start at 0. The RTP timestamp of the first packet 
is named base_timestamp in the code.

Once the other patchset for modifying the header parsing is applied, we 
could parse the RTP-Info header, too, and use the timestamp specified 
there instead of the RTP timestamp of the first packet.

When we get the first RTCP packet, we calculate the offset from the first 
RTCP packet to the base RTP timestamp, and store this in rtcp_ts_offset. 
At this point, timestamps emitted are: [RTP timestamp] - [RTP timestamp of 
last RTCP] + [diff between latest RTCP packet and first RTCP packet] + 
[rtcp_ts_offset]. Proper rescaling between values expressed in different 
units is done, of course.

Thus, all streams are synced together via the NTP timestamps once an RTCP 
packet has been received in that stream, before that, the timestamps are 
simple diffs against the first packet.

Actually, on top of all this, we add a variable named range_start_offset. 
This is used for emitting sensible timestamps after seeking. If we seek to 
e.g. 42.0, and the response to the PLAY header had a Range: 42.0- header, 
we add this on top of all timestamps, so that the emitted timestamps start 
at 42.


A full example might be useful:

We start playing with a seek to 42.0. We don't get any RTCP packets 
initially. We have both an audio and video stream, both having the 
timebase 1000 for simplicity.

We receive video packets with timestamps 1000, 1100, 1200. The first 
packet gives base_timestamp 1000. The diff to the initial timestamp thus 
is 0.0, 0.1, 0.2, and we add range_start_offset 42.0 so we return 42.0, 
42.1, 42.2. Similarly, for the audio stream, we get packets with the 
timestamps 5000, 5100, 5200. Thus, base_timestamp for this stream is 5000, 
and we return the summed timestamps 42.0, 42.1, 42.2.

After one second, we receive a RTCP packet in the video stream, but none 
in the audio stream. This RTCP packet has the NTP timestamp 501 seconds 
and RTP timestamp 2000. The diff in RTP timestamp units to base_timestamp 
is 1000, 1.0 seconds, stored in rtcp_ts_offset. We set first_rtcp_ntp_time 
and last_rtcp_ntp_time to 501, last_rtcp_timestamp to 2000. Following RTP 
packets with RTP timestamps 2100, 2200 and 2300 get the timestamps 43.1, 
43.2 and 43.3 like this: range_start_offset (42.0) + rtcp_ts_offset (1.0)
+ addend (0, diff between last_rtcp_ntp_time and first_rtcp_ntp_time) + 
delta_timestamp (0.1, 0.2, 0.3, the diff between last_rtcp_timestamp and 
the RTP timestamps).

A while later, we get another RTCP packet, with the NTP timestamp 502 and 
RTP timestamp 3000. Following packets with RTP timestamps 3100 etc get 
their timestamps like this: range_start_offset (42.0) + rtcp_ts_offset 
(1.0) + addend (1.0, last_rtcp_ntp_time - first_rtcp_ntp_time) + 
delta_timestamp (0.1).

When we got the first RTCP packet, the values for that stream are 
propagated to all other streams, namely first_rtcp_ntp_time and 
rtcp_ts_offset. Since we haven't gotten any RTCP packets in the audio 
stream (last_rtcp_ntp_time isn't set), the RTCP-less calculation is still 
used.

A bit later, we get the first RTCP packet for the audio stream, with NTP 
time 504 seconds, RTP timestamp 9000. An audio packet with RTP timestamp 
9100 gets its final timestamp calculated like this: 4.1 = 
range_start_offset (42.0) + rtcp_ts_offset (1.0, propagated from the 
stream with the first RTCP packet) + addend (3.0, 504 - 501, where 501 was 
propagated from the stream with the first RTCP packet) + delta (0.1, 9100 
- 9000)

// Martin



More information about the ffmpeg-devel mailing list