[FFmpeg-user] Use source wallclock (NTP) in RTSP/RTCP streams to setpts for inter-media synchronization

David Gessel gessel at blackrosetech.com
Sun Jan 17 00:49:03 EET 2021

I'm working on a project to multiplex several RTSP streams, from cameras, into a simple "grid" output.    The issue is that the cameras are intended to form an ultra-wide (panoramic) view and while total latency is never as low as one would like, relatively latency is a real problem.

I'm not expecting frame-accurate synchronization as one might get from genlocked analog or SDI sources, rather superficially synchronized - to within a few hundred msec, as would be sufficient for the visual perception of a live panorama.  As source start times and network connection times are not deterministic, manually tuning with incremental absolute offsets wouldn't work (though could help).  Currently the best I've been able to get is about 3 seconds differential latency, which is a fun way to do an automated one-person "wave" but not really the desired effect.  All devices are synced to NTP, which is more than sufficiently in sync for the application if I could figure out how to use it.

below is the command I'm using to connect, composite, and display that yields the fastest frame rates and least drop frames. Curiously, changing setpts from N/(8.33*TB) to the recommended for live stream setpts='(RTCTIME - RTCSTART) / (TB * 1000000)' for each stream results in really choppy playback (but no improvement in inter-stream sync).

ffmpeg  -max_delay 500000 -reorder_queue_size 10000 \
          -fflags nobuffer -re -rtsp_transport udp -an -flags low_delay -strict experimental\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
          -fflags nobuffer -re -thread_queue_size 1024 -i rtsp://\
         -filter_complex "
         nullsrc=size=3554x480 [base];
         [0:v] setpts=N/(8.33*TB) [CAM1];
         [1:v] setpts=N/(8.33*TB) [CAM2];
         [2:v] setpts=N/(8.33*TB) [CAM3];
         [3:v] setpts=N/(8.33*TB) [CAM4];
         [4:v] setpts=N/(8.33*TB) [CAM5];
         [5:v] setpts=N/(8.33*TB) [CAM6];
         [base][CAM1] overlay=x=0 [tmp1];
         [tmp1][CAM2] overlay=x=583 [tmp2];
         [tmp2][CAM3] overlay=x=1165 [tmp3];
         [tmp3][CAM4] overlay=x=1748 [tmp4];
         [tmp4][CAM5] overlay=x=2331 [tmp5];
         [tmp5][CAM6] overlay=x=2914 " \
         -c:v libx264 -tune zerolatency -an -preset ultrafast -crf 22 -f matroska - |\
     ffplay -framedrop -sync ext -probesize 32 -

What I was looking for was some way to set offsets to either:

* Least Latency - treat each inbound source as "live" and "best effort" rather than trying to sync to the start of each stream on the receiver (which is fairly arbitrary) or the start of the stream on the source (which is completely arbitrary).  I suspect this is doable, but I haven't found an obvious way to do it yet - any advice much appreciated.

* Absolute sync to NTP packets. This seems like it should be the "right" solution.  I've found some links that suggest I'm not the only one looking for a multi-source NTP sync solution, but I haven't found anyone with a clear solution.

I've tried the a few setpoints constructions to no obvious improvement in absolute synchronization including:

setpts='(RTCTIME - RTCSTART) / (TB * 1000000)'

To no avail.

The RTCP Sender Report from these cameras do contain accurate wallclock time and are sent every 4-5 seconds:

Real-time Transport Control Protocol (Sender Report)
     [Stream setup by RTSP (frame 12)]
     10.. .... = Version: RFC 1889 Version (2)
     ..0. .... = Padding: False
     ...0 0000 = Reception report count: 0
     Packet type: Sender Report (200)
     Length: 6 (28 bytes)
     Sender SSRC: 0xcd323da7 (3442621863)
     Timestamp, MSW: 3819623083 (0xe3aad2ab)
     Timestamp, LSW: 3277326335 (0xc35807ff)
     [MSW and LSW as NTP timestamp: Jan 14, 2021 14:24:43.763062000 UTC]
     RTP timestamp: 2329505188
     Sender's packet count: 479641
     Sender's octet count: 674869163

My understanding is that NTP is included in RTCP specifically for inter-media global synchronization but I don't know how to enable this.

This stack overflow question seems to be on the right path, but implies that some mods needed to be made, at least 7 years ago:

https://stackoverflow.com/questions/20265546/reading-rtcp-packets-from-an-ip-camera-using-ffmpeg but

On Trac #4586 inspired me to try -use_wallclock_as_timestamps, but that seems to be receiver wallclock, not source.  The only other option in options_table.h that seems relevant is "start_time_realitme" - but this is encode.

It looks like there was a patch proposed in 2016 as a step toward multi-source synchronization:


libavformat/rtpdec.c seems to parse the NTP RTCP timestamps, but I don't see any method of using them to setpts (or otherwise sync sources), though it looks like this was a topic of discussion back in 2008:


Which is to say, I looked through all the hints I could find, and while there are many tantalizing clues, nothing indicates an obvious solution to me.  Is there perhaps some obvious parameter I'm just not understanding or other solution escaping my search engine skills?

More information about the ffmpeg-user mailing list