[FFmpeg-user] IP camera recording via RTSP: audio/video desync (dropped frames?)

Vladimir Mishonov me at player701.ru
Thu Jul 14 09:00:14 EEST 2022

Here's some additional information that may or may not be useful in 
solving this:

1. I removed the "+genpts+igndts+ignidx" from "fflags". From what I 
could understand, it wasn't necessary to use them, and removing them 
didn't make anything better or worse.

2. I tried to find out why the video lag gets introduced only when the 
audio stream is present, so I ran two live recording sessions in 
parallel: one with audio, and another one without - and found something 
very interesting. Remember I said that the lag happens when the video 
does not have enough frames? Well, that's not quite true, it seems - 
frames do get dropped sometimes, but it's not the actual cause of the 
problem. I've been analyzing the recorded files, and it looks like the 
actual issue is with incorrect presentation timestamps (PTS) being 
transmitted or calculated. The frames themselves come on time, but when 
FFmpeg records the video along with the audio, it erroneously puts some 
frames in the next segment, probably in an attempt to synchronize the 
two streams, which actually makes it worse!

Here's the comparison of two segments recorded during the experiment, 
from the time when the lag began:

The segment recorded WITHOUT audio has 15000 frames and PTS ranging from 
0 to 612. This is obviously incorrect, because the actual clock time, as 
seen in the video itself, only counts 600 seconds (from 00:10:00 to 
00:20:00). Due to this discrepancy, the video length is also wrong (10 
minutes and 12 seconds). However, the next segment does not exhibit any 
lag (the clock starts at 00:20:00 as expected) and does not seem to have 
any PTS issues whatsoever.

Now, the segment recorded WITH audio is another story entirely. It has 
only 14750 frames, and the PTS values range from 0 to about 602. The 
latter corresponds to its reported length of 10 minutes and 2 seconds, 
but the clock ends counting on 00:19:50 - entire 10 seconds are missing! 
But they aren't gone, they are actually included in the next segment, 
which reports 15000 frames just like the silent one, but these frames 
are not the same - the clock runs from 00:19:50 to 00:20:50 (should be 
00:20:00 to 00:30:00), and the PTS also starts from 2 instead of from 0 
- but the audio begins playing right from the start. As a result, a lag 
of 12 seconds between the audio and the video has been introduced.

Now, the question is, how to fix this. Like I already said before, 
"-use_wallclock_as_timestamps 1" works, but causes constant stuttering 
in the video. But it's not actually necessary to aggressively enforce 
new PTS at all times. Because a 10-minute segment reports a length of 10 
minutes and 12 seconds, it implies that some of the reported (or 
calculated) timestamps are ahead of real time - which is obviously wrong 
for a live stream. Perhaps a less aggressive correction of these 
discrepancies can be implemented with some option or bitstream filter 
(like "setts") that can detect this scenario of PTS jumping ahead of 
actual time?

Thank you very much.

Kind regards,

More information about the ffmpeg-user mailing list