[Libav-user] Video and audio timing / syncing

Brad O'Hearne brado at bighillsoftware.com
Fri Mar 29 21:28:05 CET 2013

On Mar 28, 2013, at 11:53 PM, Kalileo <kalileo at universalx.net> wrote:
> Hi Brad,
> when you start writing the packets (muxing them), you give each audio and video packet a DTS (and PTS) value. You can start at zero. 
> At the start you give the first audio and the first video packet the same value.  For every new packet you have to increase the DTS value accordingly, depending on the length of the  audio or video packet before. Audio and video packets have different lengths, so you increase them using different step values.  
> For example, you can increase the DTS value for every video packets by 4000, and for every audio packets by 2000 (you must correct these values depending on your codecs).
> If you use the correct step values, then at the end of your video, both audio and video DTS values should be roughly the same again. If they are not, your step value is wrong.
> That's all already. Works perfectly for me.

Kalileo -- hey thanks for taking the time to respond, it is good to hear from you again. I think you are probably right on target, but I have a few wrinkles to add which have caused me to scratch my head a bit. Check these few tidbits out: 

- Another poster has mentioned earlier in this thread (if I understood his point accurately) that audio and video streams (timing that is) are completely unrelated in their handling. While we view these streams as single rendered product, that internally they are completely separate entities. There's kind of an issue of semantics here, but I'm not sure whether that agrees with or contradicts above what you are saying about the relationship between audio and video pts / dts. To the best of what I've been able to determine from mailing list responses, doc, and my testing, it would appear that these settings for audio don't have any material effect on settings for video and vice versa, but in viewing the output, they obviously would show sync problems if timings weren't right. This seems supported by the next several points which follow. 

- Here's an interesting note: it doesn't appear that pts and dts are even relevant for audio. I don't know whether that is the case across the board, or only in some specific circumstances, but I don't even have to set either value, and the audio is perfect both in the case of writing video frames as well, or if I completely turn off writing of all video frames. I've outputted the audio pts value when not setting it and it is complete junk, yet the audio is perfect. 

- If I completely turn off the writing of all audio frames, there is absolutely no change in video rendering -- it still renders video frames at twice the speed. This would seem to support the fact that a) pts might only be significant for video packets and not for audio, and b) there's no direct relationship between video and audio packet pts. 

So my next questions become the following: 

1. Is setting the audio pts and dts even relevant? I've seen no functional indication that it is. 

2. Is there any direct thing that the playback codecs do (other than just rendering at the proper time) to relate audio timing to video timing? There's no comparison or sequencing being done between values is there? 

3. The whole setting of pts and dts is relative to the time_base configured on the codec context. According to the documentation, the time_base.num should be 1, and the time_base.den should be equal to the expected frames per second. I have both of these set accordingly. However, I got to thinking, what if you expect (I'm going to use round multiples for discussion here, I'm actually setting time_base.den to 24 fps) 30 fps, but at runtime receive only 15 fps. Will this internally have any material impact to rendering? I think this is where some of the FFmpeg code examples may be bypassing an issue common to many actual use-cases. They can virtually guarantee frame-rate and proper pts values by simply generating X frames and assigning them proper pts. But what happens when receiving these frames from an external source and frames aren't delivered at the frame rate expected? Is there some compensation that has to be done in code, or is the codec smart enough to render frames at the timings you stamp on them, regardless of whether the frame rate matches your time_base.den setting? 



More information about the Libav-user mailing list