[Libav-user] Video and audio timing / syncing

Fri Mar 29 07:53:03 CET 2013

On Mar 29, 2013, at 03:03 , Brad O'Hearne wrote:

> On Mar 27, 2013, at 11:48 PM, Kalileo <kalileo at universalx.net> wrote:
> 
>> When you encode audio and video, you'll feed each packet with the dts and pts value. The encoding function for video and the encoding function for audio do not know each other, they do not communicate. You have to set these values for them, and pass them in when you write the already encoded packet. As far as I remember, the write function does not set these values, but only checks that what you pass in is plausible.
>> If you mix already encoded audio and video, you have to remux it, which is basically the same writing of packets as after encoding, and in that process set the correct values for dts and pts.
>> 
>> Once you understand that you are responsible for setting these values, and that there is no magic communication between audio and video involved, it is quite simple.
>> 
>> if you base your dts and pts values on the time when you received the data after it went through various buffers, you have to take the delay caused by these buffers into consideration.
> 
> Kalileo -- thanks for the reply. I can give a little more detail to the nature of the problem I'm experiencing. What I first thought was the video hanging half-way through the video wasn't that at all -- it was the video actually ending. The audio plays perfectly and sounds exactly as expected. But the video is playing at a much faster rate, and just completes in about half the time as the audio, and so it stops on the last frame. So the video is the problem -- audio is now perfect.
> 
> I have been setting the pts and dts values on each video's AVPacket, and on this point, the muxing.c example file which another poster mentioned doesn't really clarify the issue for me. That example isn't receiving video frames arbitrarily from an external source, but rather, it is generating these frames and the pts values in an organized loop in sequential fashion. The example is also essential orchestrating interleaving itself because it can -- that is, because it is generating its own data, it can alternate calls to write_frame for audio and video. 
> 
> It's weird, its as if the video is playing at twice the speed of the audio. 
> 

Hi Brad,

when you start writing the packets (muxing them), you give each audio and video packet a DTS (and PTS) value. You can start at zero. 

At the start you give the first audio and the first video packet the same value.  For every new packet you have to increase the DTS value accordingly, depending on the length of the  audio or video packet before. Audio and video packets have different lengths, so you increase them using different step values.  

For example, you can increase the DTS value for every video packets by 4000, and for every audio packets by 2000 (you must correct these values depending on your codecs).

If you use the correct step values, then at the end of your video, both audio and video DTS values should be roughly the same again. If they are not, your step value is wrong.

That's all already. Works perfectly for me.

> 
> It's weird, its as if the video is playing at twice the speed of the audio. 
> 

Looks like you do not take into consideration that an audio packet and a video packet do not have the same length!  DTS is not a counter, but a time value. You do not increase by 1, but by a value, which considers the length of the packet.

There are actually some formulas, which describe how you calculate usual DTS values. The important part is that the relation of the length of the different packets are mapped into these values.

HtH, Regards,
Kalileo