[Libav-user] Video and audio timing / syncing

Brad O'Hearne brado at bighillsoftware.com
Sat Mar 30 08:05:27 CET 2013


On Mar 29, 2013, at 11:23 PM, Alex Cohn <alexcohn at netvision.net.il> wrote:

> On 30 Mar 2013 07:56, "Kalileo" <kalileo at universalx.net> wrote:
> > Another thing you can do is to take a stream which plays correctly, and analyze the dts values of audio and video used there. This might show you right away where you're different.
> 
> This is rarely practical: there are multiple ways to construct an FLV movie that will play correctly in VLC, and there is no easy way to find which subtle difference causes the digression.

Well, here's the rub -- thanks to QTKit, and the QTSampleBuffer it delivers for both video and audio, I don't have to calculate pts, dts, or duration -- those time values are already delivered with the data buffer, along with its associated time scale, so converting to time_base units is merely a simple math problem. However, using the units (and I've verified in the console log that these values are all sequential and ascending as expected) it still isn't right. It is closer, and the audio still seems perfect, but the audio still seems to play just a bit two fast, cutting down something like 2 seconds off a 12 second video. 

Questions: 

1. I'm still not completely clear on the needed time_base.den value for the audio codec context -- should that be the same as the time_base.den value for the video codec context (which is essentially the video frame rate) or something else? Like I said, the muxing.c example doesn't appear to set this at all, so pts values have to be conforming to some scale.

2. One thing I find really interesting is that all of these time_base units, pts, dts, and duration are integral. If video / audio timing needs exactness, why aren't these things floats for the purposes of finer-grained precision? 

3. Assuming for a moment that pts and dts settings are right, is there any other possible factor that can throw off timing even with setting pts / dts values given directly determined from capture? I encountered one code sample this past week which was different from all others I had looked at -- after the av_write_frame call, it followed with a loop to flush "delayed frames" feeding the encoding a NULL data buffer (no pts set on the packet though). Is it possible that there are frames somehow not making out of the encoder?

Brad


More information about the Libav-user mailing list