[Libav-user] Video and audio timing / syncing

Mon Apr 1 06:20:48 CEST 2013

On Apr 1, 2013, at 09:50 , Brad O'Hearne wrote:

> On Mar 31, 2013, at 6:32 PM, Kalileo <kalileo at universalx.net> wrote:
> 
> Kalileo -- thanks for the reply. I'm not sure if you've read this thread and everything I've written, but based on the questions it appears you may have missed a post or two, so please forgive me if there's a rehash here.

I haven't read your _other_ threads.

> 
>> There's a lot of half-theory in your questions, and i get confused about your intentions. Do you want to solve a problem (of video/audio not being in sync) or do you want to redesign the dts/pts concept of video/audio syncing?
> 
>> Didn't you say that it's _not_ in sync now? So obviously you've to correct one side, not do the same modification on both sides.
>> 
>> I do not understand why you need to make this so complicated. It is so easy, same PTS = to be played at the same time.
> 
> I'll do my best to distill this all down as simply as possible. 
> 
> THE GOAL
> Capture video and audio from QTKit, pass the decompressed video and audio sample buffers to FFmpeg for encoding to FLV, and output the encoded frames (ultimately to a network stream, but in my present prototype app, a file). This is live capture-encode-stream use-case where the video is then being broadcast and played by multiple users in as near real-time as possible. Latency and delay needs to be minimized and eliminated to the degree it is possible.
> 
> THE PROBLEM
> I have finally determined through many hours of testing that the problem here is NOT pts and dts values I am assigning. The values I am assigning to pts and dts are 100% accurate -- every video and audio sample buffer received from QuickTime (QTSampleBuffer) delivers its  exact presentation time, decode time, and duration. When I plug these values into the AVPacket pts and dts values, video and audio is perfectly synced provided that -- and here's the crux of the issues -- the time_base.den value matches EXACTLY the *actual* frame rate of captured frames being returned.

in other words, it works if you give the encoder correct values.

> If the actual frame rate is different from the frame rate indicated in time_base.den, then the video does not play properly.

in other words, it does not work if you give the encoder incorrect values.

> In my specific use case, I had configured a minimum frame rate of 24 fps on my QTDecompressedVideoCaptureOutput, and so expecting that frame rate, I configured my codec context time_base.den to be 24 as well. What happened, however, is that despite being configured to output 24 fps, it a
> ctually output fewer fps, and when that happened, even though the pts and dts values were the exact ones delivered on the sample buffers,

Did you ever check the resulting video pts and dts values? When you say "the pts and dts values were the exact ones delivered on the sample buffers" you talk about the input, before encoding, and not the output, after encoding, right?

> the video played much faster than it should, while the audio was still perfect. So I manually went through my console log, counted how many frames per second were actually being received from capture (15), and hard-coded 15 as the time_base.den value. I reran my code with no other changes, and the video and audio is synced perfectly. The problem is the nature of the time_base, and however internally it is being used in encoding. 

Some assumptions:

Your encoder (!) does give the 16th frame the PTS the 16th frame at 25 fps would have received and not the PTS the 26th frame (at 25 fps) would have received. Hence when playing in a player audio and video drift 2/5th of a second apart, every second. 

How can the encoder possibly know that it gets 15 fps instead of the promised 25? It needs the correct info to calculate the next DTS/PTS. 

You would need to measure at the capture source (before you feed the encoder) what frame rate it is actually sending out, and set that (as you apparently somehow did eventually).

Alternatively you could also measure the average framerate dynamically and, if that changes, manipulate the DTS/PTS _after_ encoding but _before_ writing. I found players to usually use the container's DTS/PTS values, so that's working, and I have done it in some projects. You simply get an average over the video frames per time unit, and use that value to adjust video DTS/PTS accordingly _after_ encoding but _before_ writing. 

And voila, variable fps implemented, should work in every player which goes by PTS and even better if it uses audio as master for syncing.