[Libav-user] Video and audio timing / syncing

Fri Mar 29 22:49:25 CET 2013

On Mar 29, 2013, at 2:12 PM, Kalileo <kalileo at universalx.net> wrote:

All of the below really helps my understanding...I think a few more things I need to know to fill in the gaps:
>> To the best of what I've been able to determine from mailing list responses, doc, and my testing, it would appear that these settings for audio don't have any material effect on settings for video and vice versa,
> 
> Correct, except that they are used for syncing.

Ok, is this a logical syncing (how synced video and audio appear to the user when played, i.e. an independent audio player just following audio pts while an independent video player following video pts) or a literal syncing -- i.e. the player is doing a direct comparison of the video pts and audio pts values to determine sequencing. 

> Depends on your Player. In the case you describe the audio is the "master",

> Not correct. You can take the video timing as the master

What determines whether audio or video is the "master"? Is this something I need to specify in output format context, codec, or stream configuration? 

>> 
>> - If I completely turn off the writing of all audio frames, there is absolutely no change in video rendering -- it still renders video frames at twice the speed.
> 
> What Player are you using, what player shows that behavior?

My output format is an FLV file. Once I run my app and output an FLV file to my desktop, I've tried in both VLC and Wondershare Video Converter Ultimate -- same result. 

> Not correct. The relationship is the timing, the length. Same PTS means this video and this audio should be played at the same time. 

Again, might be semantics, but you may have hit on the core of my problem. Each of the video codec context and audio codec context have their own configured time bases. The documentation for pts in the AVCodecCtx reads as follows: 

     * This is the fundamental unit of time (in seconds) in terms
     * of which frame timestamps are represented. For fixed-fps content,
     * timebase should be 1/framerate and timestamp increments should be
     * identically 1.

That's obviously analog for the video codec context. But it doesn't mention anything about how to set this value for the audio codec context. I was tempted initially to set the audio codec context's time_base.den to the video frame rate (even though it is audio), but I ruled that out, for two reasons: 

1. You can encode audio when there is no video stream and therefore no frame rate, so it would seem that if pts were important for audio, it would need to have some logical time_base, when there was no frame_rate. 

2. Once again, this is a hole in the examples given in muxing.c. It doesn't set the time_base on the audio codec context at all, so this is one of those threads of info that lead me to question, along with the fact that audio encoding and video encoding are completely separate operations and entities (which I read as being that their respective codec contexts were also separate entities, therefore not sharing a time_base value) whether pts for audio was even relevant or not. 

The logical alternative was to assign the audio codec context a time_base.den of the audio sample rate (44100). It sounds extremely plausible that this would be the relative culprit when audio and video are both in play, and as I stated prior, I suspected there was a relative mismatch, so I disabled audio entirely and was surprised to discover it had no material impact on the video at all. 

I think I'm close....thanks again for the discussion, its really helping to shore up my understanding of how these things work. 

Brad