[Libav-user] strange H264 audio sync behavior

Marco Sieber l-spy at web.de
Fri Jun 27 11:52:13 CEST 2014


> Marco — I empathize with your difficulties, I had similar. All I can pass on is my experience. My use case was real-time webcam capture, encoding, and streaming frames to a server — I had sync and playback problems too. I inquired on this mailing list, and was told setting pts / dts would correct the problem. Weeks of testing didn’t prove that to be true in my case. The only way to make syncing and playback work was that you absolutely had to provide whatever # of frames was set in the codec’s time_base to the encoder *whether you received them or not*. In other words, if the time base indicated 30fps, you couldn’t provide 15fps with proper pts and dts (double duration). Playback (on any number of video players) would try to process half the frames at 30fps, and thus the video played at double the speed (half the proper time).
>
> I had this exact scenario — I was receiving exact pts / dts from the capture source, but feeding that to the encoder didn’t work. The only way I could get this to work was to force-feed 30 fps to the encoder. In other words, if receiving 15 fps from the source, I had to feed each frame to the encoder twice with modified pts / dts so that the encoder got its 30fps. Again, I was working real-time, so if you are not, then perhaps your situation is different.
>
> That’s the only light I can shed on experience with FFmpeg and pts / dts — I read all the recommended docs on it, read through source code, and though I tried every way I knew how, fixing those two things only doesn’t solve all sync and playback issues. I know (well) that others will dispute this. But at the end of the day, I produced code which proved otherwise, and posted it to this list, and that was never demonstrated to be wrong. And for what its worth, I had several people contact me off-list after I posted this to confirm they were seeing the same behavior in their code. So take it for what it is — if there is another explanation as to why this is, I am unaware, but if one does surface, suffice to say I’ll be as interested as you.

If this code is sufficiently minimal and self-contained, and can
reproduce this problem, can you post it again?

>
> I don’t know if that helps, but that’s the best I can do to help. At the very least, I hope it leads to a good direction to getting things solved.
>
> Good Luck.
>
> Brad
>

Wow, thanks a lot for such a long Mail. It motivated me :D Its just so frustrating that you only can do some try&error ... to pinpoint the error. wish you maybe have to fix with try&error again... and on top of that i'm a video&audio newbie, i have to research a lot.


Im not sure if i can provide the code, because its a Class in my Environment, wish would need some rewriting or QT C++ Framework.
Thats how my pts/dts for audio gets calculated, while the audiostream is running with 1/48000.
...
audioFrame->pts = samples_count; 
samples_count += dst_nb_samples; (1024,944,.. see below)
ret = avcodec_encode_audio2(audioStream->codec,&encoderPacket,audioFrame,&got_packet);
...
encoderPacket.pts = av_rescale_q_rnd(encoderPacket.pts, audioCodecCtx->time_base,audioStream->time_base,(AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));
encoderPacket.dts = av_rescale_q_rnd(encoderPacket.dts, audioCodecCtx->time_base,audioStream->time_base,(AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));


Im now trying ffprobe out, so that i can print out the frames and pts/dts afterwards.

While Audio frames have different durations but all the same samples and video frames always have the same duration/durationtime.
I understand that video frames with a constant framerate (25fps), should increase with 0.04. 

But when im seeing these values... I'm kinda wondering how much of difference between video/audio are okay. (And how the QT player gets asynchron after 4min and not already from the start....)

>From the look of it ~ 2 audio frames for each video frame.

audio : 0:00:00.021333 * 2 = 0:00:00.042667
video : 0:00:00.040000

I kinda remembering, that the audios frame arent always the same... and that they are more floating not like a imageframe in video which is static.
But still the audio has a overhead. The difference of the last Frames are not to high:

audio:
pkt_pts_time=0:20:23.816667
pkt_dts_time=0:20:23.816667

video:
pkt_pts_time=0:20:23.760000
pkt_dts_time=0:20:23.760000


ffprobeoutput:

....
....
[FRAME]
media_type=video
key_frame=0
pkt_pts=10240
pkt_pts_time=0:00:00.800000
pkt_dts=10240
pkt_dts_time=0:00:00.800000
pkt_duration=512
pkt_duration_time=0:00:00.040000
pkt_pos=100085
pkt_size=3265
width=1024
height=576
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=20
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
reference=0
[/FRAME]
[FRAME]
media_type=audio
key_frame=1
pkt_pts=43008
pkt_pts_time=0:00:00.896000
pkt_dts=43008
pkt_dts_time=0:00:00.896000
pkt_duration=944
pkt_duration_time=0:00:00.019667
pkt_pos=111691
pkt_size=251
sample_fmt=fltp
nb_samples=1024
channels=2
channel_layout=stereo
[/FRAME]
[FRAME]
media_type=audio
key_frame=1
pkt_pts=43952
pkt_pts_time=0:00:00.915667
pkt_dts=43952
pkt_dts_time=0:00:00.915667
pkt_duration=1024
pkt_duration_time=0:00:00.021333
pkt_pos=111942
pkt_size=252
sample_fmt=fltp
nb_samples=1024
channels=2
channel_layout=stereo
[/FRAME]
[FRAME]
media_type=video
key_frame=0
pkt_pts=10752
pkt_pts_time=0:00:00.840000
pkt_dts=10752
pkt_dts_time=0:00:00.840000
pkt_duration=512
pkt_duration_time=0:00:00.040000
pkt_pos=103845
pkt_size=3528
width=1024
height=576
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=21
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
reference=0
[/FRAME]
....
....
....
[FRAME]
media_type=audio
key_frame=1
pkt_pts=58743200
pkt_pts_time=0:20:23.816667
pkt_dts=58743200
pkt_dts_time=0:20:23.816667
pkt_duration=2016
pkt_duration_time=0:00:00.042000
pkt_pos=169394958
pkt_size=266
sample_fmt=fltp
nb_samples=1024
channels=2
channel_layout=stereo
[/FRAME]
[FRAME]
media_type=video
key_frame=0
pkt_pts=15664128
pkt_pts_time=0:20:23.760000
pkt_dts=15664128
pkt_dts_time=0:20:23.760000
pkt_duration=512
pkt_duration_time=0:00:00.040000
pkt_pos=169392212
pkt_size=2188
width=1024
height=576
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=30594
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
reference=0
[/FRAME]






More information about the Libav-user mailing list