[Libav-user] Some encoding A/V synchronization questions

Sat Nov 16 11:11:55 CET 2013

As a side note, '-tune animation -crf 18' gives excellent video
quality for retro video games. If I add YUV444P it's hard to tell the
difference from the emulator itself. :)

/Ulf

On Sat, Nov 16, 2013 at 11:02 AM, Ulf Magnusson <ulfalizer at gmail.com> wrote:
> On Thu, Nov 14, 2013 at 5:54 AM, Andy Shaules <bowljoman at gmail.com> wrote:
>> On 11/13/2013 7:40 PM, Ulf Magnusson wrote:
>>>
>>> Hi,
>>>
>>> I'm adding a movie recording feature to an emulator using libav. I've
>>> got video (x264), audio (Android VisualON AAC), and muxing (mp4)
>>> working, but since I make no special efforts to keep audio and video
>>> synchronized, they desynchronize after a few minutes. My questions are
>>> as follows:
>>>
>>> 1. I believe I will need to derive PTS values for the audio and video
>>> streams, diff them, and compensate by e.g. changing the audio
>>> resampling rate when they drift apart. For audio, the PTS values from
>>> the AVStream seem reliable, but not for video (they seem to lag
>>> behind, perhaps due to buffering (?)). Is it safe to simply use the
>>> PTS values I write into the AVFrame for video frames? Is there some
>>> other field I could use?
>>>
>>> 2. Why does ffmpeg.c assign DTS values to PTS values in some locations,
>>> e.g.
>>> ist->next_pts = ist->pts = av_rescale_q(pkt->dts, ist->st->time_base,
>>> AV_TIME_BASE_Q); ?
>>>
>>> 3. Is there some simpler solution specific to x264/AAC/mp4?
>>>
>>> Thanks,
>>> Ulf
>>> _______________________________________________
>>> Libav-user mailing list
>>> Libav-user at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/libav-user
>>
>>
>> Hello,
>> Mostly there is no guarantee as to how hardware will provide samples up
>> front. the variables are:
>>
>> A: One will arrive first with samples available.
>> B: One stream will have timestamps earlier than the other, and it may be
>> either one of (A).
>>
>> I buffer both channels until I have overlap.
>>
>> The general rule is , 'write audio first'.
>>
>> The routine I run is this:
>>
>> After compressing the raw samples, I have two FIFO queues, one for aud and
>> one for vid. This is what I typically run for static video frame rates.
>>
>> ..
>>
>> 1) is there audio packet? yes, goto 2.
>>
>> 2) is there video? yes, goto 3.
>>
>> 3) Is this the first time you have both channels? yes, goto 4, no goto 5.
>>
>> 4) Consider dropping samples. Do you have lots of audio without video or
>> lots of video before audio?
>>
>> 5)Write all audio packets up to the earliest video frame time, and then
>> write that one video frame to the container.
>>
>> 6) Repeat. goto1.
>>
>>
>> Short answer...
>>
>> In general, you need to buffer and mux those timestamps in ascending order
>> from either media type, writing them to your container as soon as they line
>> up within the range of your camera interval. Dumping early media until they
>> do line up. Dont mess with the timestamps.
>>
>> have fun!
>>
>>_________________________________________
>> Libav-user mailing list
>> Libav-user at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/libav-user
>
> After some more digging through ffmpeg.c I ended up modifying the
> resampling rate to keep audio and video in synch (similar to what the
> -async option does). The code is at
> https://github.com/ulfalizer/nesalizer/blob/master/movie.cpp in case
> anyone is interested. It's written to work out of the box on Ubuntu
> 13.10, and so uses a few deprecated APIs.
>
> /Ulf