[FFmpeg-devel] [PATCH] avformat/dv: fix timestamps of audio packets in case of dropped corrupt audio frames

Sun Nov 15 02:14:55 EET 2020

On Fri, 6 Nov 2020, Michael Niedermayer wrote:

> On Wed, Nov 04, 2020 at 10:44:56PM +0100, Marton Balint wrote:
>>
>>
>> On Wed, 4 Nov 2020, Michael Niedermayer wrote:
>>
>>> we have "millisecond" based formats, rounded timestamps
>>> we have "exact" cases, maybe the timebase being 1 packet/frame per tick
>>> we have "high precission" where the timebase is so precisse it doesnt matter
>>>
>>> This here though is a bit an oddball, the size if 1 PCM frame is 1 sample
>>> The timebase is not a millisecond based one, its not 1 frame either nor is
>>> it exact nor high precission.
>>> Its 1 video frame, and whatever amount of audio there is in the container
>>>
>>> which IIUC can differ from 1 video frame even rounded.
>>> maybe this just doesnt occur and each frame has a count of samples always
>>> rounded to the closes integer count for the video frame.
>>
>> The difference between the audio timestamp and the video timestamp for
>> packets from the same DV frame is at most 0.3929636797*frame_duration as the
>> specification says, as Dave quoted, so I don't see how the error can be
>> bigger than this.
>>
>> It looks to me you are mixing timestamps coming from a demuxer, and
>> timestamps you calculate by counting the number of demuxed/decoded audio
>> samples or video frames. Synchronization is done using the former.
>>
>
>>>
>>> But if for example some hardware was using internally a 16 sample buffer
>>> and only put multiplies of 16 samples in frames this would introduce a
>>> considerable amount of jitter in the timestamps in relation to the actual
>>> duration. And using async to fix this without introducing more problems
>>> might require some care.
>>
>> I still don't see why timestamp or duration jitter is a problem
>
>> as long as
>> the error is below frame_duration/2. You can safely use async with
>> min_hard_comp set to frame_duration/2.
>
> Thats exactly what i meant. an async like filter which behaves differently
> or async with a different value there can mess this up.
> IMHO such mess up is ok when the input is corrupted or invalid. OTOH
> here it is valid and correct data.
>
>
>>
>> In general, don't you find it problematic that the dv demuxer can return
>> different timestamps if you read packets sequentially and if you seek to the
>> end of a file? It looks like a huge bug
>
> yes, this is not great
> but even with your patch you still have this effect
> when seeking to some point in time a player has to output video and
> audio to the user at an exact time and that will differ even with async
> from linear playbacks presentation
>
>
>> which is not fixable if you insist
>> on sample counting...
>
> I think you misunderstood me, or maybe i didnt state my opinion well,
> iam not saying that i consider what dv in git does good. Rather that there
> is a problem beyond what these patches fix.
> Some concept of timestamp accuracy independant of the distance of representable
> values would be usefull.
> if you take teh 1/25 or whatever they are based on dv timestamps and convert that
> to teh mpeg 90khz based ones thats not making it that accurate.
> OTOH if you take 1/25 based audio where each packet is 1/25sec worth of samples
> that very well might be sample accurate or even beyond.
> knowing this accuracy is usefull for configuring a async like filter or also in
> knowing how to deal with inconsistencies, is that timestamp jtter ? or the sample
> rate jittering / some droped samples ?
> Its important to know as in one instance its the timestamps that need adjustment
> while in the other the samples need adjustment
> ATM its down to the user to figure out on a file by file base how to deal or
> ignore this. Instead it should be possible for an automated system to
> compensate such issues ...

OK, but the automated solution is far from trivial, e.g. it should start 
with a analysis of the file to check if the sample rate is accurate or 
not... And if it is not, is the difference constant througout the file? 
Then there are several methods to fix it and the user might have a 
preference. E.g consider audio clock "master" and duplicate/drop video 
frames. Or keep all video frames, but stretch audio (with or without pitch 
correction - and which filter you want for pitch correction? atempo? 
rubberband?). So making it automated is not trivial at all.

Anyhow, is it OK to apply this patch then?

Thanks,
Marton