[FFmpeg-devel] [PATCH] mov.c: read fragment start dts from fragmented mp4

Fri Oct 10 22:08:29 CEST 2014

Firstly, thank you for the detailed explanation.

Secondly, how should we proceed?

I am not confident I'm able to implement that correctly, especially
with no test coverage.

My current implementation improves discontinuous fragmented mp4s
significantly (from unusable to close-to-perfect) while slightly
worsening the timestamps for non-discontinuous fragmented mp4s. I
definitely need it for our streams, and I think it would help other
people in the same situation. I am quite willing to spend time on
this, but I fear that I just don't have enough known inputs and
outputs to verify my implementation.

Normally fragments are supposed to start on key frames, which should
have pts close to dts, but there are no guarantees.

Some alternatives:

1. I can leave my implementation behind a flag. That's not very
friendly to others, but breaks no existing usage.

2. We can merge my code as-is, and hope somebody more knowledgeable
can fix it up later.

3. I can try to implement the algorithm described.

4. Somebody helps me with either implementation or by providing test cases.

Opinions?

    Mika

On 10 October 2014 20:11, Yusuke Nakamura <muken.the.vfrmaniac at gmail.com> wrote:
> 2014-10-10 13:38 GMT+09:00 Mika Raento <mikie at iki.fi>:
>
>> On 9 October 2014 23:37, Yusuke Nakamura <muken.the.vfrmaniac at gmail.com>
>> wrote:
>> > 2014-10-10 4:49 GMT+09:00 Michael Niedermayer <michaelni at gmx.at>:
>> >
>> >> On Thu, Oct 09, 2014 at 09:44:43PM +0200, Michael Niedermayer wrote:
>> >> > On Thu, Oct 09, 2014 at 06:57:59PM +0300, Mika Raento wrote:
>> >> > > If present, an MFRA box and its TFRAs are read for fragment start
>> >> times.
>> >> > >
>> >> > > Without this change, timestamps for discontinuous fragmented mp4 are
>> >> > > wrong, and cause audio/video desync and are not usable for
>> generating
>> >> > > HLS.
>> >> > > ---
>> >> > >  libavformat/isom.h |  15 ++++++
>> >> > >  libavformat/mov.c  | 140
>> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> > >  2 files changed, 155 insertions(+)
>> >> >
>> >> > this seems to break some files
>> >> >
>> >> > for example a file generated with the following 2 commands:
>> >> > ffmpeg -i matrixbench_mpeg2.mpg -t 10 in.mp4
>> >> > l-smash/cli/remuxer -i in.mp4 --fragment 1 -o test.mp4
>> >> >
>> >> > ive not investigated why this doesnt work
>> >>
>> >> maybe above was unclear, so to clarify before someone is confused
>> >> test.mp4 from above plays with ffplay before te patch but not really
>> >> aferwards. The 2 commads are just to create such file
>> >>
>> >> [...]
>> >>
>> >> --
>> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>> >>
>> >> Good people do not need laws to tell them to act responsibly, while bad
>> >> people will find a way around the laws. -- Plato
>> >>
>> >> _______________________________________________
>> >> ffmpeg-devel mailing list
>> >> ffmpeg-devel at ffmpeg.org
>> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> >>
>> >>
>> > The 'time' field in the tfra box is defined in presentation timeline, not
>> > composition or decode timeline.
>> > Therefore, generally, the value of 'time' can't be used for DTS directly
>> as
>> > long as following 14496-12.
>> > Maybe some derivatives of ISO Base Media file format define differently,
>> > but the spec of ISO Base Media file format defines 'time' as the
>> > presentation time of the sync sample.
>> > Presentation times are composition times after the application of any
>> edit
>> > list for the track.
>> >
>> > I have also some samples which use the 'time' as DTS of sync sample.
>> > Historically, the term 'presentation time' was not defined clearly before
>> > 14496-12:2012, this fact possibly may have brought about such
>> inconsistency.
>>
>> Hm. So my changes aren't correct if there is an edit list? Because
>> AFAICT without edit lists mov.c sets pkt->pts = pkt->dts.
>>
>
> Wrong. PTS == DTS is nothing to do with edit list. Generally, CTS != DTS
> occurs only when frame reordering exists.
> Even if there is no edit list for a track, there is implicit edit for that
> track, and in this case PTS == (CTS + alpha)*mvhd.timescale/mdhd.timescale,
> where the constant alpha depends on the implementation.
>
>
>>
>> Would you mind explaining how edit lists and fragment times are
>> supposed to work together?
>>
>>
> The tfra box is designed as the player seeks and finds sync sample on
> presentation timeline i.e. by PTS in units of mdhd.timescale.
> PTS comes from CTS via edit list, and CTS comes from DTS. So, basically you
> can't get DTS directly from the 'time' field in the tfra box.
>
> Let's say mvhd.timescale=600, mdhd.timescale=24000 and the edit list
> contains two edits (edit[0] and edit[1]),
>   edit[0] = {segment_duration=600, media_time=-1, media_rate=1}; // empty
> edit
>   edit[1] = {segment_duration=1200, media_time=2002, media_rate=1};
> and the track fragment run in a track fragment which you get from an entry
> in the tfra box, where 'time' in that entry is equal to 48000 is as follows.
>   trun.sample[0].sample_is_non_sync_sample = 1
>   trun.sample[0].sample_duration=1001
>   trun.sample[0].sample_composition_time_offset=1001
>   trun.sample[1].sample_is_non_sync_sample = 0
>   trun.sample[1].sample_duration=1001
>   trun.sample[1].sample_composition_time_offset=1001
> Then, time/mdhd.timescale*mvhd.timescale=1200, that is, the PTS of the sync
> sample is equal to 1200 in mvhd.timescale.
> And, the first edit is an empty edit, so the presentation of actual media
> starts with 1200 - 600 = 600 in mvhd.timescale.
> The CTS of the second sample in the trun.sample[1] is equal to 1001 + X,
> where the X is the sum of the duration of all preceding samples.
> The presentation starts with CTS=2002 because of media_time of the second
> edit, so the PTS of the sync sample corresponds to X - 1001.
> From this, X is equal to 49001, and the DTS of trun.sample[0] is equal to X
> - trun.sample[0].sample_duration = 49001 - 1001 = 48000.
>
> |<--edit[0]-->|<---------edit[1]--------->|
> |-------------|-------------|-------------|---->presentation timeline
> 0             D             T'
>               |-------------|------------------>composition timeline
>           media_time        T
>         |-----|-------|-----|------------------>decode timeline
>         0 media_time  X     T
>                       |<--->|
>                      ct_offset
>
> D = edit[0].segment_duration = 600
> T' = time/mdhd.timescale*mvhd.timescale = 1200
> media_time = edit[1].media_time = 1001
> ct_offset = trun.sample[1].sample_composition_time_offset = 1001
> T=ct_offset + X
> T-media_time = (T'-D)*mdhd.timescale/mvhd.timescale
>
>
>>     Mika
>>
>> > _______________________________________________
>> > ffmpeg-devel mailing list
>> > ffmpeg-devel at ffmpeg.org
>> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel