[FFmpeg-devel] [PATCH 4/4] lavf/mov: Add support for edit list parsing.

Fri Sep 2 23:00:59 EEST 2016

On Aug 31, 2016 5:23 AM, "Michael Niedermayer" <michael at niedermayer.cc>
wrote:
>
> On Tue, Aug 30, 2016 at 06:37:22PM -0700, Sasi Inguva wrote:
> > On Sun, Aug 28, 2016 at 3:10 AM, Michael Niedermayer
<michael at niedermayer.cc
> > > wrote:
> >
> > > On Sat, Aug 27, 2016 at 03:30:24PM -0700, Sasi Inguva wrote:
> > > > On Fri, Aug 26, 2016 at 5:55 PM, Michael Niedermayer
> > > <michael at niedermayer.cc
> > > > > wrote:
> > > >
> > > > > On Fri, Aug 26, 2016 at 12:49:19PM -0700, Sasi Inguva wrote:
> > > > > > I think there is some bug in mp3 decoder which is making skip
> > > > > > samples -1431655766 for ~/tickets/5528/fire.mp3 . For now I have
> > > removed
> > > > > > the assert from the 3rd commit.
> > > > > > For the file one.mov , I think the audio has edit list with
start
> > > time
> > > > > > correspending to the second sample - (which should be media time
> > > 1024 if
> > > > > I
> > > > > > guess correctly). This indicates that the first sample be used
for
> > > > > encoder
> > > > > > delay.
> > > > > >  Previously without edit  list parsing , we used to offset the
> > > start_dts
> > > > > by
> > > > > > -1024 to make the second sample timestamp start from zero.
> > > > > >              sc->time_offset = start_time - empty_duration;
> > > > > > -            current_dts = -sc->time_offset;
> > > > > >              if (sc->ctts_count>0 && sc->stts_count>0 &&
> > > > > >
> > > > > > But now edit list parsing handles the rebasing of timestamps to
zero,
> > > > > > because it will  assign increasing timestamps  starting from
zero, to
> > > > > > samples present in the edit list.
> > > > >
> > > > > > Because the first sample is not in the
> > > > > > edit list, we mark it as DISCARD, which flag av_decode_audio4
will
> > > look
> > > > > at
> > > > > > and decode-and-discard that frame. So it wouldn't matter what
the
> > > first
> > > > > > sample timestamp should be assigned because it is anyway
discarded
> > > after
> > > > > > decode.
> > > > >
> > > > > current applications using libavformat have no knowledge of the
> > > > > discard flag you can add the DISCARD flag, you can set it on
packets
> > > but
> > > > > applications written or built for libavformat 57 dont have to know
> > > > > this flag and can treat the packets like any other packet.
> > > > >
> > > >
> > > > Yes. they can treat the packet like any other packet. They don't
have to
> > > > know about the discard flag.
> > > >
> > > > Adding this feature without a major version bump requires things to
> > > > > still work reasonable with the old semantics that apps are build
> > > > > around. That should be possible unless iam missing something
> > > >
> > > >
> > > > As I have pointed out earlier this code will change the timestamps
of the
> > > > packets. In the case of multiple edit lists, the code will also
repeat
> > > some
> > > > packets, and might make the timestamps non-monotonous. I don't know
if
> > > the
> > > > last behavior is not  an acceptable expectation from av_read_frame.
> > >
> > > if timestamps repeat then it will not be possible to seek in the file
> > > by timestamp (to all positions) and i suspect also not by byte
position.
> > > How would one seek then ?
> > > or do i misunderstand ?
> > >
> > >
> > In case of MOV  container, the seek is done using
av_index_search_timestamp
> > function
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/
mov.c;h=f4999068519f1f06f6b5d84ca007148e74e5a82e;hb=HEAD#l5419
> > .
> >
> > i) In case of single edit list , the timestamps will only be repeated
but
> > not non-monotonous. In that case av_index_search_timestamp will still
work
> > correctly, only that it will seek to any one of the packets with the
same
> > timestamp. However when we decode the file, then all of the discarded
> > packets with similar timestamps should be discarded and only the
> > non-discarded packet will bet output. So in short,
> > ./ffprobe -read_intervals 0.0  -show_frames -print_format compact
> > one-editlist-audio.mov
> > <https://drive.google.com/file/d/0Bz6XfEJZ-9N3ejNkMW9yU0k4ZF
E/view?usp=sharing>
> > should
> > give exactly the same behavior as before while -show_packets will show
more
> > discarded packets at the start.  I had to change the patch (4) a bit to
> > make the audio-seek on MOV to work  correctly, so please reapply the
> > attached patch to test.
> >
> > ii) In case of multiple edit lists , timestamps might be non
monotonous  so
> > existing av_index_search_timestamp implementation won't be correct,
since
> > it assumes sorted timestamps. However making it work for discarded
packets
> > is not that hard. I have attached a 5th patch that changes
av_index_search
> > function. This fixes the issue in (i) also
> >
> >
> > > > However as I've pointed out, we are already showing the wrong
packets for
> > > > videos with multiple edit lists by not parsing the edit lists
currently,
> > > > which will introduce AVSync issues. So this patch wont make things
any
> > > > worse for those cases in my view.
> > >
> > > Is it difficult to adjust the timestamps of discarded packets to avoid
> > > timestamp discontinuities ? (for the cases where this is possible of
> > > course only)
> > > the timestamps for discarded packets i would assume are meaningless in
> > > the new semenatics but they matter for the old semantics
> > > again, please correct me if iam wrong
> > >
> > > The way fix_index is implemented it is difficult to change the
timestamps
> > to avoid discotinuities and still have the timeline the same as MOV edit
> > lists would intend.
>
> My first question is, entirely independant of the implementation from
> the patches. What is the correct output ? (my primary focus is on
> the timestamps)
>
>
> Also if there are discontinuities, AVFMT_TS_DISCONT is normally set
> (and this never happens for files with indexes but only for files
>  that dont have indexes like mpeg*)
> players will generally seek by file position if AVFMT_TS_DISCONT is
> set (because timestamps are ambigous in that case)
>
> iam not sure how to seek reliably by file position in a mov
> with edit lists and stll have the file positions actually be file
> positions of the frames, so this direction gets tricky too ...
>
I'll try do describe the behavior of the timestamps and seek after all
these patches are applied.
Reading:
i) Decoding (using av_codec_video* API) - Non-repeated monotonically
increasing timestamps.
ii) Just demuxing - Repeated timestamps in case of Single edit list .
Non-monotonic timestamps in case of multiple edit lists.

Seeking:
With the 5th patch seeking by "timestamp" using mov_read_seek function
works correctly. So if we are using ffmpeg API to seek it should work.
i) If AV_SEEK_FLAG_ANY  is set -
   before: seeks to incorrect packet because edit lists were not parsed
   now: seeks to the correct packet accounting for the edit lists.

ii) if AV_SEEK_FLAG_ANY is not set i.e. seek to KEYFRAME
    before:  finds the incorrect packet by timestamp , and then seeks to
the closest keyframe to that packet. If decoding/demuxing after seek,
decoder/demuxer outputs frames starting from that keyframe.
   now : finds the correct non-discarded packet by timestamp, and then
seeks to the closest keyframe to that packet, irrespective of, if the
closest keyframe is discarded or not-discarded .
             If decoding after seek and if that closest keyframe is marked
discard, then decoder will output frames starting from the next
non-discarded frame after the key frame.
             If demuxing after seek, we still output packets starting from
the discarded keyframe.

Now the only worry I think is, if some application is doing its own
analysis, based on the timestamps of the packets it "demuxes"  , without
accounting for the DISCARD flag. For example an application can compute
duration of the video as
max(timestamp) - min(timestamp) over all packets. If it doesn't account for
the DISCARD flag, it will be computing it wrong. However without edit list
parsing, such duration is already computed wrong because all the packets
are output, and not trimmed according to edit lists.

Another example can be an application implements its own transmuxing by
doing av_read_frame and passing the pts, dts without santizing, directly to
av_write_frame. In this case, if there are multiple edit lists in the input
MOV file, this application will fail because we have non-monotonic
timestamp being given to av_write_frame. (However previously it succeeded
but produced wrong video anyway) . I guess videos with multiple edit lists
are relatively few, and we already have other possible cases where ffmpeg
returns dts=N/A for some frames and the av_write_frame will fail on that
too. So this will add another one of those corner cases where this kind of
application will fail.

>
>
> > The timestamps for discarded packets are meaningless to av_decode_*
> > functions because they parse the DISCARD flag and ignore the packets. I
am
> > not sure, what you mean by semantics though, because I don't think I am
> > changing any user expectations defined by the mov_read_frame
mov_seek_frame
> > functions .
>
> > If by semantics, you mean that user expects to see
> > monotonically increasing timestamps for the "demuxed " packets then yes
> > that expectation changes to " monotonically increasing timestamps for
the
> > "demuxed and non-discarded" packets" and user has to parse the discard
> > flag.
>
> Adding a flag that "must be parsed" would be a incompatible API change.
> It would require a major version bump
>
> also this patchset changes streamcopy
> try:
> ./ffmpeg-ref -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test-ref.mov
> ./ffmpeg     -i matrixbench_mpeg2.mpg  -acodec mp3 -vn -t 1  test.mov
> ./ffmpeg-ref -i test-ref.mov -acodec copy test2-ref.mov
> ./ffmpeg     -i test.mov -acodec copy test2.mov
>
> old code:
> ./ffmpeg-ref -i test-ref.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> 0,       2304,       2304,     1152,     4608, 0x8994a906
> and
> ./ffmpeg-ref -i test2-ref.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,       1152,       1152,     1152,     4608, 0xa573dfd4
> 0,       2304,       2304,     1152,     4608, 0x8994a906
>
> new code:
> ./ffmpeg -i test.mov -aframes 3 -f framecrc
> 0,          0,          0,     1152,     4608, 0xa573dfd4
> 0,       1152,       1152,     1152,     4608, 0x8994a906
> 0,       2304,       2304,     1152,     4608, 0x824d1a30
> and
> ./ffmpeg -i test2.mov -aframes 3 -f framecrc -
> 0,          0,          0,     1152,     4608, 0xa2a00df2
> 0,          1,          1,     1152,     4608, 0xa573dfd4
> 0,       1152,       1152,     1152,     4608, 0x8994a906
>

First phase of transmuxing :

test.mov and test-ref.mov both have an edit list with media time 1152 which
indicates that the first audio frame be not output (and perhaps it is a
silent packet used as encoder delay) . While ./ffmpeg correctly doesn't
output first frame ./ffmpeg-ref does.

Second phase of transmuxing:

test2.mov has an stts as such
EntryCount 4
Entry[0].SampleCount 1
Entry[0].SampleDelta 1
Entry[1].SampleCount 1
Entry[1].SampleDelta 1151
Entry[2].SampleCount 40
Entry[2].SampleDelta 1152
Entry[3].SampleCount 1
Entry[3].SampleDelta 768

 which makes the ffmpeg demuxer output the first two packet timestamps as 0
and 1 . Something must be going wrong while remuxing to MOV from test.mov
to test2.mov, but as long as demuxing and timestamps are concerned, they
seem correct to me.

>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Does the universe only have a finite lifespan? No, its going to go on
> forever, its just that you wont like living in it. -- Hiranya Peiri
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>