[FFmpeg-devel] [PATCH] lavf/mov: Add support for edit list parsing.

Fri Oct 21 20:47:59 EEST 2016

On Fri, Oct 21, 2016 at 8:32 AM, Derek Buitenhuis <
derek.buitenhuis at gmail.com> wrote:

> > In YouTube we have long been receiving MOV files from users, which have
> non-trivial edit lists (Those edit lists which are not just used to offset
> video start from audio start) and multiple edit lists. Recently the uploads
> of such files has increased with the introduction of apps that allow video
> editing and upload like Vine etc. mov.c in ffmpeg does not have edit list
> parsing support i.e. the support for deciding what video/audio packets
> should be output with what timestamps, based on edit lists. For this
> reason, we had built a basic support for edit list parsing in our version
> of ffmpeg, which fixes the AVIndexEntries built by mov_build_index by
> looking at edit lists, and introduces new DISCARD flags in AVPacket and
> AVFrame to signal to the decoder to discard certain packets.
> >
> > For a while our edit list support was broken, as in it didn't properly
> work for multiple edit lists and it had problems with edit-lists ending on
> B-frames. But we've fixed most of these issues in recent times, and we
> think that now it is in a good enough condition so that it can be submitted
> to HEAD. This will not only help the vast userbase of ffmepg, but will also
> help us with staying up-to-date with ffmpeg and also by adding the power of
> ffmpeg developer community to our MOV support. So here's a go at it.
> > What is supported:
> >  - multiple edit lists
> >  - edit lists ending on B-frames
> >  - zero segment duration edit lists
> >
> > What is not supported:
> >  - Changing the rate of playing with edit lists. We basically ignore
> MediaRate field of edit.
>
> Sorry for being late to the party, but this recently broke some code I
> work on.
> I actually re-subscribed to the mailing list just so I could reply to this.
>
> I see two problems with this implementation, one of which breaks things,
> and one of which is a code-breaking bug, and one of which is a design
> problem.
>
> The design one fist: This approach is fundamentally wrong. Edit lists are
> meant to be applied at presentation level, not packet level. The current
> implementation will cause problems in a number of cases:
>
>     * Audio packets. Especially audio packets with a large number of
> samples.
>       It's extremely likely that edits will not fall on packet boundaries,
> and
>       depending on the number of samples per packet, audio sync issues can
> and
>       will occur, even with smaller packets of e.g. 1024 samples, if there
> are
>       a large number of entries in the edit list. Gradual loss of sync will
>       occur.
>
We handle the case of the edit list start boundary not being exactly
aligned with a packet boundary, by setting skip_samples fileld properly.
But we don't handle the case yet, where edit list boundary might be not
aligned with the packet boundary, in which case yes, one might get a
gradual loss of sync. (Which is still better than the huge loss of sync one
might get if edit lists are not parsed at all! )

>     * Edit list entries that are out of order, or repeat. This one is
> obvious;
>       simply dropping packets is not sufficient to create a virtual
> timeline
>       like edit lists can be used for by various NLEs. I need to look up
>       if these are actually allowed in the ISOBMFF of QT specs, but I know
>       Matroska allows it in its virtual timeline feature.
>
The currrent edit list code supports edits specifying repeated segments,
overlapping segments etc.  We are not just dropping packets, but we are
rebuilding the whole index. If there are two edits repeating a segmeng (a
section of AVPackets lets call it ) , then those section of AVPackets get
added twice in the AVIndex, and the decoder would decode the section twice
(though I haven't tested such video )  .

In fact this is the case where doing it after decode might not work,
because something needs to go back and feed those packets again to the
decoder.

>     * Returning timestamps that differ from the codec timestamps. I very
>       much disagree with this approach. It's the responsibility of the
>       presentation/render/transcode/app/whatever layer to adjust
> timestamps
>       based on edits. I absolutely disagree with libavformat changing
>       timestamps from what is coded in the actual file. avformat provides
>       demuxers.
>
We already used to apply a global offset to the timestamps of the video,
set skip_samples field by parsing the first edit list entry etc. before the
edit list patch
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=7b3bc365f9923e30a925f8dece4fddd127a54c5d#l2792
 . Not defending the approach, but just pointing  that what I did is just
extending that approach to increased functionality.

> Now onto the actual code breaking bug: You are losing packets entirely.
>
> For example, here is part of a diff of pre-edit list ffprobe and post-edit
> list ffprobe:
>
>     --- old.json    2016-10-21 15:35:00.449619260 +0100
>     +++ new.json    2016-10-21 15:35:09.392976730 +0100
>     @@ -3,28 +3,15 @@
>              {
>                  "codec_type": "audio",
>                  "stream_index": 1,
>     -            "pts": -2048,
>     -            "pts_time": "-0.046440",
>     -            "dts": -2048,
>     -            "dts_time": "-0.046440",
>     -            "duration": 1024,
>     -            "duration_time": "0.023220",
>     -            "size": "480",
>     -            "pos": "958620",
>     -            "flags": "K"
>     -        },
>     -        {
>     -            "codec_type": "audio",
>     -            "stream_index": 1,
>                  "pts": -1024,
>                  "pts_time": "-0.023220",
>                  "dts": -1024,
>                  "dts_time": "-0.023220",
>                  "duration": 1024,
>                  "duration_time": "0.023220",
>     -            "size": "457",
>     +            "size": "480",
>                  "pos": "959077",
>     -            "flags": "K"
>     +            "flags": "KD"
>              },
>
> As you can see, the packet, sized 457, is entirely missing. This was
> generated via
> 'ffprobe -show_packets -of json -select_streams 1 sample.mp4'. I can
> provide the
> sample privately to you, if you wish. I did also confirm that the packet
> as not
> added elswwhere:
>
>     $ jq '.packets | length' new.json
>     57050
>     $ jq '.packets | length' old.json
>     57051
>
> The hashes I used were today's git HEAD (03ec6b780cfae85b8bf0f32b2eda20
> 1063ad061b)
> for new.json and 590f025b3dc078a6be58b36c67d87499f62b521c for old.json.
>

Can you send that file to me please. Thanks.

>
> P.S. I do under stand the desire to not refactor YouTube's internal
> codebase to
> properly do this at the presentation level, and that YouTube has had this
> patchset
> in it's internal fork of FFmpeg for many years, and also that there is a
> push to
> update FFmpeg Google-wide / use more vanilla stuff, but I don't think
> that's a good
> enough reason to have pushed something very specific like this to FFmpeg.
> For example,
> if it wad done at presentation level, that code could also have been the
> foundation
> of virtual timeline support for Matroska. I realize I'm too late to the
> part, and
> it's already in, though, and I'm not going to advocate for removal. It's
> just my 2
> cents.
>
> P.P.S. I do apologize if this comes off as hostile, I don't mean it as
> such. I
> appreciate that YouTube has moved towards more community involvement, a
> lot.
>
> Cheers,
> - Derek
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>