[FFmpeg-trac] #4536(ffmpeg:new): mkv audio reencoding leads to nonuniform video timecodes

Wed May 2 14:03:02 EEST 2018

#4536: mkv audio reencoding leads to nonuniform video timecodes
------------------------------------+----------------------------------
             Reporter:  sneaker     |                    Owner:
                 Type:  defect      |                   Status:  new
             Priority:  normal      |                Component:  ffmpeg
              Version:  git-master  |               Resolution:
             Keywords:              |               Blocked By:
             Blocking:              |  Reproduced by developer:  0
Analyzed by developer:  0           |
------------------------------------+----------------------------------

Comment (by mkver):

 1. I think I know why this is happening. The code for offsetting the
 timestamps is in the write_packet function in libavformat/mux.c. The video
 in the sample uses two reorder frames and therefore the first two video
 packets have a dts that is smaller than the dts of the first audio packet;
 there are therefore interleaved first and arrive first at write_packet
 where no offset is set (because the pts of both video packets isn't
 negative, so no shifting is required). But when the first audio packet
 (with negative pts due to encoder delay) arrives, there needs to be a
 shift which affects all following packets in coding order (which also
 explains why MKVToolNix's timecode/timestamp files (which ignore coding
 order) are unsuited for this). Here is mkvinfo's output for
 ffmpeg_opus.mkv confirming what I just said:
 {{{
 Track 1: video, codec ID: V_MPEG4/ISO/AVC (h.264 profile: High @L3.1),
 mkvmerge/mkvextract track ID: 0, language: und, default duration: 41.708ms
 (23.976 frames/fields per second for a video track), pixel width: 1280,
 pixel height: 720, display width: 1280, display height: 720
 Track 2: audio, codec ID: A_OPUS, mkvmerge/mkvextract track ID: 1,
 language: und, channels: 1, sampling freq: 48000, bits per sample: 16
 I frame, track 1, timestamp 00:00:00.000000000, size 943, adler 0xd5581006
 P frame, track 1, timestamp 00:00:00.167000000, size 40, adler 0x98d0052f
 I frame, track 2, timestamp 00:00:00.000000000, size 3, adler 0x05e702f6
 P frame, track 1, timestamp 00:00:00.090000000, size 37, adler 0x63b003e4
 I frame, track 2, timestamp 00:00:00.021000000, size 3, adler 0x05e702f6
 I frame, track 2, timestamp 00:00:00.041000000, size 3, adler 0x05e702f6
 P frame, track 1, timestamp 00:00:00.049000000, size 37, adler 0x50a503b5
 I frame, track 2, timestamp 00:00:00.061000000, size 3, adler 0x05e702f6
 I frame, track 2, timestamp 00:00:00.081000000, size 3, adler 0x05e702f6
 P frame, track 1, timestamp 00:00:00.132000000, size 37, adler 0x4fb803ae
 ...
 }}}
 If one encodes with the libfdk_aac encoder (which has 2048 samples encoder
 delay which is longer than one frame at 24/1.001 fps), all video frames
 except the very first one are offset.
 2. This happens generally with encoder delay, it is not opus-specific.
 Although Opus should be treated specially in this regard (the CodecDelay
 header field already indicates the delay, using this header field and
 baking the delay into the timestamps is wrong, but that is another issue).
 3. Judging from this, I think that the decisions regarding delay should be
 made before any packet is written so that packets from all tracks (or all
 tracks for which packets are available at the beginning) can be
 considered.
 4. For a container like Matroska for which the offset decision is based
 upon pts (by default) there is also another issue that could happen and
 could be fixed by not making the decision about the offset in
 write_packet: Just because the first packet of a track has a lower dts
 than the first packet of another track does not mean that the first track
 needs a bigger offset. That's because the difference of dts and pts can be
 different for the tracks. An example: Imagine a video track with (say)
 24/1.001 fps and two reorder frames whose first packet has a pts of -1 ms
 (easily createable with -itsoffset). Then the first packet has a pts of -1
 ms and a dts of about -84 ms. If one has e.g. an audio track whose first
 packet has a pts=dts in between -1 ms and -84 ms (given the encoder delay,
 this can easily happen), then the audio packet will still have a negative
 pts after shifting. For Matroska this means that the file is against the
 [https://matroska.org/technical/specs/notes.html specifications].

--
Ticket URL: <https://trac.ffmpeg.org/ticket/4536#comment:4>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker