[FFmpeg-trac] #8578(avformat:new): FFmpeg 4.2 breaks Matroska streaming

Mon Mar 23 00:00:08 EET 2020

#8578: FFmpeg 4.2 breaks Matroska streaming
-------------------------------------+-------------------------------------
             Reporter:  Sesse        |                    Owner:
                 Type:  defect       |                   Status:  new
             Priority:  normal       |                Component:  avformat
              Version:  git-master   |               Resolution:
             Keywords:  mkv          |               Blocked By:
  regression                         |
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------

Comment (by mkver):

 1. The "variable" returned by avformat_write_header() that you store and
 send to the client is the header written by avformat_write_header() and
 not the return value of avformat_write_header(), isn't it? (Yes, a stupid
 question, but I just want to be sure.)
 2. The AVIOContext's is flagged as non-seekable (i.e. pb->seekable &
 AVIO_SEEKABLE_NORMAL is wrong), I presume?
 3. Before the patchset that the cited commit belongs to was applied there
 were two interwoven codepaths for writing clusters (and other level 1
 elements). Both assembled the clusters in memory via a dynamic buffer (an
 AVIOContext that you can open with avio_open_dyn_buf()) (let's call it
 dyn) before outputting via the real AVIOContext (called pb in the
 following).
 a) The codepath for seekable output: Write the Cluster element ID via pb;
 also write Matroska's "unknown-length" length via pb (using the maximal
 length of 8 bytes for the length field). If a CRC-32 element is going to
 be written, reserve six bytes for it by writing an EBML-Void element in
 dyn. Then write content of the cluster as usual into dyn. When the cluster
 is to be finished, output the CRC-32 (if it is to be written) via pb and
 then write the whole content of dyn (containing the Cluster's content)
 with the exception of the bytes reserved for the CRC-32 (if any) via pb.
 Then seek back (in pb) to length field to update it (i.e. overwrite it
 with the real length that is now known) and seek back to the end of the
 Cluster again to write the next Cluster (or the Cues if this was the last
 one and Cues should be written at the end).
 b) The codepath for unseekable output: Write the Cluster element ID into
 dyn; also write an "unknown-length" length field into dyn. Then write the
 Cluster into dyn (without writing CRC-32 elements, because that was just
 not supported by the implementation). When closing the Cluster, the length
 field was updated (requiring seeks but that is possible because we are
 seeking with dyn, not pb) and the whole Cluster has been output (i.e. sent
 to pb) in one avio_write().
 4. Given that both codepaths relied on dynamic buffers they could be
 merged into one: Write Cluster element ID to pb; open dyn and reserve
 space for the CRC-32 in dyn (if it is to be written). Write Cluster
 content into dyn. When the Cluster is to be finished, write the correct
 length field for the Cluster (said size is now known!).* Then write the
 CRC-32 to pb (if ...) and then write the actual Cluster content (excluding
 the stuff at the beginning reserved for the CRC-32). This is what the
 patchset to which the cited patch belongs does. This patch is the very one
 that stopped the Cluster to be output in one go for nonseekable output; so
 your fuzzy result makes sense.
 5. As it happens, I already have a patch that should fix this. It modifies
 the process as follows: Open dyn and reserve space for CRC-32 (if ...).
 Write Cluster content into dyn. When the Cluster is to be finished, write
 the Cluster element ID, the length field, the CRC-32 (if ...) and the
 Cluster content (excluding the part reserved for the CRC-32 element (if
 ...)). You can find it
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-16-andreas.rheinhardt@gmail.com/
 here]; yet it is part of a longer patchset and you will also have to apply
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200313220349.12974-1-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-10-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-11-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-12-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-13-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-14-andreas.rheinhardt@gmail.com/
 this patch],
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200313231536.16949-1-andreas.rheinhardt@gmail.com/
 this patch] and finally
 [https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200101005837.11356-16-andreas.rheinhardt@gmail.com/
 this patch]. Can you test whether this indeed fixes your problem?

 *: We save a few bytes here: Matroska (or actually EBML, the thing
 Matroska is built on) uses variable-length length-fields where smaller
 lengths take less bytes to encode and given that the length to write is
 known at this point the smallest possible length field is chosen.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/8578#comment:4>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker