[FFmpeg-devel] [PATCH 05/21] avformat/matroskadec: Set offset of first cluster

Andreas Rheinhardt andreas.rheinhardt at googlemail.com
Sun Apr 7 11:31:00 EEST 2019

Steve Lhomme:
> On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
>> By default, the data_offset member of the AVFormatInternal of the
>> AVFormatContext associated with the MatroskaDemuxContext has not been
>> initialized explicitly by any Matroska-specific function, so that it
>> was
>> initialized by default to the offset at the end of
>> matroska_read_header,
>> i.e. usually to the offset of the length field of the first encountered
>> cluster. This meant that in case that the Matroska-specific seek-code
>> fails because there are no index entries for the target track a seek to
>> data_offset would be performed and ordinary parsing would start from
>> there which is nonsense: The length field would be treated as EBML
>> ID and
>> (if the length field is not longer than four bytes (EBML numbers that
>> long are rejected as invalid EBML IDs)) and whatever comes next
>> would be
>> treated as its EBML size although it simply isn't.
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at googlemail.com>
>> ---
>>   libavformat/matroskadec.c | 3 +++
>>   1 file changed, 3 insertions(+)
>> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
>> index 49f8ff4082..f9811b54a1 100644
>> --- a/libavformat/matroskadec.c
>> +++ b/libavformat/matroskadec.c
>> @@ -2651,6 +2651,9 @@ static int
>> matroska_read_header(AVFormatContext *s)
>>           pos = avio_tell(matroska->ctx->pb);
>>           res = ebml_parse(matroska, matroska_segment, matroska);
>>       }
>> +    /* Set data_offset as it might be needed later by
>> seek_frame_generic. */
>> +    if (matroska->current_id)
> I'm surprised this doesn't error out if a (level 1) ID is not found here.
>There are two ways how ebml_parse can return 1 in the earlier call: It
can find (and consume) a cluster ID; or it can hit EOF if it is in the
mode for parsing live header files (these don't contain clusters at
all). In the latter case, matroska->current_id is unset.

And if we are in the normal (non-live-header) mode and no cluster has
been found, then we will end up at fail anyway (when matroska_resync
eventually hits the end of input).

Thinking about this, there is something wrong with this approach: A
Matroska file needn't contain a cluster at all (e.g. a chapter- or
tags- or attachment-only file is not against the spec if I am not
mistaken). But this is orthogonal to the patch at hand.
>> +        s->internal->data_offset = avio_tell(matroska->ctx->pb) - 4;
> The "- 4" is OK as long as level 1 elements are always 4 bytes (which
> is the case). But if matroska_resync() ever exits if it finds an EBML
> Void or CRC-32 then this will break.
> The code is safe for now but may not be future proof.
As has already been said: If matroska->current_id is set at this
point, then it contains a cluster ID, not an arbitrary ID, because the
IDs that matroska_resync finds are fed to ebml_parse before this check
and a level 1 ID different than a cluster is parsed.

Nevertheless, the check should probably be changed to explicitly check
for a cluster ID, just to make this more readable.
>>       matroska_execute_seekhead(matroska);
>>         if (!matroska->time_scale)
>> -- 
>> 2.19.2

More information about the ffmpeg-devel mailing list