[FFmpeg-devel] [PATCH 14/21] avformat/matroskadec: Use proper levels after discontínuity

Andreas Rheinhardt andreas.rheinhardt at googlemail.com
Sun Apr 7 15:59:00 EEST 2019


Steve Lhomme:
> On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
>> The earlier code set the level to zero upon seeking and after a
>> discontinuity although in both cases parsing (re)starts at a level 1
>> element.
>>
>> Also set the segment's length to unkown if an error occured in order
>> not
>> to drop any valid data that happens to be beyond the designated end of
>> the segment.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at googlemail.com>
>> ---
>>   libavformat/matroskadec.c | 59
>> +++++++++++++++++++++++----------------
>>   1 file changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
>> index 0179e5426e..42f1c21042 100644
>> --- a/libavformat/matroskadec.c
>> +++ b/libavformat/matroskadec.c
>> @@ -737,13 +737,24 @@ static const char *const matroska_doctypes[] =
>> { "matroska", "webm" };
>>     static int matroska_read_close(AVFormatContext *s);
>>   +static int matroska_reset_status(MatroskaDemuxContext *matroska,
>> +                                 uint32_t id, int64_t position)
>> +{
>> +    matroska->current_id = id;
>> +    matroska->num_levels = 1;
>> +    matroska->current_cluster.pos = 0;
>> +
>> +    if (position >= 0)
>> +        return avio_seek(matroska->ctx->pb, position, SEEK_SET);
>> +
>> +    return 0;
>> +}
>> +
> 
> I think you should have done this commit in 2 parts.
> - adding matroska_reset_status() and do exactly what was done before
> - add changes related to the level and unknown length/discontinuity.
> 
Ok, that's easily possible.

>>   static int matroska_resync(MatroskaDemuxContext *matroska, int64_t
>> last_pos)
>>   {
>>       AVIOContext *pb = matroska->ctx->pb;
>>       int64_t ret;
>>       uint32_t id;
>> -    matroska->current_id = 0;
>> -    matroska->num_levels = 0;
>>         /* seek to next position to resync from */
>>       if ((ret = avio_seek(pb, last_pos + 1, SEEK_SET)) < 0) {
>> @@ -759,7 +770,14 @@ static int matroska_resync(MatroskaDemuxContext
>> *matroska, int64_t last_pos)
>>               id == MATROSKA_ID_CUES     || id ==
>> MATROSKA_ID_TAGS        ||
>>               id == MATROSKA_ID_SEEKHEAD || id ==
>> MATROSKA_ID_ATTACHMENTS ||
>>               id == MATROSKA_ID_CLUSTER  || id ==
>> MATROSKA_ID_CHAPTERS) {
>> -            matroska->current_id = id;
>> +            /* Prepare the context for further parsing of a level 1
>> element. */
>> +            matroska_reset_status(matroska, id, -1);
> 
> You set num_levels to 1 now, leaving this function used to have
> num_levels set to 0. I'm not sure it's correct or not.
> 
We have found a level 1 element, so the level should be set to 1. If
we enter and parse the level 1 element found, we will be at level two;
currently we are only at level 1 (so that the level 1 element appears
to be the highest level in the hierarchy, when in fact it is not). So
this change is intentional.
>> +
>> +            /* Given that we are here means that an error has occured,
> 
> I thought this function was meant to find a level1 ID after getting
> into a Segment. This does not seem like an error at all.
> 
First the EBML header is parsed as a typical EBML_NEST element; then
the segment is parsed as nest, too. The syntax according to which its
children are parsed contains all level 1 elements; except for a
cluster (which is an EBML_STOP element) all elements are EBML_LEVEL1
and are therefore parsed according to the respective syntax elements
for their descendants. For normal files, there is no need to resync at
all: The next element begins directly after the previous one has ended.

matroska_resync is only used in two situations:
If parsing the segment doesn't find a cluster and we are not in the
mode for parsing live header files. (This implies that an error
(possibly EOF) has occurred.)
And if an error happened during reading a packet (i.e. in
matroska_parse_cluster()).
There is actually no reason to resync for valid files at all. (When
reading till the end, current FFmpeg would use matroska_resync() at
the end even if everything actually looks fine. Patch 20 is designed
to fix this.)
>> +             * so treat the segment as unknown length in order not to
>> +             * discard valid data that happens to be beyond the
>> designated
>> +             * end of the segment. */
>> +            matroska->levels[0].length = EBML_UNKNOWN_LENGTH;
>>               return 0;
>>           }
>>           id = (id << 8) | avio_r8(pb);
>> @@ -1610,18 +1628,12 @@ static int
>> matroska_parse_seekhead_entry(MatroskaDemuxContext *matroska,
>>               matroska->current_id                   = 0;
>>                 ret = ebml_parse(matroska, matroska_segment, matroska);
>> -
>> -            /* remove dummy level */
>> -            while (matroska->num_levels) {
>> -                uint64_t length =
>> matroska->levels[--matroska->num_levels].length;
>> -                if (length == EBML_UNKNOWN_LENGTH)
>> -                    break;
>> -            }
> 
> I think this code indicates unknown length was handled in a seekhead
> entry, probably because such files exist. Making the assumption in
> 13/21 about unknown length only on Segment+Cluster incorrect.
> 
1. The current way of handling unknown-sized elements is this:

a) If the current level is unknown-sized and we encounter a cluster ID
when clusters are not part of the syntax list used currently for
parsing, parsing is stopped and matroska->current_id (containing a
cluster ID) is kept. The code here contains a comment that we reached
the end of an unknown-size cluster, although it needn't be true that
the current master element is indeed a cluster.
(Michael has already merged my patch regarding length checks (that you
partially objected to) so that currently only level 0 elements and
clusters can be unknown-sized, but before that it was not assured that
we really reached the end of an unknown-size cluster.)
b) ebml_level_end() ends the current level if there is one and if
matroska->current_id is set. (It of course also ends the current level
based upon a length criterion, but that is only useful for known-sized
master elements.)

2. As you can see (both from the code and the comment), the code was
based around the assumption that only clusters (and segments) could be
unkown-sized.

3. The code snippet you cite as evidence that unknown length was
somehow supported for seekheads actually doesn't work if the
referenced element is an unknown-length element: In this case, it's
not the dummy level that is removed, but rather only the uppermost
unknown-size level. (And of course, parsing the actual element the
seekhead points to still likely fails, because the code doesn't know
when the levels end.)
>>           }
>>       }
>> -    /* seek back */
>> -    avio_seek(matroska->ctx->pb, before_pos, SEEK_SET);
>> -    matroska->current_id = saved_id;
>> +
>> +    /* Seek back - notice that in all instances where this is used
>> it is safe
>> +     * to set the level to 1 and unset the position of the current
>> cluster. */
>> +    matroska_reset_status(matroska, saved_id, before_pos);
>>         return ret;
>>   }
>> @@ -3535,9 +3547,7 @@ static int matroska_read_seek(AVFormatContext
>> *s, int stream_index,
>>       timestamp = FFMAX(timestamp, st->index_entries[0].timestamp);
>>         if ((index = av_index_search_timestamp(st, timestamp,
>> flags)) < 0 || index == st->nb_index_entries - 1) {
>> -        avio_seek(s->pb, st->index_entries[st->nb_index_entries -
>> 1].pos,
>> -                  SEEK_SET);
>> -        matroska->current_id = 0;
>> +        matroska_reset_status(matroska, 0,
>> st->index_entries[st->nb_index_entries - 1].pos);
>>           while ((index = av_index_search_timestamp(st, timestamp,
>> flags)) < 0 || index == st->nb_index_entries - 1) {
>>               matroska_clear_queue(matroska);
>>               if (matroska_parse_cluster(matroska) < 0)
>> @@ -3557,8 +3567,8 @@ static int matroska_read_seek(AVFormatContext
>> *s, int stream_index,
>>           tracks[i].end_timecode         = 0;
>>       }
>>   -    avio_seek(s->pb, st->index_entries[index].pos, SEEK_SET);
>> -    matroska->current_id       = 0;
>> +    /* We seek to a level 1 element, so set the appropriate status. */
>> +    matroska_reset_status(matroska, 0, st->index_entries[index].pos);
>>       if (flags & AVSEEK_FLAG_ANY) {
>>           st->skip_to_keyframe = 0;
>>           matroska->skip_to_timecode = timestamp;
>> @@ -3568,18 +3578,16 @@ static int
>> matroska_read_seek(AVFormatContext *s, int stream_index,
>>       }
>>       matroska->skip_to_keyframe = 1;
>>       matroska->done             = 0;
>> -    matroska->num_levels       = 0;
>>       ff_update_cur_dts(s, st, st->index_entries[index].timestamp);
>>       return 0;
>>   err:
>>       // slightly hackish but allows proper fallback to
>>       // the generic seeking code.
>> +    matroska_reset_status(matroska, 0, -1);
>>       matroska_clear_queue(matroska);
>> -    matroska->current_id = 0;
>>       st->skip_to_keyframe =
>>       matroska->skip_to_keyframe = 0;
>>       matroska->done = 0;
>> -    matroska->num_levels = 0;
>>       return -1;
>>   }
>>   @@ -3662,8 +3670,8 @@ static int
>> webm_clusters_start_with_keyframe(AVFormatContext *s)
>>           read = ebml_read_length(matroska, matroska->ctx->pb,
>> &cluster_length);
>>           if (read < 0)
>>               break;
>> -        avio_seek(s->pb, cluster_pos, SEEK_SET);
>> -        matroska->current_id = 0;
>> +
>> +        matroska_reset_status(matroska, 0, cluster_pos);
>>           matroska_clear_queue(matroska);
>>           if (matroska_parse_cluster(matroska) < 0 ||
>>               !matroska->queue) {
>> @@ -3677,7 +3685,10 @@ static int
>> webm_clusters_start_with_keyframe(AVFormatContext *s)
>>               break;
>>           }
>>       }
>> -    avio_seek(s->pb, before_pos, SEEK_SET);
>> +
>> +    /* Restore the status after matroska_read_header: */
>> +    matroska_reset_status(matroska, MATROSKA_ID_CLUSTER, before_pos);
>> +
>>       return rv;
>>   }
>>   -- 
>> 2.19.2


More information about the ffmpeg-devel mailing list