[FFmpeg-devel] [PATCH 08/21] avformat/matroskadec: Improve error messages
Steve Lhomme
robux4 at ycbcr.xyz
Mon Apr 8 09:44:47 EEST 2019
> On April 7, 2019 at 11:24 AM Andreas Rheinhardt via ffmpeg-devel <ffmpeg-devel at ffmpeg.org> wrote:
>
>
> Steve Lhomme:
> > On 3/27/2019 12:18 PM, Andreas Rheinhardt via ffmpeg-devel wrote:
> >> ebml_read_num had a number of flaws:
> >>
> >> 1. The check for read errors/EOF was totally wrong. E.g. an EBML number
> >> beginning with the invalid 0x00 would be considered a read error,
> >> although it is just invalid data.
> >> 2. The check for read errors/EOF was done just once, after reading the
> >> first byte of the EBML number. But errors/EOF can happen inbetween, of
> >> course, and this wasn't checked.
> >> 3. There was no way to distinguish when EOF should be an error (because
> >> the data has to be there) for which an error message should be emitted
> >> and when it is not necessarily an error (namely during parsing of EBML
> >> IDs). Such a possibility has been added and used.
> >
> > Maybe the title of the patch should rather mention that it's fixing
> > the EOF handling when reading EBML ID/Length. The changed error
> > messages is a less important consequence.
> >
> How about "avformat/matroskadec: Improve (in particular) EOF checks"?
> >
> >>
> >> Some useless initializations were also fixed.
> >>
> >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at googlemail.com>
> >> ---
> >> libavformat/matroskadec.c | 61
> >> ++++++++++++++++++++++-----------------
> >> 1 file changed, 34 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> >> index a6617a607b..aa2266384a 100644
> >> --- a/libavformat/matroskadec.c
> >> +++ b/libavformat/matroskadec.c
> >> @@ -820,29 +820,19 @@ static int ebml_level_end(MatroskaDemuxContext
> >> *matroska)
> >> * Returns: number of bytes read, < 0 on error
> >> */
> >> static int ebml_read_num(MatroskaDemuxContext *matroska,
> >> AVIOContext *pb,
> >> - int max_size, uint64_t *number)
> >> + int max_size, uint64_t *number, int
> >> eof_forbidden)
> >> {
> >> - int read = 1, n = 1;
> >> - uint64_t total = 0;
> >> + int read, n = 1;
> >> + uint64_t total;
> >> + int64_t pos;
> >> - /* The first byte tells us the length in bytes - avio_r8()
> >> can normally
> >> - * return 0, but since that's not a valid first ebmlID byte, we
> >> can
> >> - * use it safely here to catch EOS. */
> >> - if (!(total = avio_r8(pb))) {
> >> - /* we might encounter EOS here */
> >> - if (!avio_feof(pb)) {
> >> - int64_t pos = avio_tell(pb);
> >> - av_log(matroska->ctx, AV_LOG_ERROR,
> >> - "Read error at pos. %"PRIu64" (0x%"PRIx64")\n",
> >> - pos, pos);
> >> - return pb->error ? pb->error : AVERROR(EIO);
> >> - }
> >> - return AVERROR_EOF;
> >> - }
> >> + /* The first byte tells us the length in bytes - except when it
> >> is zero. */
> >> + total = avio_r8(pb);
> >> + if (avio_feof(pb))
> >> + goto err;
> >> /* get the length of the EBML number */
> >> - read = 8 - ff_log2_tab[total];
> >> - if (read > max_size) {
> >> + if (!total || (read = 8 - ff_log2_tab[total]) > max_size) {
> >> int64_t pos = avio_tell(pb) - 1;
> >> av_log(matroska->ctx, AV_LOG_ERROR,
> >> "Invalid EBML number size tag 0x%02x at pos
> >> %"PRIu64" (0x%"PRIx64")\n",
> >> @@ -855,9 +845,27 @@ static int ebml_read_num(MatroskaDemuxContext
> >> *matroska, AVIOContext *pb,
> >> while (n++ < read)
> >> total = (total << 8) | avio_r8(pb);
> >> + if (avio_feof(pb)) {
> >> + eof_forbidden = 1;
> >> + goto err;
> >> + }
> >
> > You're forcing an error if the data ends after reading a number ?
> > Ending a Matroska file with a number should be fine. It could also be
> > an element with a size of 0. It doesn't contain any data but it's
> > still valid (depending on the semantic of the element).
> >
> > So this forced error seem wrong. Let the next read catch the EOF if it
> > finds one.
> >
> avio_feof doesn't indicate an error if we reach EOF, only if we
> attempt to read past EOF (or if another error occurred). If the EBML
> number we intend to parse is completely read, then no error is
> indicated here at all. If the input ends immediately after this EBML
> number, then the next read will trigger EOF and will have to catch it.
OK, it makes sense.
> If avio_feof is set at this point, it means that the first byte of the
> EBML number says that the EBML number is total bytes long, but an
> error or EOF occured during reading of these bytes. I think this
> warrants an error. eof_forbidden is set at this point so that an
> incomplete EBML number will always trigger an error.
OK
> >> +
> >> *number = total;
> >> return read;
> >> +
> >> +err:
> >> + pos = avio_tell(pb);
> >> + if (pb->error) {
> >> + av_log(matroska->ctx, AV_LOG_ERROR,
> >> + "Read error at pos. %"PRIu64" (0x%"PRIx64")\n",
> >> + pos, pos);
> >> + return pb->error;
> >> + }
> >> + if (eof_forbidden)
> >> + av_log(matroska->ctx, AV_LOG_ERROR, "File ended prematurely "
> >> + "at pos. %"PRIu64" (0x%"PRIx64")\n", pos, pos);
> >
> > If eof_forbidden is false (EOF allowed) you return an EOF error anyway ?
> >
> Yes, so that the caller knows that the file has ended and can act
> accordingly. Currently only the parsing of EBML IDs in ebml_parse
> allow EOF and if we hit EOF while parsing an EBML ID, ebml_parse does
> a more thorough check for whether this is a real error or something
> legal (of course, based upon whether we the current level is
> unknown-sized or not).
Then I don't understand what eof_forbidden is for. It seems that EOF should be treated no matter what.
> >> + return AVERROR_EOF;
> >> }
> >> /**
> >> @@ -868,7 +876,7 @@ static int ebml_read_num(MatroskaDemuxContext
> >> *matroska, AVIOContext *pb,
> >> static int ebml_read_length(MatroskaDemuxContext *matroska,
> >> AVIOContext *pb,
> >> uint64_t *number)
> >> {
> >> - int res = ebml_read_num(matroska, pb, 8, number);
> >> + int res = ebml_read_num(matroska, pb, 8, number, 1);
> >> if (res > 0 && *number + 1 == 1ULL << (7 * res))
> >> *number = EBML_UNKNOWN_LENGTH;
> >> return res;
> >> @@ -1010,7 +1018,7 @@ static int
> >> matroska_ebmlnum_uint(MatroskaDemuxContext *matroska,
> >> {
> >> AVIOContext pb;
> >> ffio_init_context(&pb, data, size, 0, NULL, NULL, NULL, NULL);
> >> - return ebml_read_num(matroska, &pb, FFMIN(size, 8), num);
> >> + return ebml_read_num(matroska, &pb, FFMIN(size, 8), num, 1);
> >> }
> >> /*
> >> @@ -1057,7 +1065,7 @@ static int ebml_parse(MatroskaDemuxContext
> >> *matroska, EbmlSyntax *syntax,
> >> {
> >> if (!matroska->current_id) {
> >> uint64_t id;
> >> - int res = ebml_read_num(matroska, matroska->ctx->pb, 4, &id);
> >> + int res = ebml_read_num(matroska, matroska->ctx->pb, 4,
> >> &id, 0);
> >> if (res < 0) {
> >> // in live mode, finish parsing if EOF is reached.
> >> return (matroska->is_live &&
> >> matroska->ctx->pb->eof_reached &&
> >> @@ -3335,10 +3343,9 @@ static int
> >> matroska_parse_block(MatroskaDemuxContext *matroska, AVBufferRef *buf
> >> uint64_t num;
> >> int trust_default_duration = 1;
> >> - if ((n = matroska_ebmlnum_uint(matroska, data, size, &num)) <
> >> 0) {
> >> - av_log(matroska->ctx, AV_LOG_ERROR, "EBML block data
> >> error\n");
> >> + if ((n = matroska_ebmlnum_uint(matroska, data, size, &num)) < 0)
> >> return n;
> >> - }
> >> +
> >> data += n;
> >> size -= n;
> >> @@ -3699,7 +3706,7 @@ static int
> >> webm_clusters_start_with_keyframe(AVFormatContext *s)
> >> AVPacket *pkt;
> >> avio_seek(s->pb, cluster_pos, SEEK_SET);
> >> // read cluster id and length
> >> - read = ebml_read_num(matroska, matroska->ctx->pb, 4,
> >> &cluster_id);
> >> + read = ebml_read_num(matroska, matroska->ctx->pb, 4,
> >> &cluster_id, 1);
> >> if (read < 0 || cluster_id != 0xF43B675) // done with all
> >> clusters
> >> break;
> >> read = ebml_read_length(matroska, matroska->ctx->pb,
> >> &cluster_length);
> >> @@ -3916,7 +3923,7 @@ static int
> >> webm_dash_manifest_cues(AVFormatContext *s, int64_t init_range)
> >> // Cues element ID + EBML length of the Cues element.
> >> cues_end is
> >> // inclusive and the above sum is reduced by 1.
> >> uint64_t cues_length = 0, cues_id = 0, bytes_read = 0;
> >> - bytes_read += ebml_read_num(matroska, matroska->ctx->pb, 4,
> >> &cues_id);
> >> + bytes_read += ebml_read_num(matroska, matroska->ctx->pb, 4,
> >> &cues_id, 1);
> >
> > += on a call that may return an error doesn't seem correct. The error
> > should be checked before using the value.
> >
> Ok, I'll fix this (pre-existing) error.
> >> bytes_read += ebml_read_length(matroska,
> >> matroska->ctx->pb, &cues_length);
> >> cues_end = cues_start + cues_length + bytes_read - 1;
> >> }
> >> --
> >> 2.19.2
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list