Ticket #1163 (new defect)

Opened 15 months ago

Last modified 7 weeks ago

ffprobe can produce invalid XML

Reported by: Ian Owned by: stefano
Priority: normal Component: FFprobe
Version: 0.10.2 Keywords:
Cc: n@… Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

ffprobe can output invalid XML as xml_escape_str only handles < > ' " and &. For example most escape characters below 32 are invalid UTF-8.

This replacement version of the function replaces any invalid UTF-8 characters by the inverted question mark.

static const char *xml_escape_str(char **dst, size_t *dst_size, const char *src,
                                  void *log_ctx)
{
    // the unknown character (inverted question mark)
    const unsigned char BAD_CHARACTER_1 = 194, BAD_CHARACTER_2 = 191;

    const char *p;
    char *q;
    int copyAll = 1;
    size_t size = 1;

    /* precompute size */
    for (p = src; *p;) {
        int badChar = 0;
        unsigned char byte;

        ESCAPE_CHECK_SIZE(src, size, SIZE_MAX-10);

        byte = (unsigned char)*p;
        if (byte < 32 && byte != 9 && byte != 10 && byte != 13) {
            badChar = 1;
            ++p;
        } else if (byte < 128) {
            switch (byte) {
                case '&' : size += 5; /* &amp; */  copyAll = 0; break;
                case '<' : size += 4; /* &lt; */   copyAll = 0; break;
                case '>' : size += 4; /* &gt; */   copyAll = 0; break;
                case '\"': size += 6; /* &quot; */ copyAll = 0; break;
                case '\'': size += 6; /* &apos; */ copyAll = 0; break;
                default: size++;
                }
            ++p;
            ++size;
            }
        else if (byte < 0xC0)
            {
            badChar = 1;
            ++p;
            }
        else
            {
            int extra;

            copyAll = 0;
            if (byte < 0xe0)
                extra = 1;
            else if (byte < 0xf0)
                extra = 2;
            else if (byte < 0xf8)
                extra = 3;
            else
                badChar = 1;

            if (badChar)
                ++p;
            else
                {
                ++p;
                for (int i = 0; i < extra && *p; ++i, ++p)
                    {
                    byte = (unsigned char)*p;
                    if ((byte & 0xc0) != 0x80)
                        badChar = 1;
                    }
                if (!badChar)
                    size += extra;
                }
            }
        if (badChar) {
            size += 2;
            copyAll = 0;
            }
        }

    ESCAPE_REALLOC_BUF(dst_size, dst, src, size);

#define COPY_STR(str) {      \
        const char *s = str; \
        while (*s)           \
            *q++ = *s++;     \
    }

    p = src;
    q = *dst;
    if (copyAll)
        COPY_STR(p)
    else {
        while (*p) {
            int badChar = 0;
            unsigned char byte;
    
            byte = (unsigned char)*p;
            if (byte < 32 && byte != 9 && byte != 10 && byte != 13) {
                badChar = 1;
                ++p;
            } else if (byte < 128) {
                switch (byte) {
                    case '&' : COPY_STR("&amp;");  break;
                    case '<' : COPY_STR("&lt;");   break;
                    case '>' : COPY_STR("&gt;");   break;
                    case '\"': COPY_STR("&quot;"); break;
                    case '\'': COPY_STR("&apos;"); break;
                    default: *q++ = *p;
                    }
                ++p;
                ++size;
                }
            else if (byte < 0xC0)
                {
                badChar = 1;
                ++p;
                }
            else
                {
                int extra;
    
                copyAll = 0;
                if (byte < 0xe0)
                    extra = 1;
                else if (byte < 0xf0)
                    extra = 2;
                else if (byte < 0xf8)
                    extra = 3;
                else
                    badChar = 1;
    
                if (badChar)
                    ++p;
                else
                    {
                    const char *startChar = p;
                    int i;
                    ++p;
                    for (i = 0; i < extra && *p; ++i, ++p)
                        {
                        byte = (unsigned char)*p;
                        if ((byte & 0xc0) != 0x80)
                            badChar = 1;
                        }
                    if (!badChar) {
                        for (i = 0; i < extra;)
                            *q++ = *startChar++;
                    }
                }
            }
            if (badChar) {
                *q++ = BAD_CHARACTER_1;
                *q++ = BAD_CHARACTER_2;
            }
        }
    }
    *q = 0;

    return *dst;
}

Change History

comment:1 Changed 15 months ago by cehoyos

Is the problem also reproducible with current git head?

Please send unified patches to ffmpeg-devel and please provide an example (including console) on how to produce the invalid xml.

comment:2 Changed 10 months ago by Đonny

Hi, I can reproduce it with ffprobe version N-42704-g85761ef built on Jul 20 2012 20:39:19

Version 0, edited 10 months ago by Đonny (next)

comment:3 follow-ups: ↓ 4 ↓ 5 Changed 7 weeks ago by nathanaeljones

  • Cc n@… added

I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?

This is currently blocking our integration of ffmpeg thumbnailing within  http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.

comment:4 in reply to: ↑ 3 Changed 7 weeks ago by cehoyos

Replying to nathanaeljones:

I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?

First step would be to make this a reproducible ticket by providing a failing command line together with complete, uncut console output (current git head).

comment:5 in reply to: ↑ 3 Changed 7 weeks ago by saste

Replying to nathanaeljones:

I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?

This is currently blocking our integration of ffmpeg thumbnailing within  http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.

I'm interested in this, but I need more samples. Also the problem seems more related to invalid input data and/or encoding, so it looks more like a libavformat problem. Data should be probably validated and sanitized at the demuxing level, before rendering is performed at the application level.

Note: See TracTickets for help on using tickets.