Ticket #1163 (new defect)
ffprobe can produce invalid XML
| Reported by: | Ian | Owned by: | stefano |
|---|---|---|---|
| Priority: | normal | Component: | FFprobe |
| Version: | 0.10.2 | Keywords: | |
| Cc: | n@… | Blocked By: | |
| Blocking: | Reproduced by developer: | no | |
| Analyzed by developer: | no |
Description
ffprobe can output invalid XML as xml_escape_str only handles < > ' " and &. For example most escape characters below 32 are invalid UTF-8.
This replacement version of the function replaces any invalid UTF-8 characters by the inverted question mark.
static const char *xml_escape_str(char **dst, size_t *dst_size, const char *src,
void *log_ctx)
{
// the unknown character (inverted question mark)
const unsigned char BAD_CHARACTER_1 = 194, BAD_CHARACTER_2 = 191;
const char *p;
char *q;
int copyAll = 1;
size_t size = 1;
/* precompute size */
for (p = src; *p;) {
int badChar = 0;
unsigned char byte;
ESCAPE_CHECK_SIZE(src, size, SIZE_MAX-10);
byte = (unsigned char)*p;
if (byte < 32 && byte != 9 && byte != 10 && byte != 13) {
badChar = 1;
++p;
} else if (byte < 128) {
switch (byte) {
case '&' : size += 5; /* & */ copyAll = 0; break;
case '<' : size += 4; /* < */ copyAll = 0; break;
case '>' : size += 4; /* > */ copyAll = 0; break;
case '\"': size += 6; /* " */ copyAll = 0; break;
case '\'': size += 6; /* ' */ copyAll = 0; break;
default: size++;
}
++p;
++size;
}
else if (byte < 0xC0)
{
badChar = 1;
++p;
}
else
{
int extra;
copyAll = 0;
if (byte < 0xe0)
extra = 1;
else if (byte < 0xf0)
extra = 2;
else if (byte < 0xf8)
extra = 3;
else
badChar = 1;
if (badChar)
++p;
else
{
++p;
for (int i = 0; i < extra && *p; ++i, ++p)
{
byte = (unsigned char)*p;
if ((byte & 0xc0) != 0x80)
badChar = 1;
}
if (!badChar)
size += extra;
}
}
if (badChar) {
size += 2;
copyAll = 0;
}
}
ESCAPE_REALLOC_BUF(dst_size, dst, src, size);
#define COPY_STR(str) { \
const char *s = str; \
while (*s) \
*q++ = *s++; \
}
p = src;
q = *dst;
if (copyAll)
COPY_STR(p)
else {
while (*p) {
int badChar = 0;
unsigned char byte;
byte = (unsigned char)*p;
if (byte < 32 && byte != 9 && byte != 10 && byte != 13) {
badChar = 1;
++p;
} else if (byte < 128) {
switch (byte) {
case '&' : COPY_STR("&"); break;
case '<' : COPY_STR("<"); break;
case '>' : COPY_STR(">"); break;
case '\"': COPY_STR("""); break;
case '\'': COPY_STR("'"); break;
default: *q++ = *p;
}
++p;
++size;
}
else if (byte < 0xC0)
{
badChar = 1;
++p;
}
else
{
int extra;
copyAll = 0;
if (byte < 0xe0)
extra = 1;
else if (byte < 0xf0)
extra = 2;
else if (byte < 0xf8)
extra = 3;
else
badChar = 1;
if (badChar)
++p;
else
{
const char *startChar = p;
int i;
++p;
for (i = 0; i < extra && *p; ++i, ++p)
{
byte = (unsigned char)*p;
if ((byte & 0xc0) != 0x80)
badChar = 1;
}
if (!badChar) {
for (i = 0; i < extra;)
*q++ = *startChar++;
}
}
}
if (badChar) {
*q++ = BAD_CHARACTER_1;
*q++ = BAD_CHARACTER_2;
}
}
}
*q = 0;
return *dst;
}
Change History
comment:2 Changed 10 months ago by Đonny
Hi, I can reproduce it with ffprobe version N-42704-g85761ef built on Jul 20 2012 20:39:19
Test file can be downloaded here: http://pangea.rferl.org/videoroot/Pangeavideo/2012/08/2/28/28efb444-f6bc-430c-ae8c-50d613f7fde9.mp4
comment:3 follow-ups: ↓ 4 ↓ 5 Changed 4 weeks ago by nathanaeljones
- Cc n@… added
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
comment:4 in reply to: ↑ 3 Changed 4 weeks ago by cehoyos
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
First step would be to make this a reproducible ticket by providing a failing command line together with complete, uncut console output (current git head).
comment:5 in reply to: ↑ 3 Changed 4 weeks ago by saste
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
I'm interested in this, but I need more samples. Also the problem seems more related to invalid input data and/or encoding, so it looks more like a libavformat problem. Data should be probably validated and sanitized at the demuxing level, before rendering is performed at the application level.



Is the problem also reproducible with current git head?
Please send unified patches to ffmpeg-devel and please provide an example (including console) on how to produce the invalid xml.