[FFmpeg-devel] [RFC] function to check for valid UTF-8 string

Reimar Döffinger Reimar.Doeffinger
Sun Dec 9 15:41:35 CET 2007


Hello,
On Sun, Dec 09, 2007 at 03:33:12PM +0100, Michael Niedermayer wrote:
> On Sun, Dec 09, 2007 at 03:19:26PM +0100, Reimar D?ffinger wrote:
> > On Sun, Dec 09, 2007 at 02:40:51PM +0100, Michael Niedermayer wrote:
> > > On Sun, Dec 09, 2007 at 11:18:32AM +0100, Reimar D?ffinger wrote:
> > > > since Rich seems to have given up on it, here is a proposed patch
> > > > that adds a av_check_utf8 function that could be used to validate
> > > > input strings.
> > > > Since it hacked it up very quickly please forgive any bugs or other
> > > > stupidity.
> > > 
> > > maybe the function should return a index to the last valid or first
> > > invalid byte or something like that?
> > 
> > Don't know. But I can easily change the "return 0;" to "return last;" and
> > "return 1;" to "return NULL;", so it would point to the start of the
> > first invalid unicode character, which would allow for easy truncating
> > of invalid strings, though I don't consider that too useful.
> 
> well i think it is usefull, feel free to commit with that 

I'll apply as attached tomorrow then.

Greetings,
Reimar D?ffinger
-------------- next part --------------
Index: libavutil/string.c
===================================================================
--- libavutil/string.c	(revision 11199)
+++ libavutil/string.c	(working copy)
@@ -23,8 +23,23 @@
 #include <stdio.h>
 #include <string.h>
 #include <ctype.h>
+#include <inttypes.h>
+#include "common.h"
 #include "avstring.h"
 
+static const uint32_t utf8_minvals[5] = {0, 1 << 7, 1 << 13, 1 << 20, 1 << 27};
+
+const char *av_check_utf8(const char *str) {
+    while (*str) {
+        const char *last = str;
+        uint32_t v;
+        GET_UTF8(v, *str++, return last;)
+        if (str - last > 4) return last;
+        if (v < utf8_minvals[str - last]) return last;
+    }
+    return NULL;
+}
+
 int av_strstart(const char *str, const char *pfx, const char **ptr)
 {
     while (*pfx && *pfx == *str) {
Index: libavutil/avstring.h
===================================================================
--- libavutil/avstring.h	(revision 11199)
+++ libavutil/avstring.h	(working copy)
@@ -24,6 +24,15 @@
 #include <stddef.h>
 
 /**
+ * Return pointer to the start of the first invalid UTF-8 character in str
+ * or NULL if str is a valid UTF-8 string.
+ *
+ * @param str input string
+ * @return start of first invalid character or NULL
+ */
+const char *av_check_utf8(const char *str);
+
+/**
  * Return non-zero if pfx is a prefix of str. If it is, *ptr is set to
  * the address of the first character in str after the prefix.
  *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071209/08df6611/attachment.pgp>



More information about the ffmpeg-devel mailing list