[FFmpeg-devel] [RFC] function to check for valid UTF-8 string
Reimar Döffinger
Reimar.Doeffinger
Sun Dec 9 15:41:35 CET 2007
Hello,
On Sun, Dec 09, 2007 at 03:33:12PM +0100, Michael Niedermayer wrote:
> On Sun, Dec 09, 2007 at 03:19:26PM +0100, Reimar D?ffinger wrote:
> > On Sun, Dec 09, 2007 at 02:40:51PM +0100, Michael Niedermayer wrote:
> > > On Sun, Dec 09, 2007 at 11:18:32AM +0100, Reimar D?ffinger wrote:
> > > > since Rich seems to have given up on it, here is a proposed patch
> > > > that adds a av_check_utf8 function that could be used to validate
> > > > input strings.
> > > > Since it hacked it up very quickly please forgive any bugs or other
> > > > stupidity.
> > >
> > > maybe the function should return a index to the last valid or first
> > > invalid byte or something like that?
> >
> > Don't know. But I can easily change the "return 0;" to "return last;" and
> > "return 1;" to "return NULL;", so it would point to the start of the
> > first invalid unicode character, which would allow for easy truncating
> > of invalid strings, though I don't consider that too useful.
>
> well i think it is usefull, feel free to commit with that
I'll apply as attached tomorrow then.
Greetings,
Reimar D?ffinger
-------------- next part --------------
Index: libavutil/string.c
===================================================================
--- libavutil/string.c (revision 11199)
+++ libavutil/string.c (working copy)
@@ -23,8 +23,23 @@
#include <stdio.h>
#include <string.h>
#include <ctype.h>
+#include <inttypes.h>
+#include "common.h"
#include "avstring.h"
+static const uint32_t utf8_minvals[5] = {0, 1 << 7, 1 << 13, 1 << 20, 1 << 27};
+
+const char *av_check_utf8(const char *str) {
+ while (*str) {
+ const char *last = str;
+ uint32_t v;
+ GET_UTF8(v, *str++, return last;)
+ if (str - last > 4) return last;
+ if (v < utf8_minvals[str - last]) return last;
+ }
+ return NULL;
+}
+
int av_strstart(const char *str, const char *pfx, const char **ptr)
{
while (*pfx && *pfx == *str) {
Index: libavutil/avstring.h
===================================================================
--- libavutil/avstring.h (revision 11199)
+++ libavutil/avstring.h (working copy)
@@ -24,6 +24,15 @@
#include <stddef.h>
/**
+ * Return pointer to the start of the first invalid UTF-8 character in str
+ * or NULL if str is a valid UTF-8 string.
+ *
+ * @param str input string
+ * @return start of first invalid character or NULL
+ */
+const char *av_check_utf8(const char *str);
+
+/**
* Return non-zero if pfx is a prefix of str. If it is, *ptr is set to
* the address of the first character in str after the prefix.
*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071209/08df6611/attachment.pgp>
More information about the ffmpeg-devel
mailing list