[FFmpeg-devel] [RFC] function to check for valid UTF-8 string

Reimar Döffinger Reimar.Doeffinger
Mon Dec 10 11:47:47 CET 2007


Hello,
On Sun, Dec 09, 2007 at 05:09:11PM -0500, Rich Felker wrote:
> On Sun, Dec 09, 2007 at 11:18:32AM +0100, Reimar D?ffinger wrote:
> > since Rich seems to have given up on it, here is a proposed patch
> > that adds a av_check_utf8 function that could be used to validate
> > input strings.
> > Since it hacked it up very quickly please forgive any bugs or other
> > stupidity.
> 
> Read RFC 3629. There's a very simple way to validate byte sequences
> (using the given ABNF) without any decoding required, and it's less
> likely to be buggy. Your patch relies on GET_UTF8 not being buggy,
> which is quite doubtful IMO..

Now reading RFC 3629 was a useless exercise. Their ABNF certainly isn't
my idea of simple (actually "mess" fits it better) and is mostly what
I thought about as an alternative.
Maybe it actually is less likely to be buggy, but this is not worth much
here because:
1) if GET_UTF8 is broken our UTF-8 handling is most likely broken
anyway, and I don't think it will help much if av_check_utf8 is not
broken.
2) There is at least some chance that if GET_UTF8 ever breaks that
somebody will notice it, whereas it is almost certain that av_check_utf8
being replaced by a return NULL would go unnoticed for ages (even if
we add a regression check that is problematic), so I actually do believe
that using GET_UTF8 and a little bit of custom code will be more robust
than a completely custom code.

Btw.: If that is the reason for concern I will happily clearly note
that av_check_utf8 should _never_ be used for security-critical checks,
it is only to warn the user if e.g. a command-line string that should be
in UTF-8 is not.

Greetings,
Reimar D?ffinger




More information about the ffmpeg-devel mailing list