[FFmpeg-devel] [PATCH 3/3] lavc: check decoded subtitles encoding.

Nicolas George nicolas.george at normalesup.org
Sun Apr 7 11:19:42 CEST 2013

L'octidi 18 germinal, an CCXXI, Reimar Döffinger a écrit :
> I am against a av_is_valid_utf8 if this is going to be the only purpose.
> Detecting non-UTF-8 is quite reliable by just checking the "syntax",
> without trying to validate the code points.
> And exposing a function called is_valid_utf8 isn't something we should
> do unless we are 100% sure the validation is fully waterproof, which is
> a lot of extra review effort.
> It doesn't seem worth it to me for this single usage.

I see your point. There are, IMHO, two sides to your mail: validating
carefully or not, and making the function public or not.

On the second question, I have no strong opinion. I needed the validation
function, it seemed like something useful so it seemed logical to put it in
lavu, but I have no objection making it static.

That was also my reason for accepting BOM: it is meant to be generic, not
specialized for decoded subtitles lines.

I do not think it is a real problem if the validation is not 100%
waterproof: there is no formal definition of valid UTF-8 (like there is for
XML), only guidelines to detect common bugs and limitations that depend on
the use.

On the question of validating carefully, it is actually fairly trivial.
Testing the codepoints is actually simpler than extracting them in the first
place. There is already a GET_UTF8 macro in lavu, but it is way too lax to
be used there.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130407/44797cc3/attachment.asc>

More information about the ffmpeg-devel mailing list