[FFmpeg-devel] file protocol with Unicode support

Nicolas George nicolas.george at normalesup.org
Wed Apr 13 09:28:14 CEST 2011


Le quartidi 24 germinal, an CCXIX, Tomas Härdin a écrit :
> The file protocol works fine for UTF-8 paths on Linux based systems
> last time I checked.

Linux/Unix file names are byte-based: a filename is any sequence of bytes
except 0 and 0x2F ('/'). Interpreting this sequence of bytes as a sequence
of characters is left to the discretion of each tool that displays file
names or gets them from the user. This is usually done according to the
locale settings, using an ASCII-compatible encoding, and these days UTF-8 is
the most common choice.

As ffmpeg does not itself directly display or read file names (it acts
through a tty), all these subtleties are irrelevant for it.

I do not know much of windows, but as far as I know, file names in windows
are an ugly and incomprehensible mix: they are supposed to be made of
Unicode codepoints, they sometimes look like ASCII or extended ASCII and
sometimes look like multibyte strings in UTF-16 (which is one of the
stupidest things ever invented).

So as far as I understand, Kirill's request is legitimate: because of
windows's idiotic way of implementing file names i18n, there is specific
work to do in each application to handle non-ASCII file names.

But I think his patch is way too complex for that.

As UTF-8 cover the whole of Unicode, better than UTF-16, if there is some
way to force windows to parse the string as UTF-8, it would solve the
problem (except if there are files with broken UTF-16 surrogates; I do not
know if fsck tools consider this an error). It would be much simpler.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110413/e9bf1d74/attachment.asc>


More information about the ffmpeg-devel mailing list