[FFmpeg-devel] [PATCH] Support for UTF8 filenames on Windows

Ramiro Polla ramiro.polla
Fri Jun 26 17:16:37 CEST 2009


On Fri, Jun 26, 2009 at 11:07 AM, Karl Blomster<thefluff at uppcon.com> wrote:
> M?ns Rullg?rd wrote:
>> Karl Blomster <thefluff at uppcon.com> writes:
>>> Ramiro Polla wrote:
>>>> On Thu, Jun 25, 2009 at 8:59 AM, Michael
>>>> Niedermayer<michaelni at gmx.at> wrote:
>>>>> On Sat, Jun 20, 2009 at 11:56:37PM +0200, Kalle Blomster wrote:
>>>>>> Currently, ffmpeg on Windows does not support opening files whose
>>>>>> names
>>>>>> contain characters that cannot be expressed in the current locale,
>>>>>> because
>>>>>> on Windows you can't pass UTF8 in a char* to _open() and have it work.
>>>>>> You
>>>>>> have to convert the filename to UTF16 and use _wopen(), which takes a
>>>>>> wchar_t instead.
>>>>>>
>>>>>> I have attached a patch that attempts to solve the problem with a
>>>>>> rather
>>>>>> ugly hack. It Works For Me(tm) under mingw at least. Comments are
>>>>>> appreciated.
>>>>>>
>>>>>> Regards,
>>>>>> Karl Blomster
>>>>>> ?os_support.c | ? 17 +++++++++++++++++
>>>>>> ?os_support.h | ? ?5 +++++
>>>>>> ?2 files changed, 22 insertions(+)
>>>>>> 9afa6887f1f6998c37d75efaae5d589918dc752b ?ffmpeg_win_utf8_paths.patch
>>>>>> Index: libavformat/os_support.c
>>>>>> ===================================================================
>>>>>> --- libavformat/os_support.c ?(revision 19242)
>>>>>> +++ libavformat/os_support.c ?(working copy)
>>>>>> @@ -30,6 +30,23 @@
>>>>>> ?#include <sys/time.h>
>>>>>> ?#include "os_support.h"
>>>>>>
>>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>>>>> +#define WIN32_LEAN_AND_MEAN
>>>>>> +#include <windows.h>
>>>>>> +#endif
>>>>>> +
>>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>>>
>>>> Where is HAVE_WIN_UTF8_PATHS defined?
>>>
>>> Nowhere, right now. My thought is to let configure set it with some
>>> --enable parameter, or you just pass -DHAVE_WIN_UTF8_PATHS in your
>>> CFLAGS. The point was that I thought it might be a good idea to let
>>> the user compile with it disabled, if he wanted to, like if someone
>>> wanted to build on Win9x (heh) or something where unicode support
>>> might not be available.
>>
>> Can we simply test for the existence of _wopen()? ?Is there any reason
>> to disable this if the function exists?
>
> That may be dangerous. It will always exist in the MinGW includes/libraries,
> but that doesn't mean it's implemented and works in the runtime libraries
> you end up using. See also below.

It this something from msvcrt or from the MinGW runtime libraries?
FFmpeg already expects minimum mingw-rt and w32api versions.

If it's because of Win9x users, we already have a couple of places
that need higher versions of Windows (like a call in getutime in
ffmpeg.c and inside vfwcap IIRC). I haven't heard of anyone seriously
using FFmpeg in Win9x and before that happens I don't think we should
worry about them =)

>>>>>> +int winutf8_open(const char *filename, int oflag, int pmode)
>>>>>> +{
>>>>>> + ? ? wchar_t wfilename[MAX_PATH * 2];
>>>>>> +
>>>>>> + ? ? if
>>>>>> (MultiByteToWideChar(CP_UTF8,MB_ERR_INVALID_CHARS,filename,-1,wfilename,MAX_PATH)
>>>>>> > 0)
>>>>>> + ? ? ? ? ? ? return _wopen(wfilename, oflag, pmode);
>>>>>> + ? ? else
>>>>>> + ? ? ? ? ? ? return open(filename, oflag, pmode);
>>>>>> +}
>>>>>> +#endif
>>
>> What might cause MultiByteToWideChar() to fail? ?What will plain
>> open() do with such input? ?Also, what is the value of MAX_PATH?
>> It is probably a bad idea to silently truncate the filename at
>> MAX_PATH characters. ?This could turn an invalid name into the name of
>> an existing file.
>
> MultiByteToWideChar() will fail in this case if the input string has
> characters that cannot be translated as valid UTF8 (since
> MB_ERR_INVALID_CHARS is specified). This might happen if you have a
> multi-byte string that isn't UTF8, like for example in the system's local
> code page (if it's multi-byte). It can also fail if the buffer length is
> insufficient, or if you lack CP_UTF8, but neither should be a concern here.
>
> open() should, as far as I am aware, deal gracefully with multi-byte strings
> in the system locale, but since it is conceivable that there might be
> multi-byte characters in the local code page that can be interpreted as
> valid UTF-8 even though they are not, and considering the fact that the
> MSVCRT behaves really weirdly with character translations sometimes, the
> only truly safe option here is to pass only UTF-8 or latin-1; other
> character sets are not guaranteed to work. Hence my preference for leaving
> it optional, so people who want UTF-8 filenames on Windows can get them and
> everyone else can go about their business as usual.

If it's optional it should be documented and the consequences made clear.

> MAX_PATH is defined to 260 in WinDef.h, and that is actually the maximum
> allowed path length in the Win32 API unless you want to jump through some
> hoops. Paths of up to 32,767 characters (approximately) are allowed, but
> only if they are absolute and start with the magical \\?\ prefix. I guess I
> could do some detection of relative paths and add said magical prefix
> manually if so desired, but the static allocation seems safe enough, and the
> 260 character limit is indeed what a vast majority of Windows programs use.

Indeed, FFmpeg fails with long names. But if you truncate the long
name, it might turn into a valid name (like Mans said).

> Updated patch with less tabs (and a rather embarrassing typo fix) attached.
>
> Regards,
> Karl Blomster
>
> Index: libavformat/os_support.c
> ===================================================================
> --- libavformat/os_support.c ? ?(revision 19266)
> +++ libavformat/os_support.c ? ?(working copy)
> @@ -30,6 +30,23 @@
> ?#include <sys/time.h>
> ?#include "os_support.h"
>
> +#ifdef HAVE_WIN_UTF8_PATHS
> +#define WIN32_LEAN_AND_MEAN
> +#include <windows.h>
> +#endif
> +
> +#ifdef HAVE_WIN_UTF8_PATHS
> +int winutf8_open(const char *filename, int oflag, int pmode)
> +{
> + ? ?wchar_t wfilename[MAX_PATH * 2];

Isn't sizeof(wchar_t) == 2?

I think you could also use wchar_t wfilename[strlen(filename) + 1]
instead of malloc if we are going to try and pass paths larger than
MAX_PATH.

> + ? ?if (MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, filename, -1,
> wfilename, MAX_PATH) > 0)
> + ? ? ? ?return _wopen(wfilename, oflag, pmode);
> + ? ?else
> + ? ? ? ?return _open(filename, oflag, pmode);
> +}
> +#endif
> +
> ?#if CONFIG_NETWORK
> ?#if !HAVE_POLL_H
> ?#if HAVE_WINSOCK2_H
> Index: libavformat/os_support.h
> ===================================================================
> --- libavformat/os_support.h ? ?(revision 19266)
> +++ libavformat/os_support.h ? ?(working copy)
> @@ -34,6 +34,11 @@
> ?# ?define lseek(f,p,w) _lseeki64((f), (p), (w))
> ?#endif
>
> +#ifdef HAVE_WIN_UTF8_PATHS
> +#define open(fn,of,pm) winutf8_open((fn), (of), (pm))
> +int winutf8_open(const char *filename, int oflag, int pmode);
> +#endif
> +
> ?static inline int is_dos_path(const char *path)
> ?{
> ?#if HAVE_DOS_PATHS



More information about the ffmpeg-devel mailing list