Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win32: support full unicode in filenames (use Wide-system calls) #9578

Open
p5pRT opened this issue Nov 27, 2008 · 5 comments
Open

Win32: support full unicode in filenames (use Wide-system calls) #9578

p5pRT opened this issue Nov 27, 2008 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 27, 2008

Migrated from rt.perl.org#60888 (status was 'open')

Searchable as RT60888$

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2008

From abraendle@gmx.de

OS​: Windows XP (German); cp1252

There are 2 problems with encoding of filenames on windows​:
1) cp1252 != latin1, but perl treats them as the same​:
  for example filenames returned by readdir (cp1252) are silently interpreted as latin1,
  but the Euro sign for example is different, the result is wrong/unuseable filename in this case.

Note​: the error may be invisible if the function that uses the filename again
silently uses the inverse conversion. However if i use the filename somewhere else (print to utf8 text file,
use direct Win32 Api call, ...), it is wrong.

2) Unicode chars are not possible

Since perl supports utf8 strings internally, the filenames should be correct utf8 strings
(for opendir, open, stat, readdir, -d, -e, etc...). Currently this is not so.
WinAnsi cp1252 byte strings are interpreted as latin1 (and the other way around),
with above problem.

NTFS supports unicode filenames, and winapi has "Wide-system calls" (suffix W,
e.g. CreateFileW, FindFilesW)

So, perl should switch to use these Wide-system calls (only a UCS2 <=> utf8 conversion remains to be done),
both problems above would be solved ...

[Active Perl 5.8.8, 5.10.0]



Flags​:
  category=core
  severity=medium

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2008

abraendle@gmx.de - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2008

abraendle@gmx.de - Status changed from 'open' to 'new'

@p5pRT
Copy link
Author

p5pRT commented Dec 19, 2008

From abraendle@gmx.de

 
* cp1252 aka windows-1252 is the default (8-bit) charset in Windows XP here,
  (Wikipedia​: also in "English and some other Western languages")
  It is similar to latin1/9 but not exactly the same.

* another advantage of full unicode support for filenames is max
filename length
  of 32000 chars (instead of "#define MAX_PATH 255" for winansi system
calls),
see Windows Api Documentation​:
(e.g. FindFirstFile​: http​://msdn.microsoft.com/en-us/library/aa364418.aspx)

@p5pRT
Copy link
Author

p5pRT commented Dec 19, 2008

abraendle@gmx.de - Status changed from 'new' to 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants