Skip Menu |
 
Report information
Id: 60888
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: xur_ux <abraendle [at] gmx.de>
Cc: hippietrail <hippietrail [at] yahoo.com>
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: medium
Type:
  • core
  • PerlIO
  • portability
  • Unicode
Perl Version: (no value)
Fixed In: (no value)



Subject: Win32: support full unicode in filenames (use Wide-system calls)
Date: Thu, 27 Nov 2008 23:34:45 +0100
To: <perlbug [...] perl.org>
From: "ab" <abraendle [...] gmx.de>
Download (untitled) / with headers
text/plain 1.2k
OS: Windows XP (German); cp1252 There are 2 problems with encoding of filenames on windows: 1) cp1252 != latin1, but perl treats them as the same: for example filenames returned by readdir (cp1252) are silently interpreted as latin1, but the Euro sign for example is different, the result is wrong/unuseable filename in this case. Note: the error may be invisible if the function that uses the filename again silently uses the inverse conversion. However if i use the filename somewhere else (print to utf8 text file, use direct Win32 Api call, ...), it is wrong. 2) Unicode chars are not possible Since perl supports utf8 strings internally, the filenames should be correct utf8 strings (for opendir, open, stat, readdir, -d, -e, etc...). Currently this is not so. WinAnsi cp1252 byte strings are interpreted as latin1 (and the other way around), with above problem. NTFS supports unicode filenames, and winapi has "Wide-system calls" (suffix W, e.g. CreateFileW, FindFilesW) So, perl should switch to use these Wide-system calls (only a UCS2 <=> utf8 conversion remains to be done), both problems above would be solved ... [Active Perl 5.8.8, 5.10.0] ----------------------------------------------------------------- --- Flags: category=core severity=medium
Download (untitled) / with headers
text/plain 470b
* cp1252 aka windows-1252 is the default (8-bit) charset in Windows XP here, (Wikipedia: also in "English and some other Western languages") It is similar to latin1/9 but not exactly the same. * another advantage of full unicode support for filenames is max filename length of 32000 chars (instead of "#define MAX_PATH 255" for winansi system calls), see Windows Api Documentation: (e.g. FindFirstFile: http://msdn.microsoft.com/en-us/library/aa364418.aspx)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org