Skip Menu |
Report information
Id: 113536
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: kartlee05 <karthik.rajagopalan [at] schrodinger.com>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: GetEnvironmentStrings() mess up the values for non-ascii strings
Date: Wed, 6 Jun 2012 17:59:30 -0400
To: perlbug [...] perl.org
From: Karthik Rajagopalan <karthik.rajagopalan [...] schrodinger.com>
Download (untitled) / with headers
text/plain 5.9k
Hi, Before launching a sub-process using win32_spawnvp of win32.c, we try to get the environment block of the parent process by PerlEnv_get_children. This is defined to call CreateChildEnv(.)-> CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of win32 directory. This in turn call kernel32.dll's GetEnvironmentStrings(..). The ansi version of the function return the strings in OEM code page. So this mess up the string while passing to the subprocess if it has non-ascii characters. The patch attached ( wrt main branch 5.17 ) address the issue by calling GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in a single shot. It is then used to construct the new environment block for the sub-process. You will observe this issue in all versions of perl including 5.17 of your main branch. To reproduce the issue, following the below recipe - 0) Create a user account in windows with non-ascii character, say bærbar 1) Open a cmd shell with this special usename account. 2) Download test1.c, test.bat.txt, test.pl to a directory, say X 3) cd to X. 4) rename test.bat.txt test.bat 5) Run vcvcars32.bat or vcvars64.bat to get Visual Studio environment ( I used VS 2010 to compile it ) 6) Run test.bat. This will give test1.exe binary. 7) Set your perl environment in PATH and run perl test.pl The program should successfully launch test1.exe and get the CSIDL_LOCAL_APPDATA of the user. Since the USERNAME, USERPROFILE env get messed up at perl side, you won't be able to retrieve the CSIDL_LOCAL_APPDATA of the user. With the patch, the program should successfully retrieve the value. The ANSI code page of my system while doing this test is 'windows-1252' and OEM code page is 'IBM437'. My System Details ---------------------------- --- Flags: category=core severity=critical --- Site configuration information for perl 5.14.2: Configured by rajagopa at Wed May 23 12:48:29 2012. Summary of my perl5 (revision 5 version 14 subversion 2) configuration: Platform: osname=MSWin32, osvers=6.1, archname=MSWin32-x64-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -Ox -GL -fp:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO', optimize='-MD -Zi -DNDEBUG -Ox -GL -fp:precise', cppflags='-DWIN32' ccversion='16.00.30319.01', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\perl\lib\CORE" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' libpth=\lib libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl514.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg -libpath:"c:\perl\lib\CORE" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' Locally applied patches: --- @INC for perl 5.14.2: c:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/site/lib c:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/lib . --- Environment for perl 5.14.2: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=c:\msys\1.0\software\lib\Windows-x64\VS2010\perl-5.14.2\bin;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Windows\Microsoft.NET\Framework64\v3.5;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\VCPackages;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools;C:\Program Files (x86)\HTML Help Workshop;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools\x64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\x64;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\x64;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin;C:\Program Files (x86)\Windows Resource Kits\Tools\;C:\Program Files (x86)\Intel\Compiler\11.1\065\lib\Intel64;C:\Program Files (x86)\Intel\Compiler\11.1\065\lib\ia32;C:\Program Files (x86)\Intel\Compiler\11.1\065\mkl\em64t\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;c:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\Microsoft Windows Performance Toolkit\;C:\Program Files (x86)\AMD\CodeAnalyst\bin PERL_BADLANG (unset) SHELL (unset)
Download test.bat.txt
text/plain 105b

Message body is not shown because sender requested not to inline it.

Download test1.c
text/x-csrc 480b

Message body is not shown because sender requested not to inline it.

Download test.pl
text/x-perl 389b

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 355b
First issue, why are there 2 wcslens for the same data in the loop? Second issue, I'm not sure what the Perl Interp's standards are for the last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 aware/internally, maybe we should be using lpUsedDefaultChar and return a UTF8 string if it the flag is true after the length run of WideCharToMultiByte.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 745b
On Thu Jun 07 08:41:48 2012, bulk88. wrote: Show quoted text
> First issue, why are there 2 wcslens for the same data in the loop? >
I just maintained the same way it was handled earlier for GetEnvironmentStrings(..). The value could be saved and used for the next one just to avoid the two operations. Show quoted text
> Second issue, I'm not sure what the Perl Interp's standards are for
the Show quoted text
> last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 > aware/internally, maybe we should be using lpUsedDefaultChar and
return Show quoted text
> a UTF8 string if it the flag is true after the length run of > WideCharToMultiByte.
That would require a huge change in other functions AFAIK. This is the least change I can think of without disturbing other implementations. -Karthik
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1007b
BTW, I need free ascii encdoing strings once they are processed. Once I hear from perl developers about my change, I will be able to make this change quickly. -Karthik On Thu Jun 07 08:49:05 2012, kartlee05 wrote: Show quoted text
> On Thu Jun 07 08:41:48 2012, bulk88. wrote:
> > First issue, why are there 2 wcslens for the same data in the loop? > >
> > I just maintained the same way it was handled earlier for > GetEnvironmentStrings(..). The value could be saved and used for the > next one just to avoid the two operations. >
> > Second issue, I'm not sure what the Perl Interp's standards are for
> the
> > last 2 parameters of WideCharToMultiByte. Since Perl is UTF8 > > aware/internally, maybe we should be using lpUsedDefaultChar and
> return
> > a UTF8 string if it the flag is true after the length run of > > WideCharToMultiByte.
> > That would require a huge change in other functions AFAIK. This is the > least change I can think of without disturbing other implementations. > > -Karthik >
CC: bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings
Date: Thu, 7 Jun 2012 22:33:33 +0100
To: perl5-porters [...] perl.org
From: Steve Hay <steve.m.hay [...] googlemail.com>
Download (untitled) / with headers
text/plain 2.2k
On 6 June 2012 22:59, Karthik Rajagopalan <perlbug-followup@perl.org> wrote: Show quoted text
> # New Ticket Created by  Karthik Rajagopalan > # Please include the string:  [perl #113536] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=113536 > > > > Hi, > > Before launching a sub-process using win32_spawnvp of win32.c, we try > to get the environment block of the parent process by > PerlEnv_get_children. This is defined to call CreateChildEnv(.)-> > CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of > win32 directory. This in turn call kernel32.dll's > GetEnvironmentStrings(..). The ansi version of the function return the > strings > in OEM code page. So this mess up the string while passing to the > subprocess if it has non-ascii characters. The patch attached ( wrt > main branch 5.17  ) address the issue by calling > GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in > a single shot. It is then used to construct the new environment block > for the sub-process. > > You will observe this issue in all versions of perl including 5.17 of > your main branch. > > To reproduce the issue, following the below recipe - >
Thanks for the report. I have reproduced this with bleadperl and am interested in your fix, but I have several concerns: - You've added a large duplicated block of code in place of what was originally a GetEnvironmentStrings() call. You should factor out your new code into its own function and simply call that from each place concerned. - Where do you free() the memory that was calloc()ed? If you factor out your code into its own function then you could have a corresponding 'free' function for the callers to use when they are done with the return value, much like they currently call FreeEnvironmentStrings(). - Is calloc() definitely the right thing to call? I'm wondering if it should be win32_calloc(), or are they the same thing? (I always get lost in this maze of #defines...) - Perhaps your WideCharToMultiByte() calls should make use of the WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does (which I also notice calls win32_malloc() / win32_realloc() / win32_free() rather than just malloc() / realloc() / free()...).
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.8k
Hi Steve, Show quoted text
> > Thanks for the report. I have reproduced this with bleadperl and am > interested in your fix, but I have several concerns: > > - You've added a large duplicated block of code in place of what was > originally a GetEnvironmentStrings() call. You should factor out your > new code into its own function and simply call that from each place > concerned.
Sounds good. Show quoted text
> > - Where do you free() the memory that was calloc()ed? If you factor > out your code into its own function then you could have a > corresponding 'free' function for the callers to use when they are > done with the return value, much like they currently call > FreeEnvironmentStrings().
Just before the function CreateLocalEnvironmentStrings(..) and Cleanenv(..) return, I planned to have code - // release the ANSI encoding process environment strings // for CreateLocalEnvironmentStrings(..) if(lpAllocPtr) free((char*)lpAllocPtr); // for Clearenv(..) if(lpEnvPtr) free((char*)lpEnvPtr); You suggestion of moving to a separate function for caller to use, sounds good to me. Show quoted text
> > - Is calloc() definitely the right thing to call? I'm wondering if it > should be win32_calloc(), or are they the same thing? (I always get > lost in this maze of #defines...)
win32_calloc(..) in turn call calloc(..) ( see win32.c ). I can make use of win32_calloc(..) in new function. Show quoted text
> > - Perhaps your WideCharToMultiByte() calls should make use of the > WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does > (which I also notice calls win32_malloc() / win32_realloc() / > win32_free() rather than just malloc() / realloc() / free()...).
Since we are not going to deal just with PATH in environment strings, do you think this flag make sense in this case? I will submit a patch based on your suggestion by today. BTW, did you get a chance to try with my patch? Did it solve the issue? -Karthik
CC: perl5-porters [...] perl.org
Subject: Re: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings
Date: Fri, 8 Jun 2012 21:17:02 +0100
To: perlbug-followup [...] perl.org
From: Steve Hay <steve.m.hay [...] googlemail.com>
Download (untitled) / with headers
text/plain 1.1k
On 7 June 2012 23:20, Karthik Rajagopalan via RT <perlbug-followup@perl.org> wrote: Show quoted text
>> >> - Perhaps your WideCharToMultiByte() calls should make use of the >> WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does >> (which I also notice calls win32_malloc() / win32_realloc() / >> win32_free() rather than just malloc() / realloc() / free()...).
> > Since we are not going to deal just with PATH in environment strings, do > you think this flag make sense in this case?
I was thinking about things in the environment (like usernames!...) which contain characters that are not representable in the current ANSI codepage. What is the best thing to do with them, given that WideCharToMultiByte() cannot do exactly what is asked of it in such cases? The win32_ansipath() function uses that flag, and MSDN suggests using it for usernames too: "For strings that require validation, such as file, resource, and user names, the application should always use the WC_NO_BEST_FIT_CHARS flag." Show quoted text
> > I will submit a patch based on your suggestion by today. > > > BTW, did you get a chance to try with my patch? Did it solve the issue?
Yes, your patch fixed the problem for me. Looking forward to a revised version :-)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.6k
Hi Shay, On Fri Jun 08 13:17:34 2012, shay wrote: Show quoted text
> On 7 June 2012 23:20, Karthik Rajagopalan via RT > <perlbug-followup@perl.org> wrote:
> >> > >> - Perhaps your WideCharToMultiByte() calls should make use of the > >> WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does > >> (which I also notice calls win32_malloc() / win32_realloc() / > >> win32_free() rather than just malloc() / realloc() / free()...).
> > > > Since we are not going to deal just with PATH in environment strings,
do Show quoted text
> > you think this flag make sense in this case?
> > I was thinking about things in the environment (like usernames!...) > which contain characters that are not representable in the current > ANSI codepage. What is the best thing to do with them, given that > WideCharToMultiByte() cannot do exactly what is asked of it in such > cases? The win32_ansipath() function uses that flag, and MSDN suggests > using it for usernames too: "For strings that require validation, such > as file, resource, and user names, the application should always use > the WC_NO_BEST_FIT_CHARS flag."
I was thinking for env other than username. This could have been taken by default by MS folks if it is really necessary to consider. Anyway, I now used the same in my patch. Show quoted text
> >
> > > > I will submit a patch based on your suggestion by today. > > > > > > BTW, did you get a chance to try with my patch? Did it solve the issue?
> > Yes, your patch fixed the problem for me. Looking forward to a revised > version :-) >
I now have your comments incorporated in my attached patch. Please go- through it and let me know your opinion. -Karthik
From bb7c5e61c26c2634be8e508bf0514f48056b54a9 Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <rajagopa@schrodinger.com> Date: Mon, 11 Jun 2012 15:50:34 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 34 ++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 40 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..1c4c05c 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,40 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + // Get the process environment strings + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + // calculate the size of the environment strings + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.6k
On Mon Jun 11 12:57:53 2012, kartlee05 wrote: Show quoted text
> I now have your comments incorporated in my attached patch. Please go- > through it and let me know your opinion. > > -Karthik > >
Show quoted text
__________________________ + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, aenvstrings_len, 0, 0);
__________________________ I didn't run your code, but you didn't convert to UTF8. You converted to legacy high ASCII, and the conversion was done with character approximation since the 2nd WCTMB didn't get the no best fit flag and since you didn't pass the UTF8 CP, the function will fail if no best fit is passed. I'll post a known good perl to UTF16 windows converter for comparison (its from Win32 module), but its an apples oranges comparison.
________________________________ SV * wstr_to_sv(pTHX_ WCHAR *wstr) { int wlen = (int)wcslen(wstr)+1; BOOL use_default = FALSE; int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen, NULL, 0, NULL, NULL); SV *sv = sv_2mortal(newSV(len)); len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen, SvPVX(sv), len, NULL, &use_default); if (use_default) { len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0, NULL, NULL); sv_grow(sv, len); len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv), len, NULL, NULL); SvUTF8_on(sv); } /* Shouldn't really ever fail since we ask for the required length first, but who knows... */ if (len) { SvPOK_on(sv); SvCUR_set(sv, len-1); } return sv; }
______________________________________
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.1k
Hi, On Tue Jun 12 08:13:09 2012, bulk88. wrote: Show quoted text
> On Mon Jun 11 12:57:53 2012, kartlee05 wrote:
> > I now have your comments incorporated in my attached patch. Please
go- Show quoted text
> > through it and let me know your opinion. > > > > -Karthik > > > >
> __________________________ > + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ > + WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, > aenvstrings_len, 0, 0); > __________________________ > I didn't run your code, but you didn't convert to UTF8. You converted
to Show quoted text
> legacy high ASCII, and the conversion was done with character > approximation since the 2nd WCTMB didn't get the no best fit flag and > since you didn't pass the UTF8 CP, the function will fail if no best
fit Show quoted text
> is passed.
That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal conversion call from UTF-16 to ACP. I will revise the patch and sent it now. BTW, I don't think the win32 code would really work if UTF-8 is internally used for conversion. I see we use lot of calls toUpper, tolower which would not really work with utf-8 directly. So it is good if ACP is used. -Karthik Show quoted text
>I'll post a known good perl to UTF16 windows converter for > comparison (its from Win32 module), but its an apples oranges
comparison. Show quoted text
> > ________________________________ > SV * > wstr_to_sv(pTHX_ WCHAR *wstr) > { > int wlen = (int)wcslen(wstr)+1; > BOOL use_default = FALSE; > int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, > wlen, NULL, 0, NULL, NULL); > SV *sv = sv_2mortal(newSV(len)); > > len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr,
wlen, Show quoted text
> SvPVX(sv), len, NULL, &use_default); > if (use_default) { > len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0,
NULL, Show quoted text
> NULL); > sv_grow(sv, len); > len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv), > len, NULL, NULL); > SvUTF8_on(sv); > } > /* Shouldn't really ever fail since we ask for the required length > first, but who knows... */ > if (len) { > SvPOK_on(sv); > SvCUR_set(sv, len-1); > } > return sv; > } > ______________________________________
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 516b
Show quoted text
> That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal > conversion call from UTF-16 to ACP. I will revise the patch and sent
it Show quoted text
> now. > > BTW, I don't think the win32 code would really work if UTF-8 is > internally used for conversion. I see we use lot of calls toUpper, > tolower which would not really work with utf-8 directly. So it is good > if ACP is used. > > -Karthik
I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual conversion call. Please give a shot. -Karthik
From b77244181d5feea493dfc861434ea27a7b4825ef Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <rajagopa@schrodinger.com> Date: Tue, 12 Jun 2012 11:37:29 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 35 +++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..de7b9aa 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + // Get the process environment strings + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + // calculate the size of the environment strings + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to UTF-8 encoding */ + WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, + aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 340b
I've got a problem. I think your patch's code is unreachable. I compiled a blead 5.17.0 and applied your latest patch. win32_getenvironmentstrings is called by CPerlHost::CreateLocalEnvironmentStrings which is called by CreateChildEnv which is called by PerlEnvGetChildenv which is called by nothing and isn't in any file except perlhost.h.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 3.2k
On Tue Jun 12 16:48:28 2012, bulk88. wrote: Show quoted text
> I've got a problem. I think your patch's code is unreachable. I compiled > a blead 5.17.0 and applied your latest patch. > win32_getenvironmentstrings is called by > CPerlHost::CreateLocalEnvironmentStrings which is called by > CreateChildEnv which is called by PerlEnvGetChildenv which is called by > nothing and isn't in any file except perlhost.h.
Show quoted text
_______________________________________ C:\p517\perl\win32>perl -V Summary of my perl5 (revision 5 version 17 subversion 0) configuration: Platform: osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -Od -MD -Zi -DDEBUGGING -DWIN32 -D_CONSOL E -DNO_STRICT -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_S YS -DUSE_PERLIO -D_USE_32BIT_TIME_T', optimize='-Od -MD -Zi -DDEBUGGING', cppflags='-DWIN32' ccversion='13.10.6030', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi ze=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -libpath:"c:\perl517\lib\ CORE" -machine:x86' libpth="C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\lib" libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.l ib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32 .lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt. lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg 32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws 2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msv crt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -libpath:"c:\p erl517\lib\CORE" -machine:x86' Characteristics of this binary (from libperl): Compile-time options: DEBUGGING HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_TRACK_MEMPOOL PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under MSWin32 Compiled at Jun 12 2012 19:06:01 %ENV: PERL_JSON_BACKEND="JSON::XS" PERL_YAML_BACKEND="YAML" @INC: C:/perl517/site/lib C:/perl517/lib . C:\p517\perl\win32>
___________________________
RT-Send-CC: perl5-porters [...] perl.org
disregard the last 2 posts, i found the place where its called
Download (untitled) / with headers
text/plain 941b
You can't use CP_ACP unless your intending for your patch to only work with utf16 characters which are representable in the current system CP's high ascii. Is this what you want? From lpWStr Show quoted text
___________________________________________ 0x00010D40 5c 00 57 00 49 00 4e 00 44 00 4f 00 57 00 53 00 5c 00 54 00 45 00 4d 00 50 00 00 00 75 00 6e 00 \WINDOWS\TEMP␀un 0x00010D60 69 00 63 00 6f 00 64 00 65 00 3d 00 31 04 43 04 3b 04 3a 04 00 00 55 00 53 00 45 00 52 00 44 00 icode=булк␀USERD
___________________________________________ After conversion WideCharToMultiByte
___________________________________________ 0x00346B04 44 4f 57 53 5c 54 45 4d 50 00 75 6e 69 63 6f 64 DOWS\TEMP.unicod 0x00346B14 65 3d 3f 3f 3f 3f 00 55 53 45 52 44 4f 4d 41 49 e=????.USERDOMAI
___________________________________________ those "?" are real "?", 0x3f. The unicode does NOT pass through to the child process for me. I'll try more tomorrow.
Download env.pl
text/x-perl 1.6k
use utf8; # so literals and identifiers can be in UTF-8 use strict; # quote strings, declare variables use warnings; # on by default use warnings qw(FATAL utf8); # fatalize encoding glitches use open qw(:std :utf8); # undeclared streams in UTF-8 use Win32::API 0.70 qw ( ReadMemory ); use Encode; use Data::Dumper; use Devel::Peek qw( Dump ); my $SEVW = Win32::API::More->new('kernel32.dll', 'BOOL SetEnvironmentVariableW( LPWSTR lpName, LPWSTR lpValue ); '); my $GEVW = Win32::API::More->new('kernel32.dll', 'DWORD GetEnvironmentVariableW( LPWSTR lpName, LPWSTR lpBuffer, DWORD nSize ); '); my $strlen = Win32::API::More->new('kernel32.dll', 'lstrlenA', 'N', 'i'); my $name = "unicode\x00"; my $val = "булк"; my $widename = Encode::encode("UTF-16LE",$name,1); print "len widename=".length($widename)."\n"; my $wideval = Encode::encode("UTF-16LE",$val,1); print "len wideval=".length($wideval)."\n"; my $widevalres = "\x00" x 10; die "SEVW" if ! $SEVW->Call($widename, $wideval); print "len/res GEVW=".$GEVW->Call($widename, $widevalres, length($widevalres)/2)."\n"; $widevalres = decode("UTF-16LE", $widevalres, 1); print "GEVW=".$widevalres."\n"; print "Dumping GEVW\n"; DumpHex($widevalres); print "ENV HASH".$ENV{unicode}."\n"; print "Dumping %ENV way\n"; DumpHex($ENV{unicode}); system($^X." envread.pl"); sub DumpHex{ my $var = $_[0]; my $varplain; my $varptr = unpack('J', pack('p', $var)); my $realvarlen = $strlen->Call($varptr); $varplain = ReadMemory($varptr, $realvarlen); Dump($var); print "as Hex \"".unpack('H['.($realvarlen*2).']', $varplain)."\"\n"; }
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 162b
The attachment is a accidentally posted work in progress debugging scripts for the patch, it uses a release of Win32::API. I'll have better test scripts tomorrow.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 221b
On Tue Jun 12 22:57:14 2012, bulk88. wrote: Show quoted text
> The attachment is a accidentally posted work in progress debugging > scripts for the patch, it uses a FUTURE release of Win32::API. I'll
have better Show quoted text
> test scripts tomorrow.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Tue Jun 12 08:41:15 2012, kartlee05 wrote: Show quoted text
>
> > That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal > > conversion call from UTF-16 to ACP. I will revise the patch and sent
> it
> > now. > > > > BTW, I don't think the win32 code would really work if UTF-8 is > > internally used for conversion. I see we use lot of calls toUpper, > > tolower which would not really work with utf-8 directly. So it is good > > if ACP is used. > > > > -Karthik
> > I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual > conversion call. Please give a shot. >
I think there is a bug in this and the previous patch (but not your original one): the calculation of 'wenvstrings_len' wrongly sets 'size' to 1 at the start of the for-loop, but never uses 'size' again. It should have set 'wenvstrings_len' to 1 instead, otherwise that ends up with the wrong value and causes many, many tests to fail when running the full regression test suite. After changing for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { to for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { the patch looks good to me code-wise, but I will await any further input from bulk88 before committing. Could you also fix up a couple of the comments, though, please? The comment about converting from UTF-16 to UTF-8 should say *to the ANSI codepage* instead. You also use C++ style comments in two places, but perl always uses C style comments since most files are C and not all C compilers accept C++ style comments.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.6k
Okay, here is the problem with the patch, the env var is NOT corrupted, only, if the env var is a utf8 char, that can be represented in the current system high ascii CP. If the utf8 char can not be represented in the current system high ascii CP, it is converted to 0x3F. On pre-patch Perls, the previous case results in a different letter, but not "?" appearing in the child process. I can't explain the letter Z from WE A. I made 2 perl files a parent and a reader and ran it on Perl 5.12 and 5.17 with OP's latest patch (shays code not included). The parent has a option to switch between a western europe char, and a cyrillic char. My system is CP 1252 normally. Make sure your console is chcp 65001 and Lucinda font is set. The parent (env2) was run as console as "perl -C7 env2.pl 1> resp512.txt 2>&1". All the files attached are in UTF8 not high ascii. Must open in a editor and manual switch to UTF8. Notepad and Write dont do it right. Note, that with WE A, 5.17 correctly passed through the WE A, but CYR b turned to ? substitution in the child process. Is this the correct behavior or the behavior we want or should the WideCharToMultiBytes be done to UTF8 and not system default CP? Then the issue is, do we want to pass a non UTF-16 narrow-ish env array with UTF8 through the ANSI/narrow CreateProcess that Perl uses? I decided NOT to study how %ENV works just to post this. I saw some strange behavior with %ENV and I wasn't sure whats correct and whats wrong behavior so I dropped the %ENV issue. If someone knows how %ENV is "supposed" to behave on windows with perl, or writes a "known good" script (on Unix or Windows) that uses %ENV for me to compare my results to, I can investigate the %ENV behavior I saw.
Download env2.pl
text/x-perl 2.6k
use strict; # quote strings, declare variables use warnings; # on by default use utf8; # so literals and identifiers can be in UTF-8 use Win32::API 0.70 qw ( ReadMemory ); use Encode; use Data::Dumper; use Devel::Peek qw( Dump ); binmode STDOUT, ":utf8"; my $SEVW = Win32::API::More->new('kernel32.dll', 'BOOL SetEnvironmentVariableW( LPWSTR lpName, LPWSTR lpValue ); '); my $GEVW = Win32::API::More->new('kernel32.dll', 'DWORD GetEnvironmentVariableW( LPWSTR lpName, LPWSTR lpBuffer, DWORD nSize ); '); my $strlen = Win32::API::More->new('kernel32.dll', 'lstrlenA', 'N', 'i'); system('chcp'); print "env2.pl here бл\n"; my $name = "unicode\x00"; $name = Encode::encode("UTF-16LE",$name,1); die "bad len" if length($name) != 16; my $val = "AAA". #"б" "Ä" ."AAA\x00"; my $origval = $val; chop($origval); $val = Encode::encode("UTF-16LE",$val,1); die "bad len" if length($name) != 16; die "SEVW" if ! $SEVW->Call($name, $val); my $get = GetEnvironmentVariableW("unicode\x00"); die "bad Encode" if $get ne $origval; DumpHex($get); print "perl ver parent $^V\n"; system($^X." -C7 envread.pl"); sub DumpHex{ my $var = $_[0]; my $realvarlen; if(@_ == 2 ){ $realvarlen = $_[1]; } my $varplain; my $varptr = unpack('J', pack('p', $var)); if(! defined $realvarlen){ $realvarlen = $strlen->Call($varptr); } $varplain = ReadMemory($varptr, $realvarlen); print "Dump in Perl\n"; Dump($var); print "as Hex bytes \"".unpack('H['.($realvarlen*2).']', $varplain)."\"\n"; print "to console bytes\n\"".$varplain."\"\n".Dumper($varplain); print "to console Perl\n\"".$var."\"\n".Dumper($var)."\n"; } sub GetEnvironmentVariableW{ my $name = $_[0]; $name .= "\x00";#8 to 16 conv wont make wide null otherwise $name = Encode::encode("UTF-16LE",$name,1); my $buflen = $GEVW->Call($name, undef, 0); #first get len, no buffer die "GetEnvironmentVariableW failed $!" if ! $buflen; my $val = "\x00" x $buflen; $val = Encode::encode("UTF-16LE",$val,1); #die "bad len" if length($val) != 16; {my $len = $GEVW->Call($name, $val, $buflen); die "GetEnvironmentVariableW failed $!" if ! $len || $len >= $buflen || $len != ($buflen-1); } $buflen --; $val= substr($val, 0, $buflen * 2); #chop off null #print "as utf16\n"; #DumpHex($val, $buflen * 2); #die "bad len" if length($val) != 14; #print "after decode\n"; $val = Encode::decode("UTF-16LE", $val, 1); #die "bad len" if length($val) != 7; #DumpHex($val); return $val; } END { }
Download resp517CYR.txt
text/plain 499b
Active code page: 65001 SV = PV(0xb17c5c) at 0xac764c REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa9138c "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141d0b1414141" to console bytes "AAAбAAA" $VAR1 = "AAA\320\261AAA"; to console Perl "AAAбAAA" $VAR1 = "AAA\x{431}AAA"; perl ver parent v5.17.0 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAA?AAA perl ver reader v5.17.0
Download resp517WE.txt
text/plain 498b
Active code page: 65001 SV = PV(0xb17c5c) at 0xac764c REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa9138c "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141c384414141" to console bytes "AAAÄAAA" $VAR1 = "AAA\303\204AAA"; to console Perl "AAAÄAAA" $VAR1 = "AAA\x{c4}AAA"; perl ver parent v5.17.0 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAAÄAAA perl ver reader v5.17.0
Download resp512CYR.txt
text/plain 499b
Active code page: 65001 SV = PV(0x9d820c) at 0xa39a14 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa46574 "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141d0b1414141" to console bytes "AAAбAAA" $VAR1 = "AAA\320\261AAA"; to console Perl "AAAбAAA" $VAR1 = "AAA\x{431}AAA"; perl ver parent v5.12.2 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAA?AAA perl ver reader v5.12.2
Download envread.pl
text/x-perl 2.1k
use strict; # quote strings, declare variables use warnings; # on by default use utf8; # so literals and identifiers can be in UTF-8 use Win32::API 0.70 qw ( ReadMemory ); use Encode; use Data::Dumper; use Devel::Peek qw( Dump ); binmode STDOUT, ":utf8"; print "\n\nNEW PROCESS\n\nin ENVREAD.PL\nбл\n"; system('chcp'); my $GEVW = Win32::API::More->new('kernel32.dll', 'DWORD GetEnvironmentVariableW( LPWSTR lpName, LPWSTR lpBuffer, DWORD nSize ); '); my $strlen = Win32::API::More->new('kernel32.dll', 'lstrlenA', 'N', 'i'); sub GetEnvironmentVariableW{ my $name = $_[0]; $name .= "\x00";#8 to 16 conv wont make wide null otherwise $name = Encode::encode("UTF-16LE",$name,1); my $buflen = $GEVW->Call($name, undef, 0); #first get len, no buffer die "GetEnvironmentVariableW failed $!" if ! $buflen; my $val = "\x00" x $buflen; $val = Encode::encode("UTF-16LE",$val,1); #die "bad len" if length($val) != 16; {my $len = $GEVW->Call($name, $val, $buflen); die "GetEnvironmentVariableW failed $!" if ! $len || $len >= $buflen || $len != ($buflen-1); } $buflen --; $val= substr($val, 0, $buflen * 2); #chop off null #print "as utf16\n"; #DumpHex($val, $buflen * 2); #die "bad len" if length($val) != 14; #print "after decode\n"; $val = Encode::decode("UTF-16LE", $val, 1); #die "bad len" if length($val) != 7; #DumpHex($val); return $val; } #my $val = GetEnvironmentVariableW("unicode"); #print "GEVW\n"; #DumpHex($val); #print "%ENV\n"; #DumpHex($ENV{unicode}); system('set unicode'); print "perl ver reader $^V\n"; sub DumpHex{ my $var = $_[0]; my $realvarlen; if(@_ == 2 ){ $realvarlen = $_[1]; } my $varplain; my $varptr = unpack('J', pack('p', $var)); if(! defined $realvarlen){ $realvarlen = $strlen->Call($varptr); } $varplain = ReadMemory($varptr, $realvarlen); Dump($var); print "as Hex bytes \"".unpack('H['.($realvarlen*2).']', $varplain)."\"\n"; print "to console bytes\n\"".$varplain."\"\n".Dumper($varplain); print "to console Perl\n\"".$var."\"\n".Dumper($var)."\n"; }
Download resp512WE.txt
text/plain 497b
Active code page: 65001 SV = PV(0x9d820c) at 0xa39a14 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xa46574 "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"] CUR = 8 LEN = 12 env2.pl here бл Dump in Perl as Hex bytes "414141c384414141" to console bytes "AAAÄAAA" $VAR1 = "AAA\303\204AAA"; to console Perl "AAAÄAAA" $VAR1 = "AAA\x{c4}AAA"; perl ver parent v5.12.2 NEW PROCESS in ENVREAD.PL бл Active code page: 65001 unicode=AAAZAAA perl ver reader v5.12.2
CC: <perl5-porters [...] perl.org>
Subject: RE: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings
Date: Thu, 14 Jun 2012 08:56:20 +0100
To: <perlbug-followup [...] perl.org>, "OtherRecipients of perl Ticket #113536" <"OtherRecipients of perl Ticket #113536:;" [...] planit.com>
From: "Steve Hay" <Steve.Hay [...] verosoftware.com>
Download (untitled) / with headers
text/plain 2.4k
bulk88 via RT wrote on 2012-06-13: Show quoted text
> Okay, here is the problem with the patch, the env var is NOT > corrupted, only, if the env var is a utf8 char, that can be > represented in the current system high ascii CP. If the utf8 char can > not be represented in the current system high ascii CP, it is > converted to 0x3F. On pre- patch Perls, the previous case results in a > different letter, but not "?" > appearing in the child process.
Thanks for the sample programs. I intend to have a look through them when I have more time, but if I understand you correctly the problem here is only that things are still not working if the environment contains characters which cannot be represented in the current ANSI codepage? That is to be expected, given the widechar (UTF-16) -> ANSI codepage conversion being done, and the ? character, of course, comes from the WideCharToMultiByte() calls when they fail to map a unicode character to the target encoding. On pre-patch perls something different happens because GetEnvironmentStringsA() was being used which returns things in the OEM codepage and must presumably have some algorithm of its own for handling unicode characters that cannot be mapped to that encoding, hence different characters (but still the wrong ones!) appearing instead of ?. I still think the patch is worthwhile (with the fix I posted previously) because it fixes things for the common case where the environment contains characters which *are* in the current ANSI codepage but either not in the current OEM codepage (because the GetEnvironmentStringsA() conversion from unicode to OEM would have failed) or else (more commonly) in the OEM codepage but at a different byte position (because perl then treated the OEM bytes as if they were the ANSI bytes which are normally used). So the only problem remaining is what happens in the case when characters outside of the current ANSI codepage appear in the environment, and that would be worth logging as a separate bug. It isn't such a common case, and there are likely to be problems in other areas of the code in that case anyway, e.g. you can't open a file whose name contains characters outside of the ANSI codepage without jumping through some hoops rather than using the built-in open() function. If you agree and have no other issues with the patch then I'll apply it and raise the non-ANSI characters problem as a separate bug (and attach your scripts etc).
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.7k
On Thu Jun 14 00:56:57 2012, Steve.Hay@verosoftware.com wrote: Show quoted text
> So the > only problem remaining is what happens in the case when characters > outside of the current ANSI codepage appear in the environment, and > that would be worth logging as a separate bug. It isn't such a > common case, and there are likely to be problems in other areas of > the code in that case anyway, e.g. you can't open a file whose name > contains characters outside of the ANSI codepage without jumping > through some hoops rather than using the built-in open() function. > If you agree and have no other issues with the patch then I'll > apply it and raise the non-ANSI characters problem as a separate > bug (and attach your scripts etc).
I agree. You can apply the patch. Ultimately one day all of %INC support in perlhost.h will have have to be made to use the wide apis so all wide char env vars roundtrip correctly between processes and through %INC. Note, to test this patch I used the raw wide win API since %INC was unusable. Currently perlhost.h (which implements %INC) either queries its own cache of env, or goes out to windows API. http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2402 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2084 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/perlhost.h#l2134 http://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c:/win32/win32.c#l1653 Plus hv.c will need some fixing to add the utf8 flag to the %INC if the env var isnt low ascii. Anyways, I'm done. You can apply the patch, and there should be enough notes in this ticket for someone in the future to work on the remainder of the problems.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.8k
On Wed Jun 13 00:22:43 2012, shay wrote: Show quoted text
> On Tue Jun 12 08:41:15 2012, kartlee05 wrote:
> >
> > > That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in
acutal Show quoted text
> > > conversion call from UTF-16 to ACP. I will revise the patch and
sent Show quoted text
> > it
> > > now. > > > > > > BTW, I don't think the win32 code would really work if UTF-8 is > > > internally used for conversion. I see we use lot of calls toUpper, > > > tolower which would not really work with utf-8 directly. So it is
good Show quoted text
> > > if ACP is used. > > > > > > -Karthik
> > > > I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual > > conversion call. Please give a shot. > >
> > I think there is a bug in this and the previous patch (but not your > original one): the calculation of 'wenvstrings_len' wrongly sets
'size' Show quoted text
> to 1 at the start of the for-loop, but never uses 'size' again. It > should have set 'wenvstrings_len' to 1 instead, otherwise that ends up > with the wrong value and causes many, many tests to fail when running > the full regression test suite. > > After changing > > for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > to > > for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > the patch looks good to me code-wise, but I will await any further
input Show quoted text
> from bulk88 before committing.
I fixed the bug in my patch which failed to account for last '\0' char in the environment strings. Show quoted text
> > Could you also fix up a couple of the comments, though, please? The > comment about converting from UTF-16 to UTF-8 should say *to the ANSI > codepage* instead. You also use C++ style comments in two places, but > perl always uses C style comments since most files are C and not all C > compilers accept C++ style comments.
This is also fixed now. The revised patch is attached now. Please take a look. -Karthik
From f5bbd57635ff725971d736aed43832a57845491b Mon Sep 17 00:00:00 2001 From: Karthik Rajagopalan <rajagopa@schrodinger.com> Date: Thu, 14 Jun 2012 12:16:15 -0400 Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of GetEnvironmentStringsA(..). GetEnvironmentStringsA(..) return strings in the OEM code page. This can actually mangle the environment strings if it contain special characters. A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..) and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..). To free the block, you can use win32_freeenvironmentstrings(..). --- win32/perlhost.h | 8 ++++---- win32/win32.c | 35 +++++++++++++++++++++++++++++++++++ win32/win32iop.h | 2 ++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/win32/perlhost.h b/win32/perlhost.h index e8f5fb4..ae422ef 100644 --- a/win32/perlhost.h +++ b/win32/perlhost.h @@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) int nLength, compVal; // get the process environment strings - lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings(); + lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings(); // step over current directory stuff while(*lpTmp == '=') @@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir) } // release the process environment strings - FreeEnvironmentStrings(lpAllocPtr); + win32_freeenvironmentstrings(lpAllocPtr); return lpPtr; } @@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void) } /* get the process environment strings */ - lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings(); + lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings(); /* step over current directory stuff */ while(*lpStr == '=') @@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void) lpStr += strlen(lpStr) + 1; } - FreeEnvironmentStrings(lpEnvPtr); + win32_freeenvironmentstrings(lpEnvPtr); } diff --git a/win32/win32.c b/win32/win32.c index 7f2444b..e496837 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename) } DllExport char * +win32_getenvironmentstrings(void) +{ + LPWSTR lpWStr, lpWTmp; + LPSTR lpStr, lpTmp; + DWORD env_len, wenvstrings_len = 0, aenvstrings_len = 0; + + /* Get the process environment strings */ + lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW(); + for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { + env_len = wcslen(lpWTmp); + /* calculate the size of the environment strings */ + wenvstrings_len += env_len + 1; + } + + /* Get the number of bytes required to store the UTF16 encoded string */ + aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, + lpWStr, wenvstrings_len, NULL, 0, 0, 0); + lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char)); + if(!lpTmp) + out_of_memory(); + + /* Convert the string from UTF-16 encoding to ACP encoding */ + WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, + aenvstrings_len, 0, 0); + + return(lpStr); +} + +DllExport void +win32_freeenvironmentstrings(void* block) +{ + win32_free(block); +} + +DllExport char * win32_getenv(const char *name) { dTHX; diff --git a/win32/win32iop.h b/win32/win32iop.h index 373e3e3..cbc9716 100644 --- a/win32/win32iop.h +++ b/win32/win32iop.h @@ -126,6 +126,8 @@ DllExport void win32_rewinddir(DIR *dirp); DllExport int win32_closedir(DIR *dirp); DllExport DIR* win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param); +DllExport char* win32_getenvironmentstrings(void); +DllExport void win32_freeenvironmentstrings(void *block); DllExport char* win32_getenv(const char *name); DllExport int win32_putenv(const char *name); -- 1.7.7.1
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 349b
On Tue Jun 12 22:54:32 2012, bulk88. wrote: Show quoted text
> You can't use CP_ACP unless your intending for your patch to only work > with utf16 characters which are representable in the current system CP's > high ascii. Is this what you want?
Yes, there are lot of places in win32 which still depend on ACP calls. So I just went ahead to do the same. -Karthik
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 693b
On Thu Jun 14 09:02:31 2012, bulk88. wrote: Show quoted text
> On Thu Jun 14 00:56:57 2012, Steve.Hay@verosoftware.com wrote:
> > So the > > only problem remaining is what happens in the case when
> characters
> > outside of the current ANSI codepage appear in the environment,
> and
> > that would be worth logging as a separate bug.
> > I agree. [...] Anyways, I'm done. You can apply the patch, > and > there should be enough notes in this ticket for someone in the future > to > work on the remainder of the problems.
I added a note to perltodo.pod in commit 799c141b0d referring to this ticket since the problem of Unicode in the environment (and in filenames etc) is already a known problem.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 162b
On Thu Jun 14 09:20:33 2012, kartlee05 wrote: Show quoted text
> The revised patch is attached now. Please take a look. >
Thanks, your patch is now applied as commit 4f46e52b00.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 409b
Hi Steve, On Fri Jun 15 00:57:16 2012, shay wrote: Show quoted text
> On Thu Jun 14 09:20:33 2012, kartlee05 wrote:
> > The revised patch is attached now. Please take a look. > >
> > Thanks, your patch is now applied as commit 4f46e52b00.
Thanks for applying this change. Will this be in 5.17? BTW, do you want me to take a look and suggest changes that we would need to fully support wide version for win32? -Karthik
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 529b
On Fri Jun 15 07:13:34 2012, kartlee05 wrote: Show quoted text
> Hi Steve, > > On Fri Jun 15 00:57:16 2012, shay wrote:
> > On Thu Jun 14 09:20:33 2012, kartlee05 wrote:
> > > The revised patch is attached now. Please take a look. > > >
> > > > Thanks, your patch is now applied as commit 4f46e52b00.
> > Thanks for applying this change. Will this be in 5.17?
I meant in next minor version for it? Show quoted text
>BTW, do you want > me to take a look and suggest changes that we would need to fully > support wide version for win32? > > -Karthik >
Subject: Re: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings
Date: Thu, 14 Jun 2012 09:16:10 -0400
To: "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>
From: Kartlee Schrodinger <karthik.rajagopalan [...] schrodinger.com>
Download (untitled) / with headers
text/plain 1.8k
Sent from my iPhone On Jun 13, 2012, at 3:22 AM, "Steve Hay via RT" <perlbug-followup@perl.org> wrote: Show quoted text
> On Tue Jun 12 08:41:15 2012, kartlee05 wrote:
>>
>>> That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal >>> conversion call from UTF-16 to ACP. I will revise the patch and sent
>> it
>>> now. >>> >>> BTW, I don't think the win32 code would really work if UTF-8 is >>> internally used for conversion. I see we use lot of calls toUpper, >>> tolower which would not really work with utf-8 directly. So it is good >>> if ACP is used. >>> >>> -Karthik
>> >> I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual >> conversion call. Please give a shot. >>
> > I think there is a bug in this and the previous patch (but not your > original one): the calculation of 'wenvstrings_len' wrongly sets 'size' > to 1 at the start of the for-loop, but never uses 'size' again. It > should have set 'wenvstrings_len' to 1 instead, otherwise that ends up > with the wrong value and causes many, many tests to fail when running > the full regression test suite. > > After changing > > for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) { > > to > > for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {
I agree. I have not taken care for the additional null char to indicate the end of environment strings in my last two patches. I will add this now and resend. Show quoted text
> > the patch looks good to me code-wise, but I will await any further input > from bulk88 before committing. > > Could you also fix up a couple of the comments, though, please? The > comment about converting from UTF-16 to UTF-8 should say *to the ANSI > codepage* instead. You also use C++ style comments in two places, but > perl always uses C style comments since most files are C and not all C > compilers accept C++ style comments.
Sure, I will do that and send you in an hour. -Karthik
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.1k
On Fri Jun 15 07:13:34 2012, kartlee05 wrote: Show quoted text
> Hi Steve, > > On Fri Jun 15 00:57:16 2012, shay wrote:
> > On Thu Jun 14 09:20:33 2012, kartlee05 wrote:
> > > The revised patch is attached now. Please take a look. > > >
> > > > Thanks, your patch is now applied as commit 4f46e52b00.
> > Thanks for applying this change. Will this be in 5.17? BTW, do you want > me to take a look and suggest changes that we would need to fully > support wide version for win32? >
The change will indeed be in 5.17.1, due out in a couple of days. Support for wide api calls on Win32 was actually removed some time ago (5.8.1) since it was not working correctly and it was deemed, IIRC, that it would be more worthwhile to address the wider issue of supporting Unicode in any filesystems that support it rather than fixing the broken support for just Windows NTFS, hence the items in Porting/todo.pod about "Unicode in Filenames", "Unicode in %ENV" and "Unicode and glob()". For sure, the solution on Windows must involve the wide apis, but what we really need first is a framework for hooking those calls into as described in the item "Virtualize operating system access" in Porting/todo.pod.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 252b
On Mon Jun 18 00:26:30 2012, shay wrote: Show quoted text
> > The change will indeed be in 5.17.1, due out in a couple of days. >
This ticket probably caused the leak reported at https://rt.perl.org/Ticket/Display.html?id=121676 -- bulk88 ~ bulk88 at hotmail.com


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org