Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetEnvironmentStrings() mess up the values for non-ascii strings #12161

Closed
p5pRT opened this issue Jun 6, 2012 · 48 comments
Closed

GetEnvironmentStrings() mess up the values for non-ascii strings #12161

p5pRT opened this issue Jun 6, 2012 · 48 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 6, 2012

Migrated from rt.perl.org#113536 (status was 'resolved')

Searchable as RT113536$

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2012

From karthik.rajagopalan@schrodinger.com

Hi,

Before launching a sub-process using win32_spawnvp of win32.c, we try
to get the environment block of the parent process by
PerlEnv_get_children. This is defined to call CreateChildEnv(.)->
CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of
win32 directory. This in turn call kernel32.dll's
GetEnvironmentStrings(..). The ansi version of the function return the
strings
in OEM code page. So this mess up the string while passing to the
subprocess if it has non-ascii characters. The patch attached ( wrt
main branch 5.17 ) address the issue by calling
GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in
a single shot. It is then used to construct the new environment block
for the sub-process.

You will observe this issue in all versions of perl including 5.17 of
your main branch.

To reproduce the issue, following the below recipe -

0) Create a user account in windows with non-ascii character, say bærbar
1) Open a cmd shell with this special usename account.
2) Download test1.c, test.bat.txt, test.pl to a directory, say X
3) cd to X.
4) rename test.bat.txt test.bat
5) Run vcvcars32.bat or vcvars64.bat to get Visual Studio environment
( I used VS 2010 to compile it )
6) Run test.bat. This will give test1.exe binary.
7) Set your perl environment in PATH and run perl test.pl

The program should successfully launch test1.exe and get the
CSIDL_LOCAL_APPDATA of the user. Since the USERNAME, USERPROFILE env
get messed up at perl side, you won't be able to retrieve the
CSIDL_LOCAL_APPDATA of the user. With the patch, the program should
successfully retrieve the value.

The ANSI code page of my system while doing this test is
'windows-1252' and OEM code page is 'IBM437'.

My System Details



Flags​:
  category=core
  severity=critical


Site configuration information for perl 5.14.2​:

Configured by rajagopa at Wed May 23 12​:48​:29 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration​:

  Platform​:
  osname=MSWin32, osvers=6.1, archname=MSWin32-x64-multi-thread
  uname=''
  config_args='undef'
  hint=recommended, useposix=true, d_sigaction=undef
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -Ox -GL
-fp​:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE
-D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
-DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
-DUSE_PERLIO',
  optimize='-MD -Zi -DNDEBUG -Ox -GL -fp​:precise',
  cppflags='-DWIN32'
  ccversion='16.00.30319.01', gccversion='', gccosandvers=''
  intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
  ivtype='__int64', ivsize=8, nvtype='double', nvsize=8,
Off_t='__int64', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='link', ldflags ='-nologo -nodefaultlib -debug -opt​:ref,icf
-ltcg -libpath​:"c​:\perl\lib\CORE" -machine​:AMD64
"/manifestdependency​:type='Win32'
name='Microsoft.Windows.Common-Controls' version='6.0.0.0'
processorArchitecture='*' publicKeyToken='6595b64144ccf1df'
language='*'"'
  libpth=\lib
  libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
  perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib
version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
  libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl514.lib
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug
-opt​:ref,icf -ltcg -libpath​:"c​:\perl\lib\CORE" -machine​:AMD64
"/manifestdependency​:type='Win32'
name='Microsoft.Windows.Common-Controls' version='6.0.0.0'
processorArchitecture='*' publicKeyToken='6595b64144ccf1df'
language='*'"'

Locally applied patches​:


@​INC for perl 5.14.2​:
  c​:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/site/lib
  c​:/msys/1.0/software/lib/Windows-x64/VS2010/perl-5.14.2/lib
  .


Environment for perl 5.14.2​:
  HOME (unset)
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=c​:\msys\1.0\software\lib\Windows-x64\VS2010\perl-5.14.2\bin;C​:\Program
Files (x86)\Microsoft Visual Studio
10.0\VC\BIN\amd64;C​:\Windows\Microsoft.NET\Framework64\v4.0.30319;C​:\Windows\Microsoft.NET\Framework64\v3.5;C​:\Program
Files (x86)\Microsoft Visual Studio 10.0\VC\VCPackages;C​:\Program
Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE;C​:\Program Files
(x86)\Microsoft Visual Studio 10.0\Common7\Tools;C​:\Program Files
(x86)\HTML Help Workshop;C​:\Program Files (x86)\Microsoft Visual
Studio 10.0\Team Tools\Performance Tools\x64;C​:\Program Files
(x86)\Microsoft Visual Studio 10.0\Team Tools\Performance
Tools;C​:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\NETFX
4.0 Tools\x64;C​:\Program Files (x86)\Microsoft
SDKs\Windows\v7.0A\bin\x64;C​:\Program Files (x86)\Microsoft
SDKs\Windows\v7.0A\bin;C​:\Program Files (x86)\Windows Resource
Kits\Tools\;C​:\Program Files
(x86)\Intel\Compiler\11.1\065\lib\Intel64;C​:\Program Files
(x86)\Intel\Compiler\11.1\065\lib\ia32;C​:\Program Files
(x86)\Intel\Compiler\11.1\065\mkl\em64t\bin;C​:\Windows\system32;C​:\Windows;C​:\Windows\System32\Wbem;C​:\Windows\System32\WindowsPowerShell\v1.0\;c​:\Program
Files (x86)\Microsoft SQL Server\90\Tools\binn\;C​:\Program
Files\Microsoft Windows Performance Toolkit\;C​:\Program Files
(x86)\AMD\CodeAnalyst\bin
  PERL_BADLANG (unset)
  SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2012

From karthik.rajagopalan@schrodinger.com

cl.exe /c /MD test1.c
link.exe /NODEFAULTLIB test1.obj msvcrt.lib msvcprt.lib shell32.lib kernel32.lib

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2012

From karthik.rajagopalan@schrodinger.com

#include <stdio.h>
#include <Shlobj.h>
#include <windows.h>

int main()

{
	char local_appdata_path[1024];
	printf("ACP %d\n", GetACP());
	printf("OEMCP %d\n", GetOEMCP());
	
	if( SHGetFolderPath(NULL, CSIDL_LOCAL_APPDATA|CSIDL_FLAG_CREATE, NULL,
                      SHGFP_TYPE_CURRENT, local_appdata_path) < 0 )
    {
      puts("Failed to get CSIDL_LOCAL_APPDATA of user");
	
    }
	else {
	  puts("success in getting CSIDL_LOCAL_APPDATA of user");
    }
}

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2012

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2012

From karthik.rajagopalan@schrodinger.com

0001-Use-GetEnvironmentStringsW-.-instead-of-GetEnvironme.patch
From 011f192c113c53705f2bab641e12e37c747a8146 Mon Sep 17 00:00:00 2001
From: Karthik Rajagopalan <rajagopa@schrodinger.com>
Date: Wed, 6 Jun 2012 17:05:11 -0400
Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of
 GetEnvironmentStringsA(..)

GetEnvironmentStringsA(..) return strings in the OEM code page. This
can actually mangle the environment strings if if contain special characters.
A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..)
and convert them to ANSI code page.
---
 win32/perlhost.h |   45 ++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/win32/perlhost.h b/win32/perlhost.h
index e8f5fb4..244c4ac 100644
--- a/win32/perlhost.h
+++ b/win32/perlhost.h
@@ -2258,11 +2258,26 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
 {
     dTHX;
     LPSTR lpStr, lpPtr, lpEnvPtr, lpTmp, lpLocalEnv, lpAllocPtr;
-    DWORD dwSize, dwEnvIndex;
+	LPWSTR lpWAllocPtr, lpWTmp;
+    DWORD dwSize, dwEnvIndex, ansienv_size;
     int nLength, compVal;
 
-    // get the process environment strings
-    lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings();
+    // Get the process environment strings
+    lpWAllocPtr = lpWTmp = (LPWSTR) GetEnvironmentStringsW();
+    for(dwSize = 1; *lpWTmp != '\0'; lpWTmp += wcslen(lpWTmp) + 1) {      
+        // calculate the size of the environment strings
+        dwSize += wcslen(lpWTmp) + 1;	  
+    }
+
+    /* Get the number of bytes required to store the UTF16 encoded string */
+    ansienv_size = WideCharToMultiByte(CP_ACP, 0, lpWAllocPtr, dwSize, NULL, 0, 0, 0);
+    lpAllocPtr = lpTmp = (char *)calloc(ansienv_size, sizeof(char));
+
+    /* Convert the string from UTF-16 encoding to ANSI encoding */
+    WideCharToMultiByte(CP_ACP, 0, lpWAllocPtr, dwSize, lpTmp, ansienv_size, 0, 0);
+
+    // release the process environment strings
+    FreeEnvironmentStringsW(lpWAllocPtr);
 
     // step over current directory stuff
     while(*lpTmp == '=')
@@ -2337,8 +2352,6 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
 	*lpStr = '\0';
     }
 
-    // release the process environment strings
-    FreeEnvironmentStrings(lpAllocPtr);
 
     return lpPtr;
 }
@@ -2364,6 +2377,10 @@ CPerlHost::Clearenv(void)
     dTHX;
     char ch;
     LPSTR lpPtr, lpStr, lpEnvPtr;
+    LPWSTR lpWStr, lpWEnvPtr;
+    DWORD ansienv_size = 0;
+    DWORD dwSize = 0;
+
     if (m_lppEnvList != NULL) {
 	/* set every entry to an empty string */
 	for(DWORD index = 0; index < m_dwEnvCount; ++index) {
@@ -2374,8 +2391,22 @@ CPerlHost::Clearenv(void)
 	}
     }
 
-    /* get the process environment strings */
-    lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings();
+    /* Get the process environment strings */
+    lpWStr = lpWEnvPtr = (LPWSTR) GetEnvironmentStringsW();
+    for(dwSize = 1; *lpWEnvPtr != '\0'; lpWEnvPtr += wcslen(lpWEnvPtr) + 1) {      
+        // calculate the size of the environment strings
+        dwSize += wcslen(lpWEnvPtr) + 1;	  
+    }
+
+    /* Get the number of bytes required to store the UTF16 encoded string */
+    ansienv_size = WideCharToMultiByte(CP_ACP, 0, lpWStr, dwSize, NULL, 0, 0, 0);
+    lpStr = lpEnvPtr = (char *)calloc(ansienv_size, sizeof(char));
+
+    /* Convert the string from UTF-16 encoding to ANSI encoding */
+    WideCharToMultiByte(CP_ACP, 0, lpWStr, dwSize, lpStr, ansienv_size, 0, 0);
+
+    /* Free environment strings */
+    FreeEnvironmentStringsW(lpWStr);
 
     /* step over current directory stuff */
     while(*lpStr == '=')
-- 
1.7.7.1

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

From @bulk88

First issue, why are there 2 wcslens for the same data in the loop?

Second issue, I'm not sure what the Perl Interp's standards are for the
last 2 parameters of WideCharToMultiByte. Since Perl is UTF8
aware/internally, maybe we should be using lpUsedDefaultChar and return
a UTF8 string if it the flag is true after the length run of
WideCharToMultiByte.

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

From karthik.rajagopalan@schrodinger.com

On Thu Jun 07 08​:41​:48 2012, bulk88. wrote​:

First issue, why are there 2 wcslens for the same data in the loop?

I just maintained the same way it was handled earlier for
GetEnvironmentStrings(..). The value could be saved and used for the
next one just to avoid the two operations.

Second issue, I'm not sure what the Perl Interp's standards are for
the
last 2 parameters of WideCharToMultiByte. Since Perl is UTF8
aware/internally, maybe we should be using lpUsedDefaultChar and
return
a UTF8 string if it the flag is true after the length run of
WideCharToMultiByte.

That would require a huge change in other functions AFAIK. This is the
least change I can think of without disturbing other implementations.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

From karthik.rajagopalan@schrodinger.com

BTW, I need free ascii encdoing strings once they are processed. Once I
hear from perl developers about my change, I will be able to make this
change quickly.

-Karthik

On Thu Jun 07 08​:49​:05 2012, kartlee05 wrote​:

On Thu Jun 07 08​:41​:48 2012, bulk88. wrote​:

First issue, why are there 2 wcslens for the same data in the loop?

I just maintained the same way it was handled earlier for
GetEnvironmentStrings(..). The value could be saved and used for the
next one just to avoid the two operations.

Second issue, I'm not sure what the Perl Interp's standards are for
the
last 2 parameters of WideCharToMultiByte. Since Perl is UTF8
aware/internally, maybe we should be using lpUsedDefaultChar and
return
a UTF8 string if it the flag is true after the length run of
WideCharToMultiByte.

That would require a huge change in other functions AFAIK. This is the
least change I can think of without disturbing other implementations.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

From @steve-m-hay

On 6 June 2012 22​:59, Karthik Rajagopalan <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by  Karthik Rajagopalan
# Please include the string​:  [perl #113536]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=113536 >

Hi,

Before launching a sub-process using win32_spawnvp of win32.c, we try
to get the environment block of the parent process by
PerlEnv_get_children. This is defined to call CreateChildEnv(.)->
CreateLocalEnvironmentStrings(..), which reside in perlhost.h file of
win32 directory. This in turn call kernel32.dll's
GetEnvironmentStrings(..). The ansi version of the function return the
strings
in OEM code page. So this mess up the string while passing to the
subprocess if it has non-ascii characters. The patch attached ( wrt
main branch 5.17  ) address the issue by calling
GetEnvironmentStringsW(..) and convert the strings to ANSI encoding in
a single shot. It is then used to construct the new environment block
for the sub-process.

You will observe this issue in all versions of perl including 5.17 of
your main branch.

To reproduce the issue, following the below recipe -

Thanks for the report. I have reproduced this with bleadperl and am
interested in your fix, but I have several concerns​:

- You've added a large duplicated block of code in place of what was
originally a GetEnvironmentStrings() call. You should factor out your
new code into its own function and simply call that from each place
concerned.

- Where do you free() the memory that was calloc()ed? If you factor
out your code into its own function then you could have a
corresponding 'free' function for the callers to use when they are
done with the return value, much like they currently call
FreeEnvironmentStrings().

- Is calloc() definitely the right thing to call? I'm wondering if it
should be win32_calloc(), or are they the same thing? (I always get
lost in this maze of #defines...)

- Perhaps your WideCharToMultiByte() calls should make use of the
WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does
(which I also notice calls win32_malloc() / win32_realloc() /
win32_free() rather than just malloc() / realloc() / free()...).

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2012

From karthik.rajagopalan@schrodinger.com

Hi Steve,

Thanks for the report. I have reproduced this with bleadperl and am
interested in your fix, but I have several concerns​:

- You've added a large duplicated block of code in place of what was
originally a GetEnvironmentStrings() call. You should factor out your
new code into its own function and simply call that from each place
concerned.

Sounds good.

- Where do you free() the memory that was calloc()ed? If you factor
out your code into its own function then you could have a
corresponding 'free' function for the callers to use when they are
done with the return value, much like they currently call
FreeEnvironmentStrings().

Just before the function CreateLocalEnvironmentStrings(..) and
Cleanenv(..) return, I planned to have code -

  // release the ANSI encoding process environment strings
  // for CreateLocalEnvironmentStrings(..)
  if(lpAllocPtr)
  free((char*)lpAllocPtr);

  // for Clearenv(..)
  if(lpEnvPtr)
  free((char*)lpEnvPtr);

You suggestion of moving to a separate function for caller to use,
sounds good to me.

- Is calloc() definitely the right thing to call? I'm wondering if it
should be win32_calloc(), or are they the same thing? (I always get
lost in this maze of #defines...)

win32_calloc(..) in turn call calloc(..) ( see win32.c ). I can make use
of win32_calloc(..) in new function.

- Perhaps your WideCharToMultiByte() calls should make use of the
WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does
(which I also notice calls win32_malloc() / win32_realloc() /
win32_free() rather than just malloc() / realloc() / free()...).

Since we are not going to deal just with PATH in environment strings, do
you think this flag make sense in this case?

I will submit a patch based on your suggestion by today.

BTW, did you get a chance to try with my patch? Did it solve the issue?

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2012

From @steve-m-hay

On 7 June 2012 23​:20, Karthik Rajagopalan via RT
<perlbug-followup@​perl.org> wrote​:

- Perhaps your WideCharToMultiByte() calls should make use of the
WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does
(which I also notice calls win32_malloc() / win32_realloc() /
win32_free() rather than just malloc() / realloc() / free()...).

Since we are not going to deal just with PATH in environment strings, do
you think this flag make sense in this case?

I was thinking about things in the environment (like usernames!...)
which contain characters that are not representable in the current
ANSI codepage. What is the best thing to do with them, given that
WideCharToMultiByte() cannot do exactly what is asked of it in such
cases? The win32_ansipath() function uses that flag, and MSDN suggests
using it for usernames too​: "For strings that require validation, such
as file, resource, and user names, the application should always use
the WC_NO_BEST_FIT_CHARS flag."

I will submit a patch based on your suggestion by today.

BTW, did you get a chance to try with my patch? Did it solve the issue?

Yes, your patch fixed the problem for me. Looking forward to a revised
version :-)

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2012

From karthik.rajagopalan@schrodinger.com

Hi Shay,

On Fri Jun 08 13​:17​:34 2012, shay wrote​:

On 7 June 2012 23​:20, Karthik Rajagopalan via RT
<perlbug-followup@​perl.org> wrote​:

- Perhaps your WideCharToMultiByte() calls should make use of the
WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does
(which I also notice calls win32_malloc() / win32_realloc() /
win32_free() rather than just malloc() / realloc() / free()...).

Since we are not going to deal just with PATH in environment strings,
do
you think this flag make sense in this case?

I was thinking about things in the environment (like usernames!...)
which contain characters that are not representable in the current
ANSI codepage. What is the best thing to do with them, given that
WideCharToMultiByte() cannot do exactly what is asked of it in such
cases? The win32_ansipath() function uses that flag, and MSDN suggests
using it for usernames too​: "For strings that require validation, such
as file, resource, and user names, the application should always use
the WC_NO_BEST_FIT_CHARS flag."

I was thinking for env other than username. This could have been taken by
default by MS folks if it is really necessary to consider. Anyway, I now
used the same in my patch.

I will submit a patch based on your suggestion by today.

BTW, did you get a chance to try with my patch? Did it solve the issue?

Yes, your patch fixed the problem for me. Looking forward to a revised
version :-)

I now have your comments incorporated in my attached patch. Please go-
through it and let me know your opinion.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2012

From karthik.rajagopalan@schrodinger.com

0001-Use-GetEnvironmentStringsW-.-instead-of-GetEnvironme.patch
From bb7c5e61c26c2634be8e508bf0514f48056b54a9 Mon Sep 17 00:00:00 2001
From: Karthik Rajagopalan <rajagopa@schrodinger.com>
Date: Mon, 11 Jun 2012 15:50:34 -0400
Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of
 GetEnvironmentStringsA(..).

GetEnvironmentStringsA(..) return strings in the OEM code page. This
can actually mangle the environment strings if it contain special characters.
A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..)
and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..).
To free the block, you can use win32_freeenvironmentstrings(..).
---
 win32/perlhost.h |    8 ++++----
 win32/win32.c    |   34 ++++++++++++++++++++++++++++++++++
 win32/win32iop.h |    2 ++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/win32/perlhost.h b/win32/perlhost.h
index e8f5fb4..ae422ef 100644
--- a/win32/perlhost.h
+++ b/win32/perlhost.h
@@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     int nLength, compVal;
 
     // get the process environment strings
-    lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings();
+    lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings();
 
     // step over current directory stuff
     while(*lpTmp == '=')
@@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     }
 
     // release the process environment strings
-    FreeEnvironmentStrings(lpAllocPtr);
+    win32_freeenvironmentstrings(lpAllocPtr);
 
     return lpPtr;
 }
@@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void)
     }
 
     /* get the process environment strings */
-    lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings();
+    lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings();
 
     /* step over current directory stuff */
     while(*lpStr == '=')
@@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void)
 	lpStr += strlen(lpStr) + 1;
     }
 
-    FreeEnvironmentStrings(lpEnvPtr);
+    win32_freeenvironmentstrings(lpEnvPtr);
 }
 
 
diff --git a/win32/win32.c b/win32/win32.c
index 7f2444b..1c4c05c 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -1650,6 +1650,40 @@ win32_ansipath(const WCHAR *widename)
 }
 
 DllExport char *
+win32_getenvironmentstrings(void)
+{
+    LPWSTR lpWStr, lpWTmp;
+    LPSTR lpStr, lpTmp;
+    DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0;
+
+    // Get the process environment strings
+    lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW();
+    for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {
+        env_len = wcslen(lpWTmp);
+        // calculate the size of the environment strings
+        wenvstrings_len += env_len + 1;
+    }
+
+    /* Get the number of bytes required to store the UTF16 encoded string */
+    aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, 
+                                          lpWStr, wenvstrings_len, NULL, 0, 0, 0);
+    lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char));
+    if(!lpTmp)
+        out_of_memory();
+
+    /* Convert the string from UTF-16 encoding to UTF-8 encoding */
+    WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr, aenvstrings_len, 0, 0);
+
+    return(lpStr);
+}
+
+DllExport void
+win32_freeenvironmentstrings(void* block)
+{
+    win32_free(block);
+}
+
+DllExport char *
 win32_getenv(const char *name)
 {
     dTHX;
diff --git a/win32/win32iop.h b/win32/win32iop.h
index 373e3e3..cbc9716 100644
--- a/win32/win32iop.h
+++ b/win32/win32iop.h
@@ -126,6 +126,8 @@ DllExport  void		win32_rewinddir(DIR *dirp);
 DllExport  int		win32_closedir(DIR *dirp);
 DllExport  DIR*		win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param);
 
+DllExport  char*        win32_getenvironmentstrings(void);
+DllExport  void         win32_freeenvironmentstrings(void *block);
 DllExport  char*	win32_getenv(const char *name);
 DllExport  int		win32_putenv(const char *name);
 
-- 
1.7.7.1

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2012

From [Unknown Contact. See original ticket]

Hi Shay,

On Fri Jun 08 13​:17​:34 2012, shay wrote​:

On 7 June 2012 23​:20, Karthik Rajagopalan via RT
<perlbug-followup@​perl.org> wrote​:

- Perhaps your WideCharToMultiByte() calls should make use of the
WC_NO_BEST_FIT_CHARS flag like similar code in win32_ansipath() does
(which I also notice calls win32_malloc() / win32_realloc() /
win32_free() rather than just malloc() / realloc() / free()...).

Since we are not going to deal just with PATH in environment strings,
do
you think this flag make sense in this case?

I was thinking about things in the environment (like usernames!...)
which contain characters that are not representable in the current
ANSI codepage. What is the best thing to do with them, given that
WideCharToMultiByte() cannot do exactly what is asked of it in such
cases? The win32_ansipath() function uses that flag, and MSDN suggests
using it for usernames too​: "For strings that require validation, such
as file, resource, and user names, the application should always use
the WC_NO_BEST_FIT_CHARS flag."

I was thinking for env other than username. This could have been taken by
default by MS folks if it is really necessary to consider. Anyway, I now
used the same in my patch.

I will submit a patch based on your suggestion by today.

BTW, did you get a chance to try with my patch? Did it solve the issue?

Yes, your patch fixed the problem for me. Looking forward to a revised
version :-)

I now have your comments incorporated in my attached patch. Please go-
through it and let me know your opinion.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From @bulk88

On Mon Jun 11 12​:57​:53 2012, kartlee05 wrote​:

I now have your comments incorporated in my attached patch. Please go-
through it and let me know your opinion.

-Karthik

__________________________
+ /* Convert the string from UTF-16 encoding to UTF-8 encoding */
+ WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr,
aenvstrings_len, 0, 0);
__________________________
I didn't run your code, but you didn't convert to UTF8. You converted to
legacy high ASCII, and the conversion was done with character
approximation since the 2nd WCTMB didn't get the no best fit flag and
since you didn't pass the UTF8 CP, the function will fail if no best fit
is passed. I'll post a known good perl to UTF16 windows converter for
comparison (its from Win32 module), but its an apples oranges comparison.

________________________________
SV *
wstr_to_sv(pTHX_ WCHAR *wstr)
{
  int wlen = (int)wcslen(wstr)+1;
  BOOL use_default = FALSE;
  int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr,
wlen, NULL, 0, NULL, NULL);
  SV *sv = sv_2mortal(newSV(len));

  len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr, wlen,
SvPVX(sv), len, NULL, &use_default);
  if (use_default) {
  len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0, NULL,
NULL);
  sv_grow(sv, len);
  len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv),
len, NULL, NULL);
  SvUTF8_on(sv);
  }
  /* Shouldn't really ever fail since we ask for the required length
first, but who knows... */
  if (len) {
  SvPOK_on(sv);
  SvCUR_set(sv, len-1);
  }
  return sv;
}
______________________________________

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From karthik.rajagopalan@schrodinger.com

Hi,

On Tue Jun 12 08​:13​:09 2012, bulk88. wrote​:

On Mon Jun 11 12​:57​:53 2012, kartlee05 wrote​:

I now have your comments incorporated in my attached patch. Please
go-
through it and let me know your opinion.

-Karthik

__________________________
+ /* Convert the string from UTF-16 encoding to UTF-8 encoding */
+ WideCharToMultiByte(CP_ACP, 0, lpWStr, wenvstrings_len, lpStr,
aenvstrings_len, 0, 0);
__________________________
I didn't run your code, but you didn't convert to UTF8. You converted
to
legacy high ASCII, and the conversion was done with character
approximation since the 2nd WCTMB didn't get the no best fit flag and
since you didn't pass the UTF8 CP, the function will fail if no best
fit
is passed.

That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal
conversion call from UTF-16 to ACP. I will revise the patch and sent it
now.

BTW, I don't think the win32 code would really work if UTF-8 is
internally used for conversion. I see we use lot of calls toUpper,
tolower which would not really work with utf-8 directly. So it is good
if ACP is used.

-Karthik

I'll post a known good perl to UTF16 windows converter for
comparison (its from Win32 module), but its an apples oranges
comparison.

________________________________
SV *
wstr_to_sv(pTHX_ WCHAR *wstr)
{
int wlen = (int)wcslen(wstr)+1;
BOOL use_default = FALSE;
int len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, wstr,
wlen, NULL, 0, NULL, NULL);
SV *sv = sv_2mortal(newSV(len));

len = WideCharToMultiByte\(CP\_ACP\, WC\_NO\_BEST\_FIT\_CHARS\, wstr\, 

wlen,

SvPVX(sv), len, NULL, &use_default);
if (use_default) {
len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, NULL, 0,
NULL,
NULL);
sv_grow(sv, len);
len = WideCharToMultiByte(CP_UTF8, 0, wstr, wlen, SvPVX(sv),
len, NULL, NULL);
SvUTF8_on(sv);
}
/* Shouldn't really ever fail since we ask for the required length
first, but who knows... */
if (len) {
SvPOK_on(sv);
SvCUR_set(sv, len-1);
}
return sv;
}
______________________________________

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From karthik.rajagopalan@schrodinger.com

That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal
conversion call from UTF-16 to ACP. I will revise the patch and sent
it
now.

BTW, I don't think the win32 code would really work if UTF-8 is
internally used for conversion. I see we use lot of calls toUpper,
tolower which would not really work with utf-8 directly. So it is good
if ACP is used.

-Karthik

I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual
conversion call. Please give a shot.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From karthik.rajagopalan@schrodinger.com

0001-Use-GetEnvironmentStringsW-.-instead-of-GetEnvironme.patch
From b77244181d5feea493dfc861434ea27a7b4825ef Mon Sep 17 00:00:00 2001
From: Karthik Rajagopalan <rajagopa@schrodinger.com>
Date: Tue, 12 Jun 2012 11:37:29 -0400
Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of
 GetEnvironmentStringsA(..).

GetEnvironmentStringsA(..) return strings in the OEM code page. This
can actually mangle the environment strings if it contain special characters.
A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..)
and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..).
To free the block, you can use win32_freeenvironmentstrings(..).
---
 win32/perlhost.h |    8 ++++----
 win32/win32.c    |   35 +++++++++++++++++++++++++++++++++++
 win32/win32iop.h |    2 ++
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/win32/perlhost.h b/win32/perlhost.h
index e8f5fb4..ae422ef 100644
--- a/win32/perlhost.h
+++ b/win32/perlhost.h
@@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     int nLength, compVal;
 
     // get the process environment strings
-    lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings();
+    lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings();
 
     // step over current directory stuff
     while(*lpTmp == '=')
@@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     }
 
     // release the process environment strings
-    FreeEnvironmentStrings(lpAllocPtr);
+    win32_freeenvironmentstrings(lpAllocPtr);
 
     return lpPtr;
 }
@@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void)
     }
 
     /* get the process environment strings */
-    lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings();
+    lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings();
 
     /* step over current directory stuff */
     while(*lpStr == '=')
@@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void)
 	lpStr += strlen(lpStr) + 1;
     }
 
-    FreeEnvironmentStrings(lpEnvPtr);
+    win32_freeenvironmentstrings(lpEnvPtr);
 }
 
 
diff --git a/win32/win32.c b/win32/win32.c
index 7f2444b..de7b9aa 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename)
 }
 
 DllExport char *
+win32_getenvironmentstrings(void)
+{
+    LPWSTR lpWStr, lpWTmp;
+    LPSTR lpStr, lpTmp;
+    DWORD size, env_len, wenvstrings_len = 0, aenvstrings_len = 0;
+
+    // Get the process environment strings
+    lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW();
+    for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {
+        env_len = wcslen(lpWTmp);
+        // calculate the size of the environment strings
+        wenvstrings_len += env_len + 1;
+    }
+
+    /* Get the number of bytes required to store the UTF16 encoded string */
+    aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, 
+                                          lpWStr, wenvstrings_len, NULL, 0, 0, 0);
+    lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char));
+    if(!lpTmp)
+        out_of_memory();
+
+    /* Convert the string from UTF-16 encoding to UTF-8 encoding */
+    WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, 
+                        aenvstrings_len, 0, 0);
+
+    return(lpStr);
+}
+
+DllExport void
+win32_freeenvironmentstrings(void* block)
+{
+    win32_free(block);
+}
+
+DllExport char *
 win32_getenv(const char *name)
 {
     dTHX;
diff --git a/win32/win32iop.h b/win32/win32iop.h
index 373e3e3..cbc9716 100644
--- a/win32/win32iop.h
+++ b/win32/win32iop.h
@@ -126,6 +126,8 @@ DllExport  void		win32_rewinddir(DIR *dirp);
 DllExport  int		win32_closedir(DIR *dirp);
 DllExport  DIR*		win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param);
 
+DllExport  char*        win32_getenvironmentstrings(void);
+DllExport  void         win32_freeenvironmentstrings(void *block);
 DllExport  char*	win32_getenv(const char *name);
 DllExport  int		win32_putenv(const char *name);
 
-- 
1.7.7.1

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From @bulk88

I've got a problem. I think your patch's code is unreachable. I compiled
a blead 5.17.0 and applied your latest patch.
win32_getenvironmentstrings is called by
CPerlHost​::CreateLocalEnvironmentStrings which is called by
CreateChildEnv which is called by PerlEnvGetChildenv which is called by
nothing and isn't in any file except perlhost.h.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2012

From @bulk88

On Tue Jun 12 16​:48​:28 2012, bulk88. wrote​:

I've got a problem. I think your patch's code is unreachable. I compiled
a blead 5.17.0 and applied your latest patch.
win32_getenvironmentstrings is called by
CPerlHost​::CreateLocalEnvironmentStrings which is called by
CreateChildEnv which is called by PerlEnvGetChildenv which is called by
nothing and isn't in any file except perlhost.h.
_______________________________________
C​:\p517\perl\win32>perl -V
Summary of my perl5 (revision 5 version 17 subversion 0) configuration​:

  Platform​:
  osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
  uname=''
  config_args='undef'
  hint=recommended, useposix=true, d_sigaction=undef
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cl', ccflags ='-nologo -GF -W3 -Od -MD -Zi -DDEBUGGING -DWIN32
-D_CONSOL
E -DNO_STRICT -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT
-DPERL_IMPLICIT_S
YS -DUSE_PERLIO -D_USE_32BIT_TIME_T',
  optimize='-Od -MD -Zi -DDEBUGGING',
  cppflags='-DWIN32'
  ccversion='13.10.6030', gccversion='', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64',
lseeksi
ze=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='link', ldflags ='-nologo -nodefaultlib -debug
-libpath​:"c​:\perl517\lib\
CORE" -machine​:x86'
  libpth="C​:\Program Files\Microsoft Visual Studio .NET 2003\VC7\lib"
  libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.l
ib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib
uuid.lib ws2_32
.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib
msvcrt.
lib
  perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg
32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib
uuid.lib ws
2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib
comctl32.lib msv
crt.lib
  libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug
-libpath​:"c​:\p
erl517\lib\CORE" -machine​:x86'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY
  PERLIO_LAYERS PERL_DONT_CREATE_GVSV
  PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
  PERL_MALLOC_WRAP PERL_PRESERVE_IVUV
  PERL_TRACK_MEMPOOL PL_OP_SLAB_ALLOC USE_ITHREADS
  USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO
  USE_PERL_ATOF
  Built under MSWin32
  Compiled at Jun 12 2012 19​:06​:01
  %ENV​:
  PERL_JSON_BACKEND="JSON​::XS"
  PERL_YAML_BACKEND="YAML"
  @​INC​:
  C​:/perl517/site/lib
  C​:/perl517/lib
  .

C​:\p517\perl\win32>
___________________________

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

disregard the last 2 posts, i found the place where its called

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

You can't use CP_ACP unless your intending for your patch to only work
with utf16 characters which are representable in the current system CP's
high ascii. Is this what you want?

From lpWStr
___________________________________________
0x00010D40 5c 00 57 00 49 00 4e 00 44 00 4f 00 57 00 53 00 5c 00 54 00
45 00 4d 00 50 00 00 00 75 00 6e 00 \WINDOWS\TEMP␀un
0x00010D60 69 00 63 00 6f 00 64 00 65 00 3d 00 31 04 43 04 3b 04 3a 04
00 00 55 00 53 00 45 00 52 00 44 00 icode=булк␀USERD
___________________________________________

After conversion WideCharToMultiByte
___________________________________________
0x00346B04 44 4f 57 53 5c 54 45 4d 50 00 75 6e 69 63 6f 64
DOWS\TEMP.unicod
0x00346B14 65 3d 3f 3f 3f 3f 00 55 53 45 52 44 4f 4d 41 49
e=????.USERDOMAI
___________________________________________
those "?" are real "?", 0x3f.

The unicode does NOT pass through to the child process for me. I'll try
more tomorrow.

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

env.pl

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

The attachment is a accidentally posted work in progress debugging
scripts for the patch, it uses a release of Win32​::API. I'll have better
test scripts tomorrow.

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

On Tue Jun 12 22​:57​:14 2012, bulk88. wrote​:

The attachment is a accidentally posted work in progress debugging
scripts for the patch, it uses a FUTURE release of Win32​::API. I'll
have better
test scripts tomorrow.

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @steve-m-hay

On Tue Jun 12 08​:41​:15 2012, kartlee05 wrote​:

That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal
conversion call from UTF-16 to ACP. I will revise the patch and sent
it
now.

BTW, I don't think the win32 code would really work if UTF-8 is
internally used for conversion. I see we use lot of calls toUpper,
tolower which would not really work with utf-8 directly. So it is good
if ACP is used.

-Karthik

I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual
conversion call. Please give a shot.

I think there is a bug in this and the previous patch (but not your
original one)​: the calculation of 'wenvstrings_len' wrongly sets 'size'
to 1 at the start of the for-loop, but never uses 'size' again. It
should have set 'wenvstrings_len' to 1 instead, otherwise that ends up
with the wrong value and causes many, many tests to fail when running
the full regression test suite.

After changing

for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

to

for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

the patch looks good to me code-wise, but I will await any further input
from bulk88 before committing.

Could you also fix up a couple of the comments, though, please? The
comment about converting from UTF-16 to UTF-8 should say *to the ANSI
codepage* instead. You also use C++ style comments in two places, but
perl always uses C style comments since most files are C and not all C
compilers accept C++ style comments.

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

Okay, here is the problem with the patch, the env var is NOT corrupted,
only, if the env var is a utf8 char, that can be represented in the
current system high ascii CP. If the utf8 char can not be represented in
the current system high ascii CP, it is converted to 0x3F. On pre-patch
Perls, the previous case results in a different letter, but not "?"
appearing in the child process. I can't explain the letter Z from WE A.
I made 2 perl files a parent and a reader and ran it on Perl 5.12 and
5.17 with
OP's latest patch (shays code not included). The parent has a option to
switch between a western europe char, and a cyrillic char. My system is
CP 1252 normally. Make sure your console is chcp 65001 and Lucinda font
is set. The parent (env2) was run as console as "perl -C7 env2.pl 1>
resp512.txt 2>&1". All the files attached are in UTF8 not high ascii.
Must open in a
editor and manual switch to UTF8. Notepad and Write dont do it right.
Note, that with WE A, 5.17 correctly passed through the WE A, but CYR
b turned to ? substitution in the child process. Is this the correct
behavior or the behavior we want or should the WideCharToMultiBytes be
done to UTF8 and not system default CP? Then the issue is, do we want to
pass a non UTF-16 narrow-ish env array with UTF8 through the ANSI/narrow
CreateProcess that Perl uses?

I decided NOT to study how %ENV works just to post this. I
saw some strange behavior with %ENV and I wasn't sure whats correct and
whats wrong behavior so I dropped the %ENV issue. If someone knows how
%ENV is "supposed" to behave on windows with perl, or writes a "known
good" script (on Unix or Windows) that uses %ENV for me to compare my
results to, I can investigate the %ENV behavior I saw.

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

env2.pl

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

Active code page​: 65001
SV = PV(0xb17c5c) at 0xac764c
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa9138c "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"]
  CUR = 8
  LEN = 12
env2.pl here бл
Dump in Perl
as Hex bytes "414141d0b1414141"
to console bytes
"AAAбAAA"
$VAR1 = "AAA\320\261AAA";
to console Perl
"AAAбAAA"
$VAR1 = "AAA\x{431}AAA";

perl ver parent v5.17.0

NEW PROCESS

in ENVREAD.PL
бл
Active code page​: 65001
unicode=AAA?AAA
perl ver reader v5.17.0

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

Active code page​: 65001
SV = PV(0xb17c5c) at 0xac764c
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa9138c "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"]
  CUR = 8
  LEN = 12
env2.pl here бл
Dump in Perl
as Hex bytes "414141c384414141"
to console bytes
"AAA�AAA"
$VAR1 = "AAA\303\204AAA";
to console Perl
"AAAÄAAA"
$VAR1 = "AAA\x{c4}AAA";

perl ver parent v5.17.0

NEW PROCESS

in ENVREAD.PL
бл
Active code page​: 65001
unicode=AAAÄAAA
perl ver reader v5.17.0

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

Active code page​: 65001
SV = PV(0x9d820c) at 0xa39a14
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa46574 "AAA\320\261AAA"\0 [UTF8 "AAA\x{431}AAA"]
  CUR = 8
  LEN = 12
env2.pl here бл
Dump in Perl
as Hex bytes "414141d0b1414141"
to console bytes
"AAAбAAA"
$VAR1 = "AAA\320\261AAA";
to console Perl
"AAAбAAA"
$VAR1 = "AAA\x{431}AAA";

perl ver parent v5.12.2

NEW PROCESS

in ENVREAD.PL
бл
Active code page​: 65001
unicode=AAA?AAA
perl ver reader v5.12.2

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

envread.pl

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2012

From @bulk88

Active code page​: 65001
SV = PV(0x9d820c) at 0xa39a14
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa46574 "AAA\303\204AAA"\0 [UTF8 "AAA\x{c4}AAA"]
  CUR = 8
  LEN = 12
env2.pl here бл
Dump in Perl
as Hex bytes "414141c384414141"
to console bytes
"AAA�AAA"
$VAR1 = "AAA\303\204AAA";
to console Perl
"AAAÄAAA"
$VAR1 = "AAA\x{c4}AAA";

perl ver parent v5.12.2

NEW PROCESS

in ENVREAD.PL
бл
Active code page​: 65001
unicode=AAAZAAA
perl ver reader v5.12.2

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From @steve-m-hay

bulk88 via RT wrote on 2012-06-13​:

Okay, here is the problem with the patch, the env var is NOT
corrupted, only, if the env var is a utf8 char, that can be
represented in the current system high ascii CP. If the utf8 char can
not be represented in the current system high ascii CP, it is
converted to 0x3F. On pre- patch Perls, the previous case results in a
different letter, but not "?"
appearing in the child process.

Thanks for the sample programs. I intend to have a look through them when I have more time, but if I understand you correctly the problem here is only that things are still not working if the environment contains characters which cannot be represented in the current ANSI codepage?

That is to be expected, given the widechar (UTF-16) -> ANSI codepage conversion being done, and the ? character, of course, comes from the WideCharToMultiByte() calls when they fail to map a unicode character to the target encoding. On pre-patch perls something different happens because GetEnvironmentStringsA() was being used which returns things in the OEM codepage and must presumably have some algorithm of its own for handling unicode characters that cannot be mapped to that encoding, hence different characters (but still the wrong ones!) appearing instead of ?.

I still think the patch is worthwhile (with the fix I posted previously) because it fixes things for the common case where the environment contains characters which *are* in the current ANSI codepage but either not in the current OEM codepage (because the GetEnvironmentStringsA() conversion from unicode to OEM would have failed) or else (more commonly) in the OEM codepage but at a different byte position (because perl then treated the OEM bytes as if they were the ANSI bytes which are normally used).

So the only problem remaining is what happens in the case when characters outside of the current ANSI codepage appear in the environment, and that would be worth logging as a separate bug. It isn't such a common case, and there are likely to be problems in other areas of the code in that case anyway, e.g. you can't open a file whose name contains characters outside of the ANSI codepage without jumping through some hoops rather than using the built-in open() function.

If you agree and have no other issues with the patch then I'll apply it and raise the non-ANSI characters problem as a separate bug (and attach your scripts etc).

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From @bulk88

On Thu Jun 14 00​:56​:57 2012, Steve.Hay@​verosoftware.com wrote​:

So the
only problem remaining is what happens in the case when characters
outside of the current ANSI codepage appear in the environment, and
that would be worth logging as a separate bug. It isn't such a
common case, and there are likely to be problems in other areas of
the code in that case anyway, e.g. you can't open a file whose name
contains characters outside of the ANSI codepage without jumping
through some hoops rather than using the built-in open() function.
If you agree and have no other issues with the patch then I'll
apply it and raise the non-ANSI characters problem as a separate
bug (and attach your scripts etc).

I agree. You can apply the patch. Ultimately one day all of %INC support
in perlhost.h will have have to be made to use the wide apis so all wide
char env vars roundtrip correctly between processes and through %INC.
Note, to test this patch I used the raw wide win API since %INC was
unusable. Currently perlhost.h (which implements %INC) either queries
its own cache of env, or goes out to windows API.
http​://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c​:/win32/perlhost.h#l2402
http​://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c​:/win32/perlhost.h#l2084
http​://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c​:/win32/perlhost.h#l2134
http​://perl5.git.perl.org/perl.git/blob/3630f57ef8a29a646a6848f4e93d25ac47093a3c​:/win32/win32.c#l1653
Plus hv.c will need some fixing to add the utf8 flag to the %INC if the
env var isnt low ascii. Anyways, I'm done. You can apply the patch, and
there should be enough notes in this ticket for someone in the future to
work on the remainder of the problems.

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From karthik.rajagopalan@schrodinger.com

On Wed Jun 13 00​:22​:43 2012, shay wrote​:

On Tue Jun 12 08​:41​:15 2012, kartlee05 wrote​:

That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in
acutal
conversion call from UTF-16 to ACP. I will revise the patch and
sent
it
now.

BTW, I don't think the win32 code would really work if UTF-8 is
internally used for conversion. I see we use lot of calls toUpper,
tolower which would not really work with utf-8 directly. So it is
good
if ACP is used.

-Karthik

I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual
conversion call. Please give a shot.

I think there is a bug in this and the previous patch (but not your
original one)​: the calculation of 'wenvstrings_len' wrongly sets
'size'
to 1 at the start of the for-loop, but never uses 'size' again. It
should have set 'wenvstrings_len' to 1 instead, otherwise that ends up
with the wrong value and causes many, many tests to fail when running
the full regression test suite.

After changing

for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

to

for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

the patch looks good to me code-wise, but I will await any further
input
from bulk88 before committing.

I fixed the bug in my patch which failed to account for last '\0' char
in the environment strings.

Could you also fix up a couple of the comments, though, please? The
comment about converting from UTF-16 to UTF-8 should say *to the ANSI
codepage* instead. You also use C++ style comments in two places, but
perl always uses C style comments since most files are C and not all C
compilers accept C++ style comments.

This is also fixed now.

The revised patch is attached now. Please take a look.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From karthik.rajagopalan@schrodinger.com

0001-Use-GetEnvironmentStringsW-.-instead-of-GetEnvironme.patch
From f5bbd57635ff725971d736aed43832a57845491b Mon Sep 17 00:00:00 2001
From: Karthik Rajagopalan <rajagopa@schrodinger.com>
Date: Thu, 14 Jun 2012 12:16:15 -0400
Subject: [PATCH] Use GetEnvironmentStringsW(..) instead of
 GetEnvironmentStringsA(..).

GetEnvironmentStringsA(..) return strings in the OEM code page. This
can actually mangle the environment strings if it contain special characters.
A better approach would be to get the utf-16 strings through GetEnvironmentStringsW(..)
and convert them to ANSI code page. This is now done by win32_getenvironmentstrings(..).
To free the block, you can use win32_freeenvironmentstrings(..).
---
 win32/perlhost.h |    8 ++++----
 win32/win32.c    |   35 +++++++++++++++++++++++++++++++++++
 win32/win32iop.h |    2 ++
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/win32/perlhost.h b/win32/perlhost.h
index e8f5fb4..ae422ef 100644
--- a/win32/perlhost.h
+++ b/win32/perlhost.h
@@ -2262,7 +2262,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     int nLength, compVal;
 
     // get the process environment strings
-    lpAllocPtr = lpTmp = (LPSTR)GetEnvironmentStrings();
+    lpAllocPtr = lpTmp = (LPSTR)win32_getenvironmentstrings();
 
     // step over current directory stuff
     while(*lpTmp == '=')
@@ -2338,7 +2338,7 @@ CPerlHost::CreateLocalEnvironmentStrings(VDir &vDir)
     }
 
     // release the process environment strings
-    FreeEnvironmentStrings(lpAllocPtr);
+    win32_freeenvironmentstrings(lpAllocPtr);
 
     return lpPtr;
 }
@@ -2375,7 +2375,7 @@ CPerlHost::Clearenv(void)
     }
 
     /* get the process environment strings */
-    lpStr = lpEnvPtr = (LPSTR)GetEnvironmentStrings();
+    lpStr = lpEnvPtr = (LPSTR)win32_getenvironmentstrings();
 
     /* step over current directory stuff */
     while(*lpStr == '=')
@@ -2394,7 +2394,7 @@ CPerlHost::Clearenv(void)
 	lpStr += strlen(lpStr) + 1;
     }
 
-    FreeEnvironmentStrings(lpEnvPtr);
+    win32_freeenvironmentstrings(lpEnvPtr);
 }
 
 
diff --git a/win32/win32.c b/win32/win32.c
index 7f2444b..e496837 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -1650,6 +1650,41 @@ win32_ansipath(const WCHAR *widename)
 }
 
 DllExport char *
+win32_getenvironmentstrings(void)
+{
+    LPWSTR lpWStr, lpWTmp;
+    LPSTR lpStr, lpTmp;
+    DWORD env_len, wenvstrings_len = 0, aenvstrings_len = 0;
+
+    /* Get the process environment strings */
+    lpWTmp = lpWStr = (LPWSTR) GetEnvironmentStringsW();
+    for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {
+        env_len = wcslen(lpWTmp);
+        /* calculate the size of the environment strings */
+        wenvstrings_len += env_len + 1;
+    }
+
+    /* Get the number of bytes required to store the UTF16 encoded string */
+    aenvstrings_len = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, 
+                                          lpWStr, wenvstrings_len, NULL, 0, 0, 0);
+    lpTmp = lpStr = (char *)win32_calloc(aenvstrings_len, sizeof(char));
+    if(!lpTmp)
+        out_of_memory();
+
+    /* Convert the string from UTF-16 encoding to ACP encoding */
+    WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, lpWStr, wenvstrings_len, lpStr, 
+                        aenvstrings_len, 0, 0);
+
+    return(lpStr);
+}
+
+DllExport void
+win32_freeenvironmentstrings(void* block)
+{
+    win32_free(block);
+}
+
+DllExport char *
 win32_getenv(const char *name)
 {
     dTHX;
diff --git a/win32/win32iop.h b/win32/win32iop.h
index 373e3e3..cbc9716 100644
--- a/win32/win32iop.h
+++ b/win32/win32iop.h
@@ -126,6 +126,8 @@ DllExport  void		win32_rewinddir(DIR *dirp);
 DllExport  int		win32_closedir(DIR *dirp);
 DllExport  DIR*		win32_dirp_dup(DIR *const dirp, CLONE_PARAMS *const param);
 
+DllExport  char*        win32_getenvironmentstrings(void);
+DllExport  void         win32_freeenvironmentstrings(void *block);
 DllExport  char*	win32_getenv(const char *name);
 DllExport  int		win32_putenv(const char *name);
 
-- 
1.7.7.1

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From karthik.rajagopalan@schrodinger.com

On Tue Jun 12 22​:54​:32 2012, bulk88. wrote​:

You can't use CP_ACP unless your intending for your patch to only work
with utf16 characters which are representable in the current system CP's
high ascii. Is this what you want?

Yes, there are lot of places in win32 which still depend on ACP calls. So
I just went ahead to do the same.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

From @steve-m-hay

On Thu Jun 14 09​:02​:31 2012, bulk88. wrote​:

On Thu Jun 14 00​:56​:57 2012, Steve.Hay@​verosoftware.com wrote​:

So the
only problem remaining is what happens in the case when
characters
outside of the current ANSI codepage appear in the environment,
and
that would be worth logging as a separate bug.

I agree. [...] Anyways, I'm done. You can apply the patch,
and
there should be enough notes in this ticket for someone in the future
to
work on the remainder of the problems.

I added a note to perltodo.pod in commit 799c141 referring to this
ticket since the problem of Unicode in the environment (and in filenames
etc) is already a known problem.

@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

From @steve-m-hay

On Thu Jun 14 09​:20​:33 2012, kartlee05 wrote​:

The revised patch is attached now. Please take a look.

Thanks, your patch is now applied as commit 4f46e52.

@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

@steve-m-hay - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jun 15, 2012
@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

From karthik.rajagopalan@schrodinger.com

Hi Steve,

On Fri Jun 15 00​:57​:16 2012, shay wrote​:

On Thu Jun 14 09​:20​:33 2012, kartlee05 wrote​:

The revised patch is attached now. Please take a look.

Thanks, your patch is now applied as commit 4f46e52.

Thanks for applying this change. Will this be in 5.17? BTW, do you want
me to take a look and suggest changes that we would need to fully
support wide version for win32?

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

From karthik.rajagopalan@schrodinger.com

On Fri Jun 15 07​:13​:34 2012, kartlee05 wrote​:

Hi Steve,

On Fri Jun 15 00​:57​:16 2012, shay wrote​:

On Thu Jun 14 09​:20​:33 2012, kartlee05 wrote​:

The revised patch is attached now. Please take a look.

Thanks, your patch is now applied as commit 4f46e52.

Thanks for applying this change. Will this be in 5.17?

I meant in next minor version for it?

BTW, do you want
me to take a look and suggest changes that we would need to fully
support wide version for win32?

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 15, 2012

From [Unknown Contact. See original ticket]

On Fri Jun 15 07​:13​:34 2012, kartlee05 wrote​:

Hi Steve,

On Fri Jun 15 00​:57​:16 2012, shay wrote​:

On Thu Jun 14 09​:20​:33 2012, kartlee05 wrote​:

The revised patch is attached now. Please take a look.

Thanks, your patch is now applied as commit 4f46e52.

Thanks for applying this change. Will this be in 5.17?

I meant in next minor version for it?

BTW, do you want
me to take a look and suggest changes that we would need to fully
support wide version for win32?

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2012

From karthik.rajagopalan@schrodinger.com

Sent from my iPhone

On Jun 13, 2012, at 3​:22 AM, "Steve Hay via RT" <perlbug-followup@​perl.org> wrote​:

On Tue Jun 12 08​:41​:15 2012, kartlee05 wrote​:

That was mistake on my part not to pass WC_NO_BEST_FIT_CHARS in acutal
conversion call from UTF-16 to ACP. I will revise the patch and sent
it
now.

BTW, I don't think the win32 code would really work if UTF-8 is
internally used for conversion. I see we use lot of calls toUpper,
tolower which would not really work with utf-8 directly. So it is good
if ACP is used.

-Karthik

I now fixed the patch to use WC_NO_BEST_FIT_CHARS in the actual
conversion call. Please give a shot.

I think there is a bug in this and the previous patch (but not your
original one)​: the calculation of 'wenvstrings_len' wrongly sets 'size'
to 1 at the start of the for-loop, but never uses 'size' again. It
should have set 'wenvstrings_len' to 1 instead, otherwise that ends up
with the wrong value and causes many, many tests to fail when running
the full regression test suite.

After changing

for(size = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

to

for(wenvstrings_len = 1; *lpWTmp != '\0'; lpWTmp += env_len + 1) {

I agree. I have not taken care for the additional null char to indicate the end of environment strings in my last two patches. I will add this now and resend.

the patch looks good to me code-wise, but I will await any further input
from bulk88 before committing.

Could you also fix up a couple of the comments, though, please? The
comment about converting from UTF-16 to UTF-8 should say *to the ANSI
codepage* instead. You also use C++ style comments in two places, but
perl always uses C style comments since most files are C and not all C
compilers accept C++ style comments.

Sure, I will do that and send you in an hour.

-Karthik

@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2012

From @steve-m-hay

On Fri Jun 15 07​:13​:34 2012, kartlee05 wrote​:

Hi Steve,

On Fri Jun 15 00​:57​:16 2012, shay wrote​:

On Thu Jun 14 09​:20​:33 2012, kartlee05 wrote​:

The revised patch is attached now. Please take a look.

Thanks, your patch is now applied as commit 4f46e52.

Thanks for applying this change. Will this be in 5.17? BTW, do you want
me to take a look and suggest changes that we would need to fully
support wide version for win32?

The change will indeed be in 5.17.1, due out in a couple of days.

Support for wide api calls on Win32 was actually removed some time ago
(5.8.1) since it was not working correctly and it was deemed, IIRC, that
it would be more worthwhile to address the wider issue of supporting
Unicode in any filesystems that support it rather than fixing the broken
support for just Windows NTFS, hence the items in Porting/todo.pod about
"Unicode in Filenames", "Unicode in %ENV" and "Unicode and glob()".

For sure, the solution on Windows must involve the wide apis, but what
we really need first is a framework for hooking those calls into as
described in the item "Virtualize operating system access" in
Porting/todo.pod.

@p5pRT
Copy link
Author

p5pRT commented May 13, 2014

From @bulk88

On Mon Jun 18 00​:26​:30 2012, shay wrote​:

The change will indeed be in 5.17.1, due out in a couple of days.

This ticket probably caused the leak reported at https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121676

--
bulk88 ~ bulk88 at hotmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant