Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@ARGV, -CA and Win32 #16681

Open
p5pRT opened this issue Sep 3, 2018 · 9 comments
Open

@ARGV, -CA and Win32 #16681

p5pRT opened this issue Sep 3, 2018 · 9 comments
Labels
distro-mswin32 type-core Unicode and System Calls Bad interactions of syscalls and UTF-8

Comments

@p5pRT
Copy link

p5pRT commented Sep 3, 2018

Migrated from rt.perl.org#133496 (status was 'open')

Searchable as RT133496$

@p5pRT
Copy link
Author

p5pRT commented Sep 3, 2018

From @tonycoz

Created by @tonycoz

Parsing non-ASCII command-lines on Win32 is moderately broken.

Whether in codepage 65001 or not, non-ASCII (or at least, non-system
codepage) characters are not preserved. For example​:

J​:\dev\perl\git\perl\win32>..\perl -I..\lib -MDevel​::Peek -Mutf8 -e "Dump(qq(αω)
)"
SV = PV(0x34f6f8) at 0x67c578
  REFCNT = 1
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
  PV = 0x688998 "a?"\0
  CUR = 2
  LEN = 10
  COW_REFCNT = 0

J​:\dev\perl\git\perl\win32>..\perl -I..\lib -MDevel​::Peek -Mutf8 -e "Dump(shift)
" αω
SV = PV(0x52efb8) at 0x5bbfa0
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK)
  PV = 0x5abfe8 "a?"\0
  CUR = 2
  LEN = 10

The -CA switch makes no difference.

While this is caused by the interaction between the C runtime and the
Win32 API, should we be bypassing that to get some sensible behaviour
out of it?

Simplest (for the user) would be for -CA to retrieve the UTF-16
command-line arguments and convert them to UTF-8 before @​ARGV is
populated, though this is complicated by us parsing argv while we're
updating it.

Relying only on -CA might also have some backcompat issues, since if
there's Win32 code that blindly sets -CA, it might break if we start
returning the real value. This change could be considered a bug fix
though.

To simplify the implementation we might depend on an environment
variable instead, but that has the same sort of problem that setting
PERL_UNICODE= by default had on file handling.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.29.3:

Configured by tony at Mon Sep  3 10:54:31 2018.

Summary of my perl5 (revision 5 version 29 subversion 3) configuration:
   
  Platform:
    osname=MSWin32
    osvers=6.1.7601
    archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended
    useposix=true
    d_sigaction=undef
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE  -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields'
    optimize='-s -O2'
    cppflags='-DWIN32'
    ccversion=''
    gccversion='4.9.2'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='long long'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags ='-s -L"c:\perl\lib\CORE" -L"C:\MinGW\lib"'
    libpth=C:\MinGW\lib
    libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=
    so=dll
    useshrplib=true
    libperl=libperl529.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags='-mdll -s -L"c:\perl\lib\CORE" -L"C:\MinGW\lib"'



@INC for perl 5.29.3:
    J:/dev/perl/git/perl/lib


Environment for perl 5.29.3:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=c:\sperl-5.22.0.1-64bit-portable\perl\site\bin;c:\sperl-5.22.0.1-64bit-portable\perl\bin;c:\sperl-5.22.0.1-64bit-portable\c\bin;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\ProgramData\Oracle\Java\javapath;C:\Program Files (x86)\Common Files\Intel\Shared Files\cpp\bin\Intel64;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\libnvvp\;C:\Program Files (x86)\Common Files\MICROSOFT SHARED\WINDOWS LIVE;C:\PROGRAM FILES (X86)\MIKTEX 2.8\MIKTEX\BIN;C:\Windows\SYSTEM32;C:\Windows;C:\Windows\SYSTEM32\WBEM;C:\APPS\GIT\GIT\CMD;C:\PROGRAM FILES (X86)\ORACLE\BERKELEY DB 11GR2 5.3.15\BIN;C:\Program Files\WIDCOMM\Bluetooth Software\;C:\Program Files\WIDCOMM\Bluetooth Software\syswow64;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Microsoft SQL
Server\110\Tools\Binn\;C:\Program Files (x86)\Microsoft SDKs\TypeScript\1.0\;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\;C:\Users\tony\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files (x86)\nodejs\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;J:\postgresql\pg96\bin;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Git\cmd;C:\Users\tony\.dnx\bin;C:\Users\tony\AppData\Roaming\npm;C:\Program Files (x86)\Dr. Memory\bin\
    PERL_BADLANG (unset)
    SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Sep 3, 2018

From @xenu

On Sun, 02 Sep 2018 18​:26​:01 -0700
"Tony Cook \(via RT\)" <perlbug-followup@​perl.org> wrote​:

Parsing non-ASCII command-lines on Win32 is moderately broken.

Whether in codepage 65001 or not, non-ASCII (or at least, non-system
codepage) characters are not preserved. For example​:

J​:\dev\perl\git\perl\win32>..\perl -I..\lib -MDevel​::Peek -Mutf8 -e "Dump(qq(??)
)"
SV = PV(0x34f6f8) at 0x67c578
REFCNT = 1
FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
PV = 0x688998 "a?"\0
CUR = 2
LEN = 10
COW_REFCNT = 0

J​:\dev\perl\git\perl\win32>..\perl -I..\lib -MDevel​::Peek -Mutf8 -e "Dump(shift)
" ??
SV = PV(0x52efb8) at 0x5bbfa0
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0x5abfe8 "a?"\0
CUR = 2
LEN = 10

The -CA switch makes no difference.

While this is caused by the interaction between the C runtime and the
Win32 API, should we be bypassing that to get some sensible behaviour
out of it?

Simplest (for the user) would be for -CA to retrieve the UTF-16
command-line arguments and convert them to UTF-8 before @​ARGV is
populated, though this is complicated by us parsing argv while we're
updating it.

Relying only on -CA might also have some backcompat issues, since if
there's Win32 code that blindly sets -CA, it might break if we start
returning the real value. This change could be considered a bug fix
though.

To simplify the implementation we might depend on an environment
variable instead, but that has the same sort of problem that setting
PERL_UNICODE= by default had on file handling.

While this ticket is about @​ARGV and not filenames, I consider it a
duplicate of https://rt.perl.org/Public/Bug/Display.html?id=130831 (see
also a recent thread on p5p​:
https://www.nntp.perl.org/group/perl.perl5.porters/2018/08/msg251899.html).

It's *exactly* the same problem. In this case, command line arguments
are being converted from the console codepage to the system codepage
(usually they're not the same!). If a character is invalid or doesn't
exist in the system codepage, it is being replaced with either '?' (most
locales) or REPLACEMENT CHARACTER (when *system* codepage is set to
65001, which is possible only on Windows 10).

While this brings *horrible* experience to perl users, I don't think
it's a bug. Also, changing this behaviour would obviously break stuff.

@p5pRT
Copy link
Author

p5pRT commented Sep 3, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2018

From @tonycoz

On Sun, 02 Sep 2018 19​:49​:57 -0700, me@​xenu.pl wrote​:

While this ticket is about @​ARGV and not filenames, I consider it a
duplicate of https://rt.perl.org/Public/Bug/Display.html?id=130831
(see
also a recent thread on p5p​:
https://www.nntp.perl.org/group/perl.perl5.porters/2018/08/msg251899.html).

It's *exactly* the same problem. In this case, command line arguments
are being converted from the console codepage to the system codepage
(usually they're not the same!). If a character is invalid or doesn't
exist in the system codepage, it is being replaced with either '?'
(most
locales) or REPLACEMENT CHARACTER (when *system* codepage is set to
65001, which is possible only on Windows 10).

It's certainly a strongly related problem, but it's not the same problem.

It came up while I was diagnosing code that accepted a filename (the name in the test case) and called the Win32 specific APIs to open it and failed.

While this brings *horrible* experience to perl users, I don't think
it's a bug. Also, changing this behaviour would obviously break stuff.

I think it's something we can improve.

The main issue right now is code that accepts strings from the command-line get nonsensical results - unless the caller does nonsensical things.

The attached patch modifies perl to re-generate argv from the UTF-16 command-line if it sees a -CA switch, and it works for me for commands run from the command prompt.

However, one test fails​:

run/switchC.t ..
1..15
ok 1 - -CO​: no warning on UTF-8 output
ok 2 - -C2​: no warning on UTF-8 output
ok 3 - -CI​: read in UTF-8 input
ok 4 - -CE​: UTF-8 stderr
ok 5 - -Co​: auto-UTF-8 open for output
ok 6 - -Ci​: auto-UTF-8 open for input
ok 7 - -Ci​: auto-UTF-8 open for input affects the current file
ok 8 - -Ci​: auto-UTF-8 open for input has file scope
# Failed test 9 - -CA​: @​ARGV at run/switchC.t line 78
# got '196'
not ok 9 - -CA​: @​ARGV# expected /(?^s​:^256(?​:\r?\n)?$)/

ok 10 - \#!perl -C
ok 11 - \#!perl -C followed by another switch
ok 12 - \#!perl -C<num> followed by another switch
ok 13 - \#!perl -C with different -C on command line
ok 14 - \#!perl -C but not command line
ok 15 - perl -C00 [perl \#123991]
Failed 1/15 subtests

This fails because backticks have the same type of problem - backticks don't know about unicode either. By the time the chr(256) gets to win32_popen() all it sees is "\xc4\c80", and it can't tell if that was Latin1 or UTF-8, so it's passed through as 00C4 0080 rather than 0100.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2018

From @tonycoz

0001-perl-133496-convert-utf-16-arguments-to-utf-8-if-CA-.patch
From 3c90673de0140195fa62034ce7b49f7fd9949d8c Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 5 Sep 2018 13:24:30 +1000
Subject: (perl #133496) convert utf-16 arguments to utf-8 if -CA set

perl generally uses the ANSI/multi-byte APIs on Win32, which means
that the Unicode values that Windows keeps can be corrupted when
accepted by perl.

When the -CA switch is supplied this change refetches and splits the
UTF-16 the command line from the system and converts each argument
to UTF-8, replacing the original argv.

This has two effects:

- any other following arguments, especially including -[eE] code, are
  encoded as UTF-8, so -Mutf8 works sensibly

- @ARGV is properly populated with Unicode arguments instead of
  the ANSI-fied version of the UTF-16 arguments.

With this change, the test for -CA in run/switchC.t fails, since it
ends up passes "\xc4\x80" to system, which ends up as UTF-16 00c4 0080
rather than 0100.
---
 perl.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/perl.c b/perl.c
index 30ee577483..10a3510c11 100644
--- a/perl.c
+++ b/perl.c
@@ -42,6 +42,10 @@
 #include "nwutil.h"	
 #endif
 
+#ifdef WIN32
+#include <shellapi.h>
+#endif
+
 #ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
 #  ifdef I_SYSUIO
 #    include <sys/uio.h>
@@ -1557,6 +1561,55 @@ Perl_call_atexit(pTHX_ ATEXIT_t fn, void *ptr)
     ++PL_exitlistlen;
 }
 
+#ifdef WIN32
+
+#define win32_upgrade_argv(pargv) S_win32_upgrade_argv(aTHX_ (pargv))
+
+STATIC void
+S_win32_upgrade_argv(pTHX_ char ***pargv) {
+    LPWSTR cmdline = GetCommandLineW();
+    int num_args;
+    LPWSTR *argsw = CommandLineToArgvW(cmdline, &num_args);
+    STRLEN argsc_sz =
+        WideCharToMultiByte(CP_UTF8, 0, cmdline, -1, 
+                            NULL, 0, 0, 0);
+    /* maybe combine these into one block */
+    /* FIXME: leaking this memory for now */
+    LPSTR *argvc = safemalloc(sizeof(LPSTR) * (num_args+1) + argsc_sz);
+    LPSTR argsc = (LPSTR)(argvc + num_args+1);
+    LPSTR argsc_end = argsc + argsc_sz;
+    LPSTR dest = argsc;
+    int i;
+            
+    for (i = 0; i < num_args; ++i) {
+        argvc[i] = dest;
+        dest += WideCharToMultiByte(CP_UTF8, 0, argsw[i], -1,
+                                    dest, argsc_end - dest, 0, 0);
+    }
+    argvc[i] = NULL;
+
+    LocalFree(argsw);
+
+    /* this assumes both the CRT and CommandLineToArgW() split
+       the command-line in the same way
+    */
+    assert(num_args == PL_origargc);
+    *pargv = argvc + (*pargv - PL_origargv);
+
+    SAVEFREEPV(argvc);
+}
+
+#define WIN32_UPGRADE_ARGV \
+    STMT_START { \
+        if (!replaced_args && (PL_unicode & PERL_UNICODE_ARGV_FLAG)) { \
+            win32_upgrade_argv(&argv);                                 \
+            replaced_args = TRUE;                                      \
+        } \
+    } STMT_END
+#else
+#define WIN32_UPGRADE_ARGV NOOP
+#endif
+
 /*
 =for apidoc Am|int|perl_parse|PerlInterpreter *my_perl|XSINIT_t xsinit|int argc|char **argv|char **env
 
@@ -2034,6 +2087,9 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
     SV *linestr_sv = NULL;
     bool add_read_e_script = FALSE;
     U32 lex_start_flags = 0;
+#ifdef WIN32
+    bool replaced_args = FALSE;
+#endif
 
     PERL_SET_PHASE(PERL_PHASE_START);
 
@@ -2072,8 +2128,9 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
 	case 'W':
 	case 'X':
 	case 'w':
-	    if ((s = moreswitches(s)))
+	    if ((s = moreswitches(s))) {
 		goto reswitch;
+            }
 	    break;
 
 	case 't':
@@ -2192,6 +2249,7 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
 	default:
 	    Perl_croak(aTHX_ "Unrecognized switch: -%s  (-h will show valid options)",s);
 	}
+        WIN32_UPGRADE_ARGV;
     }
     }
 
@@ -2265,6 +2323,7 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
 #endif
 		} else {
 		    moreswitches(d);
+                    WIN32_UPGRADE_ARGV;
 		}
 	    }
 	}
@@ -2387,6 +2446,7 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit)
 	    linestr_sv = newSV_type(SVt_PV);
 	    lex_start_flags |= LEX_START_COPIED;
 	    find_beginning(linestr_sv, rsfp);
+            WIN32_UPGRADE_ARGV;
 	    if (cddir && PerlDir_chdir( (char *)cddir ) < 0)
 		Perl_croak(aTHX_ "Can't chdir to %s",cddir);
 	}
-- 
2.14.1.windows.1

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2018

From @tonycoz

On Tue, 04 Sep 2018 20​:40​:16 -0700, tonyc wrote​:

While this brings *horrible* experience to perl users, I don't think
it's a bug. Also, changing this behaviour would obviously break
stuff.

I think it's something we can improve.

The main issue right now is code that accepts strings from the
command-line get nonsensical results - unless the caller does
nonsensical things.

The attached patch modifies perl to re-generate argv from the UTF-16
command-line if it sees a -CA switch, and it works for me for commands
run from the command prompt.

Also, it breaks embedding, so don't apply this patch.

Maybe an alternative is to not make it depend on the -CA switch, but on the current code page.

If the current code page is 65001 then main() (in win32/runperl.c) could do the conversion to utf-8 I do in my patch.

The program then depends on the normal -CA behaviour to treat that as UTF-8, so perl code sees Unicode in @​ARGV.

It does mean that a user has to do something unusual (chcp 65001) to get reasonable behaviour.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 6, 2018

From @khwilliamson

On 09/05/2018 05​:20 PM, Tony Cook via RT wrote​:

On Tue, 04 Sep 2018 20​:40​:16 -0700, tonyc wrote​:

While this brings *horrible* experience to perl users, I don't think
it's a bug. Also, changing this behaviour would obviously break
stuff.

I think it's something we can improve.

The main issue right now is code that accepts strings from the
command-line get nonsensical results - unless the caller does
nonsensical things.

The attached patch modifies perl to re-generate argv from the UTF-16
command-line if it sees a -CA switch, and it works for me for commands
run from the command prompt.

Also, it breaks embedding, so don't apply this patch.

Maybe an alternative is to not make it depend on the -CA switch, but on the current code page.

If the current code page is 65001 then main() (in win32/runperl.c) could do the conversion to utf-8 I do in my patch.

The program then depends on the normal -CA behaviour to treat that as UTF-8, so perl code sees Unicode in @​ARGV.

It does mean that a user has to do something unusual (chcp 65001) to get reasonable behaviour.

Tony

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133496

I haven't looked at this thread in detail, but using script runs can be
used to disambiguate some things. Perhaps that would help, so think
about that possibility. I have unreleased code for Pod​::Simple that
improves CP1252 vs UTF-8 detection that might be instructive.

@p5pRT
Copy link
Author

p5pRT commented Sep 6, 2018

From @xenu

On Wed, 05 Sep 2018 16​:20​:03 -0700
"Tony Cook via RT" <perlbug-followup@​perl.org> wrote​:

Also, it breaks embedding, so don't apply this patch.

Maybe an alternative is to not make it depend on the -CA switch, but on the current code page.

If the current code page is 65001 then main() (in win32/runperl.c) could do the conversion to utf-8 I do in my patch.

The program then depends on the normal -CA behaviour to treat that as UTF-8, so perl code sees Unicode in @​ARGV.

It does mean that a user has to do something unusual (chcp 65001) to get reasonable behaviour.

Tony

You mean the console codepage? There are some problem with that approach.

Console codepages don't exist in windows subsystem applications (like
wperl.exe), GetConsoleCP() returns 0 in them​:

C​:\Users\xenu>wperl -MWin32 -E "open my($fh), '>', 'a.txt'; print {$fh} Win32​::GetConsoleCP()"
C​:\Users\xenu>type a.txt
0

Another problem is that it won't cover situations where it's impossible
to change console codepage, for example when perl.exe is launched via
explorer.exe (e.g. via .lnk shortcut or when some file extension is
associated with a perl script).

I think that the only reasonable way to fix the win32 unicode bug is to
introduce a way to globally force utf-8 everywhere, i.e. @​ARGV,
filenames and env variables. -C flag used to serve this exact purpose[1],
but this functionality was removed in 5.8.1.

IMO we should reintroduce that switch.

On second thought, I think, in the long run, we should enable unicode
handling by default and add a switch which would restore the old
behaviour for scripts that rely on it. IMO that would be the most
reasonable approach, because the current behaviour is *completely*
broken and I'm pretty sure that changing it would fix more code than it
would break.

[1] - https://perldoc.pl/5.8.0/perlrun#-C

@p5pRT
Copy link
Author

p5pRT commented Sep 11, 2018

From @tonycoz

On Thu, 06 Sep 2018 11​:44​:09 -0700, me@​xenu.pl wrote​:

On Wed, 05 Sep 2018 16​:20​:03 -0700
"Tony Cook via RT" <perlbug-followup@​perl.org> wrote​:

Also, it breaks embedding, so don't apply this patch.

Maybe an alternative is to not make it depend on the -CA switch, but
on the current code page.

If the current code page is 65001 then main() (in win32/runperl.c)
could do the conversion to utf-8 I do in my patch.

The program then depends on the normal -CA behaviour to treat that as
UTF-8, so perl code sees Unicode in @​ARGV.

It does mean that a user has to do something unusual (chcp 65001) to
get reasonable behaviour.

Tony

You mean the console codepage? There are some problem with that
approach.

Console codepages don't exist in windows subsystem applications (like
wperl.exe), GetConsoleCP() returns 0 in them​:

C​:\Users\xenu>wperl -MWin32 -E "open my($fh), '>', 'a.txt'; print
{$fh} Win32​::GetConsoleCP()"
C​:\Users\xenu>type a.txt
0

Another problem is that it won't cover situations where it's
impossible
to change console codepage, for example when perl.exe is launched via
explorer.exe (e.g. via .lnk shortcut or when some file extension is
associated with a perl script).

I think that the only reasonable way to fix the win32 unicode bug is
to
introduce a way to globally force utf-8 everywhere, i.e. @​ARGV,
filenames and env variables. -C flag used to serve this exact
purpose[1],
but this functionality was removed in 5.8.1.

The argv handling looks similar to what my patch does - with the same problem for embedding.

The wide system calls handling appears to assume all SVs are UTF-8 encoded, even without the SVf_UTF8 flag set.

IMO we should reintroduce that switch.

On second thought, I think, in the long run, we should enable unicode
handling by default and add a switch which would restore the old
behaviour for scripts that rely on it. IMO that would be the most
reasonable approach, because the current behaviour is *completely*
broken and I'm pretty sure that changing it would fix more code than
it
would break.

I think fixing @​ARGV would be reasonably painless for backcompat, but the rest wide-character support is too likely to break things, I think.

I wonder how much CPAN testing was done with -C for perl 5.8.0.

Tony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distro-mswin32 type-core Unicode and System Calls Bad interactions of syscalls and UTF-8
Projects
None yet
Development

No branches or pull requests

2 participants