Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows %ENV and utf8 #15855

Open
p5pRT opened this issue Jan 31, 2017 · 12 comments
Open

Windows %ENV and utf8 #15855

p5pRT opened this issue Jan 31, 2017 · 12 comments
Labels
distro-mswin32 do not merge Don't merge this PR, at least for now type-core Unicode and System Calls Bad interactions of syscalls and UTF-8

Comments

@p5pRT
Copy link

p5pRT commented Jan 31, 2017

Migrated from rt.perl.org#130683 (status was 'open')

Searchable as RT130683$

@p5pRT
Copy link
Author

p5pRT commented Jan 31, 2017

From Gordon.Weekly@mathworks.com

Created by gweekly@mathworks.com

To​: perlbug@​perl.org
Message-Id​: <5.20.2_9828_1485889356@​mail-vif.mathworks.com>
From​: gweekly@​mathworks.com
Cc​: gordon@​weekly.org
Reply-To​: gweekly@​mathworks.com
Subject​: Windows %ENV and utf8

This is a bug report for perl from gweekly@​mathworks.com,
generated with the help of perlbug 1.40 running under perl 5.20.2.

-----------------------------------------------------------------
There is a BUG in the Windows handling of the %ENV hash since perl-5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in %ENV hash

See my posting in the PerlMonks thread​:
http​://perlmonks.pair.com/index.pl?node_id=1180341

# This script attempts to expose the bug that if a utf8 variable is
# used as the key to delete a hash entry (that doesn't already exist)
# a non-deletable key comes into existence, in violation of the
# documentation for delete that says "Setting a hash element to the
# undefined value does not remove its key, but deleting it does..."
# It also shows that something is definitely wrong, as the hash can
# end up with two identical keys.

# The DATA area shows the results for running this script, making it
# evident the bug was introduced in perl-5.18 and has persisted since.

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.20.2:

Configured by batserve at Thu Dec  1 12:24:33 2016.

Summary of my perl5 (revision 5 version 20 subversion 2) configuration:

  Platform:
    osname=MSWin32, osvers=6.1, archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-DWIN32 -D_WIN32_WINNT=0x0503 -DWINVER=0x0503 -D_WIN32_IE=0x0600 -D_CRT_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_SECURE_SCL=0 /D_MBCS /D_WINDOWS -D__SSE__ -D__SSE2__ -FC -MD -nologo -Z7     -O2 -GS- -O1 -MD -Zi -DNDEBUG -GL -fp:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO',
    optimize='-O1 -MD -Zi -DNDEBUG -GL -fp:precise',
    cppflags='-DWIN32'
    ccversion='18.00.31101', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-incremental:NO -map -nologo -debug -largeaddressaware -errorReport:prompt -dynamicbase:no'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl520.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -incremental:NO -map -nologo -debug -largeaddressaware -errorReport:prompt -dynamicbase:no'



@INC for perl 5.20.2:
    //mathworks/hub/win64/apps/bat/apps/perl/perl-5.20.2-mw-016/site/lib
    //mathworks/hub/win64/apps/bat/apps/perl/perl-5.20.2-mw-016/lib
    .


Environment for perl 5.20.2:
    HOME=C:\emacs243
   LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\Perl64\site\bin;C:\Perl64\bin;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\MathWorks\DevelTools\bin;C:\Program Files\Perforce;H:\mytools;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Perforce\DVCS\;;C:\Strawberry-perl-5.20.2.1\c\bin;C:\Strawberry-perl-5.20.2.1\perl\site\bin;C:\Strawberry-perl-5.20.2.1\perl\bin
    PERL_BADLANG (unset)
    SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2017

From @jkeenan

On Tue, 31 Jan 2017 19​:38​:34 GMT, Gordon.Weekly@​mathworks.com wrote​:

To​: perlbug@​perl.org
Message-Id​: <5.20.2_9828_1485889356@​mail-vif.mathworks.com>
From​: gweekly@​mathworks.com
Cc​: gordon@​weekly.org
Reply-To​: gweekly@​mathworks.com
Subject​: Windows %ENV and utf8

This is a bug report for perl from gweekly@​mathworks.com,
generated with the help of perlbug 1.40 running under perl 5.20.2.

-----------------------------------------------------------------
There is a BUG in the Windows handling of the %ENV hash since perl-
5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in %ENV
hash

See my posting in the PerlMonks thread​:
http​://perlmonks.pair.com/index.pl?node_id=1180341

# This script attempts to expose the bug that if a utf8 variable is
# used as the key to delete a hash entry (that doesn't already exist)
# a non-deletable key comes into existence, in violation of the
# documentation for delete that says "Setting a hash element to the
# undefined value does not remove its key, but deleting it does..."
# It also shows that something is definitely wrong, as the hash can
# end up with two identical keys.

# The DATA area shows the results for running this script, making it
# evident the bug was introduced in perl-5.18 and has persisted since.

I am attaching a file downloaded from perlmonks with the OP's source code and running results.

Thank you very much.
Jim Keenan

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2017

From @jkeenan

1180341.pl

# -*- perl -*-
# badperl   25-Jan-2017 11:04 gweekly

# This script attempts to expose the bug that if a utf8 variable is
# used as the key to delete a hash entry (that doesn't already exist)
# a non-deletable key comes into existence, in violation of the
# documentation for delete that says "Setting a hash element to the
# undefined value does not remove its key, but deleting it does..."
# It also shows that something is definitely wrong, as the hash can
# end up with two identical keys.

# The DATA area shows the results for running this script, making it
# evident the bug was introduced in perl-5.18 and has persisted since.

use strict;
use warnings;
use Devel::Peek; # Dump
use Encode;      # decode

print "Running Perl $]\n";

# Pre-condition
if ( exists $ENV{SAM} ) {
    print "Note: removing SAM=$ENV{SAM} from the environment\n";
    delete $ENV{SAM};
}

my $utf8 = Encode::decode('utf8','SAM');

my $fixed = substr $utf8, 0;

# Confirm $utf8 and $fixed both are eq 'SAM';
die 'utf8 is not SAM'  if $utf8 ne 'SAM';
die 'fixed is not SAM' if $fixed ne 'SAM';
die 'fixed is not utf8' if $fixed ne $utf8;

print "Deleting ENV{\$utf8} where utf8 eq SAM\n";
delete $ENV{$utf8}; # Here badness happens

if ( defined $ENV{SAM} ) {
    die "WRONG: ENV{SAM} is defined: '$ENV{SAM}'\n";
}
if ( defined $ENV{$utf8} ) {
    die "WRONG: ENV{\$utf8} is defined: '$ENV{$utf8}'\n";
}

if ( exists $ENV{SAM} ) {
    die "WRONG: ENV{SAM} exists\n";
}
print exists $ENV{$utf8}  ? "WRONG: ENV{\$utf8} exists\n"
    :                       "OKAY: ENV{\$utf8} does not exist\n";

if ( my @sams = grep {$_ eq 'SAM'} keys %ENV ) {
    if ( @sams > 1 ) {
        die "Surpise: ENV has ". @sams ." SAM keys: @sams\n";
    }
    else {
        print "WRONG: ENV has the key 'SAM' - @sams\n";
    }
    if ( exists $ENV{$fixed} ) {
        die "Surprise: \$ENV{\$fixed} DOES exist\n";
    }
}

print "Now, assign a new value:\n";
$ENV{$utf8}='newVal';

if ( ! defined $ENV{SAM} ) {
    die " ENV{SAM} is not defined\n";
}
print defined $ENV{$utf8} ? "OKAY: ENV{\$utf8} is defined: '$ENV{$utf8}'\n"
    :                       "OKAY: ENV{\$utf8} is not defined\n";

if ( ! exists $ENV{SAM} ) {
    die " ENV{SAM} does not exist\n";
}

print exists $ENV{$utf8}  ? "OKAY: ENV{\$utf8} exists\n"
    :                       "OKAY: ENV{\$utf8} does not exist\n";

my @sams = grep {$_ eq 'SAM'} keys %ENV
    or die "No SAM keys?";

if ( @sams > 1 ) {
    print "WRONG: ENV has ". @sams ." SAM keys: @sams\n";
    for ( @sams ) {
        Dump $_;
        my $ans = $ENV{$_} || '<undef>';
        print "  Value = $ans\n";
    }
}

if ( ! exists $ENV{$fixed} ) {
    die "Surprise: \$ENV{\$fixed} does NOT exist\n";
}

print "Now, delete the entry\n";
delete $ENV{$fixed};

if ( exists $ENV{$fixed} ) {
    die "Surprise: \$ENV{\$fixed} does exist\n";
}

print "Done testing\n";

__END__

C:\Users\gweekly>c:\perl\bin\perl i:\bin\badperl
Running Perl 5.008008
Deleting ENV{$utf8} where utf8 eq SAM
OKAY: ENV{$utf8} does not exist
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
Now, delete the entry
Done testing

C:\Users\gweekly>c:\ActivePerl-5.14.2\bin\perl i:\bin\badperl
Running Perl 5.014002
Deleting ENV{$utf8} where utf8 eq SAM
OKAY: ENV{$utf8} does not exist
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
Now, delete the entry
Done testing

C:\Users\gweekly>c:\ActivePerl5.16.3\bin\perl i:\bin\badperl
Running Perl 5.016003
Deleting ENV{$utf8} where utf8 eq SAM
OKAY: ENV{$utf8} does not exist
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
Now, delete the entry
Done testing

C:\Users\gweekly>c:\ActivePerl5.18.1\bin\perl i:\bin\badperl
Running Perl 5.018001
Deleting ENV{$utf8} where utf8 eq SAM
WRONG: ENV{$utf8} exists
WRONG: ENV has the key 'SAM' - SAM
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
WRONG: ENV has 2 SAM keys: SAM SAM
SV = PV(0x2890b2c) at 0x5015b4
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x288ea78 "SAM" [UTF8 "SAM"]
  CUR = 3
  LEN = 0
  Value = newVal
SV = PV(0x2890b24) at 0x5015cc
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x288e988 "SAM"
  CUR = 3
  LEN = 0
  Value = newVal
Now, delete the entry
Done testing

C:\Users\gweekly>\\mathworks\hub\win64\apps\bat\perl\latest520\bin\perl i:\bin\badperl
Running Perl 5.020002
Deleting ENV{$utf8} where utf8 eq SAM
WRONG: ENV{$utf8} exists
WRONG: ENV has the key 'SAM' - SAM
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
WRONG: ENV has 2 SAM keys: SAM SAM
SV = PV(0x23556c8) at 0x3f1970
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x2361e78 "SAM"
  CUR = 3
  LEN = 0
  Value = newVal
SV = PV(0x23556b8) at 0x3f1910
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x2361f68 "SAM" [UTF8 "SAM"]
  CUR = 3
  LEN = 0
  Value = newVal
Now, delete the entry
Done testing

C:\Users\gweekly>c:\Strawberry-perl-5.20.2.1\perl\bin\perl i:\bin\badperl
Running Perl 5.020002
Deleting ENV{$utf8} where utf8 eq SAM
WRONG: ENV{$utf8} exists
WRONG: ENV has the key 'SAM' - SAM
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
WRONG: ENV has 2 SAM keys: SAM SAM
SV = PV(0x25562b8) at 0x5dd698
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x2527e58 "SAM"
  CUR = 3
  LEN = 0
  Value = newVal
SV = PV(0x25562a8) at 0x5dd6c8
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x2527db8 "SAM" [UTF8 "SAM"]
  CUR = 3
  LEN = 0
  Value = newVal
Now, delete the entry
Done testing

C:\Users\gweekly>c:\perl64\bin\perl i:\bin\badperl
Running Perl 5.024001
Deleting ENV{$utf8} where utf8 eq SAM
WRONG: ENV{$utf8} exists
WRONG: ENV has the key 'SAM' - SAM
Now, assign a new value:
OKAY: ENV{$utf8} is defined: 'newVal'
OKAY: ENV{$utf8} exists
WRONG: ENV has 2 SAM keys: SAM SAM
SV = PV(0x3ed3d8) at 0x30f488
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x30b8b8 "SAM" [UTF8 "SAM"]
  CUR = 3
  LEN = 0
  Value = newVal
SV = PV(0x3ed3e8) at 0x30f470
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x23bc458 "SAM"
  CUR = 3
  LEN = 0
  Value = newVal
Now, delete the entry
Done testing

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2017

From @tonycoz

On Tue, 31 Jan 2017 11​:38​:34 -0800, Gordon.Weekly@​mathworks.com wrote​:

There is a BUG in the Windows handling of the %ENV hash since perl-
5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in %ENV
hash

This bisects down to​:

0ddecb9 is the first bad commit
commit 0ddecb9
Author​: Ruslan Zakirov <ruz@​bestpractical.com>
Date​: Sat Oct 6 02​:30​:18 2012 +0400

  there is no obvious reason not to set flags

  I don't see any reason not to set flags properly in this
  branch. It doesn't look like any useful optimization.

  It's probably even a bug, but probably it can only be hit from
  a XS code. To hit the bug keysv should be provided, be UTF8
  and not SvIsCOW_shared_hash, but with flags containing
  HVhek_KEYCANONICAL.

Reverting that fixes the problem, I haven't tried to track down exactly why yet.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 12, 2018

From @khwilliamson

On Sun, 12 Feb 2017 21​:53​:15 -0800, tonyc wrote​:

On Tue, 31 Jan 2017 11​:38​:34 -0800, Gordon.Weekly@​mathworks.com wrote​:

There is a BUG in the Windows handling of the %ENV hash since perl-
5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in
%ENV
hash

This bisects down to​:

0ddecb9 is the first bad commit
commit 0ddecb9
Author​: Ruslan Zakirov <ruz@​bestpractical.com>
Date​: Sat Oct 6 02​:30​:18 2012 +0400

there is no obvious reason not to set flags

I don't see any reason not to set flags properly in this
branch. It doesn't look like any useful optimization.

It's probably even a bug, but probably it can only be hit from
a XS code. To hit the bug keysv should be provided, be UTF8
and not SvIsCOW_shared_hash, but with flags containing
HVhek_KEYCANONICAL.

Reverting that fixes the problem, I haven't tried to track down
exactly why yet.

Tony

Tony, Any reason not to proceed with the reverting?
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 13, 2018

From @tonycoz

On Mon, Mar 12, 2018 at 01​:12​:45PM -0700, Karl Williamson via RT wrote​:

On Sun, 12 Feb 2017 21​:53​:15 -0800, tonyc wrote​:

On Tue, 31 Jan 2017 11​:38​:34 -0800, Gordon.Weekly@​mathworks.com wrote​:

There is a BUG in the Windows handling of the %ENV hash since perl-
5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in
%ENV
hash

This bisects down to​:

0ddecb9 is the first bad commit
commit 0ddecb9
Author​: Ruslan Zakirov <ruz@​bestpractical.com>
Date​: Sat Oct 6 02​:30​:18 2012 +0400

there is no obvious reason not to set flags

I don't see any reason not to set flags properly in this
branch. It doesn't look like any useful optimization.

It's probably even a bug, but probably it can only be hit from
a XS code. To hit the bug keysv should be provided, be UTF8
and not SvIsCOW_shared_hash, but with flags containing
HVhek_KEYCANONICAL.

Reverting that fixes the problem, I haven't tried to track down
exactly why yet.

Tony

Tony, Any reason not to proceed with the reverting?

Because I don't know the cause of the problem, or if the original
patch actually fixed a problem (it has no tests.)

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2019

From @khwilliamson

On Mon, 12 Mar 2018 17​:25​:50 -0700, tonyc wrote​:

On Mon, Mar 12, 2018 at 01​:12​:45PM -0700, Karl Williamson via RT wrote​:

On Sun, 12 Feb 2017 21​:53​:15 -0800, tonyc wrote​:

On Tue, 31 Jan 2017 11​:38​:34 -0800, Gordon.Weekly@​mathworks.com wrote​:

There is a BUG in the Windows handling of the %ENV hash since perl-
5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.

The problem derives from the way Perl itself handles the built-in
%ENV
hash

This bisects down to​:

0ddecb9 is the first bad commit
commit 0ddecb9
Author​: Ruslan Zakirov <ruz@​bestpractical.com>
Date​: Sat Oct 6 02​:30​:18 2012 +0400

there is no obvious reason not to set flags

I don't see any reason not to set flags properly in this
branch. It doesn't look like any useful optimization.

It's probably even a bug, but probably it can only be hit from
a XS code. To hit the bug keysv should be provided, be UTF8
and not SvIsCOW_shared_hash, but with flags containing
HVhek_KEYCANONICAL.

Reverting that fixes the problem, I haven't tried to track down
exactly why yet.

Tony

Tony, Any reason not to proceed with the reverting?

Because I don't know the cause of the problem, or if the original
patch actually fixed a problem (it has no tests.)

Tony

How about we revert this early in 5.31, and see what happens?
--
Karl Williamson

@toddr
Copy link
Member

toddr commented Feb 14, 2020

@khwilliamson did this get reverted?

@khwilliamson
Copy link
Contributor

I believe it fell through the cracks

@khwilliamson
Copy link
Contributor

Should there be a 5.33.1 milestone that applies to this?

@toddr toddr added this to the 5.33.1 milestone Feb 14, 2020
@toddr
Copy link
Member

toddr commented Feb 14, 2020

Should there be a 5.33.1 milestone that applies to this?

That's a really good idea. Done!

@xsawyerx xsawyerx added the do not merge Don't merge this PR, at least for now label Jun 20, 2020
khwilliamson added a commit that referenced this issue Jul 30, 2020
@xenu xenu added the Unicode and System Calls Bad interactions of syscalls and UTF-8 label Oct 20, 2021
@hvds hvds removed this from the 5.33.1 milestone Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distro-mswin32 do not merge Don't merge this PR, at least for now type-core Unicode and System Calls Bad interactions of syscalls and UTF-8
Projects
None yet
Development

No branches or pull requests

6 participants