Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent use of characters above \xFF in bitwise string ops #17022

Closed
p5pRT opened this issue May 25, 2019 · 7 comments
Closed

Prevent use of characters above \xFF in bitwise string ops #17022

p5pRT opened this issue May 25, 2019 · 7 comments
Milestone

Comments

@p5pRT
Copy link

p5pRT commented May 25, 2019

Migrated from rt.perl.org#134140 (status was 'pending release')

Searchable as RT134140$

@p5pRT
Copy link
Author

p5pRT commented May 25, 2019

From @jkeenan

pod/perldeprecation.pod contains this entry for perl-5.32​:

#####
Use of code points over 0xFF in string bitwise operators

The string bitwise operators, "&", "|", "^", and "~", treat
their operands as strings of bytes. As such, values above
0xFF are nonsensical. Some instances of these have been
deprecated since Perl 5.24, and were made fatal in 5.28,
but it turns out that in cases where the wide characters
did not affect the end result, no deprecation notice was
raised, and so remain legal. Now, all occurrences either
are fatal or raise a deprecation warning, so that the
remaining legal occurrences will be fatal in 5.32.

An example of this is

"" & "\x{100}"

The wide character is not used in the "&" operation because
the left operand is shorter. This now warns anyway.
#####

This entry was made in the following commit​:

#####
commit ba52ce1 Author​:
Karl Williamson <khw@​cpan.org> Date​: Tue Dec 19 16​:03​:39
2017 -0700

Deprecate above \xFF in bitwise string ops

This is already a fatal error for operations whose outcome
depends on them, but in things like

"abc" & "def\x{100}"

the wide character doesn't actually need to participate in
the AND, and so perl doesn't. As a result of the discussion
in the thread beginning with
http​://nntp.perl.org/group/perl.perl5.porters/244884, it
was decided to deprecate these ones too.
#####

Make it so.

@p5pRT
Copy link
Author

p5pRT commented May 25, 2019

From @jkeenan

Summary of my perl5 (revision 5 version 31 subversion 0) configuration​:
  Commit id​: 58f4626
  Platform​:
  osname=linux
  osvers=4.15.0-50-generic
  archname=x86_64-linux
  uname='linux zareason 4.15.0-50-generic #54-ubuntu smp mon may 6 18​:46​:08 utc 2019 x86_64 x86_64 x86_64 gnulinux '
  config_args='-des -Dusedevel -Dusemymalloc=y'
  hint=recommended
  useposix=true
  d_sigaction=define
  useithreads=undef
  usemultiplicity=undef
  use64bitint=define
  use64bitall=define
  uselongdouble=undef
  usemymalloc=y
  default_inc_excludes_dot=define
  bincompat5005=undef
  Compiler​:
  cc='cc'
  ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
  optimize='-O2'
  cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
  ccversion=''
  gccversion='7.4.0'
  gccosandvers=''
  intsize=4
  longsize=8
  ptrsize=8
  doublesize=8
  byteorder=12345678
  doublekind=3
  d_longlong=define
  longlongsize=8
  d_longdbl=define
  longdblsize=16
  longdblkind=3
  ivtype='long'
  ivsize=8
  nvtype='double'
  nvsize=8
  Off_t='off_t'
  lseeksize=8
  alignbytes=8
  prototype=define
  Linker and Libraries​:
  ld='cc'
  ldflags =' -fstack-protector-strong -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64
  libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
  perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
  libc=libc-2.27.so
  so=so
  useshrplib=false
  libperl=libperl.a
  gnulibc_version='2.27'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs
  dlext=so
  d_dlsymun=undef
  ccdlflags='-Wl,-E'
  cccdlflags='-fPIC'
  lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl)​:
  Compile-time options​:
  HAS_TIMES
  MYMALLOC
  PERLIO_LAYERS
  PERL_COPY_ON_WRITE
  PERL_DONT_CREATE_GVSV
  PERL_MALLOC_WRAP
  PERL_OP_PARENT
  PERL_PRESERVE_IVUV
  PERL_USE_DEVEL
  USE_64_BIT_ALL
  USE_64_BIT_INT
  USE_LARGE_FILES
  USE_LOCALE
  USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE
  USE_LOCALE_NUMERIC
  USE_LOCALE_TIME
  USE_PERLIO
  USE_PERL_ATOF
  Built under linux
  Compiled at May 22 2019 14​:58​:04
  %ENV​:
  PERL2DIR="/home/jkeenan/gitwork/perl2"
  PERLBREW_HOME="/home/jkeenan/.perlbrew"
  PERLBREW_MANPATH="/home/jkeenan/perl5/perlbrew/perls/perl-5.28.0/man"
  PERLBREW_PATH="/home/jkeenan/perl5/perlbrew/bin​:/home/jkeenan/perl5/perlbrew/perls/perl-5.28.0/bin"
  PERLBREW_PERL="perl-5.28.0"
  PERLBREW_ROOT="/home/jkeenan/perl5/perlbrew"
  PERLBREW_SHELLRC_VERSION="0.84"
  PERLBREW_VERSION="0.84"
  PERL_WORKDIR="/home/jkeenan/gitwork/perl"
  @​INC​:
  lib
  /usr/local/lib/perl5/site_perl/5.31.0/x86_64-linux
  /usr/local/lib/perl5/site_perl/5.31.0
  /usr/local/lib/perl5/5.31.0/x86_64-linux
  /usr/local/lib/perl5/5.31.0

@p5pRT
Copy link
Author

p5pRT commented May 28, 2019

From @jkeenan

On Sat, 25 May 2019 02​:13​:08 GMT, jkeenan@​pobox.com wrote​:

pod/perldeprecation.pod contains this entry for perl-5.32​:

#####
Use of code points over 0xFF in string bitwise operators

The string bitwise operators, "&", "|", "^", and "~", treat
their operands as strings of bytes. As such, values above
0xFF are nonsensical. Some instances of these have been
deprecated since Perl 5.24, and were made fatal in 5.28,
but it turns out that in cases where the wide characters
did not affect the end result, no deprecation notice was
raised, and so remain legal. Now, all occurrences either
are fatal or raise a deprecation warning, so that the
remaining legal occurrences will be fatal in 5.32.

An example of this is

"" & "\x{100}"

The wide character is not used in the "&" operation because
the left operand is shorter. This now warns anyway.
#####

This entry was made in the following commit​:

#####
commit ba52ce1 Author​:
Karl Williamson <khw@​cpan.org> Date​: Tue Dec 19 16​:03​:39
2017 -0700

Deprecate above \xFF in bitwise string ops

This is already a fatal error for operations whose outcome
depends on them, but in things like

"abc" & "def\x{100}"

the wide character doesn't actually need to participate in
the AND, and so perl doesn't. As a result of the discussion
in the thread beginning with
http​://nntp.perl.org/group/perl.perl5.porters/244884, it
was decided to deprecate these ones too.
#####

Make it so.

Please review patch attached. Available for smoking in this branch​:

smoke-me/jkeenan/rt-134140-bitwise

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented May 28, 2019

From @jkeenan

0001-Use-of-code-points-over-0xFF-in-string-bitwise-opera.patch
From 9c60d4d1fc88f4c92b7dd6b09691e8ef5828109d Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Mon, 27 May 2019 13:18:10 -0400
Subject: [PATCH] Use of code points over 0xFF in string bitwise operators

Implement complete fatalization.  Some instances of these were fatalized
in 5.28.  However, in cases where the wide characters did not affect the
end result, no deprecation notice was raised.  So they remained legal,
though deprecated.  Now, all occurrences are fatal (as of 5.32).

Modify source code in doop.c.  Adapt test file.  Update perldiag and
perldeprecation.

For:  RT 134140
---
 doop.c                  | 27 ++----------
 pod/perldeprecation.pod |  4 +-
 pod/perldiag.pod        |  2 +-
 t/op/bop.t              | 95 +++++++++++------------------------------
 4 files changed, 33 insertions(+), 95 deletions(-)

diff --git a/doop.c b/doop.c
index 54e35f10a6..c348afe292 100644
--- a/doop.c
+++ b/doop.c
@@ -1087,30 +1087,11 @@ Perl_do_vop(pTHX_ I32 optype, SV *sv, SV *left, SV *right)
      * on zeros without having to do it.  In the case of '&', the result is
      * zero, and the dangling portion is simply discarded.  For '|' and '^', the
      * result is the same as the other operand, so the dangling part is just
-     * appended to the final result, unchanged.  We currently accept above-FF
-     * code points in the dangling portion, as that's how it has long worked,
-     * and code depends on it staying that way.  But it is now fatal for
-     * above-FF to appear in the portion that does get operated on.  Hence, any
-     * above-FF must come only in the longer operand, and only in its dangling
-     * portion.  That means that at least one of the operands has to be
-     * entirely non-UTF-8, and the length of that operand has to be before the
-     * first above-FF in the other */
+     * appended to the final result, unchanged.  As of perl-5.32, we no longer
+     * accept above-FF code points in the dangling portion.
+     */
     if (left_utf8 || right_utf8) {
-        if (left_utf8) {
-            if (right_utf8 || rightlen > leftlen) {
-                Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]);
-            }
-            len = rightlen;
-        }
-        else if (right_utf8) {
-            if (leftlen > rightlen) {
-                Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]);
-            }
-            len = leftlen;
-        }
-
-        Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),
-                               DEPRECATED_ABOVE_FF_MSG, PL_op_desc[optype]);
+        Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]);
     }
     else {  /* Neither is UTF-8 */
         len = MIN(leftlen, rightlen);
diff --git a/pod/perldeprecation.pod b/pod/perldeprecation.pod
index 5213eca229..5395e11ac4 100644
--- a/pod/perldeprecation.pod
+++ b/pod/perldeprecation.pod
@@ -65,14 +65,14 @@ nonsensical. Some instances of these have been deprecated since Perl
 the wide characters did not affect the end result, no deprecation
 notice was raised, and so remain legal.  Now, all occurrences either are
 fatal or raise a deprecation warning, so that the remaining legal
-occurrences will be fatal in 5.32.
+occurrences become fatal in 5.32.
 
 An example of this is
 
  "" & "\x{100}"
 
 The wide character is not used in the C<&> operation because the left
-operand is shorter.  This now warns anyway.
+operand is shorter.  This now throws an exception.
 
 =head3 hostname() doesn't accept any arguments
 
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index f69b1b8367..a7176c342e 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -7338,7 +7338,7 @@ C<~>) on a string containing a code point over 0xFF.  The string bitwise
 operators treat their operands as strings of bytes, and values beyond
 0xFF are nonsensical in this context.
 
-This became fatal in Perl 5.28.
+Certain instances became fatal in Perl 5.28; others in perl 5.32.
 
 =item Use of strings with code points over 0xFF as arguments to C<vec>
 is deprecated. This will be a fatal error in Perl 5.32
diff --git a/t/op/bop.t b/t/op/bop.t
index 411d253a7b..666dfb8114 100644
--- a/t/op/bop.t
+++ b/t/op/bop.t
@@ -18,7 +18,7 @@ BEGIN {
 # If you find tests are failing, please try adding names to tests to track
 # down where the failure is, and supply your new names as a patch.
 # (Just-in-time test naming)
-plan tests => 504;
+plan tests => 491;
 
 # numerics
 ok ((0xdead & 0xbeef) == 0x9ead);
@@ -613,73 +613,30 @@ foreach my $op_info ([and => "&"], [or => "|"], [xor => "^"]) {
 }
 
 {
-    # Since these are temporary, and it was a pain to make them into loops,
-    # the code is just rolled out.
-    local $SIG{__WARN__} = sub { push @warnings, @_; };
-
-    undef @warnings;
-    is("abc" & "abc\x{100}", "abc", '"abc" & "abc\x{100}" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise and \(&\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    undef @warnings;
-    is("abc" | "abc\x{100}", "abc\x{100}", '"abc" | "abc\x{100}" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise or \(|\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    undef @warnings;
-    is("abc" ^ "abc\x{100}", "\0\0\0\x{100}", '"abc" ^ "abc\x{100}" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise xor \(\^\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    undef @warnings;
-    is("abc\x{100}" & "abc", "abc", '"abc\x{100}" & "abc" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise and \(&\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    undef @warnings;
-    is("abc\x{100}" | "abc", "abc\x{100}", '"abc\x{100}" | "abc" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise or \(|\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    undef @warnings;
-    is("abc\x{100}" ^ "abc", "\0\0\0\x{100}", '"abc\x{100}" ^ "abc" works');
-    if (! is(@warnings, 1, "... but returned a single warning")) {
-        diag join "\n", @warnings;
-    }
-    like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?#
-                            )arguments to bitwise xor \(\^\) operator (?#
-                            )is deprecated/,
-                        "... which is the expected warning");
-    no warnings 'deprecated';
-    undef @warnings;
-    my $foo = "abc" & "abc\x{100}";
-    $foo = "abc" | "abc\x{100}";
-    $foo = "abc" ^ "abc\x{100}";
-    $foo = "abc\x{100}" & "abc";
-    $foo = "abc\x{100}" | "abc";
-    $foo = "abc\x{100}" ^ "abc";
-    if (! is(@warnings, 0, "... And none of the last 6 main tests warns when 'deprecated' is off")) {
-        diag join "\n", @warnings;
+    # RT 134140 fatalizations
+    my %op_pairs = (
+        and => { low => 'and', high => '&', regex => qr/&/  },
+        or  => { low => 'or',  high => '|', regex => qr/\|/ },
+        xor => { low => 'xor', high => '^', regex => qr/\^/ },
+    );
+    my @combos = (
+        { string  => '"abc" & "abc\x{100}"',  op_pair => $op_pairs{and} },
+        { string  => '"abc" | "abc\x{100}"',  op_pair => $op_pairs{or}  },
+        { string  => '"abc" ^ "abc\x{100}"',  op_pair => $op_pairs{xor} },
+        { string  => '"abc\x{100}" & "abc"',  op_pair => $op_pairs{and} },
+        { string  => '"abc\x{100}" | "abc"',  op_pair => $op_pairs{or}  },
+        { string  => '"abc\x{100}" ^ "abc"',  op_pair => $op_pairs{xor} },
+
+    );
+
+    # Use of strings with code points over 0xFF as arguments to %s operator is not allowed
+    for my $h (@combos) {
+        my $s1 = "Use of strings with code points over 0xFF as arguments to bitwise";
+        my $s2 = "operator is not allowed";
+        my $expected  = qr/$s1 $h->{op_pair}->{low} \($h->{op_pair}->{regex}\) $s2/;
+        my $description = "$s1 $h->{op_pair}->{low} ($h->{op_pair}->{high}) operator is not allowed";
+        local $@;
+        eval $h->{string};
+        like $@, $expected, $description;
     }
 }
-- 
2.17.1

@p5pRT
Copy link
Author

p5pRT commented May 28, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 31, 2019

From @khwilliamson

Thanks, applied as c8b94fe
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented May 31, 2019

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT p5pRT closed this as completed May 31, 2019
@toddr toddr added this to the 5.32.0 milestone Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants