Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.20 regression: '"X" !~ /[x]/i', when pattern is UTF-8 #14051

Closed
p5pRT opened this issue Aug 30, 2014 · 16 comments
Closed

5.20 regression: '"X" !~ /[x]/i', when pattern is UTF-8 #14051

p5pRT opened this issue Aug 30, 2014 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 30, 2014

Migrated from rt.perl.org#122655 (status was 'resolved')

Searchable as RT122655$

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2014

From @khwilliamson

This is a bug report for perl from khw@​khw.(none),
generated with the help of perlbug 1.40 running under perl 5.21.4.


A regression was introduced by this​:
commit 0ba8fae
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Feb 15 14​:45​:03 2014 -0700

  Convert more EXACTFish nodes to EXACT when possible

  Under /i matching, many characters match only themselves, such a
  punctuation. If a node contains only such characters it can be an
EXACT
  node. The optimizer gets better hints when dealing with EXACT nodes
  than ones with folding.

  This changes the alloc_maybe_populate() function to look for
  possibilities of non-folding input.

A patch is smoking.



Flags​:
  category=core
  severity=high


Site configuration information for perl 5.21.4​:

Configured by khw at Sat Aug 30 06​:56​:50 MDT 2014.

Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
  Commit id​: ef0f35a
  Platform​:
  osname=linux, osvers=3.13.0-35-generic,
archname=x86_64-linux-thread-multi-ld
  uname='linux khw 3.13.0-35-generic #62-ubuntu smp fri aug 15
01​:58​:42 utc 2014 x86_64 x86_64 x86_64 gnulinux '
  config_args='-des -Uversiononly -Dprefix=/home/khw/blead -Dusedevel
-D'optimize=-g' -A'optimize=-g' -A'optimize=-O0'
-Accflags='-DPERL_BOOL_AS_CHAR' -Dman1dir=none -Dman3dir=none
-DDEBUGGING -DDEBUGGING -Dcc=g++ -Dusemorebits -Dusethreads'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  use64bitint=define, use64bitall=define, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='g++', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR
-fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_FORTIFY_SOURCE=2',
  optimize='-g -g -O0',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR -fwrapv
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
  ccversion='', gccversion='4.8.2', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16, longdblkind=3
  ivtype='long', ivsize=8, nvtype='long double', nvsize=16,
Off_t='off_t', lseeksize=8
  alignbytes=16, prototype=define
  Linker and Libraries​:
  ld='g++', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/include/c++/4.8 /usr/include/x86_64-linux-gnu/c++/4.8
/usr/include/c++/4.8/backward /usr/local/lib
/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=libc-2.19.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.19'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -g -g -O0 -L/usr/local/lib
-fstack-protector'


@​INC for perl 5.21.4​:
  /home/khw/perl/blead/lib
  /home/khw/blead/lib/perl5/site_perl/5.21.4/x86_64-linux-thread-multi-ld
  /home/khw/blead/lib/perl5/site_perl/5.21.4
  /home/khw/blead/lib/perl5/5.21.4/x86_64-linux-thread-multi-ld
  /home/khw/blead/lib/perl5/5.21.4
  /home/khw/blead/lib/perl5/site_perl/5.21.3
  /home/khw/blead/lib/perl5/site_perl/5.21.2
  /home/khw/blead/lib/perl5/site_perl/5.21.1
  /home/khw/blead/lib/perl5/site_perl/5.20.0
  /home/khw/blead/lib/perl5/site_perl/5.19.12
  /home/khw/blead/lib/perl5/site_perl/5.19.11
  /home/khw/blead/lib/perl5/site_perl/5.19.10
  /home/khw/blead/lib/perl5/site_perl
  .


Environment for perl 5.21.4​:
  HOME=/home/khw
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/home/khw/bin​:/home/khw/perl5/perlbrew/bin​:/home/khw/print/bin​:/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/usr/games​:/usr/local/games​:/home/khw/iands/www​:/home/khw/cxoffice/bin
  PERL5OPT=-w
  PERL_BADLANG (unset)
  PERL_POD_PEDANTIC=1
  SHELL=/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2014

From @khwilliamson

I stumbled across this problem today. Since this is a 5.20 regression, it should go into a 5.20 maint release. Is it worth delaying 5.20.1 for?

The patch commitdiff is
http​://perl5.git.perl.org/perl.git/commitdiff/d2f82fd903e8eb20371a55c435f76f2365cd5202

The problem is I should have been testing for the characters being part of a fold pair, instead of the fold of one being different from itself. The latter test only works on uppercase input.
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2014

From @jkeenan

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Regression confirmed​:

#####
$ cat 122655-regex.pl
use Test​::More qw(no_plan);
my $x = qr/[x]/i;
utf8​::upgrade($x);
like("X", qr/$x/, "UTF-8 of /[x]/i matches upper case");
#####
$ perl -v | head -2 | tail -1
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux

$ perl 122655-regex.pl
ok 1 - UTF-8 of /[x]/i matches upper case
1..1
#####
$ perl -v | head -2 | tail -1
This is perl 5, version 20, subversion 0 (v5.20.0) built for x86_64-linux

$ perl 122655-regex.pl
not ok 1 - UTF-8 of /[x]/i matches upper case
# Failed test 'UTF-8 of /[x]/i matches upper case'
# at 122655-regex.pl line 4.
# 'X'
# doesn't match '(?^u​:(?^i​:[x]))'
1..1
# Looks like you failed 1 test of 1.
#####

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2014

From @jkeenan

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases is to correct regressions that have crept in to a major release, I vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not doing the release, so I don't know how much this burdens the Release Manager.)

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2014

From @jkeenan

On 8/30/14 3​:28 PM, James E Keenan via RT wrote​:

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases is to correct regressions that have crept in to a major release, I vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not doing the release, so I don't know how much this burdens the Release Manager.)

I checked out the maint-5.20 branch, then checked out a new branch on
top of that in which I applied khw's patch from earlier in this RT. All
tests PASSed.

I'm attaching the patch as it would apply to maint-5.20 branch. Since
both files to be patched have changed considerably in blead since
5.20.0, the patch was very manual. It may be useful for maint-5.20, but
double-check my handiwork.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2014

From @jkeenan

122655-Correct-5.20-regression-X-x-i.maint.5.20.patch
From 1081e0b54d547a5e2a62087689194e5c5f005876 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 30 Aug 2014 16:08:58 -0400
Subject: [PATCH] Correct  5.20 regression: '"X" !~ /[x]/i'

This problem occurs only when the pattern is UTF-8, contains a single ASCII
lowercase letter.  It does not match its uppercase counterpart.

For RT #122655
---
 regcomp.c  |   14 ++++++++++----
 t/re/pat.t |    8 +++++++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 8d4ebda..e991999 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -10976,10 +10976,16 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t *pRExC_state,
                           EBCDIC, but it works there, as the extra invariants
                           fold to themselves) */
                     *character = toFOLD((U8) code_point);
-                    if (downgradable
-                        && *character == code_point
-                        && ! HAS_NONLATIN1_FOLD_CLOSURE(code_point))
-                    {
+
+                    /* We can downgrade to an EXACT node if this character
+                     * isn't a folding one.  Note that this assumes that
+                     * nothing above Latin1 folds to some other invariant than
+                     * one of these alphabetics; otherwise we would also have
+                     * to check:
+                     *  && (! HAS_NONLATIN1_FOLD_CLOSURE(code_point)
+                     *      || ASCII_FOLD_RESTRICTED))
+                     */
+                    if (downgradable && PL_fold[code_point] == code_point) {
                         OP(node) = EXACT;
                     }
                 }
diff --git a/t/re/pat.t b/t/re/pat.t
index 04f8b84..51838f9 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan tests => 721;  # Update this when adding/deleting tests.
+plan tests => 722;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -1582,6 +1582,12 @@ EOP
 
 
 
+    {   # Was getting optimized into EXACT (non-folding node)
+        my $x = qr/[x]/i;
+        utf8::upgrade($x);
+        like("X", qr/$x/, "UTF-8 of /[x]/i matches upper case");
+    }
+
 } # End of sub run_tests
 
 1;
-- 
1.6.3.2

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2014

From @cpansprout

On Sat Aug 30 17​:26​:16 2014, jkeen@​verizon.net wrote​:

On 8/30/14 3​:28 PM, James E Keenan via RT wrote​:

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases
is to correct regressions that have crept in to a major release, I
vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not
doing the release, so I don't know how much this burdens the Release
Manager.)

I checked out the maint-5.20 branch, then checked out a new branch on
top of that in which I applied khw's patch from earlier in this RT.
All
tests PASSed.

I'm attaching the patch as it would apply to maint-5.20 branch. Since
both files to be patched have changed considerably in blead since
5.20.0, the patch was very manual. It may be useful for maint-5.20,
but
double-check my handiwork.

I would have to leave that to someone who knows the regexp engine better.

But backporting gets my +1.

(Whether it should delay 5.20.1 or go into 5.20.2 I decline to opine [hey, that rhymes!].)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @khwilliamson

Now in blead as b6e093f
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @khwilliamson

On Sat Aug 30 17​:26​:16 2014, jkeen@​verizon.net wrote​:

On 8/30/14 3​:28 PM, James E Keenan via RT wrote​:

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases
is to correct regressions that have crept in to a major release, I
vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not
doing the release, so I don't know how much this burdens the Release
Manager.)

I checked out the maint-5.20 branch, then checked out a new branch on
top of that in which I applied khw's patch from earlier in this RT.
All
tests PASSed.

I'm attaching the patch as it would apply to maint-5.20 branch. Since
both files to be patched have changed considerably in blead since
5.20.0, the patch was very manual. It may be useful for maint-5.20,
but
double-check my handiwork.

Thank you very much.
Jim Keenan

Your maint patch looks good to me
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @jkeenan

On Mon Sep 01 08​:04​:18 2014, khw wrote​:

On Sat Aug 30 17​:26​:16 2014, jkeen@​verizon.net wrote​:

On 8/30/14 3​:28 PM, James E Keenan via RT wrote​:

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases
is to correct regressions that have crept in to a major release, I
vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not
doing the release, so I don't know how much this burdens the Release
Manager.)

I checked out the maint-5.20 branch, then checked out a new branch on
top of that in which I applied khw's patch from earlier in this RT.
All
tests PASSed.

I'm attaching the patch as it would apply to maint-5.20 branch. Since
both files to be patched have changed considerably in blead since
5.20.0, the patch was very manual. It may be useful for maint-5.20,
but
double-check my handiwork.

Thank you very much.
Jim Keenan

Your maint patch looks good to me

I posted a request for it to go into 5.20.1 in one of the relevant perl5-porters mailing list threads.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2014

From @steve-m-hay

On 1 Sep 2014 20​:59, "James E Keenan via RT" <perlbug-followup@​perl.org>
wrote​:

On Mon Sep 01 08​:04​:18 2014, khw wrote​:

On Sat Aug 30 17​:26​:16 2014, jkeen@​verizon.net wrote​:

On 8/30/14 3​:28 PM, James E Keenan via RT wrote​:

On Sat Aug 30 10​:34​:24 2014, khw wrote​:

I stumbled across this problem today. Since this is a 5.20
regression, it should go into a 5.20 maint release. Is it worth
delaying 5.20.1 for?

Since, as I understand it, the main purpose of maintenance releases
is to correct regressions that have crept in to a major release, I
vote Yes on delaying 5.20.1 to get this in. (Of course, I'm not
doing the release, so I don't know how much this burdens the Release
Manager.)

I checked out the maint-5.20 branch, then checked out a new branch on
top of that in which I applied khw's patch from earlier in this RT.
All
tests PASSed.

I'm attaching the patch as it would apply to maint-5.20 branch. Since
both files to be patched have changed considerably in blead since
5.20.0, the patch was very manual. It may be useful for maint-5.20,
but
double-check my handiwork.

Thank you very much.
Jim Keenan

Your maint patch looks good to me

I posted a request for it to go into 5.20.1 in one of the relevant
perl5-porters mailing list threads.

Thanks guys. I will roll out an RC2 with this when I return from vacation
on Saturday.

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2014

From @steve-m-hay

On Mon Sep 01 23​:13​:19 2014, shay wrote​:

Thanks guys. I will roll out an RC2 with this when I return from vacation
on Saturday.

One day late, but now applied in commit db6d387.

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant