Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] regex with \p{IsAlpha} doesn't match properly #9720

Closed
p5pRT opened this issue Apr 23, 2009 · 7 comments
Closed

[PATCH] regex with \p{IsAlpha} doesn't match properly #9720

p5pRT opened this issue Apr 23, 2009 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 23, 2009

Migrated from rt.perl.org#65054 (status was 'resolved')

Searchable as RT65054$

@p5pRT
Copy link
Author

p5pRT commented Apr 23, 2009

From @ntyni

As reported by Srdjan <srdjan@​catalyst.net.nz> in
<http​://bugs.debian.org/230144>, this match currently
fails​:

"abcd" =~ /x*\p{IsAlpha}/

while this supposedly equivalent match succeeds​:

"abcd" =~ /x*[[​:alpha​:]]/

Reported at 5.8.2, verified with 5.8.8, 5.10.0 and current blead.

Output with -Dr​:

  Compiling REx "x*\p{IsAlpha}"
  synthetic stclass "ANYOF[x+utf8​::IsAlpha]".
  Final program​:
  1​: STAR (4)
  2​: EXACT <x> (0)
  4​: ANYOF[{unicode}+utf8​::IsAlpha] (16)
  16​: END (0)
  stclass ANYOF[x+utf8​::IsAlpha] minlen 1
  Omitting $` $&amp; $' support.
 
  EXECUTING...
 
  Matching REx "x*\p{IsAlpha}" against "a"
  Matching stclass ANYOF[x+utf8​::IsAlpha] against "a" (1 chars)
  Contradicts stclass... [regexec_flags]
  Match failed
  Freeing REx​: "x*\p{IsAlpha}"

It seems that the problem has to do with losing the UNICODE_ALL flag
inside S_study_chunk(). The second attachment is a naive patch that fixes
this for me without introducing other test failures. That doesn't mean
it's correct, of course, but hope it helps.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.10.0:

Configured by Debian Project at Mon Apr 13 05:15:55 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.26-1-openvz-amd64, archname=x86_64-linux-gnu-thread-multi
    uname='linux minerva 2.6.26-1-openvz-amd64 #1 smp fri mar 13 19:02:24 utc 2009 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.3.3', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.9.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0
    gnulibc_version='2.9'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /etc/perl
    /usr/local/lib/perl/5.10.0
    /usr/local/share/perl/5.10.0
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    .


Environment for perl 5.10.0:
    HOME=/home/niko
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_CTYPE=fi_FI.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/niko/bin:/home/niko/bin:/home/niko/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Apr 23, 2009

From @ntyni

0001-Add-TODO-test-for-x-p-IsAlpha-match-failure.patch
From bf52a1792a253b8bfd58b82d5b56581b2d6be397 Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Thu, 23 Apr 2009 00:00:27 +0300
Subject: [PATCH] Add TODO test for /x*\p{IsAlpha}/ match failure

As reported by Srdjan <srdjan@catalyst.net.nz> in
<http://bugs.debian.org/230144>, this match currently
fails:

 "abcd" =~ /x*\p{IsAlpha}/

while this supposedly equivalent match succeeds:

 "abcd" =~ /x*[[:alpha:]]/
---
 t/uni/class.t |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/t/uni/class.t b/t/uni/class.t
index 4620ca0..b755aae 100644
--- a/t/uni/class.t
+++ b/t/uni/class.t
@@ -4,7 +4,7 @@ BEGIN {
     require "test.pl";
 }
 
-plan tests => 5092;
+plan tests => 5094;
 
 sub MyUniClass {
   <<END;
@@ -71,6 +71,13 @@ is(($str =~ /(\p{Other::Class}+)/)[0], '@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_');
 # make sure it finds class in other OTHER package
 is(($str =~ /(\p{A::B::Intersection}+)/)[0], '@ABCDEFGHIJKLMNO');
 
+# see http://bugs.debian.org/230144
+{
+is(($str =~ /x*([[:alpha:]])/)[0], 'A');
+local $::TODO = "currently `contradicts stdclass'";
+is(($str =~ /x*(\p{IsAlpha})/)[0], 'A');
+}
+
 # all of these should look in lib/unicore/bc/AL.pl
 $str = "\x{070D}\x{070E}\x{070F}\x{0710}\x{0711}";
 is(($str =~ /(\P{BidiClass: ArabicLetter}+)/)[0], "\x{070E}\x{070F}");
-- 
1.5.6.5

@p5pRT
Copy link
Author

p5pRT commented Apr 23, 2009

From @ntyni

0002-Fix-x-p-IsAlpha-match-failure.patch
From 33c4880b1d1e03d0b33adf341c6f93f9387f51ba Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Thu, 23 Apr 2009 00:03:52 +0300
Subject: [PATCH] Fix /x*\p{IsAlpha}/ match failure

regexec_flags() says 'Contradicts stclass...' because
the ANYOF_UNICODE flag is lost while cl_and()ing
the classes.
---
 regcomp.c     |    2 +-
 t/uni/class.t |    3 ---
 2 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index e061528..150e7d0 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3665,7 +3665,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
 		
 		}
 		if (flags & SCF_DO_STCLASS_OR)
-		    cl_and(data->start_class, and_withp);
+		    cl_or(pRExC_state, data->start_class, and_withp);
 		flags &= ~SCF_DO_STCLASS;
 	    }
 	}
diff --git a/t/uni/class.t b/t/uni/class.t
index b755aae..3acaf76 100644
--- a/t/uni/class.t
+++ b/t/uni/class.t
@@ -72,11 +72,8 @@ is(($str =~ /(\p{Other::Class}+)/)[0], '@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_');
 is(($str =~ /(\p{A::B::Intersection}+)/)[0], '@ABCDEFGHIJKLMNO');
 
 # see http://bugs.debian.org/230144
-{
 is(($str =~ /x*([[:alpha:]])/)[0], 'A');
-local $::TODO = "currently `contradicts stdclass'";
 is(($str =~ /x*(\p{IsAlpha})/)[0], 'A');
-}
 
 # all of these should look in lib/unicore/bc/AL.pl
 $str = "\x{070D}\x{070E}\x{070F}\x{0710}\x{0711}";
-- 
1.5.6.5

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2009

jarich@perltraining.com.au - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

From @khwilliamson

This was fixed in 5.13.11
--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

@khwilliamson - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Mar 25, 2011
@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

From @khwilliamson

This was fixed in 5.13.11
--Karl Williamson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant