Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\N{} incompatibility in 5.12+ #10367

Closed
p5pRT opened this issue May 8, 2010 · 13 comments
Closed

\N{} incompatibility in 5.12+ #10367

p5pRT opened this issue May 8, 2010 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented May 8, 2010

Migrated from rt.perl.org#74978 (status was 'resolved')

Searchable as RT74978$

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From tokuhirom@gpath.example.org

Created by tokuhirom@gpath.example.org

following one liner fails with perl 5.12.0.

perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT PARENTHESIS}./;print "ok\n";'

Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE in m/\N{U+FF08} <-- HERE ./ at -e line 1.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.12.0:

Configured by tokuhirom at Wed Apr 28 17:18:47 JST 2010.

Summary of my perl5 (revision 5 version 12 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=2.6.31-17-server, archname=x86_64-linux
    uname='linux gpath 2.6.31-17-server #54-ubuntu smp thu dec 10 18:06:56 utc 2009 x86_64 gnulinux '
    config_args='-d -Dprefix=/usr/local/app/perl-5.12.0/ -Duse64bitint'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.1', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.10.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.10.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.12.0:
    /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0/x86_64-linux
    /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0
    /usr/local/app/perl-5.12.0/lib/5.12.0/x86_64-linux
    /usr/local/app/perl-5.12.0/lib/5.12.0
    .


Environment for perl 5.12.0:
    HOME=/home/tokuhirom
    LANG=ja_JP.UTF-8
    LANGUAGE (unset)
    LC_DATE=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/tokuhirom/share/dotfiles/local/bin/:/home/tokuhirom/share/dotfiles/local/bin/
    PERL_AUTOINSTALL=--defaultdeps
    PERL_BADLANG=0
    PERL_CPANM_DEV=1
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @tokuhirom

Created by tokuhirom@gmail.com

following one liner works in perl5.10.0, but it fails with perl 5.12.0

% perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT
PARENTHESIS}./;print "ok\n";'
Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE
in m/\N{U+FF08} <-- HERE ./ at -e line 1.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.12.0:

Configured by tokuhirom at Wed Apr 28 17:18:47 JST 2010.

Summary of my perl5 (revision 5 version 12 subversion 0) configuration:

  Platform:
    osname=linux, osvers=2.6.31-17-server, archname=x86_64-linux
    uname='linux gpath 2.6.31-17-server #54-ubuntu smp thu dec 10
18:06:56 utc 2009 x86_64 gnulinux '
    config_args='-d -Dprefix=/usr/local/app/perl-5.12.0/ -Duse64bitint'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.1', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.10.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.10.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:



@INC for perl 5.12.0:
    /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0/x86_64-linux
    /usr/local/app/perl-5.12.0/lib/site_perl/5.12.0
    /usr/local/app/perl-5.12.0/lib/5.12.0/x86_64-linux
    /usr/local/app/perl-5.12.0/lib/5.12.0
    .


Environment for perl 5.12.0:
    HOME=/home/tokuhirom
    LANG=ja_JP.UTF-8
    LANGUAGE (unset)
    LC_DATE=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/home/tokuhirom/bin:/home/tokuhirom/local/bin/:/usr/local/bin/:/usr/local/app/perl-5.12.0/bin/:/usr/local/app/perl/bin/:/usr/local/mysql/bin/:/usr/local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/tokuhirom/share/dotfiles/local/bin/:/home/tokuhirom/share/dotfiles/local/bin/
    PERL_AUTOINSTALL=--defaultdeps
    PERL_BADLANG=0
    PERL_CPANM_DEV=1
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @khwilliamson

Tokuhiro Matsuno (via RT) wrote​:

# New Ticket Created by "Tokuhiro Matsuno"
# Please include the string​: [perl #74982]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=74982 >

This is a bug report for perl from tokuhirom@​gmail.com,
generated with the help of perlbug 1.39 running under perl 5.12.0.

-----------------------------------------------------------------
[Please describe your issue here]

following one liner works in perl5.10.0, but it fails with perl 5.12.0

% perl -e 'use charnames "​:full"; /\N{FULLWIDTH LEFT
PARENTHESIS}./;print "ok\n";'
Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE
in m/\N{U+FF08} <-- HERE ./ at -e line 1.

[Please do not change anything below this line]
-----------------------------------------------------------------

Thanks for the bug report. I was the one who introduced the bug. I'm
sorry. I will have a patch available today. In the meantime, the
problem turns out to be the period just after the '}'. If you remove
that, it will work.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @obra

Thanks for the bug report. I was the one who introduced the bug.
I'm sorry. I will have a patch available today. In the meantime,
the problem turns out to be the period just after the '}'. If you
remove that, it will work.

I'm going to hold 5.12.1 RC1 for this.

Best,
Jesse
--

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @khwilliamson

Attached is a minimal patch to fix this. There are two other commits
that add comments to a .t file so that someone later won't have to work
as hard as I did at finding where to put the tests for something similar.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @khwilliamson

0001-Comment-where-to-find-file-s-format.patch
From ce65c312b89d6f851ca46d24719e07bce288ee99 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 13:12:53 -0600
Subject: [PATCH] Comment where to find file's format

---
 t/re/re_tests |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/t/re/re_tests b/t/re/re_tests
index 1807ffc..b7471d9 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1,5 +1,5 @@
 # This stops me getting screenfulls of syntax errors every time I accidentally
-# run this file via a shell glob
+# run this file via a shell glob.  Format of this file is given in regexp.t
 __END__
 abc	abc	y	$&	abc
 abc	abc	y	$-[0]	0
-- 
1.5.6.3

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @khwilliamson

0002-Note-in-comment-that-many-N-.-tests-won-t-work-h.patch
From 50e44d09a829eed4eeabf9ce78d3374a5f785d4f Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 13:38:27 -0600
Subject: [PATCH] Note in comment that many \N{...} tests won't work here

---
 t/re/re_tests |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/t/re/re_tests b/t/re/re_tests
index b7471d9..c550b5a 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1,5 +1,7 @@
 # This stops me getting screenfulls of syntax errors every time I accidentally
 # run this file via a shell glob.  Format of this file is given in regexp.t
+# Can't use \N{VALID NAME TEST} here because need 'use charnames'; but can use 
+# \N{U+valid} here.
 __END__
 abc	abc	y	$&	abc
 abc	abc	y	$-[0]	0
-- 
1.5.6.3

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @khwilliamson

0003-PATCH-perl-74978-dot-after-breaks-N.patch
From 1bb86a94fea493dd6213e60ed8e19b51b8ceea0c Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Sat, 8 May 2010 14:06:10 -0600
Subject: [PATCH] PATCH [perl #74978] dot after } breaks \N{}

The problem is that a dot can come between the braces in \N{foo.bar},
but when searching for it, I didn't stop looking at the right brace, so
it generated an error inappropriately.

This is essentially a minimum patch; efficiency could be improved
slightly with a little more work.
---
 regcomp.c  |    8 +++-----
 t/re/pat.t |    8 +++++++-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index f665f0b..be5acdb 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -6762,11 +6762,10 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp)
 	    | PERL_SCAN_DISALLOW_PREFIX
 	    | (SIZE_ONLY ? PERL_SCAN_SILENT_ILLDIGIT : 0);
     
-	char * endchar = strchr(RExC_parse, '.');
-	if (endchar) {
+	char * endchar = RExC_parse + strcspn(RExC_parse, ".}");
+	if (endchar < endbrace) {
 	    ckWARNreg(endchar, "Using just the first character returned by \\N{} in character class");
 	}
-	else endchar = endbrace;
 
 	length_of_hex = (STRLEN)(endchar - RExC_parse);
 	*valuep = grok_hex(RExC_parse, &length_of_hex, &flags, NULL);
@@ -6817,8 +6816,7 @@ S_reg_namedseq(pTHX_ RExC_state_t *pRExC_state, UV *valuep, I32 *flagp)
 
 	    /* Code points are separated by dots.  If none, there is only one
 	     * code point, and is terminated by the brace */
-	    endchar = strchr(RExC_parse, '.');
-	    if (! endchar) endchar = endbrace;
+	    endchar = RExC_parse + strcspn(RExC_parse, ".}");
 
 	    /* The values are Unicode even on EBCDIC machines */
 	    length_of_hex = (STRLEN)(endchar - RExC_parse);
diff --git a/t/re/pat.t b/t/re/pat.t
index 40ae52e..7b9594c 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -23,7 +23,7 @@ BEGIN {
 }
 
 
-plan tests => 297;  # Update this when adding/deleting tests.
+plan tests => 299;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -987,6 +987,12 @@ sub run_tests {
         ok "abbbbc" =~ m/\N{3,4}/ && $& eq "abbb", '"abbbbc" =~ m/\N{3,4}/ && $& eq "abbb"';
     }
 
+    {
+        use charnames ":full";
+        local $Message = '[perl #74982] Period coming after \N{}';
+        ok "\x{ff08}." =~ m/\N{FULLWIDTH LEFT PARENTHESIS}./ && $& eq "\x{ff08}.";
+        ok "\x{ff08}." =~ m/[\N{FULLWIDTH LEFT PARENTHESIS}]./ && $& eq "\x{ff08}.";
+    }
 
 } # End of sub run_tests
 
-- 
1.5.6.3

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 8, 2010

From @obra

On Sat, May 08, 2010 at 02​:18​:33PM -0600, karl williamson wrote​:

Attached is a minimal patch to fix this. There are two other
commits that add comments to a .t file so that someone later won't
have to work as hard as I did at finding where to put the tests for
something similar.

Thanks. Applied. +1 to backport the code patch for .1.

-Jesse

@p5pRT
Copy link
Author

p5pRT commented May 9, 2010

From @xdg

On Sat, May 8, 2010 at 5​:34 PM, Jesse Vincent <jesse@​fsck.com> wrote​:

On Sat, May 08, 2010 at 02​:18​:33PM -0600, karl williamson wrote​:

Attached is a minimal patch to fix this.  There are two other
commits that add comments to a .t file so that someone later won't
have to work as hard as I did at finding where to put the tests for
something similar.

Thanks. Applied.  +1 to backport the code patch for .1.

-Jesse

agreed. +1 to backport

@p5pRT
Copy link
Author

p5pRT commented May 10, 2010

@rgs - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant