Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexps on ranges with wide characters fail once in perl -de 0 #7851

Closed
p5pRT opened this issue Mar 26, 2005 · 7 comments
Closed

regexps on ranges with wide characters fail once in perl -de 0 #7851

p5pRT opened this issue Mar 26, 2005 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 26, 2005

Migrated from rt.perl.org#34577 (status was 'resolved')

Searchable as RT34577$

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2005

From eric+perlbug@w3.org

Created by eric+perlbug@w3.org

The following regexp fails the first time and succeeds the second time
when executed in the perl debugger​:
  "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/

[[
spam​:/home/eric$ perl -de 0

Loading DB routines from perl5db.pl version 1.25
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main​::(-e​:1)​: 0
  DB<1> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/

  DB<2> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/
1
]]

I noticed this behavior a few months ago as well, so it has survived
serveral revs of debian sid's perl du'jour.
--
-eric

office​: +81.466.49.1170 W3C, Keio Research Institute at SFC,
  Shonan Fujisawa Campus, Keio University,
  5322 Endo, Fujisawa, Kanagawa 252-8520
  JAPAN
  +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell​: +81.90.6533.3882

(eric@​w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.4:

Configured by Debian Project at Thu Feb  3 01:11:27 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-8)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /etc/perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.3
    /usr/local/share/perl/5.8.3
    .


Environment for perl v5.8.4:
    HOME=/home/eric
    LANG=en_US
    LANGUAGE (unset)
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=:/home/eric/bin:/usr/local/sbin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/sbin:/usr/games:/bin:/usr/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2005

From @nwc10

Thanks for the bug report.

On Sat, Mar 26, 2005 at 06​:44​:49AM -0000, eric+perlbug @​ w3. org wrote​:

main​::(-e​:1)​: 0
DB<1> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/

DB<2> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/
1
]]

I noticed this behavior a few months ago as well, so it has survived
serveral revs of debian sid's perl du'jour.

It's still present in the development version of perl. I'm not sure if it's
related to bug 33185, which also involves UTF8 initialisation problems.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2005

From eric+perlbug@w3.org

On Sat, Mar 26, 2005 at 01​:45​:03PM -0000, Nicholas Clark via RT wrote​:

Thanks for the bug report.

On Sat, Mar 26, 2005 at 06​:44​:49AM -0000, eric+perlbug @​ w3. org wrote​:

main​::(-e​:1)​: 0
DB<1> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/

DB<2> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/
1
]]

I noticed this behavior a few months ago as well, so it has survived
serveral revs of debian sid's perl du'jour.

It's still present in the development version of perl. I'm not sure if it's
related to bug 33185, which also involves UTF8 initialisation problems.

Here's a transcript of a potentially related symptom​:
[[
  DB<1> $T_NCCHAR1 =
  "(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:[A-Z])|(?​:[a-z]))|(?​:[\x{00C0}-\x{00D6}]))|(?​:[\x{00D8}-\x{00F6}]))|(?​:[\x{00F8}-\x{02FF}]))|(?​:[\x{0370}-\x{037D}]))|(?​:[\x{037F}-\x{1FFF}]))|(?​:[\x{200C}-\x{200D}]))|(?​:[\x{2070}-\x{218F}]))|(?​:[\x{2C00}-\x{2FEF}]))|(?​:[\x{3001}-\x{D7FF}]))|(?​:[\x{F900}-\x{FFFE}])";

  DB<2> $regexp = $T_NCCHAR1

  DB<3> p "asdf" =~ m/\G($regexp)/gc

Debugger terminated
]]

where "Debugger terminated" means I killed it after 10 hours of using
99% of my CPU (i hadn't noticed it before, guess i'm not working hard
enough). This is repeatable on my system (which may have a funny
utf8_heavy.pl).
--
-eric

office​: +81.466.49.1170 W3C, Keio Research Institute at SFC,
  Shonan Fujisawa Campus, Keio University,
  5322 Endo, Fujisawa, Kanagawa 252-8520
  JAPAN
  +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell​: +81.90.6533.3882

(eric@​w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2005

From @iabyn

On Sat, Mar 26, 2005 at 01​:44​:34PM +0000, Nicholas Clark wrote​:

Thanks for the bug report.

On Sat, Mar 26, 2005 at 06​:44​:49AM -0000, eric+perlbug @​ w3. org wrote​:

main​::(-e​:1)​: 0
DB<1> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/

DB<2> p "\x{4E01}" =~ m/[\x{4E00}-\x{4E02}]/
1
]]

I noticed this behavior a few months ago as well, so it has survived
serveral revs of debian sid's perl du'jour.

It's still present in the development version of perl. I'm not sure if it's
related to bug 33185, which also involves UTF8 initialisation problems.

It's no longer present in bleedperl, and disappeared somewhere between
#24074 and #24149; so my fix #24084 for bug 33185 looks like it's fixed
this too; especially given that the problem goes away if you do a
'use utf8' in the debugger first.

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
  -- Things That Never Happen in "Star Trek" #7

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2005

From @iabyn

On Sun, Mar 27, 2005 at 07​:24​:07PM -0500, Eric Prud'hommeaux wrote​:

It's still present in the development version of perl. I'm not sure if it's
related to bug 33185, which also involves UTF8 initialisation problems.

Here's a transcript of a potentially related symptom​:
[[
DB<1> $T_NCCHAR1 =
"(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:(?​:[A-Z])|(?​:[a-z]))|(?​:[\x{00C0}-\x{00D6}]))|(?​:[\x{00D8}-\x{00F6}]))|(?​:[\x{00F8}-\x{02FF}]))|(?​:[\x{0370}-\x{037D}]))|(?​:[\x{037F}-\x{1FFF}]))|(?​:[\x{200C}-\x{200D}]))|(?​:[\x{2070}-\x{218F}]))|(?​:[\x{2C00}-\x{2FEF}]))|(?​:[\x{3001}-\x{D7FF}]))|(?​:[\x{F900}-\x{FFFE}])";

DB<2> $regexp = $T_NCCHAR1

DB<3> p "asdf" =~ m/\G($regexp)/gc

Debugger terminated
]]

where "Debugger terminated" means I killed it after 10 hours of using
99% of my CPU (i hadn't noticed it before, guess i'm not working hard
enough). This is repeatable on my system (which may have a funny
utf8_heavy.pl).

This also seems fixed in the latest bleedperl

--
The Enterprise is involved in a bizarre time-warp experience which is in
some way unconnected with the Late 20th Century.
  -- Things That Never Happen in "Star Trek" #14

@p5pRT p5pRT closed this as completed Apr 22, 2005
@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2005

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant