Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX character class [:upper:] is broken #11975

Closed
p5pRT opened this issue Feb 27, 2012 · 8 comments
Closed

POSIX character class [:upper:] is broken #11975

p5pRT opened this issue Feb 27, 2012 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 27, 2012

Migrated from rt.perl.org#111400 (status was 'resolved')

Searchable as RT111400$

@p5pRT
Copy link
Author

p5pRT commented Feb 27, 2012

From rkitover@cpan.org

This is a bug report for perl from rkitover@​cpan.org,
generated with the help of perlbug 1.39 running under perl 5.15.8.


Running this command​:

perl -le 'use utf8; print "is uppercase" if "ḿ" =~ /^[[​:upper​:]]\z/'

prints "is uppercase" but should not

This command​:

perl -le 'use utf8; print "wtf" if "\x{1e3f}" =~ /^[[​:upper​:]]\z/'

prints "wtf" but should not



Flags​:
  category=core
  severity=high


Site configuration information for perl 5.15.8​:

Configured by rkitover at Sun Feb 26 21​:49​:34 EST 2012.

Summary of my perl5 (revision 5 version 15 subversion 8) configuration​:
  Snapshot of​: a892b81
  Platform​:
  osname=linux, osvers=3.2.2, archname=i686-linux-64int-ld
  uname='linux hlagh 3.2.2 #2 smp preempt mon jan 30 20​:32​:07 est 2012 i686 gnulinux '
  config_args='-de -Dprefix=/home/rkitover/perl5/perlbrew/perls/perl-blead -DDEBUGGING -Doptimize=-ggdb3 -Dusemorebits -Dusedevel -Acc=mygcc'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=undef, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc=' mygcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-ggdb3',
  cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.6.2', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld=' mygcc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.13'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -ggdb3 -L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.15.8​:
  /home/rkitover/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.15.8/i686-linux-64int-ld
  /home/rkitover/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.15.8
  /home/rkitover/perl5/perlbrew/perls/perl-blead/lib/5.15.8/i686-linux-64int-ld
  /home/rkitover/perl5/perlbrew/perls/perl-blead/lib/5.15.8
  .


Environment for perl 5.15.8​:
  HOME=/home/rkitover
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LC_CTYPE=en_US.UTF-8
  LD_LIBRARY_PATH=/home/rkitover/lib​:/usr/local/lib​:/lib​:/usr/lib​:/usr/lib/i386-linux-gnu​:/home/rkitover/instantclient_10_2​:/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib​:/opt/sqlanywhere11/lib32​:/home/informix/lib​:/home/informix/lib/esql
  LOGDIR (unset)
  PATH=/home/rkitover/perl5/perlbrew/bin​:/home/rkitover/perl5/perlbrew/perls/perl-blead/bin​:/home/rkitover/bin​:/home/rkitover/perl5/bin​:/home/rkitover/instantclient_10_2​:/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin​:/usr/local/bin​:/usr/local/sbin​:/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/bin/X11​:/usr/games​:/opt/cxoffice/bin​:/opt/cxgames/bin​:/home/rkitover/android-sdk/tools​:/home/informix/bin​:/home/sybase/DBISQL/bin​:/home/sybase/OCS-15_0/bin​:/home/sybase/ASE-15_0/bin​:/home/sybase/ASE-15_0/install​:/home/db2inst1/sqllib/bin​:/home/db2inst1/sqllib/adm​:/home/db2inst1/sqllib/misc​:/home/db2inst1/sqllib/db2tss/bin
  PERLBREW_HOME=/home/rkitover/.perlbrew
  PERLBREW_MANPATH=/home/rkitover/perl5/perlbrew/perls/perl-5.14.2/man
  PERLBREW_PATH=/home/rkitover/perl5/perlbrew/bin​:/home/rkitover/perl5/perlbrew/perls/perl-blead/bin
  PERLBREW_PERL=perl-blead
  PERLBREW_ROOT=/home/rkitover/perl5/perlbrew
  PERLBREW_VERSION=0.27
  PERLDB_OPTS=NonStop
  PERL_AUTOINSTALL_PREFER_CPAN=1
  PERL_BADLANG (unset)
  PERL_CPANM_OPT=--notest --mirror http​://cpan.cpantesters.org/
  PERL_DBD_ODBC_PREFER_UNIXODBC=1
  PERL_DL_NONLAZY=1
  PERL_MM_OPT=INSTALLMAN1DIR=none INSTALLMAN3DIR=none
  SHELL=zsh

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

From @jkeenan

On Mon Feb 27 13​:38​:06 2012, caelum wrote​:

-----------------------------------------------------------------
Running this command​:

perl -le 'use utf8; print "is uppercase" if "ḿ" =~ /^[[​:upper​:]]\z/'

prints "is uppercase" but should not

This command​:

perl -le 'use utf8; print "wtf" if "\x{1e3f}" =~ /^[[​:upper​:]]\z/'

prints "wtf" but should not

Using the attached program, 111400.pl, in Perl 5.14.2, I cannot
reproduce your results. My program's output DWIMs​:

#####
$ perl 111400.pl
2. is lowercase

3. is uppercase

6. is lowercase

8. wtf
#####

But I notice that you are using 5.15.8. Could you try with some version
of Perl 5.14 so that we can see if this is a bug in devel?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

From @jkeenan

111400.pl

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

From @jkeenan

On Mon Feb 27 18​:07​:32 2012, jkeenan wrote​:

But I notice that you are using 5.15.8. Could you try with some version
of Perl 5.14 so that we can see if this is a bug in devel?

Hmm, when I run it with blead, I *do* reproduce your results.

#####
$ ./perl -Ilib ~/learn/perl/p5p/111400.pl
2. is lowercase

3. is uppercase

5. is uppercase
6. is lowercase

7. wtf
8. wtf
#####

So we have a bug confirmed as having been introduced somewhere in Perl 5.15.

jimk

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

From @nwc10

On Mon, Feb 27, 2012 at 01​:38​:07PM -0800, rkitover@​cpan.org wrote​:

Running this command​:

perl -le 'use utf8; print "is uppercase" if "???" =~ /^[[​:upper​:]]\z/'

prints "is uppercase" but should not

This command​:

perl -le 'use utf8; print "wtf" if "\x{1e3f}" =~ /^[[​:upper​:]]\z/'

prints "wtf" but should not

Running this​:

../perl/Porting/bisect.pl -le 'use utf8; die "wtf" if "\x{1e3f}" =~ /^[[​:upper​:]]\z/'

[yes, really, it's that simple. I changed the print to a die, to make the
error case exit non-zero, and everything else is the same]

finds this​:

HEAD is now at ea317cc regcomp.c​: Use compiled-in inversion lists
bad - non-zero exit from ./perl -Ilib -l -e use utf8; die "wtf" if "\x{1e3f}" =~ /^[[​:upper​:]]\z/
ea317cc is the first bad commit
commit ea317cc
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Feb 4 17​:08​:58 2012 -0700

  regcomp.c​: Use compiled-in inversion lists

  This uses the compiled inversion lists to generate Posix character
  classes and things like \v, \s inside bracketed character classes.

  This paves the way for future optimizations, and fixes the bug which has
  no formal bug number that /[[​:ascii​:]]/i matched non-Ascii characters,
  such as the Kelvin sign, unlike /\p{ascii}/i.

:040000 040000 d06dc60300803101cef2e81a1d2f8cd5fb00172a a3d57076077cd1581c3d4599720266b05d39e42a M pod
:100644 100644 f55b36188149ae584e27ef899f34b83d39ba6d3d 157e06ed1c58cb9a8b0cbc99d32f431791a92c6e M regcomp.c
:040000 040000 ee3c635187a759a3145483fcf76addd727efbeb3 8f60aaeccf02689f1679883ba32f68869b2031b9 M t
:100644 100644 e61f5746abefa6dce30ed7c913a7af0e49b3e7b9 b9efb366b0f3cbe92ec3eb510fb09318291733b7 M utf8.c
bisect run success
That took 1948 seconds

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

From @khwilliamson

Fixed by commit b4069bc

Thanks for finding this

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2012

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant