Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-char fold + its fold in char class #11304

Closed
p5pRT opened this issue May 3, 2011 · 7 comments
Closed

multi-char fold + its fold in char class #11304

p5pRT opened this issue May 3, 2011 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented May 3, 2011

Migrated from rt.perl.org#89774 (status was 'resolved')

Searchable as RT89774$

@p5pRT
Copy link
Author

p5pRT commented May 3, 2011

From @khwilliamson

This is a bug report for perl from public@​khwilliamson.com,
generated with the help of perlbug 1.39 running under perl 5.14.0.


If a bracketed character class includes a character that has a
multi-char fold, and it also includes the first character of that fold,
the multi-char fold will never be matched; just the first character of
the fold.

Spotted by Nicholas Clark.


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.14.0​:

Configured by khw at Tue May 3 08​:46​:17 MDT 2011.

Summary of my perl5 (revision 5 version 14 subversion 0) configuration​:
  Commit id​: 2cf7ccf
  Platform​:
  osname=linux, osvers=2.6.35-28-generic-pae,
archname=i686-linux-thread-multi-64int-ld
  uname='linux karl 2.6.35-28-generic-pae #50-ubuntu smp fri mar 18
20​:43​:15 utc 2011 i686 gnulinux '
  config_args='-des -Dprefix=/home/khw/blead -Dusedevel
-D'optimize=-ggdb3' -A'optimize=-ggdb3' -A'optimize=-O0' -Dman1dir=none
-Dman3dir=none -DDEBUGGING -Dusemorebits -Dusethreads'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=undef, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O0 -ggdb3',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.4.5', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long long', ivsize=8, nvtype='long double', nvsize=12,
Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib
/usr/lib/i686-linux-gnu
  libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=/lib/libc-2.12.1.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.12.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -ggdb3 -ggdb3 -O0
-L/usr/local/lib -fstack-protector'

Locally applied patches​:
  RC1


@​INC for perl 5.14.0​:

/home/khw/blead/lib/perl5/site_perl/5.14.0/i686-linux-thread-multi-64int-ld
  /home/khw/blead/lib/perl5/site_perl/5.14.0
  /home/khw/blead/lib/perl5/5.14.0/i686-linux-thread-multi-64int-ld
  /home/khw/blead/lib/perl5/5.14.0
  /home/khw/blead/lib/perl5/site_perl
  .


Environment for perl 5.14.0​:
  HOME=/home/khw
  LANG=en_US.UTF-8
  LANGUAGE=en_US​:en
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/home/khw/bin​:/home/khw/print/bin​:/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/usr/games​:/home/khw/cxoffice/bin
  PERL5OPT=-w
  PERL_BADLANG (unset)
  SHELL=/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @khwilliamson

Fixed by commit
9d53c45

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

@khwilliamson - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Oct 14, 2012
@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @cpansprout

On Sun Oct 14 08​:12​:41 2012, khw wrote​:

Fixed by commit
9d53c45

Does this affect look-behind?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @khwilliamson

On 10/14/2012 11​:53 AM, Father Chrysostomos via RT wrote​:

On Sun Oct 14 08​:12​:41 2012, khw wrote​:

Fixed by commit
9d53c45

Does this affect look-behind?

I don't understand your question. Look-behind of classes that contain
multi-char folds did and do not work because we currently only do
fixed-length look-behind, and these by definition are not fixed-length.
  However, the fix includes what no one disagreed with before, which is
only if the character with a multi-character fold is mentioned
explicitly individually in the character class will its multi-char fold
be considered. So, look behind of /[\0-\xff]/i should now start
working, because the single character in its range that has a multi-char
fold is not mentioned explicitly. /[\0-\xff\xdf]/i will still match 1
or 2 characters, and so cannot be used in look-behind.

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @cpansprout

On Sun Oct 14 12​:08​:01 2012, public@​khwilliamson.com wrote​:

On 10/14/2012 11​:53 AM, Father Chrysostomos via RT wrote​:

On Sun Oct 14 08​:12​:41 2012, khw wrote​:

Fixed by commit
9d53c45

Does this affect look-behind?

I don't understand your question. Look-behind of classes that contain
multi-char folds did and do not work because we currently only do
fixed-length look-behind, and these by definition are not fixed-length.
However, the fix includes what no one disagreed with before, which is
only if the character with a multi-character fold is mentioned
explicitly individually in the character class will its multi-char fold
be considered. So, look behind of /[\0-\xff]/i should now start
working, because the single character in its range that has a multi-char
fold is not mentioned explicitly. /[\0-\xff\xdf]/i will still match 1
or 2 characters, and so cannot be used in look-behind.

Thank you. I couldn’t remember the details.

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant