Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular Expression Matching Bug #1189

Closed
p5pRT opened this issue Feb 16, 2000 · 2 comments
Closed

Regular Expression Matching Bug #1189

p5pRT opened this issue Feb 16, 2000 · 2 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 16, 2000

Migrated from rt.perl.org#2158 (status was 'resolved')

Searchable as RT2158$

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2000

From ishisone@sra.co.jp

This is a bug report for perl from ishisone@​sra.co.jp,
generated with the help of perlbug 1.27 running under perl v5.5.650.

  "\n\n" =~ /\n $ \n/x and print "OK1\n";
  "\n\n" =~ /\n* $ \n/x and print "OK2\n";
  "\n\n" =~ /\n+ $ \n/x and print "OK3\n";

All of the above 3 matches should be successful, but with the current
perl only the first one succeeds.

The regexec.c​:regmatch() routine has a small optimization (avoiding
unnecessary backtracking) for patterns such as 'a+$', but the code
forgot the fact that '$' can match before and after newline.

Here's the patch to v5.5.650. This fixes the above bug, and also
makes \z (EOS) use this optimization.

*** regexec.c.org Mon Feb 7 04​:33​:00 2000
--- regexec.c Tue Feb 15 18​:25​:00 2000
***************
*** 3039,3046 ****
  n = regrepeat(scan, n);
  locinput = PL_reginput;
  if (ln < n && PL_regkind[(U8)OP(next)] == EOL &&
! (!PL_multiline || OP(next) == SEOL))
  ln = n; /* why back off? */
  REGCP_SET;
  if (paren) {
  while (n >= ln) {
--- 3039,3052 ----
  n = regrepeat(scan, n);
  locinput = PL_reginput;
  if (ln < n && PL_regkind[(U8)OP(next)] == EOL &&
! (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) {
  ln = n; /* why back off? */
+ /* ...because $ and \Z can match before *and* after
+ newline at the end. Consider "\n\n" =~ /\n+\Z\n/.
+ We should back off by one in this case. */
+ if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS)
+ ln--;
+ }
  REGCP_SET;
  if (paren) {
  while (n >= ln) {


Site configuration information for perl v5.5.650​:

Configured by ishisone at Tue Feb 15 09​:29​:56 JST 2000.

Summary of my perl5 (revision 5.0 version 5 subversion 650) configuration​:
  Platform​:
  osname=freebsd, osvers=2.2.8-release, archname=i386-freebsd
  uname='freebsd srapc459.sra.co.jp 2.2.8-release freebsd 2.2.8-release #23​: fri oct 22 18​:15​:23 jst 1999 ishisone@​srapc459.sra.co.jp​:usrsrcsyscompilesrapc459.v6 i386 '
  config_args=''
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef
  usesocks=undef useperlio=undef d_sfio=undef
  use64bits=undef uselargefiles=define usemultiplicity=undef
  Compiler​:
  cc='cc', optimize='-O', gccversion=2.7.2.1
  cppflags='-I/usr/local/include'
  ccflags ='-I/usr/local/include'
  stdchar='char', d_stdstdio=undef, usevfork=true
  intsize=4, longsize=4, ptrsize=4, doublesize=8
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries​:
  ld='ld', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-lm -lc -lcrypt
  libc=/usr/lib/libc.so.3.1, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
  cccdlflags='-DPIC -fpic', lddlflags='-Bshareable -L/usr/local/lib'

Locally applied patches​:
 


@​INC for perl v5.5.650​:
  /amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5
  /usr/local/lib/perl5/5.5.650/i386-freebsd
  /usr/local/lib/perl5/5.5.650
  /usr/local/lib/perl5/site_perl/5.5.650/i386-freebsd
  /usr/local/lib/perl5/site_perl/5.5.650
  /usr/local/lib/perl5/site_perl/5.005/i386-freebsd
  /usr/local/lib/perl5/site_perl/5.005
  /usr/local/lib/perl5/site_perl
  .


Environment for perl v5.5.650​:
  HOME=/amd/a/srapc451/mnt3/home/mgr/ishisone
  LANG=ja_JP.EUC
  LANGUAGE (unset)
  LC_COLLATE=C
  LC_TIME=C
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/amd/a/srapc451/mnt3/home/mgr/ishisone/bin​:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/i386-freebsd2​:/usr/X11R6/bin​:/usr/local/bin​:/usr/local/sbin​:/usr/sra/bin​:/usr/local/tdoc/bin​:/usr/local/emacs/bin​:/usr/new/mh​:/usr/local/bin/mh​:/usr/local/v6/bin​:/usr/local/v6/sbin​:/usr/ucb​:/usr/bin​:/usr/new​:/bin​:/etc​:/usr/etc​:/usr/sbin​:/sbin​:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/lastresort​:
  PERL5LIB=/amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5
  PERL_BADLANG=0
  SHELL=/usr/local/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2000

From [Unknown Contact. See original ticket]

Makoto Ishisone writes​:

"\\n\\n" =~ /\\n  $ \\n/x and print "OK1\\n";
"\\n\\n" =~ /\\n\* $ \\n/x and print "OK2\\n";
"\\n\\n" =~ /\\n\+ $ \\n/x and print "OK3\\n";

All of the above 3 matches should be successful, but with the current
perl only the first one succeeds.

The regexec.c​:regmatch() routine has a small optimization (avoiding
unnecessary backtracking) for patterns such as 'a+$', but the code
forgot the fact that '$' can match before and after newline.

Here's the patch to v5.5.650. This fixes the above bug, and also
makes \z (EOS) use this optimization.

*** regexec.c.org Mon Feb 7 04​:33​:00 2000
--- regexec.c Tue Feb 15 18​:25​:00 2000
***************
--- 3039,3052 ----
n = regrepeat(scan, n);
locinput = PL_reginput;
if (ln < n && PL_regkind[(U8)OP(next)] == EOL &&
- (!PL_multiline || OP(next) == SEOL))
+ (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) {
ln = n; /* why back off? */
+ /* ...because $ and \Z can match before *and* after
+ newline at the end. Consider "\n\n" =~ /\n+\Z\n/.
+ We should back off by one in this case. */
+ if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS)
+ ln--;
+ }
REGCP_SET;

This looks OK, but aren't there other similar places?

Ilya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant