Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected regex failure. #8210

Closed
p5pRT opened this issue Nov 15, 2005 · 6 comments
Closed

Unexpected regex failure. #8210

p5pRT opened this issue Nov 15, 2005 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 15, 2005

Migrated from rt.perl.org#37688 (status was 'resolved')

Searchable as RT37688$

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2005

From @Abigail

Created by @Abigail

  #!/usr/bin/perl

  use strict;
  use warnings;

  $_ = "A B";
  print /^(.)\s+.(?(1))/ ? "match\n" : "no match\n";
  print /^(.)\s+.$(?(1))/ ? "match\n" : "no match\n";
  print /^(.)\s+.(?(1))$/ ? "match\n" : "no match\n";
  print /^(.)\s+.$(?(1))$/ ? "match\n" : "no match\n";
  print /^(.)\s.$(?(1))/ ? "match\n" : "no match\n";
  __END__
  match
  no match
  match
  match
  match

I would have expected that all the cases matched. This behaviour first
happened in 5.6.0, and persists in 5.9.2. With 5.005 and 5.005_04, I
get 'match' five times, as expected.

Replacing the conditional '(?(1))' with another conditional still causes
non matching of the second regex. But a different placement of $ (or an
extra $), or replacing \s+ with \s makes the regex match. Removing the
parenthesis (and replacing the conditional with another conditional that's
always true) makes that the regex matches as well.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.7:

Configured by abigail at Wed Jun  1 21:50:09 CEST 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.4.18-bf2.4, archname=i686-linux-64int-ld
    uname='linux alexandra 2.4.18-bf2.4 #1 son apr 14 09:53:28 cest 2002 i686 unknown '
    config_args='-des -Dusemorebits -Uversiononly -Dmydomain=.abigail.nl -Dcf_email=abigail@abigail.nl -Dperladmin=abigail@abigail.nl -Doptimize=-g -Dcc=gcc -Dprefix=/opt/perl'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=define
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-g',
    cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='3.0.4', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.2.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    no-syntax-warnings
    defined-or


@INC for perl v5.8.7:
    /home/abigail/Perl
    /opt/perl/lib/5.8.7/i686-linux-64int-ld
    /opt/perl/lib/5.8.7
    /opt/perl/lib/site_perl/5.8.7/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.7
    /opt/perl/lib/site_perl/5.8.6/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.6
    /opt/perl/lib/site_perl/5.8.5/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.5
    /opt/perl/lib/site_perl/5.8.4/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.4
    /opt/perl/lib/site_perl/5.8.3/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.3
    /opt/perl/lib/site_perl/5.8.2/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.2
    /opt/perl/lib/site_perl/5.8.1/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.1
    /opt/perl/lib/site_perl/5.8.0/i686-linux-64int-ld
    /opt/perl/lib/site_perl/5.8.0
    /opt/perl/lib/site_perl
    .


Environment for perl v5.8.7:
    HOME=/home/abigail
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/lib
    LOGDIR (unset)
    PATH=/home/abigail/Bin:/opt/perl/bin:/usr/local/bin:/usr/local/X11/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/usr/share/texmf/bin:/opt/Acrobat/bin:/opt/java/blackdown/j2sdk1.3.1/bin:/usr/local/games/bin
    PERL5LIB=/home/abigail/Perl
    PERLDIR=/opt/perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2005

From @hvds

"abigail@​abigail.nl (via RT)" <perlbug-followup@​perl.org> wrote​:
: $_ = "A B";
: print /^(.)\s+.$(?(1))/ ? "match\n" : "no match\n";
[...]
: no match

Hmm​:

zen% perl -Dr -e '"A B" =~ /^(.)\s+.$(?(1))/' 2>&1 | grep 'Guessing\|STCLASS'
Guessing start of match, REx `^(.)\s+.$(?(1))' against `A B'...
This position contradicts STCLASS...
zen% grep -nF 'contradicts STCLASS' regexec.c
889​: "This position contradicts STCLASS...\n") );
zen% gdb ./perl
(gdb) set args -e '"A B" =~ /^(.)\s+.$(?(1))/'
(gdb) break regexec.c​:878
Breakpoint 1 at 0x81040d8​: file regexec.c, line 878.
(gdb) run
Breakpoint 1, Perl_re_intuit_start (prog=0x815c1d0, sv=0x8154b60,
  strpos=0x8165e90 "A B", strend=0x8165e93 "", flags=3, data=0x0)
  at regexec.c​:878
878 s = find_byclass(prog, prog->regstclass, s, endpos, 1);
(gdb) p endpos
$1 = 0x1 <Address 0x1 out of bounds>
(gdb) p check_at
$2 = 0x0
(gdb) shell grep -nF 'check_at =' regexec.c
416​: char *check_at = Nullch; /* check substr found at this pos */
597​: check_at = s;
(gdb) break 597
Breakpoint 2 at 0x810317f​: file regexec.c, line 597.
(gdb) run
Breakpoint 1, Perl_re_intuit_start (prog=0x815c1d0, sv=0x8154b60,
  strpos=0x8165e90 "A B", strend=0x8165e93 "", flags=3, data=0x0)
  at regexec.c​:878
878 s = find_byclass(prog, prog->regstclass, s, endpos, 1);
(gdb)

So, we fail the STCLASS check due to a bum endpos, acquired because we are
assuming check_at is set to something useful, but we never reached the line
where it would have been set.

Further investigation (setting breakpoints on all the labels after line 597)
showed that it was the 'goto success_at_start' that skipped past the setting
of check_at, and inserting a 'check_at = s;' just before that goto allows
the match to succeed.

This is plumber programming I'm afraid, I don't have time right now to
explore the full meaning of all this and establish for sure whether this
is the correct and ideal fix, or characterise the precise failure modes
that this would address. But intuit_start is probably the least
maintainable piece of code in perl, and waiting for that could take years.

The patch does at least pass all existing tests and the new one. It would
probably be useful to slap some asserts around the place as well, but I'll
leave that for someone with more time on their hands.

Hugo

Inline Patch
--- t/op/re_tests.old	Sun Mar 27 15:26:05 2005
+++ t/op/re_tests	Wed Nov 16 14:15:17 2005
@@ -958,3 +958,4 @@
 (a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&"foo"})(b|c)	aaaaaaaaaaaaaaab	n	-	-
 ^(a*?)(?!(aa|aaaa)*$)	aaaaaaaaaaaaaaaaaaaa	y	$1	a	# [perl #34195]
 ^(a*?)(?!(aa|aaaa)*$)(?=a\z)	aaaaaaaa	y	$1	aaaaaaa
+^(.)\s+.$(?(1))	A B	y	$1	A	# [perl #37688]
--- regexec.c.old	Fri Nov  4 19:07:37 2005
+++ regexec.c	Wed Nov 16 14:32:46 2005
@@ -518,6 +518,7 @@
 		     || ((slen = SvCUR(check)) > 1
 			 && memNE(SvPVX_const(check), s, slen)))
 		goto report_neq;
+	    check_at = s;
 	    goto success_at_start;
 	  }
 	}

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2005

From @rgs

hv@​crypt.org wrote​:

This is plumber programming I'm afraid, I don't have time right now to
explore the full meaning of all this and establish for sure whether this
is the correct and ideal fix, or characterise the precise failure modes
that this would address. But intuit_start is probably the least
maintainable piece of code in perl, and waiting for that could take years.

The patch does at least pass all existing tests and the new one. It would
probably be useful to slap some asserts around the place as well, but I'll
leave that for someone with more time on their hands.

Thanks, applied to blead as change #26137.

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2005

From @andk

On Tue, 15 Nov 2005 15​:12​:46 -0800, "abigail@​abigail.nl (via RT)" <perlbug-followup@​perl.org> said​:

  > This is a bug report for perl from abigail@​abigail.nl,
  > generated with the help of perlbug 1.35 running under perl v5.8.7.

  > -----------------------------------------------------------------
  > [Please enter your report here]

  > #!/usr/bin/perl

  > use strict;
  > use warnings;

  > $_ = "A B";
  > print /^(.)\s+.(?(1))/ ? "match\n" : "no match\n";
  > print /^(.)\s+.$(?(1))/ ? "match\n" : "no match\n";
  > print /^(.)\s+.(?(1))$/ ? "match\n" : "no match\n";
  > print /^(.)\s+.$(?(1))$/ ? "match\n" : "no match\n";
  > print /^(.)\s.$(?(1))/ ? "match\n" : "no match\n";
  > __END__
  > match
  > no match
  > match
  > match
  > match

  > I would have expected that all the cases matched. This behaviour first
  > happened in 5.6.0, and persists in 5.9.2. With 5.005 and 5.005_04, I
  > get 'match' five times, as expected.

  > ccversion='', gccversion='3.0.4', gccosandvers=''

binary search did not find the responsible patch because there is an
interference with the gcc version. The same perl compiled with gcc
3.2.3(20030415) prints 5 time "match", but compiled with gcc
4.0.3(20051023) gives the same result as above.

Does anybody have other interesting observations on this?

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2006

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant