Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex: Alternations within negative lookahead assertions #7807

Closed
p5pRT opened this issue Feb 20, 2005 · 16 comments
Closed

Regex: Alternations within negative lookahead assertions #7807

p5pRT opened this issue Feb 20, 2005 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 20, 2005

Migrated from rt.perl.org#34195 (status was 'resolved')

Searchable as RT34195$

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2005

From mike@mikero.com

Created by mike@mikero.com

## this seems to be related to Ticket #23030

## /^(aa|aaaa)*$/ is equivalent to /^(aa)*$/
## they both match strings of a's of even length

## this works​:

if ( ("a" x 19) !~ /^(aa)*$/ ) {
  print "19 a's don't match /^(aa)*\$/\n";
}

## and so does this​:

if ( ("a" x 19) !~ /^(aa|aaaa)*$/ ) {
  print "19 a's don't match /^(aa|aaaa)*\$/\n";
}

## thus ("a" x 20) should match /^(a*?)(?!(aa|aaaa)*$)/
## with $1 = "a", but it doesn't!

if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) {
  print "first matched​: ($1|$')\n";
  ## doesn't work!
}

## it works without the alternation

if ( ("a" x 20) =~ /^(a*?)(?!(aa)*$)/ ) {
  print "second matched​: ($1|$')\n";
}

## changing the * to + causes it to match with
## $1 = ("a" x 19), which is closer, but still
## incorrect

if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)+$)/ ) {
  print "third matched​: ($1|$')\n";
}

## also, changing the order of (aaaa|aa)* also doesn't work.

Perl Info



Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.4:

Configured by Debian Project at Thu Feb  3 01:11:27 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686
    gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
    -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr
    -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8
    -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
    -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
    -Dsitelib=/usr/local/share/perl/5.8.4
    -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1
    -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
    -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
    -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm
    -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define
    usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
    -DDEBIAN -fno-strict-aliasing -I/usr/local/include
    -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN
    -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-8)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
    lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true,
    libperl=libperl.so.5.8.4
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /etc/perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.3
    /usr/local/share/perl/5.8.3
    /usr/local/lib/perl/5.8.2
    /usr/local/share/perl/5.8.2
    /usr/local/lib/perl/5.8.1
    /usr/local/share/perl/5.8.1
    /usr/local/lib/perl/5.8.0
    /usr/local/share/perl/5.8.0
    .

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2005

From @demerphq

On 20 Feb 2005 22​:22​:46 -0000, via RT Mike Rosulek
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Mike Rosulek
# Please include the string​: [perl #34195]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=34195 >

This is a bug report for perl from mike@​mikero.com,
generated with the help of perlbug 1.35 running under perl v5.8.4.

-----------------------------------------------------------------
[Please enter your report here]

## this seems to be related to Ticket #23030

## /^(aa|aaaa)*$/ is equivalent to /^(aa)*$/
## they both match strings of a's of even length

## this works​:

if ( ("a" x 19) !~ /^(aa)*$/ ) {
print "19 a's don't match /^(aa)*\$/\n";
}

## and so does this​:

if ( ("a" x 19) !~ /^(aa|aaaa)*$/ ) {
print "19 a's don't match /^(aa|aaaa)*\$/\n";
}

## thus ("a" x 20) should match /^(a*?)(?!(aa|aaaa)*$)/
## with $1 = "a", but it doesn't!

Yeah I agree. Blead perl shows this problem too. It looks like it has
to do with caching from the debug output​:

"Detected a super-linear match, switching on caching..."

Is reported just a bit before the incorrect fail.

yves

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2005

From @hvds

Mike Rosulek (via RT) <perlbug-followup@​perl.org> wrote​:
[...]
:if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) {
: print "first matched​: ($1|$')\n";
: ## doesn't work!
:}

This first fails with ("a" x 8), and it appears that this is because
patch #20538 for [perl #23030] is insufficient. (See​:
  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html
for the details and the followup for the patch.)

The test case there was​:
  "......abef" =~ /.*a(?!(b|cd)*e).*f/
incorrectly reporting a match.

I'm not sure that I fully understand the failure mode in this case, but I
think the problem is that we need to be able to distinguish between cached
successes and cached failures, which I think means that the cache cannot
work without implementing the rather more expensive option (a) from the
above message.

Another option would be to disable the cache whenever we're inside one or
more negative assertions, but I suspect even that would involve a fairly
large patch to implement.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2005

From @hvds

hv@​crypt.org wrote​:
:Mike Rosulek (via RT) <perlbug-followup@​perl.org> wrote​:
:[...]
:​:if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) {
:​: print "first matched​: ($1|$')\n";
:​: ## doesn't work!
:​:}
:
:This first fails with ("a" x 8), and it appears that this is because
:patch #20538 for [perl #23030] is insufficient. (See​:
: http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html
:for the details and the followup for the patch.)
:
:The test case there was​:
: "......abef" =~ /.*a(?!(b|cd)*e).*f/
:incorrectly reporting a match.
:
:I'm not sure that I fully understand the failure mode in this case, but I
:think the problem is that we need to be able to distinguish between cached
:successes and cached failures, which I think means that the cache cannot
:work without implementing the rather more expensive option (a) from the
:above message.
:
:Another option would be to disable the cache whenever we're inside one or
:more negative assertions, but I suspect even that would involve a fairly
:large patch to implement.

There is a third way​: let the failure/success sense of the cache be
determined by the first result, and remove cached results of the wrong
sense. That requires only an extra two bits in the cache​: one for the
sense, and a second to say that the first is valid (ie we've already
seen the first result).

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;
a cleaner attempt (feel free, that's all I have time for today) would hoick
the calculations of o and b up a level to avoid redoing them, and thus also
avoid relying on the value of locinput being the same at that point (not
sure if that is guaranteed).

This patch reverses patch #20538 as a side benefit; it could do with more
eyeballs or some extra bedding-in time before going into the maintenance
branch, but in principle it should be suitable for that.

Hugo

Inline Patch
--- t/op/re_tests.old	Tue Aug 26 07:35:32 2003
+++ t/op/re_tests	Mon Feb 21 17:46:17 2005
@@ -943,3 +943,5 @@
 .*a(?!(b|cd)*e).*f	......abef	n	-	-	# [perl #23030]
 x(?#	x	c	-	Sequence (?#... not terminated
 :x(?#:	x	c	-	Sequence (?#... not terminated
+^(a*?)(?!(aa|aaaa)*$)	aaaaaaaaaaaaaaaaaaaa	y	$1	a	# [perl #34195]
+^(a*?)(?!(aa|aaaa)*$)(?=a\z)	aaaaaaaa	y	$1	aaaaaaa
--- regexec.c.old	Sat Jan 22 06:22:46 2005
+++ regexec.c	Mon Feb 21 17:32:41 2005
@@ -98,7 +98,6 @@
 #define RF_warned	2		/* warned about big count? */
 #define RF_evaled	4		/* Did an EVAL with setting? */
 #define RF_utf8		8		/* String contains multibyte chars? */
-#define RF_false	16		/* odd number of nested negatives */
 
 #define UTF ((PL_reg_flags & RF_utf8) != 0)
 
@@ -2256,6 +2255,50 @@
 #define sayNO_SILENT goto do_no
 #define saySAME(x) if (x) goto yes; else goto no
 
+#define POSCACHE_SUCCESS 0	/* caching success rather than failure */
+#define POSCACHE_SEEN 1		/* we know what we're caching */
+#define POSCACHE_START 2	/* the real cache: this bit maps to pos 0 */
+#define CACHEsayYES STMT_START { \
+    if (PL_reg_poscache) { \
+	if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+	    PL_reg_poscache[0] |= (1<<POSCACHE_SUCCESS) || (1<<POSCACHE_SEEN); \
+        else if (!(PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+	    /* cache records failure, but this is success */ \
+	    I32 o = locinput - PL_bostr, b; \
+	    o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4); \
+	    b = o % 8; \
+	    o /= 8; \
+	    DEBUG_r( \
+		PerlIO_printf(Perl_debug_log, \
+		    "%*s  (remove success from failure cache)\n", \
+		    REPORT_CODE_OFF+PL_regindent*2, "") \
+	    ); \
+	    PL_reg_poscache[o] &= ~(1<<b); \
+	} \
+    } \
+    sayYES; \
+} STMT_END
+#define CACHEsayNO STMT_START { \
+    if (PL_reg_poscache) { \
+	if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+	    PL_reg_poscache[0] |= (1<<POSCACHE_SEEN); \
+        else if ((PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+	    /* cache records success, but this is failure */ \
+	    I32 o = locinput - PL_bostr, b; \
+	    o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4); \
+	    b = o % 8; \
+	    o /= 8; \
+	    DEBUG_r( \
+		PerlIO_printf(Perl_debug_log, \
+		    "%*s  (remove failure from success cache)\n", \
+		    REPORT_CODE_OFF+PL_regindent*2, "") \
+	    ); \
+	    PL_reg_poscache[o] &= ~(1<<b); \
+	} \
+    } \
+    sayNO; \
+} STMT_END
+
 #define REPORT_CODE_OFF 24
 
 /*
@@ -3194,7 +3237,7 @@
 		    PL_reg_leftiter = PL_reg_maxiter;
 		}
 		if (PL_reg_leftiter-- == 0) {
-		    I32 size = (PL_reg_maxiter + 7)/8;
+		    I32 size = (PL_reg_maxiter + 7 + POSCACHE_START)/8;
 		    if (PL_reg_poscache) {
 			if ((I32)PL_reg_poscache_size < size) {
 			    Renew(PL_reg_poscache, size, char);
@@ -3215,7 +3258,7 @@
 		if (PL_reg_leftiter < 0) {
 		    I32 o = locinput - PL_bostr, b;
 
-		    o = (scan->flags & 0xf) - 1 + o * (scan->flags>>4);
+		    o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4);
 		    b = o % 8;
 		    o /= 8;
 		    if (PL_reg_poscache[o] & (1<<b)) {
@@ -3224,9 +3267,11 @@
 				      "%*s  already tried at this position...\n",
 				      REPORT_CODE_OFF+PL_regindent*2, "")
 			);
-			if (PL_reg_flags & RF_false)
+			if (PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))
+			    /* cache records success */
 			    sayYES;
 			else
+			    /* cache records failure */
 			    sayNO_SILENT;
 		    }
 		    PL_reg_poscache[o] |= (1<<b);
@@ -3243,7 +3288,7 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->next)) {
 			regcpblow(cp);
-			sayYES;	/* All done. */
+			CACHEsayYES;	/* All done. */
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();
@@ -3259,7 +3304,7 @@
 				 "Complex regular subexpression recursion",
 				 REG_INFTY - 1);
 			}
-			sayNO;
+			CACHEsayNO;
 		    }
 
 		    DEBUG_r(
@@ -3275,13 +3320,13 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->scan)) {
 			regcpblow(cp);
-			sayYES;
+			CACHEsayYES;
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();
 		    cc->cur = n - 1;
 		    cc->lastloc = lastloc;
-		    sayNO;
+		    CACHEsayNO;
 		}
 
 		/* Prefer scan over next for maximal matching. */
@@ -3293,7 +3338,7 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->scan)) {
 			regcpblow(cp);
-			sayYES;
+			CACHEsayYES;
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();		/* Restore some previous $<digit>s? */
@@ -3317,13 +3362,13 @@
 		if (PL_regcc)
 		    ln = PL_regcc->cur;
 		if (regmatch(cc->next))
-		    sayYES;
+		    CACHEsayYES;
 		if (PL_regcc)
 		    PL_regcc->cur = ln;
 		PL_regcc = cc;
 		cc->cur = n - 1;
 		cc->lastloc = lastloc;
-		sayNO;
+		CACHEsayNO;
 	    }
 	    /* NOT REACHED */
 	case BRANCHJ:
@@ -3860,7 +3905,6 @@
 	    }
 	    else
 		PL_reginput = locinput;
-	    PL_reg_flags ^= RF_false;
 	    goto do_ifmatch;
 	case IFMATCH:
 	    n = 1;
@@ -3876,8 +3920,6 @@
 	  do_ifmatch:
 	    inner = NEXTOPER(NEXTOPER(scan));
 	    if (regmatch(inner) != n) {
-		if (n == 0)
-		    PL_reg_flags ^= RF_false;
 	      say_no:
 		if (logical) {
 		    logical = 0;
@@ -3887,8 +3929,6 @@
 		else
 		    sayNO;
 	    }
-	    if (n == 0)
-		PL_reg_flags ^= RF_false;
 	  say_yes:
 	    if (logical) {
 		logical = 0;

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2005

From @demerphq

On Mon, 21 Feb 2005 17​:59​:49 +0000, hv@​crypt.org <hv@​crypt.org> wrote​:

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;

FWIW​: I applied this to blead, and it passed all tests, I also built
spamassissin and passed all tests with it applied, and ran mass-check
with it too. Although the latter only in combination with the TRIE
patch. Everything looked fine.

Cheers,
Yves

--
First they ignore you, then they laugh at you, then they fight you,
then you win.
  +Gandhi

@p5pRT
Copy link
Author

p5pRT commented Mar 20, 2005

From @demerphq

On Mon, 21 Feb 2005 17​:59​:49 +0000, hv@​crypt.org <hv@​crypt.org> wrote​:

hv@​crypt.org wrote​:
:Mike Rosulek (via RT) <perlbug-followup@​perl.org> wrote​:
:[...]
:​:if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) {
:​: print "first matched​: ($1|$')\n";
:​: ## doesn't work!
:​:}
:
:This first fails with ("a" x 8), and it appears that this is because
:patch #20538 for [perl #23030] is insufficient. (See​:
: http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html
:for the details and the followup for the patch.)
:
:The test case there was​:
: "......abef" =~ /.*a(?!(b|cd)*e).*f/
:incorrectly reporting a match.
:
:I'm not sure that I fully understand the failure mode in this case, but I
:think the problem is that we need to be able to distinguish between cached
:successes and cached failures, which I think means that the cache cannot
:work without implementing the rather more expensive option (a) from the
:above message.
:
:Another option would be to disable the cache whenever we're inside one or
:more negative assertions, but I suspect even that would involve a fairly
:large patch to implement.

There is a third way​: let the failure/success sense of the cache be
determined by the first result, and remove cached results of the wrong
sense. That requires only an extra two bits in the cache​: one for the
sense, and a second to say that the first is valid (ie we've already
seen the first result).

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;
a cleaner attempt (feel free, that's all I have time for today) would hoick
the calculations of o and b up a level to avoid redoing them, and thus also
avoid relying on the value of locinput being the same at that point (not
sure if that is guaranteed).

This patch reverses patch #20538 as a side benefit; it could do with more
eyeballs or some extra bedding-in time before going into the maintenance
branch, but in principle it should be suitable for that.

Hugo

<Snip inlined patch>

I was just wondering if this patch got overlooked? With all the trie
stuff at the same time it occured to me it might have been....

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 21, 2005

From @rgs

hv@​crypt.org wrote​:

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;
a cleaner attempt (feel free, that's all I have time for today) would hoick
the calculations of o and b up a level to avoid redoing them, and thus also
avoid relying on the value of locinput being the same at that point (not
sure if that is guaranteed).

This patch reverses patch #20538 as a side benefit; it could do with more
eyeballs or some extra bedding-in time before going into the maintenance
branch, but in principle it should be suitable for that.

Thanks, applied as #24053 to bleadperl.

@p5pRT
Copy link
Author

p5pRT commented Mar 21, 2005

@rgs - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2005

From @rgs

Rafael Garcia-Suarez wrote​:

hv@​crypt.org wrote​:

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;
a cleaner attempt (feel free, that's all I have time for today) would hoick
the calculations of o and b up a level to avoid redoing them, and thus also
avoid relying on the value of locinput being the same at that point (not
sure if that is guaranteed).

This patch reverses patch #20538 as a side benefit; it could do with more
eyeballs or some extra bedding-in time before going into the maintenance
branch, but in principle it should be suitable for that.

Thanks, applied as #24053 to bleadperl.

Well, in fact I'm going to revert it, since it makes installman segfault when
installing perltoc. (tested with a threaded build on linux.)

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2005

@rgs - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2005

From @demerphq

On Tue, 22 Mar 2005 12​:01​:30 +0100, Rafael Garcia-Suarez
<rgarciasuarez@​mandrakesoft.com> wrote​:

Rafael Garcia-Suarez wrote​:

hv@​crypt.org wrote​:

The patch below against latest bleadperl is a quick and dirty first
attempt at such a fix, which passes all tests (including the new one) here;
a cleaner attempt (feel free, that's all I have time for today) would hoick
the calculations of o and b up a level to avoid redoing them, and thus also
avoid relying on the value of locinput being the same at that point (not
sure if that is guaranteed).

This patch reverses patch #20538 as a side benefit; it could do with more
eyeballs or some extra bedding-in time before going into the maintenance
branch, but in principle it should be suitable for that.

Thanks, applied as #24053 to bleadperl.

Well, in fact I'm going to revert it, since it makes installman segfault when
installing perltoc. (tested with a threaded build on linux.)

I can reproduce the problem here as well. I get a seg fault that
appears to ultimately come from a FREETMPS call in pp_nextstate. The
regex that is responsible is the following one from lib/pod/Man.pm

  # func(n) is a reference to a manual page. Make it \fIfunc\fR\|(n).
  s{
  ( \b | \\s-1 )
  ( [A-Za-z_] (?​:[.​:\w]|\\-|\\s-?[01])+ )
  (
  \( \d [a-z]* \)
  )
  } { $1 . '\f(IS' . $2 . '\f(IE\|' . $3 }egx;

It goes crazy around the following point in perltoc.pod​:

=head2 CPAN - query, download and build perl modules from CPAN sites

=over 4

=item SYNOPSIS

=item STATUS

=item DESCRIPTION

=over 4

=item Interactive Mode

Searching for authors, bundles, distribution files and modules, make, test,
install, clean modules or distributions, get, readme, look module or
distribution, ls author, Signals

=item CPAN​::Shell

=item autobundle

=item recompile

=item The four C<CPAN​::*> Classes​: Author, Bundle, Module, Distribution

=item Programmer's interface

expand($type,@​things), expandany(@​things), Programming Examples

=item Methods in the other Classes

CPAN​::Author​::as_glimpse(), CPAN​::Author​::as_string(),
CPAN​::Author​::email(), CPAN​::Author​::fullname(), CPAN​::Author​::name(),
CPAN​::Bundle​::as_glimpse(), CPAN​::Bundle​::as_string(),
CPAN​::Bundle​::clean(), CPAN​::Bundle​::contains(),

perltoc.tmp has the following lines at around this point​:

.PD

.Sh "\s-1CPAN\s0 \- query, download and build perl modules from
\s-1CPAN\s0 sites"

.IX Subsection "CPAN - query, download and build perl modules from CPAN sites"

.IP "\s-1SYNOPSIS\s0" 4

.IX Item "SYNOPSIS"

.PD 0

.IP <segfault happened here>

The output from debug for the regex starts here....

Guessed​: match at offset 0
Matching REx `
  ( \b | \\s-1 )
  ( [A-Za-z_] (?​:[.​:\w]|\\-|\\s-?[01])+ )
  (
  \( \d [a-z]* \)
  )
  ...' against `\f(ISCPAN​::Author​::as_glimpse()\f(IE,
\f(ISCPAN​::Author​::as_...'
  Setting an EVAL scope, savestack=308
  0 <> <\f(ISCPAN​::A> | 1​: OPEN1
  0 <> <\f(ISCPAN​::A> | 3​: BRANCH
  Setting an EVAL scope, savestack=318
  0 <> <\f(ISCPAN​::A> | 4​: BOUND
  failed...
  0 <> <\f(ISCPAN​::A> | 6​: EXACT <\\s-1>
Guessed​: match at offset 0
Matching REx `
  ( \b | \\s-1 )
  ( [A-Za-z_] (?​:[.​:\w]|\\-|\\s-?[01])+ )
  (
  \( \d [a-z]* \)
  )
  ...' against `\f(ISCPAN​::Author​::as_glimpse()\f(IE,
\f(ISCPAN​::Author​::as_...'
  Setting an EVAL scope, savestack=308
  0 <> <\f(ISCPAN​::A> | 1​: OPEN1
  0 <> <\f(ISCPAN​::A> | 3​: BRANCH
  Setting an EVAL scope, savestack=318
  0 <> <\f(ISCPAN​::A> | 4​: BOUND
  failed...
  0 <> <\f(ISCPAN​::A> | 6​: EXACT <\\s-1>

Proceeds for a very long time and eventually craps out around here​:

1225 <strib> <ution​::> | 42​:
  EXACT <\\s>
 
  failed...
  Clearing an EVAL scope, savestack=598..608
  restoring \2 to -1(1210)..-1(no)
  restoring \2..\3 to undef
 
  failed, try continuation...
1225 <strib> <ution​::> | 61​:
  NOTHING
1225 <strib> <ution​::> | 62​:
  CLOSE2
1225 <strib> <ution​::> | 64​:
  OPEN3
1225 <strib> <ution​::> | 66​:
  EXACT <(>
 
  failed...
 
  (remove failure from success cache)
 
  failed...
1224 <istri> <bution​:> | 39​:
  EXACT <\\->
 
  failed...
1224 <istri> <bution​:> | 42​:
  EXACT <\\s>
 
  failed...
  Clearing an EVAL scope, savestack=578..588
  restoring \2 to -1(1210)..-1(no)
  restoring \2..\3 to undef
 
  failed, try continuation...
1224 <istri> <bution​:> | 61​:
  NOTHING
1224 <istri> <bution​:> | 62​:
  CLOSE2
1224 <istri> <bution​:> | 64​:
  OPEN3
1224 <istri> <bution​:> | 66​:
  EXACT <(>

Regards,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2005

From @hvds

demerphq <demerphq@​gmail.com> wrote​:
:On Tue, 22 Mar 2005 12​:01​:30 +0100, Rafael Garcia-Suarez
:<rgarciasuarez@​mandrakesoft.com> wrote​:
:> Well, in fact I'm going to revert it, since it makes installman segfault when
:> installing perltoc. (tested with a threaded build on linux.)
:
:I can reproduce the problem here as well. I get a seg fault that
:appears to ultimately come from a FREETMPS call in pp_nextstate. The
:regex that is responsible is the following one from lib/pod/Man.pm
:
: # func(n) is a reference to a manual page. Make it \fIfunc\fR\|(n).
: s{
: ( \b | \\s-1 )
: ( [A-Za-z_] (?​:[.​:\w]|\\-|\\s-?[01])+ )
: (
: \( \d [a-z]* \)
: )
: } { $1 . '\f(IS' . $2 . '\f(IE\|' . $3 }egx;
[...]

Haven't got too far with this yet, but I've managed to cut the code
to reproduce down to​:
  perl -00nle '1 while s{<[^<>]*>}{}g;/^(=|\s)/||print' pod/perltoc.pod | ./perl -00nle 's/-/\\-/g; s{ \\-\\-}{}g; s{ \b ( [A-Z] (?​: [A-Z+​:\d_\$&] | \\- )* ) (?= [\s\]] ) }{ " s1" . $1 . "s0" }egx; s{(\b|s1)([A-Za-z_]([​:\w]|s[01])+\(\))}{$1$2}g'

.. which might help a bit.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2005

From @hvds

Earlier I wrote​:
:Haven't got too far with this yet, but I've managed to cut the code
:to reproduce down to​:
: perl -00nle '1 while s{<[^<>]*>}{}g;/^(=|\s)/||print' pod/perltoc.pod | ./perl -00nle 's/-/\\-/g; s{ \\-\\-}{}g; s{ \b ( [A-Z] (?​: [A-Z+​:\d_\$&] | \\- )* ) (?= [\s\]] ) }{ " s1" . $1 . "s0" }egx; s{(\b|s1)([A-Za-z_]([​:\w]|s[01])+\(\))}{$1$2}g'
:
:.. which might help a bit.

I managed to cut it further to​:
  perl -le 's/(\d+)/"b"x$1/eg, print for @​ARGV' aa6Xa1c a68ca10Xca10 | ./perl -nle 's{a([ab]|xx)+c}{}g'

I wasn't really able to prove it, but I believe the problem was exactly
what I worried about before​:
:a cleaner attempt (feel free, that's all I have time for today) would hoick
:the calculations of o and b up a level to avoid redoing them, and thus also
:avoid relying on the value of locinput being the same at that point (not
:sure if that is guaranteed).

.. and further confusion was caused by the fact that the CACHEsay* macros
were assuming they had work to do if PL_reg_poscache existed, when that
might have been created by a recursive match after we started this
iteration.

The patch below should fix it, but I haven't worked out if there's an
easy way to introduce that test case above​: in particular I haven't
managed to find a way to reproduce this short variant without getting
the strings piped in.

Somewhat worrying is that I've been seeing occasional things like​:
  t/run/switcht.............................Missing right curly or square bracket at ../lib/Config_heavy.pl line 1160, at end of line
  syntax error at ../lib/Config_heavy.pl line 1160, at EOF
  Compilation failed in require at ../lib/Config.pm line 66.
  # Looks like you planned 11 tests but ran 0.
  FAILED at test 1
.. on various t/run and t/io tests, but they haven't been reproducible,
and I suspect this is some current blead instability not caused by this
patch.

Hugo

Inline Patch
--- t/op/re_tests.old	Thu Mar 24 19:36:17 2005
+++ t/op/re_tests	Thu Mar 24 19:36:26 2005
@@ -956,3 +956,5 @@
 (a|aa|aaa|aaaa|aaaaa|aaaaaa)(b|c)	aaaaaaaaaaaaaaab	y	$1$2	aaaaaab
 (a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&""})(b|c)	aaaaaaaaaaaaaaab	y	$1$2	aaaaaab
 (a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&"foo"})(b|c)	aaaaaaaaaaaaaaab	n	-	-
+^(a*?)(?!(aa|aaaa)*$)	aaaaaaaaaaaaaaaaaaaa	y	$1	a	# [perl #34195]
+^(a*?)(?!(aa|aaaa)*$)(?=a\z)	aaaaaaaa	y	$1	aaaaaaa
--- regexec.c.old	Tue Mar 22 11:24:32 2005
+++ regexec.c	Thu Mar 24 19:32:11 2005
@@ -98,7 +98,6 @@
 #define RF_warned	2		/* warned about big count? */
 #define RF_evaled	4		/* Did an EVAL with setting? */
 #define RF_utf8		8		/* String contains multibyte chars? */
-#define RF_false	16		/* odd number of nested negatives */
 
 #define UTF ((PL_reg_flags & RF_utf8) != 0)
 
@@ -2265,6 +2264,42 @@
 #define sayNO_SILENT goto do_no
 #define saySAME(x) if (x) goto yes; else goto no
 
+#define POSCACHE_SUCCESS 0	/* caching success rather than failure */
+#define POSCACHE_SEEN 1		/* we know what we're caching */
+#define POSCACHE_START 2	/* the real cache: this bit maps to pos 0 */
+#define CACHEsayYES STMT_START { \
+    if (cache_offset | cache_bit) { \
+	if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+	    PL_reg_poscache[0] |= (1<<POSCACHE_SUCCESS) || (1<<POSCACHE_SEEN); \
+        else if (!(PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+	    /* cache records failure, but this is success */ \
+	    DEBUG_r( \
+		PerlIO_printf(Perl_debug_log, \
+		    "%*s  (remove success from failure cache)\n", \
+		    REPORT_CODE_OFF+PL_regindent*2, "") \
+	    ); \
+	    PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \
+	} \
+    } \
+    sayYES; \
+} STMT_END
+#define CACHEsayNO STMT_START { \
+    if (cache_offset | cache_bit) { \
+	if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+	    PL_reg_poscache[0] |= (1<<POSCACHE_SEEN); \
+        else if ((PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+	    /* cache records success, but this is failure */ \
+	    DEBUG_r( \
+		PerlIO_printf(Perl_debug_log, \
+		    "%*s  (remove failure from success cache)\n", \
+		    REPORT_CODE_OFF+PL_regindent*2, "") \
+	    ); \
+	    PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \
+	} \
+    } \
+    sayNO; \
+} STMT_END
+
 /* this is used to determine how far from the left messages like
    'failed...' are printed. Currently 29 makes these messages line
    up with the opcode they refer to. Earlier perls used 25 which
@@ -3450,6 +3485,7 @@
 		CHECKPOINT cp, lastcp;
 		CURCUR* cc = PL_regcc;
 		char *lastloc = cc->lastloc; /* Detection of 0-len. */
+		I32 cache_offset = 0, cache_bit = 0;
 		
 		n = cc->cur + 1;	/* how many we know we matched */
 		PL_reginput = locinput;
@@ -3502,7 +3538,7 @@
 		    PL_reg_leftiter = PL_reg_maxiter;
 		}
 		if (PL_reg_leftiter-- == 0) {
-		    I32 size = (PL_reg_maxiter + 7)/8;
+		    I32 size = (PL_reg_maxiter + 7 + POSCACHE_START)/8;
 		    if (PL_reg_poscache) {
 			if ((I32)PL_reg_poscache_size < size) {
 			    Renew(PL_reg_poscache, size, char);
@@ -3521,23 +3557,26 @@
 			);
 		}
 		if (PL_reg_leftiter < 0) {
-		    I32 o = locinput - PL_bostr, b;
+		    cache_offset = locinput - PL_bostr;
 
-		    o = (scan->flags & 0xf) - 1 + o * (scan->flags>>4);
-		    b = o % 8;
-		    o /= 8;
-		    if (PL_reg_poscache[o] & (1<<b)) {
+		    cache_offset = (scan->flags & 0xf) - 1 + POSCACHE_START
+			    + cache_offset * (scan->flags>>4);
+		    cache_bit = cache_offset % 8;
+		    cache_offset /= 8;
+		    if (PL_reg_poscache[cache_offset] & (1<<cache_bit)) {
 		    DEBUG_EXECUTE_r(
 			PerlIO_printf(Perl_debug_log,
 				      "%*s  already tried at this position...\n",
 				      REPORT_CODE_OFF+PL_regindent*2, "")
 			);
-			if (PL_reg_flags & RF_false)
+			if (PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))
+			    /* cache records success */
 			    sayYES;
 			else
+			    /* cache records failure */
 			    sayNO_SILENT;
 		    }
-		    PL_reg_poscache[o] |= (1<<b);
+		    PL_reg_poscache[cache_offset] |= (1<<cache_bit);
 		}
 		}
 
@@ -3551,7 +3590,7 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->next)) {
 			regcpblow(cp);
-			sayYES;	/* All done. */
+			CACHEsayYES;	/* All done. */
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();
@@ -3567,7 +3606,7 @@
 				 "Complex regular subexpression recursion",
 				 REG_INFTY - 1);
 			}
-			sayNO;
+			CACHEsayNO;
 		    }
 
 		    DEBUG_EXECUTE_r(
@@ -3583,13 +3622,13 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->scan)) {
 			regcpblow(cp);
-			sayYES;
+			CACHEsayYES;
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();
 		    cc->cur = n - 1;
 		    cc->lastloc = lastloc;
-		    sayNO;
+		    CACHEsayNO;
 		}
 
 		/* Prefer scan over next for maximal matching. */
@@ -3601,7 +3640,7 @@
 		    REGCP_SET(lastcp);
 		    if (regmatch(cc->scan)) {
 			regcpblow(cp);
-			sayYES;
+			CACHEsayYES;
 		    }
 		    REGCP_UNWIND(lastcp);
 		    regcppop();		/* Restore some previous $<digit>s? */
@@ -3625,13 +3664,13 @@
 		if (PL_regcc)
 		    ln = PL_regcc->cur;
 		if (regmatch(cc->next))
-		    sayYES;
+		    CACHEsayYES;
 		if (PL_regcc)
 		    PL_regcc->cur = ln;
 		PL_regcc = cc;
 		cc->cur = n - 1;
 		cc->lastloc = lastloc;
-		sayNO;
+		CACHEsayNO;
 	    }
 	    /* NOT REACHED */
 	case BRANCHJ:
@@ -4168,7 +4207,6 @@
 	    }
 	    else
 		PL_reginput = locinput;
-	    PL_reg_flags ^= RF_false;
 	    goto do_ifmatch;
 	case IFMATCH:
 	    n = 1;
@@ -4184,8 +4222,6 @@
 	  do_ifmatch:
 	    inner = NEXTOPER(NEXTOPER(scan));
 	    if (regmatch(inner) != n) {
-		if (n == 0)
-		    PL_reg_flags ^= RF_false;
 	      say_no:
 		if (logical) {
 		    logical = 0;
@@ -4195,8 +4231,6 @@
 		else
 		    sayNO;
 	    }
-	    if (n == 0)
-		PL_reg_flags ^= RF_false;
 	  say_yes:
 	    if (logical) {
 		logical = 0;

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2005

From @rgs

hv@​crypt.org wrote​:

The patch below should fix it, but I haven't worked out if there's an
easy way to introduce that test case above​: in particular I haven't
managed to find a way to reproduce this short variant without getting
the strings piped in.

Anyway​: Thanks, applied as #24086.

@p5pRT p5pRT closed this as completed Mar 27, 2005
@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2005

@rgs - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant