Skip Menu |
 
Report information
Id: 34195
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: rosulek <mike [at] mikero.com>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: low
Type: core
Perl Version: 5.8.4
Fixed In: (no value)



Subject: Regex: Alternations within negative lookahead assertions
Date: Sun, 20 Feb 2005 16:22:21 -0600
To: perlbug [...] perl.org
From: "Mike Rosulek" <mike [...] mikero.com>
Download (untitled) / with headers
text/plain 4.1k
This is a bug report for perl from mike@mikero.com, generated with the help of perlbug 1.35 running under perl v5.8.4. ----------------------------------------------------------------- [Please enter your report here] ## this seems to be related to Ticket #23030 ## /^(aa|aaaa)*$/ is equivalent to /^(aa)*$/ ## they both match strings of a's of even length ## this works: if ( ("a" x 19) !~ /^(aa)*$/ ) { print "19 a's don't match /^(aa)*\$/\n"; } ## and so does this: if ( ("a" x 19) !~ /^(aa|aaaa)*$/ ) { print "19 a's don't match /^(aa|aaaa)*\$/\n"; } ## thus ("a" x 20) should match /^(a*?)(?!(aa|aaaa)*$)/ ## with $1 = "a", but it doesn't! if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) { print "first matched: ($1|$')\n"; ## doesn't work! } ## it works without the alternation if ( ("a" x 20) =~ /^(a*?)(?!(aa)*$)/ ) { print "second matched: ($1|$')\n"; } ## changing the * to + causes it to match with ## $1 = ("a" x 19), which is closer, but still ## incorrect if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)+$)/ ) { print "third matched: ($1|$')\n"; } ## also, changing the order of (aaaa|aa)* also doesn't work. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl v5.8.4: Configured by Debian Project at Thu Feb 3 01:11:27 EST 2005. Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Platform: osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-8)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4 gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: --- @INC for perl v5.8.4: /etc/perl /usr/local/lib/perl/5.8.4 /usr/local/share/perl/5.8.4 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl /usr/local/lib/perl/5.8.3 /usr/local/share/perl/5.8.3 /usr/local/lib/perl/5.8.2 /usr/local/share/perl/5.8.2 /usr/local/lib/perl/5.8.1 /usr/local/share/perl/5.8.1 /usr/local/lib/perl/5.8.0 /usr/local/share/perl/5.8.0 .
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Mon, 21 Feb 2005 11:29:12 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 1.2k
On 20 Feb 2005 22:22:46 -0000, via RT Mike Rosulek <perlbug-followup@perl.org> wrote: Show quoted text
> # New Ticket Created by Mike Rosulek > # Please include the string: [perl #34195] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=34195 > > > This is a bug report for perl from mike@mikero.com, > generated with the help of perlbug 1.35 running under perl v5.8.4. > > ----------------------------------------------------------------- > [Please enter your report here] > > ## this seems to be related to Ticket #23030 > > ## /^(aa|aaaa)*$/ is equivalent to /^(aa)*$/ > ## they both match strings of a's of even length > > ## this works: > > if ( ("a" x 19) !~ /^(aa)*$/ ) { > print "19 a's don't match /^(aa)*\$/\n"; > } > > ## and so does this: > > if ( ("a" x 19) !~ /^(aa|aaaa)*$/ ) { > print "19 a's don't match /^(aa|aaaa)*\$/\n"; > } > > ## thus ("a" x 20) should match /^(a*?)(?!(aa|aaaa)*$)/ > ## with $1 = "a", but it doesn't!
Yeah I agree. Blead perl shows this problem too. It looks like it has to do with caching from the debug output: "Detected a super-linear match, switching on caching..." Is reported just a bit before the incorrect fail. yves
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Mon, 21 Feb 2005 11:47:10 +0000
To: perl5-porters [...] perl.org
From: hv [...] crypt.org
Download (untitled) / with headers
text/plain 1012b
Mike Rosulek (via RT) <perlbug-followup@perl.org> wrote: [...] :if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) { : print "first matched: ($1|$')\n"; : ## doesn't work! :} This first fails with ("a" x 8), and it appears that this is because patch #20538 for [perl #23030] is insufficient. (See: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html for the details and the followup for the patch.) The test case there was: "......abef" =~ /.*a(?!(b|cd)*e).*f/ incorrectly reporting a match. I'm not sure that I fully understand the failure mode in this case, but I think the problem is that we need to be able to distinguish between cached successes and cached failures, which I think means that the cache cannot work without implementing the rather more expensive option (a) from the above message. Another option would be to disable the cache whenever we're inside one or more negative assertions, but I suspect even that would involve a fairly large patch to implement. Hugo
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Mon, 21 Feb 2005 17:59:49 +0000
To: perl5-porters [...] perl.org
From: hv [...] crypt.org
hv@crypt.org wrote: :Mike Rosulek (via RT) <perlbug-followup@perl.org> wrote: :[...] ::if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) { :: print "first matched: ($1|$')\n"; :: ## doesn't work! ::} : :This first fails with ("a" x 8), and it appears that this is because :patch #20538 for [perl #23030] is insufficient. (See: : http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html :for the details and the followup for the patch.) : :The test case there was: : "......abef" =~ /.*a(?!(b|cd)*e).*f/ :incorrectly reporting a match. : :I'm not sure that I fully understand the failure mode in this case, but I :think the problem is that we need to be able to distinguish between cached :successes and cached failures, which I think means that the cache cannot :work without implementing the rather more expensive option (a) from the :above message. : :Another option would be to disable the cache whenever we're inside one or :more negative assertions, but I suspect even that would involve a fairly :large patch to implement. There is a third way: let the failure/success sense of the cache be determined by the first result, and remove cached results of the wrong sense. That requires only an extra two bits in the cache: one for the sense, and a second to say that the first is valid (ie we've already seen the first result). The patch below against latest bleadperl is a quick and dirty first attempt at such a fix, which passes all tests (including the new one) here; a cleaner attempt (feel free, that's all I have time for today) would hoick the calculations of o and b up a level to avoid redoing them, and thus also avoid relying on the value of locinput being the same at that point (not sure if that is guaranteed). This patch reverses patch #20538 as a side benefit; it could do with more eyeballs or some extra bedding-in time before going into the maintenance branch, but in principle it should be suitable for that. Hugo --- t/op/re_tests.old Tue Aug 26 07:35:32 2003 +++ t/op/re_tests Mon Feb 21 17:46:17 2005 @@ -943,3 +943,5 @@ .*a(?!(b|cd)*e).*f ......abef n - - # [perl #23030] x(?# x c - Sequence (?#... not terminated :x(?#: x c - Sequence (?#... not terminated +^(a*?)(?!(aa|aaaa)*$) aaaaaaaaaaaaaaaaaaaa y $1 a # [perl #34195] +^(a*?)(?!(aa|aaaa)*$)(?=a\z) aaaaaaaa y $1 aaaaaaa --- regexec.c.old Sat Jan 22 06:22:46 2005 +++ regexec.c Mon Feb 21 17:32:41 2005 @@ -98,7 +98,6 @@ #define RF_warned 2 /* warned about big count? */ #define RF_evaled 4 /* Did an EVAL with setting? */ #define RF_utf8 8 /* String contains multibyte chars? */ -#define RF_false 16 /* odd number of nested negatives */ #define UTF ((PL_reg_flags & RF_utf8) != 0) @@ -2256,6 +2255,50 @@ #define sayNO_SILENT goto do_no #define saySAME(x) if (x) goto yes; else goto no +#define POSCACHE_SUCCESS 0 /* caching success rather than failure */ +#define POSCACHE_SEEN 1 /* we know what we're caching */ +#define POSCACHE_START 2 /* the real cache: this bit maps to pos 0 */ +#define CACHEsayYES STMT_START { \ + if (PL_reg_poscache) { \ + if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \ + PL_reg_poscache[0] |= (1<<POSCACHE_SUCCESS) || (1<<POSCACHE_SEEN); \ + else if (!(PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \ + /* cache records failure, but this is success */ \ + I32 o = locinput - PL_bostr, b; \ + o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4); \ + b = o % 8; \ + o /= 8; \ + DEBUG_r( \ + PerlIO_printf(Perl_debug_log, \ + "%*s (remove success from failure cache)\n", \ + REPORT_CODE_OFF+PL_regindent*2, "") \ + ); \ + PL_reg_poscache[o] &= ~(1<<b); \ + } \ + } \ + sayYES; \ +} STMT_END +#define CACHEsayNO STMT_START { \ + if (PL_reg_poscache) { \ + if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \ + PL_reg_poscache[0] |= (1<<POSCACHE_SEEN); \ + else if ((PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \ + /* cache records success, but this is failure */ \ + I32 o = locinput - PL_bostr, b; \ + o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4); \ + b = o % 8; \ + o /= 8; \ + DEBUG_r( \ + PerlIO_printf(Perl_debug_log, \ + "%*s (remove failure from success cache)\n", \ + REPORT_CODE_OFF+PL_regindent*2, "") \ + ); \ + PL_reg_poscache[o] &= ~(1<<b); \ + } \ + } \ + sayNO; \ +} STMT_END + #define REPORT_CODE_OFF 24 /* @@ -3194,7 +3237,7 @@ PL_reg_leftiter = PL_reg_maxiter; } if (PL_reg_leftiter-- == 0) { - I32 size = (PL_reg_maxiter + 7)/8; + I32 size = (PL_reg_maxiter + 7 + POSCACHE_START)/8; if (PL_reg_poscache) { if ((I32)PL_reg_poscache_size < size) { Renew(PL_reg_poscache, size, char); @@ -3215,7 +3258,7 @@ if (PL_reg_leftiter < 0) { I32 o = locinput - PL_bostr, b; - o = (scan->flags & 0xf) - 1 + o * (scan->flags>>4); + o = (scan->flags & 0xf) - 1 + POSCACHE_START + o * (scan->flags>>4); b = o % 8; o /= 8; if (PL_reg_poscache[o] & (1<<b)) { @@ -3224,9 +3267,11 @@ "%*s already tried at this position...\n", REPORT_CODE_OFF+PL_regindent*2, "") ); - if (PL_reg_flags & RF_false) + if (PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS)) + /* cache records success */ sayYES; else + /* cache records failure */ sayNO_SILENT; } PL_reg_poscache[o] |= (1<<b); @@ -3243,7 +3288,7 @@ REGCP_SET(lastcp); if (regmatch(cc->next)) { regcpblow(cp); - sayYES; /* All done. */ + CACHEsayYES; /* All done. */ } REGCP_UNWIND(lastcp); regcppop(); @@ -3259,7 +3304,7 @@ "Complex regular subexpression recursion", REG_INFTY - 1); } - sayNO; + CACHEsayNO; } DEBUG_r( @@ -3275,13 +3320,13 @@ REGCP_SET(lastcp); if (regmatch(cc->scan)) { regcpblow(cp); - sayYES; + CACHEsayYES; } REGCP_UNWIND(lastcp); regcppop(); cc->cur = n - 1; cc->lastloc = lastloc; - sayNO; + CACHEsayNO; } /* Prefer scan over next for maximal matching. */ @@ -3293,7 +3338,7 @@ REGCP_SET(lastcp); if (regmatch(cc->scan)) { regcpblow(cp); - sayYES; + CACHEsayYES; } REGCP_UNWIND(lastcp); regcppop(); /* Restore some previous $<digit>s? */ @@ -3317,13 +3362,13 @@ if (PL_regcc) ln = PL_regcc->cur; if (regmatch(cc->next)) - sayYES; + CACHEsayYES; if (PL_regcc) PL_regcc->cur = ln; PL_regcc = cc; cc->cur = n - 1; cc->lastloc = lastloc; - sayNO; + CACHEsayNO; } /* NOT REACHED */ case BRANCHJ: @@ -3860,7 +3905,6 @@ } else PL_reginput = locinput; - PL_reg_flags ^= RF_false; goto do_ifmatch; case IFMATCH: n = 1; @@ -3876,8 +3920,6 @@ do_ifmatch: inner = NEXTOPER(NEXTOPER(scan)); if (regmatch(inner) != n) { - if (n == 0) - PL_reg_flags ^= RF_false; say_no: if (logical) { logical = 0; @@ -3887,8 +3929,6 @@ else sayNO; } - if (n == 0) - PL_reg_flags ^= RF_false; say_yes: if (logical) { logical = 0;
CC: perl5-porters [...] perl.org
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Tue, 22 Feb 2005 09:23:20 +0100
To: "hv [...] crypt.org" <hv [...] crypt.org>
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 569b
On Mon, 21 Feb 2005 17:59:49 +0000, hv@crypt.org <hv@crypt.org> wrote: Show quoted text
> The patch below against latest bleadperl is a quick and dirty first > attempt at such a fix, which passes all tests (including the new one) here;
FWIW: I applied this to blead, and it passed all tests, I also built spamassissin and passed all tests with it applied, and ran mass-check with it too. Although the latter only in combination with the TRIE patch. Everything looked fine. Cheers, Yves -- First they ignore you, then they laugh at you, then they fight you, then you win. +Gandhi
CC: "hv [...] crypt.org" <hv [...] crypt.org>
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Sun, 20 Mar 2005 10:14:08 +0100
To: perl5-porters [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 2.2k
On Mon, 21 Feb 2005 17:59:49 +0000, hv@crypt.org <hv@crypt.org> wrote: Show quoted text
> hv@crypt.org wrote: > :Mike Rosulek (via RT) <perlbug-followup@perl.org> wrote: > :[...] > ::if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) { > :: print "first matched: ($1|$')\n"; > :: ## doesn't work! > ::} > : > :This first fails with ("a" x 8), and it appears that this is because > :patch #20538 for [perl #23030] is insufficient. (See: > : http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg00511.html > :for the details and the followup for the patch.) > : > :The test case there was: > : "......abef" =~ /.*a(?!(b|cd)*e).*f/ > :incorrectly reporting a match. > : > :I'm not sure that I fully understand the failure mode in this case, but I > :think the problem is that we need to be able to distinguish between cached > :successes and cached failures, which I think means that the cache cannot > :work without implementing the rather more expensive option (a) from the > :above message. > : > :Another option would be to disable the cache whenever we're inside one or > :more negative assertions, but I suspect even that would involve a fairly > :large patch to implement. > > There is a third way: let the failure/success sense of the cache be > determined by the first result, and remove cached results of the wrong > sense. That requires only an extra two bits in the cache: one for the > sense, and a second to say that the first is valid (ie we've already > seen the first result). > > The patch below against latest bleadperl is a quick and dirty first > attempt at such a fix, which passes all tests (including the new one) here; > a cleaner attempt (feel free, that's all I have time for today) would hoick > the calculations of o and b up a level to avoid redoing them, and thus also > avoid relying on the value of locinput being the same at that point (not > sure if that is guaranteed). > > This patch reverses patch #20538 as a side benefit; it could do with more > eyeballs or some extra bedding-in time before going into the maintenance > branch, but in principle it should be suitable for that. > > Hugo
<Snip inlined patch> I was just wondering if this patch got overlooked? With all the trie stuff at the same time it occured to me it might have been.... Cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Mon, 21 Mar 2005 14:43:25 +0100
To: perl5-porters [...] perl.org
From: Rafael Garcia-Suarez <rgarciasuarez [...] mandrakesoft.com>
Download (untitled) / with headers
text/plain 688b
hv@crypt.org wrote: Show quoted text
> > The patch below against latest bleadperl is a quick and dirty first > attempt at such a fix, which passes all tests (including the new one) here; > a cleaner attempt (feel free, that's all I have time for today) would hoick > the calculations of o and b up a level to avoid redoing them, and thus also > avoid relying on the value of locinput being the same at that point (not > sure if that is guaranteed). > > This patch reverses patch #20538 as a side benefit; it could do with more > eyeballs or some extra bedding-in time before going into the maintenance > branch, but in principle it should be suitable for that.
Thanks, applied as #24053 to bleadperl.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Tue, 22 Mar 2005 12:01:30 +0100
To: hv [...] crypt.org
From: Rafael Garcia-Suarez <rgarciasuarez [...] mandrakesoft.com>
Download (untitled) / with headers
text/plain 884b
Rafael Garcia-Suarez wrote: Show quoted text
> hv@crypt.org wrote:
> > > > The patch below against latest bleadperl is a quick and dirty first > > attempt at such a fix, which passes all tests (including the new one) here; > > a cleaner attempt (feel free, that's all I have time for today) would hoick > > the calculations of o and b up a level to avoid redoing them, and thus also > > avoid relying on the value of locinput being the same at that point (not > > sure if that is guaranteed). > > > > This patch reverses patch #20538 as a side benefit; it could do with more > > eyeballs or some extra bedding-in time before going into the maintenance > > branch, but in principle it should be suitable for that.
> > Thanks, applied as #24053 to bleadperl.
Well, in fact I'm going to revert it, since it makes installman segfault when installing perltoc. (tested with a threaded build on linux.)
CC: hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Tue, 22 Mar 2005 16:49:55 +0100
To: Rafael Garcia-Suarez <rgarciasuarez [...] mandrakesoft.com>
From: demerphq <demerphq [...] gmail.com>
On Tue, 22 Mar 2005 12:01:30 +0100, Rafael Garcia-Suarez <rgarciasuarez@mandrakesoft.com> wrote: Show quoted text
> Rafael Garcia-Suarez wrote:
> > hv@crypt.org wrote:
> > > > > > The patch below against latest bleadperl is a quick and dirty first > > > attempt at such a fix, which passes all tests (including the new one) here; > > > a cleaner attempt (feel free, that's all I have time for today) would hoick > > > the calculations of o and b up a level to avoid redoing them, and thus also > > > avoid relying on the value of locinput being the same at that point (not > > > sure if that is guaranteed). > > > > > > This patch reverses patch #20538 as a side benefit; it could do with more > > > eyeballs or some extra bedding-in time before going into the maintenance > > > branch, but in principle it should be suitable for that.
> > > > Thanks, applied as #24053 to bleadperl.
> > Well, in fact I'm going to revert it, since it makes installman segfault when > installing perltoc. (tested with a threaded build on linux.) >
I can reproduce the problem here as well. I get a seg fault that appears to ultimately come from a FREETMPS call in pp_nextstate. The regex that is responsible is the following one from lib/pod/Man.pm # func(n) is a reference to a manual page. Make it \fIfunc\fR\|(n). s{ ( \b | \\s-1 ) ( [A-Za-z_] (?:[.:\w]|\\-|\\s-?[01])+ ) ( \( \d [a-z]* \) ) } { $1 . '\f(IS' . $2 . '\f(IE\|' . $3 }egx; It goes crazy around the following point in perltoc.pod: =head2 CPAN - query, download and build perl modules from CPAN sites =over 4 =item SYNOPSIS =item STATUS =item DESCRIPTION =over 4 =item Interactive Mode Searching for authors, bundles, distribution files and modules, make, test, install, clean modules or distributions, get, readme, look module or distribution, ls author, Signals =item CPAN::Shell =item autobundle =item recompile =item The four C<CPAN::*> Classes: Author, Bundle, Module, Distribution =item Programmer's interface expand($type,@things), expandany(@things), Programming Examples =item Methods in the other Classes CPAN::Author::as_glimpse(), CPAN::Author::as_string(), CPAN::Author::email(), CPAN::Author::fullname(), CPAN::Author::name(), CPAN::Bundle::as_glimpse(), CPAN::Bundle::as_string(), CPAN::Bundle::clean(), CPAN::Bundle::contains(), perltoc.tmp has the following lines at around this point: .PD .Sh "\s-1CPAN\s0 \- query, download and build perl modules from \s-1CPAN\s0 sites" .IX Subsection "CPAN - query, download and build perl modules from CPAN sites" .IP "\s-1SYNOPSIS\s0" 4 .IX Item "SYNOPSIS" .PD 0 .IP <segfault happened here> The output from debug for the regex starts here.... Guessed: match at offset 0 Matching REx ` ( \b | \\s-1 ) ( [A-Za-z_] (?:[.:\w]|\\-|\\s-?[01])+ ) ( \( \d [a-z]* \) ) ...' against `\f(ISCPAN::Author::as_glimpse()\f(IE, \f(ISCPAN::Author::as_...' Setting an EVAL scope, savestack=308 0 <> <\f(ISCPAN::A> | 1: OPEN1 0 <> <\f(ISCPAN::A> | 3: BRANCH Setting an EVAL scope, savestack=318 0 <> <\f(ISCPAN::A> | 4: BOUND failed... 0 <> <\f(ISCPAN::A> | 6: EXACT <\\s-1> Guessed: match at offset 0 Matching REx ` ( \b | \\s-1 ) ( [A-Za-z_] (?:[.:\w]|\\-|\\s-?[01])+ ) ( \( \d [a-z]* \) ) ...' against `\f(ISCPAN::Author::as_glimpse()\f(IE, \f(ISCPAN::Author::as_...' Setting an EVAL scope, savestack=308 0 <> <\f(ISCPAN::A> | 1: OPEN1 0 <> <\f(ISCPAN::A> | 3: BRANCH Setting an EVAL scope, savestack=318 0 <> <\f(ISCPAN::A> | 4: BOUND failed... 0 <> <\f(ISCPAN::A> | 6: EXACT <\\s-1> Proceeds for a very long time and eventually craps out around here: 1225 <strib> <ution::> | 42: EXACT <\\s> failed... Clearing an EVAL scope, savestack=598..608 restoring \2 to -1(1210)..-1(no) restoring \2..\3 to undef failed, try continuation... 1225 <strib> <ution::> | 61: NOTHING 1225 <strib> <ution::> | 62: CLOSE2 1225 <strib> <ution::> | 64: OPEN3 1225 <strib> <ution::> | 66: EXACT <(> failed... (remove failure from success cache) failed... 1224 <istri> <bution:> | 39: EXACT <\\-> failed... 1224 <istri> <bution:> | 42: EXACT <\\s> failed... Clearing an EVAL scope, savestack=578..588 restoring \2 to -1(1210)..-1(no) restoring \2..\3 to undef failed, try continuation... 1224 <istri> <bution:> | 61: NOTHING 1224 <istri> <bution:> | 62: CLOSE2 1224 <istri> <bution:> | 64: OPEN3 1224 <istri> <bution:> | 66: EXACT <(> Regards, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: demerphq [...] gmail.com
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Thu, 24 Mar 2005 04:24:43 +0000
To: perl5-porters [...] perl.org
From: hv [...] crypt.org
demerphq <demerphq@gmail.com> wrote: :On Tue, 22 Mar 2005 12:01:30 +0100, Rafael Garcia-Suarez :<rgarciasuarez@mandrakesoft.com> wrote: :> Well, in fact I'm going to revert it, since it makes installman segfault when :> installing perltoc. (tested with a threaded build on linux.) : :I can reproduce the problem here as well. I get a seg fault that :appears to ultimately come from a FREETMPS call in pp_nextstate. The :regex that is responsible is the following one from lib/pod/Man.pm : : # func(n) is a reference to a manual page. Make it \fIfunc\fR\|(n). : s{ : ( \b | \\s-1 ) : ( [A-Za-z_] (?:[.:\w]|\\-|\\s-?[01])+ ) : ( : \( \d [a-z]* \) : ) : } { $1 . '\f(IS' . $2 . '\f(IE\|' . $3 }egx; [...] Haven't got too far with this yet, but I've managed to cut the code to reproduce down to: perl -00nle '1 while s{<[^<>]*>}{}g;/^(=|\s)/||print' pod/perltoc.pod | ./perl -00nle 's/-/\\-/g; s{ \\-\\-}{}g; s{ \b ( [A-Z] (?: [A-Z+:\d_\$&] | \\- )* ) (?= [\s\]] ) }{ " s1" . $1 . "s0" }egx; s{(\b|s1)([A-Za-z_]([:\w]|s[01])+\(\))}{$1$2}g' .. which might help a bit. Hugo
CC: demerphq [...] gmail.com
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Thu, 24 Mar 2005 19:51:08 +0000
To: perl5-porters [...] perl.org
From: hv [...] crypt.org
Download (untitled) / with headers
text/plain 7.5k
Earlier I wrote: :Haven't got too far with this yet, but I've managed to cut the code :to reproduce down to: : perl -00nle '1 while s{<[^<>]*>}{}g;/^(=|\s)/||print' pod/perltoc.pod | ./perl -00nle 's/-/\\-/g; s{ \\-\\-}{}g; s{ \b ( [A-Z] (?: [A-Z+:\d_\$&] | \\- )* ) (?= [\s\]] ) }{ " s1" . $1 . "s0" }egx; s{(\b|s1)([A-Za-z_]([:\w]|s[01])+\(\))}{$1$2}g' : :.. which might help a bit. I managed to cut it further to: perl -le 's/(\d+)/"b"x$1/eg, print for @ARGV' aa6Xa1c a68ca10Xca10 | ./perl -nle 's{a([ab]|xx)+c}{}g' I wasn't really able to prove it, but I believe the problem was exactly what I worried about before: :a cleaner attempt (feel free, that's all I have time for today) would hoick :the calculations of o and b up a level to avoid redoing them, and thus also :avoid relying on the value of locinput being the same at that point (not :sure if that is guaranteed). .. and further confusion was caused by the fact that the CACHEsay* macros were assuming they had work to do if PL_reg_poscache existed, when that might have been created by a recursive match after we started this iteration. The patch below should fix it, but I haven't worked out if there's an easy way to introduce that test case above: in particular I haven't managed to find a way to reproduce this short variant without getting the strings piped in. Somewhat worrying is that I've been seeing occasional things like: t/run/switcht.............................Missing right curly or square bracket at ../lib/Config_heavy.pl line 1160, at end of line syntax error at ../lib/Config_heavy.pl line 1160, at EOF Compilation failed in require at ../lib/Config.pm line 66. # Looks like you planned 11 tests but ran 0. FAILED at test 1 .. on various t/run and t/io tests, but they haven't been reproducible, and I suspect this is some current blead instability not caused by this patch. Hugo --- t/op/re_tests.old Thu Mar 24 19:36:17 2005 +++ t/op/re_tests Thu Mar 24 19:36:26 2005 @@ -956,3 +956,5 @@ (a|aa|aaa|aaaa|aaaaa|aaaaaa)(b|c) aaaaaaaaaaaaaaab y $1$2 aaaaaab (a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&""})(b|c) aaaaaaaaaaaaaaab y $1$2 aaaaaab (a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&"foo"})(b|c) aaaaaaaaaaaaaaab n - - +^(a*?)(?!(aa|aaaa)*$) aaaaaaaaaaaaaaaaaaaa y $1 a # [perl #34195] +^(a*?)(?!(aa|aaaa)*$)(?=a\z) aaaaaaaa y $1 aaaaaaa --- regexec.c.old Tue Mar 22 11:24:32 2005 +++ regexec.c Thu Mar 24 19:32:11 2005 @@ -98,7 +98,6 @@ #define RF_warned 2 /* warned about big count? */ #define RF_evaled 4 /* Did an EVAL with setting? */ #define RF_utf8 8 /* String contains multibyte chars? */ -#define RF_false 16 /* odd number of nested negatives */ #define UTF ((PL_reg_flags & RF_utf8) != 0) @@ -2265,6 +2264,42 @@ #define sayNO_SILENT goto do_no #define saySAME(x) if (x) goto yes; else goto no +#define POSCACHE_SUCCESS 0 /* caching success rather than failure */ +#define POSCACHE_SEEN 1 /* we know what we're caching */ +#define POSCACHE_START 2 /* the real cache: this bit maps to pos 0 */ +#define CACHEsayYES STMT_START { \ + if (cache_offset | cache_bit) { \ + if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \ + PL_reg_poscache[0] |= (1<<POSCACHE_SUCCESS) || (1<<POSCACHE_SEEN); \ + else if (!(PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \ + /* cache records failure, but this is success */ \ + DEBUG_r( \ + PerlIO_printf(Perl_debug_log, \ + "%*s (remove success from failure cache)\n", \ + REPORT_CODE_OFF+PL_regindent*2, "") \ + ); \ + PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \ + } \ + } \ + sayYES; \ +} STMT_END +#define CACHEsayNO STMT_START { \ + if (cache_offset | cache_bit) { \ + if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \ + PL_reg_poscache[0] |= (1<<POSCACHE_SEEN); \ + else if ((PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \ + /* cache records success, but this is failure */ \ + DEBUG_r( \ + PerlIO_printf(Perl_debug_log, \ + "%*s (remove failure from success cache)\n", \ + REPORT_CODE_OFF+PL_regindent*2, "") \ + ); \ + PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \ + } \ + } \ + sayNO; \ +} STMT_END + /* this is used to determine how far from the left messages like 'failed...' are printed. Currently 29 makes these messages line up with the opcode they refer to. Earlier perls used 25 which @@ -3450,6 +3485,7 @@ CHECKPOINT cp, lastcp; CURCUR* cc = PL_regcc; char *lastloc = cc->lastloc; /* Detection of 0-len. */ + I32 cache_offset = 0, cache_bit = 0; n = cc->cur + 1; /* how many we know we matched */ PL_reginput = locinput; @@ -3502,7 +3538,7 @@ PL_reg_leftiter = PL_reg_maxiter; } if (PL_reg_leftiter-- == 0) { - I32 size = (PL_reg_maxiter + 7)/8; + I32 size = (PL_reg_maxiter + 7 + POSCACHE_START)/8; if (PL_reg_poscache) { if ((I32)PL_reg_poscache_size < size) { Renew(PL_reg_poscache, size, char); @@ -3521,23 +3557,26 @@ ); } if (PL_reg_leftiter < 0) { - I32 o = locinput - PL_bostr, b; + cache_offset = locinput - PL_bostr; - o = (scan->flags & 0xf) - 1 + o * (scan->flags>>4); - b = o % 8; - o /= 8; - if (PL_reg_poscache[o] & (1<<b)) { + cache_offset = (scan->flags & 0xf) - 1 + POSCACHE_START + + cache_offset * (scan->flags>>4); + cache_bit = cache_offset % 8; + cache_offset /= 8; + if (PL_reg_poscache[cache_offset] & (1<<cache_bit)) { DEBUG_EXECUTE_r( PerlIO_printf(Perl_debug_log, "%*s already tried at this position...\n", REPORT_CODE_OFF+PL_regindent*2, "") ); - if (PL_reg_flags & RF_false) + if (PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS)) + /* cache records success */ sayYES; else + /* cache records failure */ sayNO_SILENT; } - PL_reg_poscache[o] |= (1<<b); + PL_reg_poscache[cache_offset] |= (1<<cache_bit); } } @@ -3551,7 +3590,7 @@ REGCP_SET(lastcp); if (regmatch(cc->next)) { regcpblow(cp); - sayYES; /* All done. */ + CACHEsayYES; /* All done. */ } REGCP_UNWIND(lastcp); regcppop(); @@ -3567,7 +3606,7 @@ "Complex regular subexpression recursion", REG_INFTY - 1); } - sayNO; + CACHEsayNO; } DEBUG_EXECUTE_r( @@ -3583,13 +3622,13 @@ REGCP_SET(lastcp); if (regmatch(cc->scan)) { regcpblow(cp); - sayYES; + CACHEsayYES; } REGCP_UNWIND(lastcp); regcppop(); cc->cur = n - 1; cc->lastloc = lastloc; - sayNO; + CACHEsayNO; } /* Prefer scan over next for maximal matching. */ @@ -3601,7 +3640,7 @@ REGCP_SET(lastcp); if (regmatch(cc->scan)) { regcpblow(cp); - sayYES; + CACHEsayYES; } REGCP_UNWIND(lastcp); regcppop(); /* Restore some previous $<digit>s? */ @@ -3625,13 +3664,13 @@ if (PL_regcc) ln = PL_regcc->cur; if (regmatch(cc->next)) - sayYES; + CACHEsayYES; if (PL_regcc) PL_regcc->cur = ln; PL_regcc = cc; cc->cur = n - 1; cc->lastloc = lastloc; - sayNO; + CACHEsayNO; } /* NOT REACHED */ case BRANCHJ: @@ -4168,7 +4207,6 @@ } else PL_reginput = locinput; - PL_reg_flags ^= RF_false; goto do_ifmatch; case IFMATCH: n = 1; @@ -4184,8 +4222,6 @@ do_ifmatch: inner = NEXTOPER(NEXTOPER(scan)); if (regmatch(inner) != n) { - if (n == 0) - PL_reg_flags ^= RF_false; say_no: if (logical) { logical = 0; @@ -4195,8 +4231,6 @@ else sayNO; } - if (n == 0) - PL_reg_flags ^= RF_false; say_yes: if (logical) { logical = 0;
Subject: Re: [perl #34195] Regex: Alternations within negative lookahead assertions
Date: Sun, 27 Mar 2005 16:27:19 +0200
To: perl5-porters [...] perl.org
From: Rafael Garcia-Suarez <rgarciasuarez [...] mandrakesoft.com>
Download (untitled) / with headers
text/plain 295b
hv@crypt.org wrote: Show quoted text
> The patch below should fix it, but I haven't worked out if there's an > easy way to introduce that test case above: in particular I haven't > managed to find a way to reproduce this short variant without getting > the strings piped in.
Anyway: Thanks, applied as #24086.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org