New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex: Alternations within negative lookahead assertions #7807
Comments
From mike@mikero.comCreated by mike@mikero.com## this seems to be related to Ticket #23030 ## /^(aa|aaaa)*$/ is equivalent to /^(aa)*$/ ## this works: if ( ("a" x 19) !~ /^(aa)*$/ ) { ## and so does this: if ( ("a" x 19) !~ /^(aa|aaaa)*$/ ) { ## thus ("a" x 20) should match /^(a*?)(?!(aa|aaaa)*$)/ if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)*$)/ ) { ## it works without the alternation if ( ("a" x 20) =~ /^(a*?)(?!(aa)*$)/ ) { ## changing the * to + causes it to match with if ( ("a" x 20) =~ /^(a*?)(?!(aa|aaaa)+$)/ ) { ## also, changing the order of (aaaa|aa)* also doesn't work. Perl Info
|
From @demerphqOn 20 Feb 2005 22:22:46 -0000, via RT Mike Rosulek
Yeah I agree. Blead perl shows this problem too. It looks like it has "Detected a super-linear match, switching on caching..." Is reported just a bit before the incorrect fail. yves |
The RT System itself - Status changed from 'new' to 'open' |
From @hvdsMike Rosulek (via RT) <perlbug-followup@perl.org> wrote: This first fails with ("a" x 8), and it appears that this is because The test case there was: I'm not sure that I fully understand the failure mode in this case, but I Another option would be to disable the cache whenever we're inside one or Hugo |
From @hvdshv@crypt.org wrote: There is a third way: let the failure/success sense of the cache be The patch below against latest bleadperl is a quick and dirty first This patch reverses patch #20538 as a side benefit; it could do with more Hugo Inline Patch
|
From @demerphqOn Mon, 21 Feb 2005 17:59:49 +0000, hv@crypt.org <hv@crypt.org> wrote:
FWIW: I applied this to blead, and it passed all tests, I also built Cheers, -- |
From @demerphqOn Mon, 21 Feb 2005 17:59:49 +0000, hv@crypt.org <hv@crypt.org> wrote:
<Snip inlined patch> I was just wondering if this patch got overlooked? With all the trie Cheers, -- |
From @rgshv@crypt.org wrote:
Thanks, applied as #24053 to bleadperl. |
@rgs - Status changed from 'open' to 'resolved' |
From @rgsRafael Garcia-Suarez wrote:
Well, in fact I'm going to revert it, since it makes installman segfault when |
@rgs - Status changed from 'resolved' to 'open' |
From @demerphqOn Tue, 22 Mar 2005 12:01:30 +0100, Rafael Garcia-Suarez
I can reproduce the problem here as well. I get a seg fault that # func(n) is a reference to a manual page. Make it \fIfunc\fR\|(n). It goes crazy around the following point in perltoc.pod: =head2 CPAN - query, download and build perl modules from CPAN sites =over 4 =item SYNOPSIS =item STATUS =item DESCRIPTION =over 4 =item Interactive Mode Searching for authors, bundles, distribution files and modules, make, test, =item CPAN::Shell =item autobundle =item recompile =item The four C<CPAN::*> Classes: Author, Bundle, Module, Distribution =item Programmer's interface expand($type,@things), expandany(@things), Programming Examples =item Methods in the other Classes CPAN::Author::as_glimpse(), CPAN::Author::as_string(), perltoc.tmp has the following lines at around this point: .PD .Sh "\s-1CPAN\s0 \- query, download and build perl modules from .IX Subsection "CPAN - query, download and build perl modules from CPAN sites" .IP "\s-1SYNOPSIS\s0" 4 .IX Item "SYNOPSIS" .PD 0 .IP <segfault happened here> The output from debug for the regex starts here.... Guessed: match at offset 0 Proceeds for a very long time and eventually craps out around here: 1225 <strib> <ution::> | 42: Regards, -- |
From @hvdsdemerphq <demerphq@gmail.com> wrote: Haven't got too far with this yet, but I've managed to cut the code .. which might help a bit. Hugo |
From @hvdsEarlier I wrote: I managed to cut it further to: I wasn't really able to prove it, but I believe the problem was exactly .. and further confusion was caused by the fact that the CACHEsay* macros The patch below should fix it, but I haven't worked out if there's an Somewhat worrying is that I've been seeing occasional things like: Hugo Inline Patch--- t/op/re_tests.old Thu Mar 24 19:36:17 2005
+++ t/op/re_tests Thu Mar 24 19:36:26 2005
@@ -956,3 +956,5 @@
(a|aa|aaa|aaaa|aaaaa|aaaaaa)(b|c) aaaaaaaaaaaaaaab y $1$2 aaaaaab
(a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&""})(b|c) aaaaaaaaaaaaaaab y $1$2 aaaaaab
(a|aa|aaa|aaaa|aaaaa|aaaaaa)(??{$1&&"foo"})(b|c) aaaaaaaaaaaaaaab n - -
+^(a*?)(?!(aa|aaaa)*$) aaaaaaaaaaaaaaaaaaaa y $1 a # [perl #34195]
+^(a*?)(?!(aa|aaaa)*$)(?=a\z) aaaaaaaa y $1 aaaaaaa
--- regexec.c.old Tue Mar 22 11:24:32 2005
+++ regexec.c Thu Mar 24 19:32:11 2005
@@ -98,7 +98,6 @@
#define RF_warned 2 /* warned about big count? */
#define RF_evaled 4 /* Did an EVAL with setting? */
#define RF_utf8 8 /* String contains multibyte chars? */
-#define RF_false 16 /* odd number of nested negatives */
#define UTF ((PL_reg_flags & RF_utf8) != 0)
@@ -2265,6 +2264,42 @@
#define sayNO_SILENT goto do_no
#define saySAME(x) if (x) goto yes; else goto no
+#define POSCACHE_SUCCESS 0 /* caching success rather than failure */
+#define POSCACHE_SEEN 1 /* we know what we're caching */
+#define POSCACHE_START 2 /* the real cache: this bit maps to pos 0 */
+#define CACHEsayYES STMT_START { \
+ if (cache_offset | cache_bit) { \
+ if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+ PL_reg_poscache[0] |= (1<<POSCACHE_SUCCESS) || (1<<POSCACHE_SEEN); \
+ else if (!(PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+ /* cache records failure, but this is success */ \
+ DEBUG_r( \
+ PerlIO_printf(Perl_debug_log, \
+ "%*s (remove success from failure cache)\n", \
+ REPORT_CODE_OFF+PL_regindent*2, "") \
+ ); \
+ PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \
+ } \
+ } \
+ sayYES; \
+} STMT_END
+#define CACHEsayNO STMT_START { \
+ if (cache_offset | cache_bit) { \
+ if (!(PL_reg_poscache[0] & (1<<POSCACHE_SEEN))) \
+ PL_reg_poscache[0] |= (1<<POSCACHE_SEEN); \
+ else if ((PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))) { \
+ /* cache records success, but this is failure */ \
+ DEBUG_r( \
+ PerlIO_printf(Perl_debug_log, \
+ "%*s (remove failure from success cache)\n", \
+ REPORT_CODE_OFF+PL_regindent*2, "") \
+ ); \
+ PL_reg_poscache[cache_offset] &= ~(1<<cache_bit); \
+ } \
+ } \
+ sayNO; \
+} STMT_END
+
/* this is used to determine how far from the left messages like
'failed...' are printed. Currently 29 makes these messages line
up with the opcode they refer to. Earlier perls used 25 which
@@ -3450,6 +3485,7 @@
CHECKPOINT cp, lastcp;
CURCUR* cc = PL_regcc;
char *lastloc = cc->lastloc; /* Detection of 0-len. */
+ I32 cache_offset = 0, cache_bit = 0;
n = cc->cur + 1; /* how many we know we matched */
PL_reginput = locinput;
@@ -3502,7 +3538,7 @@
PL_reg_leftiter = PL_reg_maxiter;
}
if (PL_reg_leftiter-- == 0) {
- I32 size = (PL_reg_maxiter + 7)/8;
+ I32 size = (PL_reg_maxiter + 7 + POSCACHE_START)/8;
if (PL_reg_poscache) {
if ((I32)PL_reg_poscache_size < size) {
Renew(PL_reg_poscache, size, char);
@@ -3521,23 +3557,26 @@
);
}
if (PL_reg_leftiter < 0) {
- I32 o = locinput - PL_bostr, b;
+ cache_offset = locinput - PL_bostr;
- o = (scan->flags & 0xf) - 1 + o * (scan->flags>>4);
- b = o % 8;
- o /= 8;
- if (PL_reg_poscache[o] & (1<<b)) {
+ cache_offset = (scan->flags & 0xf) - 1 + POSCACHE_START
+ + cache_offset * (scan->flags>>4);
+ cache_bit = cache_offset % 8;
+ cache_offset /= 8;
+ if (PL_reg_poscache[cache_offset] & (1<<cache_bit)) {
DEBUG_EXECUTE_r(
PerlIO_printf(Perl_debug_log,
"%*s already tried at this position...\n",
REPORT_CODE_OFF+PL_regindent*2, "")
);
- if (PL_reg_flags & RF_false)
+ if (PL_reg_poscache[0] & (1<<POSCACHE_SUCCESS))
+ /* cache records success */
sayYES;
else
+ /* cache records failure */
sayNO_SILENT;
}
- PL_reg_poscache[o] |= (1<<b);
+ PL_reg_poscache[cache_offset] |= (1<<cache_bit);
}
}
@@ -3551,7 +3590,7 @@
REGCP_SET(lastcp);
if (regmatch(cc->next)) {
regcpblow(cp);
- sayYES; /* All done. */
+ CACHEsayYES; /* All done. */
}
REGCP_UNWIND(lastcp);
regcppop();
@@ -3567,7 +3606,7 @@
"Complex regular subexpression recursion",
REG_INFTY - 1);
}
- sayNO;
+ CACHEsayNO;
}
DEBUG_EXECUTE_r(
@@ -3583,13 +3622,13 @@
REGCP_SET(lastcp);
if (regmatch(cc->scan)) {
regcpblow(cp);
- sayYES;
+ CACHEsayYES;
}
REGCP_UNWIND(lastcp);
regcppop();
cc->cur = n - 1;
cc->lastloc = lastloc;
- sayNO;
+ CACHEsayNO;
}
/* Prefer scan over next for maximal matching. */
@@ -3601,7 +3640,7 @@
REGCP_SET(lastcp);
if (regmatch(cc->scan)) {
regcpblow(cp);
- sayYES;
+ CACHEsayYES;
}
REGCP_UNWIND(lastcp);
regcppop(); /* Restore some previous $<digit>s? */
@@ -3625,13 +3664,13 @@
if (PL_regcc)
ln = PL_regcc->cur;
if (regmatch(cc->next))
- sayYES;
+ CACHEsayYES;
if (PL_regcc)
PL_regcc->cur = ln;
PL_regcc = cc;
cc->cur = n - 1;
cc->lastloc = lastloc;
- sayNO;
+ CACHEsayNO;
}
/* NOT REACHED */
case BRANCHJ:
@@ -4168,7 +4207,6 @@
}
else
PL_reginput = locinput;
- PL_reg_flags ^= RF_false;
goto do_ifmatch;
case IFMATCH:
n = 1;
@@ -4184,8 +4222,6 @@
do_ifmatch:
inner = NEXTOPER(NEXTOPER(scan));
if (regmatch(inner) != n) {
- if (n == 0)
- PL_reg_flags ^= RF_false;
say_no:
if (logical) {
logical = 0;
@@ -4195,8 +4231,6 @@
else
sayNO;
}
- if (n == 0)
- PL_reg_flags ^= RF_false;
say_yes:
if (logical) {
logical = 0; |
From @rgshv@crypt.org wrote:
Anyway: Thanks, applied as #24086. |
@rgs - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#34195 (status was 'resolved')
Searchable as RT34195$
The text was updated successfully, but these errors were encountered: