Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexec: super-linear cache can prevent a valid match #10823

Closed
p5pRT opened this issue Nov 13, 2010 · 8 comments
Closed

regexec: super-linear cache can prevent a valid match #10823

p5pRT opened this issue Nov 13, 2010 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 13, 2010

Migrated from rt.perl.org#79152 (status was 'resolved')

Searchable as RT79152$

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2010

From nick@cleaton.net

Created by nick@cleaton.net

This is a bug report for perl from nick@​cleaton.net,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
The super-linear cache in regexec.c can prevent a valid match
from being detected. For example​:

print "yay\n" if 'xayxay' =~ /(q1|.)*(q2|.)*(x(a|bc)*y){2,}/;

This should match, but it doesn't because the cache fails to
distinguish between matching the final xay to x(a|bc)*y as the
first instance of the {2,} and matching it in the same position
as the second instance.

This bug is also present in blead.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.10.0:

Configured by Debian Project at Fri Jun 26 19:28:31 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.24-23-server, archname=i486-linux-gnu-thread-multi
    uname='linux rothera 2.6.24-23-server #1 smp wed apr 1 22:22:14 utc 2009 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.3.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.8.90.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0
    gnulibc_version='2.8.90'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /etc/perl
    /usr/local/lib/perl/5.10.0
    /usr/local/share/perl/5.10.0
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    .


Environment for perl 5.10.0:
    HOME=/home/nick
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/home/nick/streamline-1.8.1/lib
    LOGDIR (unset)
    PATH=/home/nick/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/nick/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2010

From nick@cleaton.net

Patch.

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2010

From nick@cleaton.net

Inline Patch
diff --git a/regcomp.c b/regcomp.c
index 7c7f526..779d0fc 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3217,13 +3217,16 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
 		    f |= SCF_DO_STCLASS_AND;
 		    f &= ~SCF_DO_STCLASS_OR;
 		}
-		/* These are the cases when once a subexpression
-		   fails at a particular position, it cannot succeed
-		   even after backtracking at the enclosing scope.
-
-		   XXXX what if minimal match and we are at the
-		        initial run of {n,m}? */
-		if ((mincount != maxcount - 1) && (maxcount != REG_INFTY))
+	        /* Exclude from super-linear cache processing any {n,m}
+		   regops for which the combination of input pos and regex
+		   pos is not enough information to determine if a match
+		   will be possible.
+
+		   For example, in the regex /foo(bar\s*){4,8}baz/ with the
+		   regex pos at the \s*, the prospects for a match depend not
+		   only on the input position but also on how many (bar\s*)
+		   repeats into the {4,8} we are. */
+               if ((mincount > 1) || (maxcount > 1 && maxcount != REG_INFTY))
 		    f &= ~SCF_WHILEM_VISITED_POS;
 
 		/* This will finish on WHILEM, setting scan, or on NULL: */
diff --git a/t/re/re_tests b/t/re/re_tests
index 66a47cc..02da1e1 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1482,5 +1482,10 @@ abc\N{def	-	c	-	\\N{NAME} must be resolved by the lexer
 [\0005]	5\000	y	$&	5
 [\_]	_	y	$&	_
 
+# RT #79152
+(q1|.)*(q2|.)*(x(a|bc)*y){2,}	xayxay	y	$&	xayxay
+(q1|.)*(q2|.)*(x(a|bc)*y){2,3}	xayxay	y	$&	xayxay
+(q1|z)*(q2|z)*z{15}-.*?(x(a|bc)*y){2,3}Z	zzzzzzzzzzzzzzzz-xayxayxayxayZ	y	$&	zzzzzzzzzzzzzzzz-xayxayxayxayZ
+
 (?:(?:)foo|bar|zot|rt78356)	foo	y	$&	foo
 # vim: softtabstop=0 noexpandtab

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2010

From nick@cleaton.net

This seems to do the trick, but I don't know regcomp.c well enough to be
sure that the change is correct and that the comment is not misleading.

Patch against blead attached, review please.

Nick

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2010

From nick@cleaton.net

Inline Patch
diff --git a/regcomp.c b/regcomp.c
index 7c7f526..779d0fc 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3217,13 +3217,16 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
 		    f |= SCF_DO_STCLASS_AND;
 		    f &= ~SCF_DO_STCLASS_OR;
 		}
-		/* These are the cases when once a subexpression
-		   fails at a particular position, it cannot succeed
-		   even after backtracking at the enclosing scope.
-
-		   XXXX what if minimal match and we are at the
-		        initial run of {n,m}? */
-		if ((mincount != maxcount - 1) && (maxcount != REG_INFTY))
+	        /* Exclude from super-linear cache processing any {n,m}
+		   regops for which the combination of input pos and regex
+		   pos is not enough information to determine if a match
+		   will be possible.
+
+		   For example, in the regex /foo(bar\s*){4,8}baz/ with the
+		   regex pos at the \s*, the prospects for a match depend not
+		   only on the input position but also on how many (bar\s*)
+		   repeats into the {4,8} we are. */
+               if ((mincount > 1) || (maxcount > 1 && maxcount != REG_INFTY))
 		    f &= ~SCF_WHILEM_VISITED_POS;
 
 		/* This will finish on WHILEM, setting scan, or on NULL: */
diff --git a/t/re/re_tests b/t/re/re_tests
index 66a47cc..02da1e1 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1482,5 +1482,10 @@ abc\N{def	-	c	-	\\N{NAME} must be resolved by the lexer
 [\0005]	5\000	y	$&	5
 [\_]	_	y	$&	_
 
+# RT #79152
+(q1|.)*(q2|.)*(x(a|bc)*y){2,}	xayxay	y	$&	xayxay
+(q1|.)*(q2|.)*(x(a|bc)*y){2,3}	xayxay	y	$&	xayxay
+(q1|z)*(q2|z)*z{15}-.*?(x(a|bc)*y){2,3}Z	zzzzzzzzzzzzzzzz-xayxayxayxayZ	y	$&	zzzzzzzzzzzzzzzz-xayxayxayxayZ
+
 (?:(?:)foo|bar|zot|rt78356)	foo	y	$&	foo
 # vim: softtabstop=0 noexpandtab

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2010

From @cpansprout

On Mon Nov 22 00​:16​:34 2010, ncleaton wrote​:

Patch against blead attached, review please.

Thank you. Applied as 779bcb7.

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2010

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant