Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex /.*\z/ doesn't matches strings ending with \n #8905

Closed
p5pRT opened this issue May 21, 2007 · 8 comments
Closed

regex /.*\z/ doesn't matches strings ending with \n #8905

p5pRT opened this issue May 21, 2007 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented May 21, 2007

Migrated from rt.perl.org#43015 (status was 'resolved')

Searchable as RT43015$

@p5pRT
Copy link
Author

p5pRT commented May 21, 2007

From @moritz

Created by moritz@faui2k3.org

This is a bug report for perl from moritz@​faui2k3.org,
generated with the help of perlbug 1.35 running under perl v5.8.8.

-----------------------------------------------------------------
On perl 5.8.8, the regex
/.*\z/
doesn't match strings ending with \n. Since .* can match an empty string as
well, _any_ string should match that regex.

This topic was discussed on permonks​: http​://www.perlmonks.org/?node_id=616538

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.8:

Configured by Debian Project at Wed Dec  6 23:17:41 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.18.3, archname=i486-linux-gnu-thread-multi
    uname='linux saens 2.6.18.3 #1 smp sat nov 25 13:39:52 est 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 20061115 (prerelease) (Debian 4.1.1-20)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.8:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    .


Environment for perl v5.8.8:
    HOME=/home/moritz
    LANG=EN_US.UTF-8
    LANGUAGE=C
    LC_ALL=C
    LC_CTYPE=de_DE
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/sbin:/usr/bin:/usr/sbin:/home/moritz/bin:/usr/games:/usr/local/Eiffel54/studio/spec/linux-glibc2.1/bin:/usr/bin/X11:/usr/local/bin:
    PERL6LIB=/home/moritz/pugs/blib6/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 21, 2007

From schubiger@gmail.com

On Mon, May 21, 2007 at 06​:50​:15AM -0700, moritz@​faui2k3.org (via RT) wrote​:

On perl 5.8.8, the regex
/.*\z/
doesn't match strings ending with \n. Since .* can match an empty string as
well, _any_ string should match that regex.

This topic was discussed on permonks​: http​://www.perlmonks.org/?node_id=616538

I can reproduce it with bleadperl #31251 (but I'm not sure
whether to consider it a bug or not). If it is, could it be
that is somehow related to regcomp.c?

  6580 case '.'​:
  6581 nextchar(pRExC_state);
  6582 if (RExC_flags & RXf_PMf_SINGLELINE)
  6583 ret = reg_node(pRExC_state, SANY);
  6584 else
  6585 ret = reg_node(pRExC_state, REG_ANY);
  6586 *flagp |= HASWIDTH|SIMPLE;
  6587 RExC_naughty++;
  6588 Set_Node_Length(ret, 1); /* MJD */
  6589 break;

@p5pRT
Copy link
Author

p5pRT commented May 21, 2007

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 21, 2007

From @demerphq

On 5/21/07, Steven Schubiger <schubiger@​gmail.com> wrote​:

On Mon, May 21, 2007 at 06​:50​:15AM -0700, moritz@​faui2k3.org (via RT) wrote​:

On perl 5.8.8, the regex
/.*\z/
doesn't match strings ending with \n. Since .* can match an empty string as
well, _any_ string should match that regex.

This topic was discussed on permonks​: http​://www.perlmonks.org/?node_id=616538

I can reproduce it with bleadperl #31251 (but I'm not sure
whether to consider it a bug or not). If it is, could it be
that is somehow related to regcomp.c?

6580 case '.'​:
6581 nextchar(pRExC_state);
6582 if (RExC_flags & RXf_PMf_SINGLELINE)
6583 ret = reg_node(pRExC_state, SANY);
6584 else
6585 ret = reg_node(pRExC_state, REG_ANY);
6586 *flagp |= HASWIDTH|SIMPLE;
6587 RExC_naughty++;
6588 Set_Node_Length(ret, 1); /* MJD */
6589 break;

nah, this is just the code for parsing the . (dot) construct. the
problem probably lies in the optimiser, wgich means study_chunk in
regcomp.c and various places in regexec.c

cheers.
yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 28, 2007

From @demerphq

On 5/21/07, via RT moritz @​ faui2k3. org <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by moritz@​faui2k3.org
# Please include the string​: [perl #43015]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=43015 >

This is a bug report for perl from moritz@​faui2k3.org,
generated with the help of perlbug 1.35 running under perl v5.8.8.

-----------------------------------------------------------------
On perl 5.8.8, the regex
/.*\z/
doesn't match strings ending with \n. Since .* can match an empty string as
well, _any_ string should match that regex.

This topic was discussed on permonks​: http​://www.perlmonks.org/?node_id=616538

Attached patch fixes this case, and also possibly as a sideffect
enables various optimisations that might not have been.

Once applied I believe that this bug can be closed.

Gotta say tho, the optimiser is a scary place.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 28, 2007

From @demerphq

fix_z.patch
Index: D:/dev/perl/ver/zoro/regcomp.c
===================================================================
--- D:/dev/perl/ver/zoro/regcomp.c	(revision 1874)
+++ D:/dev/perl/ver/zoro/regcomp.c	(revision 1875)
@@ -4121,7 +4121,6 @@
     char*  exp = SvPV((SV*)pattern, plen);
     char* xend = exp + plen;
     regnode *scan;
-    regnode *first;
     I32 flags;
     I32 minlen = 0;
     I32 sawplus = 0;
@@ -4145,7 +4144,7 @@
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s %s\n",
 		       PL_colors[4],PL_colors[5],s);
     });
-
+    
 redo_first_pass:
     RExC_precomp = exp;
     RExC_flags = pm_flags;
@@ -4381,18 +4380,20 @@
 	struct regnode_charclass_class ch_class; /* pointed to by data */
 	int stclass_flag;
 	I32 last_close = 0; /* pointed to by data */
-
-	first = scan;
+        regnode *first= scan;
+        regnode *first_next= regnext(first);
+	
 	/* Skip introductions and multiplicators >= 1. */
 	while ((OP(first) == OPEN && (sawopen = 1)) ||
 	       /* An OR of *one* alternative - should not happen now. */
-	    (OP(first) == BRANCH && OP(regnext(first)) != BRANCH) ||
+	    (OP(first) == BRANCH && OP(first_next) != BRANCH) ||
 	    /* for now we can't handle lookbehind IFMATCH*/
 	    (OP(first) == IFMATCH && !first->flags) || 
 	    (OP(first) == PLUS) ||
 	    (OP(first) == MINMOD) ||
 	       /* An {n,m} with n>0 */
-	    (PL_regkind[OP(first)] == CURLY && ARG1(first) > 0) ) 
+	    (PL_regkind[OP(first)] == CURLY && ARG1(first) > 0) ||
+	    (OP(first) == NOTHING && PL_regkind[OP(first_next)] != END )) 
 	{
 	        
 		if (OP(first) == PLUS)
@@ -4404,6 +4405,7 @@
 		    first += EXTRA_STEP_2ARGS;
 		} else  /* XXX possible optimisation for /(?=)/  */
 		    first = NEXTOPER(first);
+		first_next= regnext(first);
 	}
 
 	/* Starting-point info. */
Index: D:/dev/perl/ver/zoro/t/op/re_tests
===================================================================
--- D:/dev/perl/ver/zoro/t/op/re_tests	(revision 1874)
+++ D:/dev/perl/ver/zoro/t/op/re_tests	(revision 1875)
@@ -1324,3 +1324,5 @@
 foo(\h)bar	foo\tbar	y	$1	\t
 (\H)(\h)	foo\tbar	y	$1-$2	o-\t
 (\h)(\H)	foo\tbar	y	$1-$2	\t-b
+
+.*\z	foo\n	y	-	-
Index: D:/dev/perl/ver/zoro/regexec.c
===================================================================
--- D:/dev/perl/ver/zoro/regexec.c	(revision 1874)
+++ D:/dev/perl/ver/zoro/regexec.c	(revision 1875)
@@ -1845,7 +1845,7 @@
 		    if (regtry(&reginfo, &s))
 			goto got_it;
 		  after_try:
-		    if (s >= end)
+		    if (s > end)
 			goto phooey;
 		    if (prog->extflags & RXf_USE_INTUIT) {
 			s = re_intuit_start(prog, sv, s + 1, strend, flags, NULL);

@p5pRT
Copy link
Author

p5pRT commented May 29, 2007

@rgs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed May 29, 2007
@p5pRT
Copy link
Author

p5pRT commented May 29, 2007

From @rgs

On 28/05/07, demerphq <demerphq@​gmail.com> wrote​:

Attached patch fixes this case, and also possibly as a sideffect
enables various optimisations that might not have been.

Once applied I believe that this bug can be closed.

Gotta say tho, the optimiser is a scary place.

Hic sunt leones. Thanks, applied as #31303.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant