Skip Menu |
Report information
Id: 122171
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: dietmar.schindler [at] manroland-web.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Date: Tue, 24 Jun 2014 07:45:02 +0000
Subject: Regular expression with lookahead does not match, while it should.
From: <dietmar.schindler [...] manroland-web.com>
To: <perlbug [...] perl.org>
This is a bug report for perl from dietmar.schindler@manroland-web.com, generated with the help of perlbug 1.40 running under perl 5.20.0. ----------------------------------------------------------------- I observed the following bug in Perl versions 5.10.1, 5.18.2 and 5.20.0, while versions 5.6.2 and 5.8.7 behaved correctly. Regular expressions similar to /(?=.*P)P/ do not match certain text which they should match. Example: " P" =~ /(?=.*P)P/ yields no match, i. e. print " P" =~ /(?=.*P)P/ prints nothing instead of 1. ----------------------------------------------------------------- --- Flags: category=core severity=high --- Site configuration information for perl 5.20.0: Configured by strawberry-perl at Fri May 30 08:28:40 2014. Summary of my perl5 (revision 5 version 20 subversion 0) configuration: Platform: osname=MSWin32, osvers=6.2, archname=MSWin32-x64-multi-thread uname='Win32 strawberry-perl 5.20.0.1 #1 Fri May 30 08:27:00 2014 x64' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.8.2', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='long long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~1\c\lib"' libpth=C:\STRAWB~1\c\lib C:\STRAWB~1\c\x86_64-w64-mingw32\lib C:\STRAWB~1\c\lib\gcc\x86_64-w64-mingw32\4.8.2 libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl520.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=xs.dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~1\c\lib"' --- @INC for perl 5.20.0: C:/Strawberry/perl/site/lib C:/Strawberry/perl/vendor/lib C:/Strawberry/perl/lib . --- Environment for perl 5.20.0: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\Program Files (x86)\AMD APP\bin\x86_64;C:\Program Files (x86)\AMD APP\bin\x86;C:\oracle\ora11g_64bit\bin;C:\oracle\ora11g_32bit\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;C:\Program Files\Microsoft Network Monitor 3\;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin PERL_BADLANG (unset) SHELL (unset) Show quoted text
________________________________ manroland web systems GmbH -- Managing Director: Joern Gossé Registered Office: Augsburg -- Trade Register: AG Augsburg -- HRB-No.: 26816 -- VAT: DE281389840 Confidentiality note: This eMail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified that any use or dissemination of this communication is strictly prohibited. If you have received this eMail in error, then please delete this eMail. ! Please consider your environmental responsibility before printing this eMail !
________________________________
RT-Send-CC: perl5-porters [...] perl.org
On Tue Jun 24 00:45:27 2014, dietmar.schindler@manroland-web.com wrote: Show quoted text
> This is a bug report for perl from dietmar.schindler@manroland- > web.com, > generated with the help of perlbug 1.40 running under perl 5.20.0. > > > ----------------------------------------------------------------- > I observed the following bug in Perl versions 5.10.1, 5.18.2 and > 5.20.0, while > versions 5.6.2 and 5.8.7 behaved correctly. > Regular expressions similar to /(?=.*P)P/ do not match certain text > which they > should match. Example: > " P" =~ /(?=.*P)P/ > yields no match, i. e. > print " P" =~ /(?=.*P)P/ > prints nothing instead of 1. > --------------------------------------------------------------
I don't believe this is a bug in Perl 5. According to 'perldoc perlre' in blead: "(?=pattern)" is used as "A zero-width positive look-ahead assertion." If we are to look ahead, we have to have something in the pattern *from which we are looking ahead*. I don't see anything like that in your pattern. Did you perhaps want a positive look-behind pattern "(?<=pattern)"? Thank you very much. Jim Keenan
To: "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>
CC: Perl5 Porters <perl5-porters [...] perl.org>
From: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
Date: Tue, 24 Jun 2014 23:39:45 +0200
Download (untitled) / with headers
text/plain 896b

On Tue, Jun 24, 2014 at 10:52 PM, James E Keenan via RT <perlbug-followup@perl.org> wrote:
Show quoted text
I don't believe this is a bug in Perl 5.

According to 'perldoc perlre' in blead:  "(?=pattern)" is used as "A zero-width positive look-ahead assertion."

If we are to look ahead, we have to have something in the pattern *from which we are looking ahead*.  I don't see anything like that in your pattern.  Did you perhaps want a positive look-behind pattern "(?<=pattern)"?


  Huh?  It's supposed to be looking ahead from the starting position, which will advance on backtracking, since this pattern is not anchored.  Observe:

$ perl -wE '"foo bar" =~ /(?=[bo])(?{ say pos })(?!)/'
1
2
4
$

  Observe also:

$ perl -E 'say " P" =~ /(?=P)P/'
1
$ perl -E 'say " P" =~ /(?=.*P)P/'

$

  Why shouldn't .* match the empty string?  I'm pretty sure this is a genuine bug in Perl 5.


Eirik
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
Date: Tue, 24 Jun 2014 16:28:57 -0600
To: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>, "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>
CC: Perl5 Porters <perl5-porters [...] perl.org>, Yves Orton <yves.orton [...] booking.com>
Download (untitled) / with headers
text/plain 1.4k
On 06/24/2014 03:39 PM, Eirik Berg Hanssen wrote: Show quoted text
> > On Tue, Jun 24, 2014 at 10:52 PM, James E Keenan via RT > <perlbug-followup@perl.org <mailto:perlbug-followup@perl.org>> wrote: > > I don't believe this is a bug in Perl 5. > > According to 'perldoc perlre' in blead: "(?=pattern)" is used as "A > zero-width positive look-ahead assertion." > > If we are to look ahead, we have to have something in the pattern > *from which we are looking ahead*. I don't see anything like that > in your pattern. Did you perhaps want a positive look-behind > pattern "(?<=pattern)"? > > > > Huh? It's supposed to be looking ahead from the starting position, > which will advance on backtracking, since this pattern is not anchored. > Observe: > > $ perl -wE '"foo bar" =~ /(?=[bo])(?{ say pos })(?!)/' > 1 > 2 > 4 > $ > > Observe also: > > $ perl -E 'say " P" =~ /(?=P)P/' > 1 > $ perl -E 'say " P" =~ /(?=.*P)P/' > > $ > > Why shouldn't .* match the empty string? I'm pretty sure this is a > genuine bug in Perl 5. > > > Eirik
The commit that changed the behavior from the original to the current was this: 07be1b83a6b2d24b492356181ddf70e1c7917ae3 is the first bad commit commit 07be1b83a6b2d24b492356181ddf70e1c7917ae3 Author: Yves Orton <demerphq@gmail.com> Date: Fri Jun 9 02:56:37 2006 +0200 Re: [PATCH] Better version of the Aho-Corasick patch and lots of benchmarks. Message-ID: <9b18b3110606081556t779de698r82f361d82a05fbc8@mail.gmail.com> (with tweaks)
From: demerphq <demerphq [...] gmail.com>
Date: Wed, 25 Jun 2014 08:36:11 +0200
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
CC: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>, "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, Yves Orton <yves.orton [...] booking.com>
To: Karl Williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 2.3k
On 25 June 2014 00:28, Karl Williamson <public@khwilliamson.com> wrote:
Show quoted text
On 06/24/2014 03:39 PM, Eirik Berg Hanssen wrote:

On Tue, Jun 24, 2014 at 10:52 PM, James E Keenan via RT
<perlbug-followup@perl.org <mailto:perlbug-followup@perl.org>> wrote:

    I don't believe this is a bug in Perl 5.

    According to 'perldoc perlre' in blead:  "(?=pattern)" is used as "A
    zero-width positive look-ahead assertion."

    If we are to look ahead, we have to have something in the pattern
    *from which we are looking ahead*.  I don't see anything like that
    in your pattern.  Did you perhaps want a positive look-behind
    pattern "(?<=pattern)"?



   Huh?  It's supposed to be looking ahead from the starting position,
which will advance on backtracking, since this pattern is not anchored.
Observe:

$ perl -wE '"foo bar" =~ /(?=[bo])(?{ say pos })(?!)/'
1
2
4
$

   Observe also:

$ perl -E 'say " P" =~ /(?=P)P/'
1
$ perl -E 'say " P" =~ /(?=.*P)P/'

$

   Why shouldn't .* match the empty string?  I'm pretty sure this is a
genuine bug in Perl 5.


Eirik

The commit that changed the behavior from the original to the current was this:
07be1b83a6b2d24b492356181ddf70 e1c7917ae3 is the first bad commit
commit 07be1b83a6b2d24b492356181ddf70 e1c7917ae3
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Jun 9 02:56:37 2006 +0200

    Re: [PATCH] Better version of the Aho-Corasick patch and lots of benchmarks.
    Message-ID: <9b18b3110606081556t779de698r8 2f361d82a05fbc8@mail.gmail.com
>

    (with tweaks)


That surprises me given this:

$ ./perl -Ilib -Mre=debug -le'" P" =~ /(?=.*P)P/'
Compiling REx "(?=.*P)P"
Final program:
   1: IFMATCH[0] (9)
   3:   STAR (5)
   4:     REG_ANY (0)
   5:   EXACT <P> (7)
   7:   SUCCEED (0)
   8: TAIL (9)
   9: EXACT <P> (11)
  11: END (0)
anchored "P" at 0 (checking anchored) anchored(MBOL) implicit minlen 1 
Matching REx "(?=.*P)P" against " P"
Intuit: trying to determine minimum start position...
  Did not find anchored substr "P"...
Match rejected by optimizer
Freeing REx: "(?=.*P)P"


Why did it decide it is anchored? It shouldn't have decided that. 

If you bisected maybe this is a false positive.

Yves



--
perl -Mre=debug -e "/just|another|perl|hacker/"
From: demerphq <demerphq [...] gmail.com>
Date: Wed, 25 Jun 2014 08:43:35 +0200
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
CC: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>, "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, Yves Orton <yves.orton [...] booking.com>
To: Karl Williamson <public [...] khwilliamson.com>, Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 5.6k
On 25 June 2014 08:36, demerphq <demerphq@gmail.com> wrote:
Show quoted text
On 25 June 2014 00:28, Karl Williamson <public@khwilliamson.com> wrote:

The commit that changed the behavior from the original to the current was this:
07be1b83a6b2d24b492356181ddf70 e1c7917ae3 is the first bad commit
commit 07be1b83a6b2d24b492356181ddf70 e1c7917ae3
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Jun 9 02:56:37 2006 +0200

    Re: [PATCH] Better version of the Aho-Corasick patch and lots of benchmarks.
    Message-ID: <9b18b3110606081556t779de698r8 2f361d82a05fbc8@mail.gmail.com
>

    (with tweaks)


That surprises me given this:

$ ./perl -Ilib -Mre=debug -le'" P" =~ /(?=.*P)P/'
Compiling REx "(?=.*P)P"
Final program:
   1: IFMATCH[0] (9)
   3:   STAR (5)
   4:     REG_ANY (0)
   5:   EXACT <P> (7)
   7:   SUCCEED (0)
   8: TAIL (9)
   9: EXACT <P> (11)
  11: END (0)
anchored "P" at 0 (checking anchored) anchored(MBOL) implicit minlen 1 
Matching REx "(?=.*P)P" against " P"
Intuit: trying to determine minimum start position...
  Did not find anchored substr "P"...
Match rejected by optimizer
Freeing REx: "(?=.*P)P"


Why did it decide it is anchored? It shouldn't have decided that. 

If you bisected maybe this is a false positive.

Or maybe not, I see I did actually touch a bit of logic related to this in that patch. (Oh the bad old days before git.)

Anyway, it is clearly the optimiser going wrong, as you can see below. Maybe Dave M has some ideas?

If I find some time to debug further I will let you know what I find.

Yves

$ ./perl -Ilib -Mre=Debug,ALL -le'" P" =~ /(?=.*P)P/'
Assembling pattern from 1 elements
Compiling REx "(?=.*P)P"
Starting first pass (sizing)
 >(?=.*P)P<     |   1|  reg    
                |    |    brnc   
                |    |      piec   
                |    |        atom   
 >?=.*P)P<      |    |          reg    
 >.*P)P<        |    |            brnc   
                |    |              piec   
                |    |                atom   
 >P)P<          |   2|                inst - STAR
                |   3|              piec   
                |    |                atom   
 >)P<           |   6|            inst - IFMATCH
 >P<            |   9|      piec   
                |    |        atom   
Required size 11 nodes
Starting second pass (creation)
 >(?=.*P)P<     |   1|  reg    
                |    |    brnc   
                |    |      piec   
                |    |        atom   
 >?=.*P)P<      |    |          reg    
 >.*P)P<        |    |            brnc   
                |    |              piec   
                |    |                atom   
 >P)P<          |   2|                inst - STAR
                |   3|              piec   
                |    |                atom   
 >)P<           |   5|              tail~ STAR (1) -> EXACT
                |   6|          lsbr~ tying lastbr STAR (1) to ender SUCCEED (5) offset 4
                |    |            tail~ STAR (1)  
                |    |                ~ EXACT <P> (3) -> SUCCEED
                |    |            inst - IFMATCH
                |   9|            tsdy~ IFMATCH[0] (1) -> END
                |    |                ~ attach to TAIL (8) offset to 7
 >P<            |    |      piec   
                |    |        atom   
 ><             |  11|      tail~ IFMATCH[0] (1)  
                |    |          ~ TAIL (8) -> EXACT
                |  12|  lsbr~ tying lastbr IFMATCH[0] (1) to ender END (11) offset 10
                |    |    tail~ IFMATCH[0] (1)  
                |    |        ~ TAIL (8)  
                |    |        ~ EXACT <P> (9) -> END
first:>  3: STAR (5)
first:>  4: REG_ANY (0)
first at 4
RExC_seen: REG_UNBOUNDED_QUANTIFIER_SEEN 
study_chunk stopparen=-1 depth=0 recursed_depth=0 
Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'' 0:0/0 *Fixed:'' @ 0 Float: '' @ 0/0
Peep>  1: IFMATCH[0] (8)
  study_chunk stopparen=-1 depth=1 recursed_depth=0 
  Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 
  Peep>  3: STAR (5)
  study_chunk stopparen=-1 depth=1 recursed_depth=0 
  Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 INF 
  Peep>  5: EXACT <P> (7)
    join>  5: EXACT <P> (7)
  study_chunk stopparen=-1 depth=1 recursed_depth=0 
  Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 INF 
  Peep>  7: SUCCEED (0)
  pre-fin:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 INF 
  post-fin:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 INF 
study_chunk stopparen=-1 depth=0 recursed_depth=0 
Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'' 0:0/0 *Fixed:'' @ 0 Float: '' @ 0/0
Peep>  9: EXACT <P> (11)
  join>  9: EXACT <P> (11)
pre-fin:Pos:1/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'P' 1:0/0 *Fixed:'' @ 0 Float: '' @ 0/0
post-fin:Pos:1/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'P' 1:0/0 *Fixed:'' @ 0 Float: '' @ 0/0
commit: Pos:1/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'P' -1:0/0 *Fixed:'P' @ 0 Float: '' @ 0/0
minlen: 1 r->minlen:0 maxlen:0
RExC_seen: REG_UNBOUNDED_QUANTIFIER_SEEN 
Final program:
   1: IFMATCH[0] (9)
   3:   STAR (5)
   4:     REG_ANY (0)
   5:   EXACT <P> (7)
   7:   SUCCEED (0)
   8: TAIL (9)
   9: EXACT <P> (11)
  11: END (0)
anchored "P" at 0 (checking anchored) anchored(MBOL) implicit minlen 1 
r->extflags: IS_ANCHORED UNBOUNDED_QUANTIFIER_SEEN USE_INTUIT_NOML USE_INTUIT_ML 
r->intflags: IMPLICIT ANCH_MBOL 
Matching REx "(?=.*P)P" against " P"
Intuit: trying to determine minimum start position...
  substrs[0]: min=0 max=0 end shift=0 useful=100 utf8=0 [PVMG("P"\0)]
  substrs[2]: min=0 max=0 end shift=0 useful=100 utf8=0 [PVMG("P"\0)]
  At restart: rx_origin=0 Check offset min: 0 Start shift: 0 End shift 0 Real end Shift: 0
  fbm_instr len=1 str=< >
  Did not find anchored substr "P"...
Match rejected by optimizer
Freeing REx: "(?=.*P)P"
 


--
perl -Mre=debug -e "/just|another|perl|hacker/"
From: demerphq <demerphq [...] gmail.com>
Date: Wed, 25 Jun 2014 09:33:16 +0200
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
To: Karl Williamson <public [...] khwilliamson.com>, Dave Mitchell <davem [...] iabyn.com>
CC: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>, "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, Yves Orton <yves.orton [...] booking.com>
On 25 June 2014 08:43, demerphq <demerphq@gmail.com> wrote:
Show quoted text
On 25 June 2014 08:36, demerphq <demerphq@gmail.com> wrote:
On 25 June 2014 00:28, Karl Williamson <public@khwilliamson.com> wrote:

The commit that changed the behavior from the original to the current was this:
07be1b83a6b2d24b492356181ddf70 e1c7917ae3 is the first bad commit
commit 07be1b83a6b2d24b492356181ddf70 e1c7917ae3
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Jun 9 02:56:37 2006 +0200

    Re: [PATCH] Better version of the Aho-Corasick patch and lots of benchmarks.
    Message-ID: <9b18b3110606081556t779de698r8 2f361d82a05fbc8@mail.gmail.com
>

    (with tweaks)



... 
Show quoted text
If you bisected maybe this is a false positive.

Or maybe not, I see I did actually touch a bit of logic related to this in that patch. (Oh the bad old days before git.)


I pushed a fix. Since I am "retired" right now I did not push tests. Someone else can do that.

Yves 

--
perl -Mre=debug -e "/just|another|perl|hacker/"
From: "Matthew Horsfall (alh)" <wolfsage [...] gmail.com>
Subject: Re: [perl #122171] Regular expression with lookahead does not match, while it should.
Date: Wed, 25 Jun 2014 10:39:29 -0400
To: demerphq <demerphq [...] gmail.com>
CC: Karl Williamson <public [...] khwilliamson.com>, Dave Mitchell <davem [...] iabyn.com>, Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>, "perlbug-followup [...] perl.org" <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, Yves Orton <yves.orton [...] booking.com>
Download (untitled) / with headers
text/plain 339b
On Wed, Jun 25, 2014 at 3:33 AM, demerphq <demerphq@gmail.com> wrote: Show quoted text
> On 25 June 2014 08:43, demerphq <demerphq@gmail.com> wrote:
Show quoted text
> I pushed a fix. Since I am "retired" right now I did not push tests. Someone > else can do that. >
Thanks. I've added a small test case in 06ab3c1. More would be good though. -- Matthew Horsfall (alh)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 129b
Fixed by c445c5b05df9dacbd0b21a477ba20c6f45e23263 with test added by 06ab3c155013050b54920c009cdc2c564e86a6a3 -- Karl Williamson


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org