Skip Menu |
Report information
Id: 68564
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: ikegami [at] adaelis.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: /g failure with zero-width patterns
Date: Sun, 16 Aug 2009 04:15:16 -0400
To: perlbug [...] perl.org
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 3.4k
This is a bug report for perl from ikegami@adaelis.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]


A regression was introduced into 5.10.0 concerning /g and zero-width patterns. The demo speaks for itself:

Show quoted text
>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
abc
bc
c

Show quoted text
>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
abc

I expect the 5.8 behaviour. 5.6.x and 5.8.x behave like 5.8.9. I've been told 5.10.1-RC1 behaves like 5.10.0. I don't know about blead.

Eric "ikegami" Brine


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl 5.10.0:

Configured by SYSTEM at Wed Sep  3 13:16:08 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"C:\progs\perl5100\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"C:\progs\perl5100\lib\CORE"  -machine:x86'

Locally applied patches:
    ACTIVEPERL_LOCAL_PATCHES_ENTRY
    33741 avoids segfaults invoking S_raise_signal() (on Linux)
    33763 Win32 process ids can have more than 16 bits
    32809 Load 'loadable object' with non-default file extension
    32728 64-bit fix for Time::Local

---
@INC for perl 5.10.0:
    c:/progs/perl5100/site/lib
    c:/progs/perl5100/lib
    .

---
Environment for perl 5.10.0:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=c:\bin;c:\progs\perl5100\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\WBEM
    PERL_BADLANG (unset)
    SHELL (unset)

Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Sun, 16 Aug 2009 14:43:15 +0100
To: perl5-porters [...] perl.org
From: hv [...] crypt.org
Download (untitled) / with headers
text/plain 1011b
"Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: :A regression was introduced into 5.10.0 concerning /g and zero-width :patterns. The demo speaks for itself: : :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" :abc :bc :c : :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" :abc With -Dr the first difference between the two cases is that the failing case discovers optimizations "stclass NSPACE plus minlen 0" whereas the other finds only "minlen 0". The failure case searches a second time for a match starting from offset 0, finds the /\S+/ as before, and then bails with: Match possible, but length=0 is smaller than requested=1, failing! Contradicts stclass... [regexec_flags] Match failed instead of advancing to offset 1 and trying again. I'll look further to see if there's an opportunity to improve stclass support to cope with the case, or whether we must instead revert to either not detecting or throwing away the stclass in this scenario. Hugo
CC: perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 17 Aug 2009 16:36:08 +0200
To: hv [...] crypt.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 1.2k
2009/8/16 <hv@crypt.org>: Show quoted text
> "Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: > :A regression was introduced into 5.10.0 concerning /g and zero-width > :patterns. The demo speaks for itself: > : > :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > :abc > :bc > :c > : > :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > :abc > > With -Dr the first difference between the two cases is that the failing > case discovers optimizations "stclass NSPACE plus minlen 0" whereas the > other finds only "minlen 0". > > The failure case searches a second time for a match starting from offset 0, > finds the /\S+/ as before, and then bails with: >  Match possible, but length=0 is smaller than requested=1, failing! >  Contradicts stclass... [regexec_flags] >  Match failed > instead of advancing to offset 1 and trying again. > > I'll look further to see if there's an opportunity to improve stclass > support to cope with the case, or whether we must instead revert to > either not detecting or throwing away the stclass in this scenario.
This rings a bell for me. Is this really a regression or is it a bug fix? Why do we assume the first is correct? cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 17 Aug 2009 16:56:01 +0100
To: demerphq <demerphq [...] gmail.com>
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 1.8k
On Mon, Aug 17, 2009 at 04:36:08PM +0200, demerphq wrote: Show quoted text
> 2009/8/16 <hv@crypt.org>:
> > "Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: > > :A regression was introduced into 5.10.0 concerning /g and zero-width > > :patterns. The demo speaks for itself: > > : > > :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > :abc > > :bc > > :c > > : > > :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > :abc > > > > With -Dr the first difference between the two cases is that the failing > > case discovers optimizations "stclass NSPACE plus minlen 0" whereas the > > other finds only "minlen 0". > > > > The failure case searches a second time for a match starting from offset 0, > > finds the /\S+/ as before, and then bails with: > >  Match possible, but length=0 is smaller than requested=1, failing! > >  Contradicts stclass... [regexec_flags] > >  Match failed > > instead of advancing to offset 1 and trying again. > > > > I'll look further to see if there's an opportunity to improve stclass > > support to cope with the case, or whether we must instead revert to > > either not detecting or throwing away the stclass in this scenario.
> > This rings a bell for me. Is this really a regression or is it a bug fix? > > Why do we assume the first is correct?
Well, I'd expect a) a look-ahead assertion not to affect pos (so apart from causing additional possible match failures, it should kind of be invisible b) which means that the first match should leave pos unaffected, c) which following the 'if pos unchanged bump it by 1 to avoid infinite loops' rule, means that I would expect the pattern to be tried against <abc >, <bc >, <c >, < > in turn (and fail on the fourth match). -- The Enterprise is involved in a bizarre time-warp experience which is in some way unconnected with the Late 20th Century. -- Things That Never Happen in "Star Trek" #14
CC: hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 17 Aug 2009 18:11:25 +0200
To: demerphq <demerphq [...] gmail.com>
From: Eirik Berg Hanssen <Eirik-Berg.Hanssen [...] allverden.no>
Download (untitled) / with headers
text/plain 1.1k
demerphq <demerphq@gmail.com> writes: Show quoted text
> 2009/8/16 <hv@crypt.org>:
>> "Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: >> :A regression was introduced into 5.10.0 concerning /g and zero-width >> :patterns. The demo speaks for itself: >> : >> :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" >> :abc >> :bc >> :c >> : >> :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" >> :abc
Show quoted text
> This rings a bell for me. Is this really a regression or is it a bug fix? > > Why do we assume the first is correct?
Because, since the match is zero-length, we expect[1] pos() to be advanced by 1 before trying to match again. Instead, it seems to be advanced by the length of the captures or something ... sidhekin@blackbox[18:07:50]~$ 588perl -le '$x=" abc xyz "; print pos $x, " $1 ", length($&) while $x =~ /(?=(\S+))/g' 1 abc 0 2 bc 0 3 c 0 5 xyz 0 6 yz 0 7 z 0 sidhekin@blackbox[18:07:56]~$ 5100perl -le '$x=" abc xyz "; print pos $x, " $1 ", length($&) while $x =~ /(?=(\S+))/g' 1 abc 0 5 xyz 0 sidhekin@blackbox[18:07:58]~$ [1]: Disclaimer: Our expectations may well be the problem. :-\ Eirik -- Machine. Unexpectedly, I'd invented a time - Alan Moore
CC: hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 17 Aug 2009 12:37:45 -0400
To: demerphq <demerphq [...] gmail.com>
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 699b
On Mon, Aug 17, 2009 at 10:36 AM, demerphq <demerphq@gmail.com> wrote: Show quoted text
> > "Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: > > :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > :abc > > :bc > > :c > > : > > :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > :abc
>
Show quoted text
> This rings a bell for me. Is this really a regression or is it a bug fix? > > Why do we assume the first is correct? >
Well something is broken. With 5.10.0 (debian lenny): $ perl -wle'print for "abc " =~ /(?=(\S+))/g' abc $ perl -wle'print for "abc " =~ /(?{})(?=(\S+))/g' abc bc c $ perl -wle'print for "abc " =~ /x*(?=(\S+))/g' abc bc c These should all return the same.
CC: demerphq <demerphq [...] gmail.com>, hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 17 Aug 2009 22:55:09 +0200
To: Eric Brine <ikegami [...] adaelis.com>
From: Abigail <abigail [...] abigail.be>
Download (untitled) / with headers
text/plain 1.1k
On Mon, Aug 17, 2009 at 12:37:45PM -0400, Eric Brine wrote: Show quoted text
> On Mon, Aug 17, 2009 at 10:36 AM, demerphq <demerphq@gmail.com> wrote: >
> > > "Eric Brine" (via RT) <perlbug-followup@perl.org> wrote: > > > :>c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > > :abc > > > :bc > > > :c > > > : > > > :>c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g" > > > :abc
> >
>
> > This rings a bell for me. Is this really a regression or is it a bug fix? > > > > Why do we assume the first is correct? > >
> > Well something is broken. With 5.10.0 (debian lenny): > > $ perl -wle'print for "abc " =~ /(?=(\S+))/g' > abc > > $ perl -wle'print for "abc " =~ /(?{})(?=(\S+))/g' > abc > bc > c > > $ perl -wle'print for "abc " =~ /x*(?=(\S+))/g' > abc > bc > c > > These should all return the same.
Something related (broken only in 5.10, fixed in 5.10.1-RC): $ /opt/perl/5.8.8/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' abc bc c $ /opt/perl/5.10.0/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' $ $ /opt/perl/5.10.1-RC1/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' abc bc c Not sure what blead does - the last pull I did didn't build. Abigail
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 836b
On Sun Aug 16 01:15:58 2009, ikegami@adaelis.com wrote: Show quoted text
> A regression was introduced into 5.10.0 concerning /g and zero-width > patterns. The demo speaks for itself: >
> >c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
> abc > bc > c >
> >c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
> abc
This was broken by commit 07be1b83a6b2d24b492356181ddf70e1c7917ae3, which extended stclass optimisations to (?=). I tried following the code paths for + and for {1,} (which are meant to be identical, but only {1,} was working). I noticed they diverged as a result of + having PREGf_SKIP set. So I fixed it by not setting PREGf_SKIP if the + is inside a (?=). I really don’t understand this code, and would much appreciate any feedback as to whether this fix (or ‘fix’) will break anything else.
Download patch.text
text/plain 2.6k
From: Father Chrysostomos <sprout@cpan.org> [perl #68564] /g failure with zero-width patterns To make "abc " =~ /(?=(\S+))/g work again (it should return "abc", "bc", "c" on each successive match, respectively; it was returning just "abc"), this commit disables the PREGf_SKIP optimisation on a PLUS if it is inside an IFMATCH (?=). diff --git a/pod/perldelta.pod b/pod/perldelta.pod index b478273..bb1af44 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -698,6 +698,13 @@ Stringifying a scalar containing -0.0 no longer has the affect of turning false into true L<[perl #45133]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=45133>. +=item * + +Using a regular expression lookhead assertion containing a subpattern +quantified with a C<+> inside a global pattern (i.e., +C<"abc " =~ /(?=(\S+))/g>) now works again, as it did in 5.8.x +L<[perl #68564]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=68564>. + =back =head1 Known Problems diff --git a/regcomp.c b/regcomp.c index d6f3523..c3867e9 100644 --- a/regcomp.c +++ b/regcomp.c @@ -4595,6 +4595,7 @@ reStudy: I32 last_close = 0; /* pointed to by data */ regnode *first= scan; regnode *first_next= regnext(first); + bool lookahead = FALSE; /* * Skip introductions and multiplicators >= 1 @@ -4611,7 +4612,7 @@ reStudy: /* An OR of *one* alternative - should not happen now. */ (OP(first) == BRANCH && OP(first_next) != BRANCH) || /* for now we can't handle lookbehind IFMATCH*/ - (OP(first) == IFMATCH && !first->flags) || + (OP(first) == IFMATCH && !first->flags && (lookahead = 1)) || (OP(first) == PLUS) || (OP(first) == MINMOD) || /* An {n,m} with n>0 */ @@ -4699,7 +4700,8 @@ reStudy: goto again; } if (sawplus && (!sawopen || !RExC_sawback) - && !(RExC_seen & REG_SEEN_EVAL)) /* May examine pos and $& */ + && !(RExC_seen & REG_SEEN_EVAL) + && !lookahead) /* May examine pos and $& */ /* x+ must match at the 1st pos of run of x's */ r->intflags |= PREGf_SKIP; diff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t index 33b6f7c..cf176ec 100644 --- a/t/re/pat_rt_report.t +++ b/t/re/pat_rt_report.t @@ -21,7 +21,7 @@ BEGIN { } -plan tests => 2510; # Update this when adding/deleting tests. +plan tests => 2511; # Update this when adding/deleting tests. run_tests() unless caller; @@ -1181,6 +1181,14 @@ sub run_tests { iseq($first, $second); } + + { + local $BugId = 68564; # minimal CURLYM limited to 32767 matches + iseq + join("-", "abc " =~ /(?=(\S+))/g), + "abc-bc-b", + "[perl #68564] stclass optimisation does not break + inside (?=)"; + } } # End of sub run_tests 1;
CC: Eric Brine <ikegami [...] adaelis.com>, demerphq <demerphq [...] gmail.com>, hv [...] crypt.org, perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 11 Oct 2010 15:39:21 +0100
To: Abigail <abigail [...] abigail.be>, Ævar Arnfjörð Bjarmason <avar [...] cpan.org>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 2.4k
On Mon, Aug 17, 2009 at 10:55:09PM +0200, Abigail wrote: Show quoted text
> Something related (broken only in 5.10, fixed in 5.10.1-RC): > > $ /opt/perl/5.8.8/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' > abc > bc > c > $ /opt/perl/5.10.0/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' > $ > $ /opt/perl/5.10.1-RC1/bin/perl -wle 'print for "abc" =~ /(?=)(?=(\S+))/g' > abc > bc > c > > > Not sure what blead does - the last pull I did didn't build.
I bisected with #!/bin/sh git clean -dxf touch .patchnum touch .sha1 touch unpushed.h # If you can use ccache, add -Dcc=ccache\ gcc -Dld=gcc to the Configure line # if Encode is not needed for the test, you can speed up the bisect by # excluding it from the runs with -Dnoextensions=Encode sh Configure -des -Dusedevel -Uusethreads -Doptimize="-g" -Dcc=ccache\ gcc -Dld=gcc -Dnoextensions=IPC/SysV\ Encode\ DB_File test -f config.sh || exit 125 # Correct makefile for newer GNU gcc perl -ni -we 'print unless /<(?:built-in|command)/' makefile x2p/makefile # if you just need miniperl, replace test_prep with miniperl make -j5 miniperl [ -x ./miniperl ] || exit 125 ./miniperl -wle 'push @a, $_ for "abc" =~ /(?=)(?=(\S+))/g; exit !!@a' ret=$? [ $ret -gt 127 ] && ret=127 git clean -dxf exit $ret and found that this was fixed by commit 89c6a13e141e02cc3af670ab47c1d41ac4e81ba0 Author: Ævar Arnfjörð Bjarmason <avar@cpan.org> Date: Thu Apr 10 00:38:52 2008 +0000 Re: [perl #52672] regexp failure: (?=) turns into OPFAIL From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80804091738r15d37763lf900d59f8bcc5e81@mail.gmail.com> p4raw-id: //depot/perl@33667 diff --git a/regcomp.c b/regcomp.c index 05b54ee..07d5535 100644 --- a/regcomp.c +++ b/regcomp.c @@ -5691,6 +5691,8 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) RExC_seen |= REG_SEEN_LOOKBEHIND; RExC_parse++; case '=': /* (?=...) */ + RExC_seen_zerolen++; + break; case '!': /* (?!...) */ RExC_seen_zerolen++; if (*RExC_parse == ')') { diff --git a/t/op/re_tests b/t/op/re_tests index 3df3745..6894458 100644 --- a/t/op/re_tests +++ b/t/op/re_tests @@ -414,6 +414,7 @@ a[-]?c ac y $& ac '(abc)\1'i ABCABC y $1 ABC '([a-c]*)\1'i ABCABC y $1 ABC a(?!b). abad y $& ad +(?=)a a y $& a a(?=d). abad y $& ad a(?=c|d). abad y $& ad a(?:b|c|d)(.) ace y $1 e I don't know if that test is equivalent to your code. Should your code be added as an additional test? Nicholas Clark
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 11 Oct 2010 09:08:02 -0600
To: perlbug-followup [...] perl.org
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 954b
Father Chrysostomos via RT wrote: Show quoted text
> On Sun Aug 16 01:15:58 2009, ikegami@adaelis.com wrote:
>> A regression was introduced into 5.10.0 concerning /g and zero-width >> patterns. The demo speaks for itself: >>
>>> c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
>> abc >> bc >> c >>
>>> c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
>> abc
> > This was broken by commit 07be1b83a6b2d24b492356181ddf70e1c7917ae3, > which extended stclass optimisations to (?=). > > I tried following the code paths for + and for {1,} (which are meant to > be identical, but only {1,} was working). I noticed they diverged as a > result of + having PREGf_SKIP set. > > So I fixed it by not setting PREGf_SKIP if the + is inside a (?=). > > I really don’t understand this code, and would much appreciate any > feedback as to whether this fix (or ‘fix’) will break anything else. > >
I'm afraid I don't know this code either.
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 11 Oct 2010 21:23:20 -0400
To: Abigail <abigail [...] abigail.be>, Ævar Arnfjörð <avar [...] cpan.org>, Eric Brine <ikegami [...] adaelis.com>, demerphq <demerphq [...] gmail.com>, hv [...] crypt.org, perl5-porters [...] perl.org
From: Eric Brine <ikegami [...] adaelis.com>

Message body is not shown because sender requested not to inline it.

Download (untitled) / with headers
text/plain 282b
On Mon, Oct 11, 2010 at 10:39 AM, Nicholas Clark <nick@ccl4.org> wrote:
Show quoted text
I don't know if that test is equivalent to your code.
Should your code be added as an additional test?

I don't see why not. Especially since it still fails for blead.

Patch with test attached.

- Eric

Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Mon, 11 Oct 2010 21:44:50 -0400
To: Abigail <abigail [...] abigail.be>, Ævar Arnfjörð <avar [...] cpan.org>, Eric Brine <ikegami [...] adaelis.com>, demerphq <demerphq [...] gmail.com>, hv [...] crypt.org, perl5-porters [...] perl.org
From: Eric Brine <ikegami [...] adaelis.com>

Message body is not shown because sender requested not to inline it.

Download (untitled) / with headers
text/plain 436b
Download (untitled) / with headers
text/html 1023b
On Mon, Oct 11, 2010 at 9:23 PM, Eric Brine <ikegami@adaelis.com> wrote:
Show quoted text
On Mon, Oct 11, 2010 at 10:39 AM, Nicholas Clark <nick@ccl4.org> wrote:
I don't know if that test is equivalent to your code.
Should your code be added as an additional test?

I don't see why not. Especially since it still fails for blead.

Patch with test attached.

- Eric


Forgot to increment number of tests. Updated patch attached.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #68564] /g failure with zero-width patterns
Date: Wed, 3 Nov 2010 10:41:07 +0100
To: perlbug-followup [...] perl.org
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 3.2k
On 10 October 2010 23:27, Father Chrysostomos via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Sun Aug 16 01:15:58 2009, ikegami@adaelis.com wrote:
>> A regression was introduced into 5.10.0 concerning /g and zero-width >> patterns. The demo speaks for itself: >>
>> >c:\progs\perl589\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
>> abc >> bc >> c >>
>> >c:\progs\perl5100\bin\perl -wle"print for 'abc ' =~ /(?=(\S+))/g"
>> abc
> > This was broken by commit 07be1b83a6b2d24b492356181ddf70e1c7917ae3, > which extended stclass optimisations to (?=). > > I tried following the code paths for + and for {1,} (which are meant to > be identical, but only {1,} was working). I noticed they diverged as a > result of + having PREGf_SKIP set. > > So I fixed it by not setting PREGf_SKIP if the + is inside a (?=). > > I really don’t understand this code, and would much appreciate any > feedback as to whether this fix (or ‘fix’) will break anything else.
I have looked into this more deeply and applied a modified version of your patch, which IMO was more or less exactly correct. Here is the commit message I wrote: commit e7f38d0fe17e7a846c0ed55e71ebb120a336b887 Author: Yves Orton <demerphq@gmail.com> Date: Wed Nov 3 10:23:00 2010 +0100 fix 68564: /g failure with zero-width patterns This is based on a patch by Father Chrysostomos <sprout@cpan.org> The start class optimisation has two modes, "try every valid start position" (doevery) and "flip flop mode" (!doevery) where it trys only the first valid start position in a sequence. Consider /(\d+)X/ and the string "123456Y", now we know that if we fail to match X after matching "123456" then we will also fail to match after "23456" (assuming no evil tricks are in place, which disable the optimisation anyway), so we know we can skip forward until the check /fails/ and only then start looking for a real match. This is flip-flop mode. Now consider the case with zero-width lookahead under /g: /(?=(\d+)X)/. In this case we have an additional failure mode, that is failure when we match a zero-width string twice at the same pos(). So now, the "flip-flop" logic breaks as it /is/ possible that we could match at "23456" when we couldn't match at "123456" because of the zero-length twice at the same pos() rule. For instance: print $1 for "123"=~/(?=(\d+))/g should first match "123". Since $& is zero length, pos() is not incremented. We then match again, successfully, except that the match is rejected despite technical-success because its $& is also zero length and pos() has not advanced. If the flip-flop mode is enabled we wont retry until we find a failing character first. The point here is that it makes perfect sense to disable the "flip-flop" mode optimisation when the start class is inside a lookahead as it really doesnt apply. IMO your patch was quite right, although I had to dig fairly deep to understand why. Thanks for the patch. BTW, I am a bit curious if there are any other flaws in the flip-flop logic. I tried reasonably hard to make it fail without the zero-width lookahead, and was unable to find a failure case, but I still kinda feel like there might be some interesting edge case Cheers, yves -- perl -Mre=debug -e "/just|another|perl|hacker/"


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org