Skip Menu |
Report information
Id: 126697
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: mauke- <l.mai [at] web.de>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: low
Type: core
Perl Version: 5.22.0
Fixed In: 5.25.10



Subject: local() in embedded code in regex not working as expected
To: perlbug [...] perl.org
Date: Fri, 20 Nov 2015 20:59:20 +0100
From: l.mai [...] web.de
Download (untitled) / with headers
text/plain 5.3k
This is a bug report for perl from l.mai@web.de, generated with the help of perlbug 1.40 running under perl 5.22.0. ----------------------------------------------------------------- [Please describe your issue here] #!/usr/bin/perl use strict; use warnings; my $str = 'ABC'; my @words; { my @accum; our $count; $str =~ m{ \A (?{ $count = 0; }) (?: ( AB | A | BC ) (?{ local $accum[@accum] = $1; local $count = $count + 1; print "! count=$count; accum=[@accum]; pos=${\pos}\n"; }) )* \z (?{ print "!!! count=$count; accum=[@accum]\n"; @words = @accum; }) }x or die; } print "words=[@words]\n"; __END__ This program tries to match the string "ABC" as a sequence of segments, each of which is either "AB" or "A" or "BC", while also recording which segments were used. For debugging purposes it also temporarily records the number of segments used in $count. According to the description in perlre, (?{ local $foo = ... }) should set $foo for the current backtracking attempt; i.e. backtracking over the (?{ }) group should restore the old value of the variable. Therefore I expect the correct output of this to be: ! count=1; accum=[AB]; pos=2 ! count=1; accum=[A]; pos=1 ! count=2; accum=[A BC]; pos=3 !!! count=2; accum=[AB A BC] words=[A BC] However, the actual output I get is: ! count=1; accum=[AB]; pos=2 ! count=2; accum=[AB A]; pos=1 ! count=3; accum=[AB A BC]; pos=3 !!! count=3; accum=[AB A BC] words=[AB A BC] That is, despite pos() resetting from 2 to 1, the old values of $count and @accum are not restored. I get this unexpected output from 5.22.0, 5.23.5, 5.20.1, 5.18.2, 5.16.3, and 5.14.4. 5.6.2, however, says: ! count=1; accum=[AB]; pos=2 Use of uninitialized value in join or string at (re_eval 2) line 4. ! count=1; accum=[ A]; pos=1 Use of uninitialized value in join or string at (re_eval 2) line 4. ! count=2; accum=[ A BC]; pos=3 Use of uninitialized value in join or string at (re_eval 3) line 2. !!! count=2; accum=[ A BC] Use of uninitialized value in join or string at try.pl line 32. words=[ A BC] This is almost correct. It properly resets $count and $accum[0]; it just doesn't shrink the array down after undoing the assignment to $accum[0]. But that seems to be a general issue in 5.6: 'my @foo; { local $foo[0] = "A"; }' keeps @foo == 1 and $foo[0] = undef. (I wasn't able to test with 5.8 because 'perlbrew install perl-5.8.9' fails at ../ext/Errno/t/Errno...............................FAILED--Further testing stopped: No errno's are exported which I believe is a general problem with gcc 5.) Is this indeed a bug in (at least) 5.14+ or am I missing something? [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.22.0: Configured by mauke at Sat Jul 4 14:56:57 CEST 2015. Summary of my perl5 (revision 5 version 22 subversion 0) configuration: Platform: osname=linux, osvers=4.0.1-1-arch, archname=i686-linux uname='linux simplicio 4.0.1-1-arch #1 smp preempt wed apr 29 12:15:20 cest 2015 i686 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='5.1.0', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12, longdblkind=3 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags ='-fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/5.1.0/include-fixed /usr/lib /lib libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.21' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' --- @INC for perl 5.22.0: /home/mauke/usr/lib/perl5/site_perl/5.22.0/i686-linux /home/mauke/usr/lib/perl5/site_perl/5.22.0 /home/mauke/usr/lib/perl5/5.22.0/i686-linux /home/mauke/usr/lib/perl5/5.22.0 . --- Environment for perl 5.22.0: HOME=/home/mauke LANG=en_US.UTF-8 LANGUAGE (unset) LC_COLLATE=POSIX LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl PERLBREW_BASHRC_VERSION=0.69 PERLBREW_HOME=/home/mauke/.perlbrew PERLBREW_ROOT=/home/mauke/perl5/perlbrew PERL_BADLANG (unset) PERL_UNICODE=SAL SHELL=/bin/bash
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.8k
On Fri, 20 Nov 2015 11:59:43 -0800, mauke- wrote: Show quoted text
> This program tries to match the string "ABC" as a sequence of segments, each of > which is either "AB" or "A" or "BC", while also recording which segments were > used. > > For debugging purposes it also temporarily records the number of segments used > in $count. > > According to the description in perlre, (?{ local $foo = ... }) should set $foo > for the current backtracking attempt; i.e. backtracking over the (?{ }) group > should restore the old value of the variable. > > Therefore I expect the correct output of this to be: > ! count=1; accum=[AB]; pos=2 > ! count=1; accum=[A]; pos=1 > ! count=2; accum=[A BC]; pos=3 > !!! count=2; accum=[AB A BC] > words=[A BC] > > However, the actual output I get is: > ! count=1; accum=[AB]; pos=2 > ! count=2; accum=[AB A]; pos=1 > ! count=3; accum=[AB A BC]; pos=3 > !!! count=3; accum=[AB A BC] > words=[AB A BC]
[...] Show quoted text
> Is this indeed a bug in (at least) 5.14+ or am I missing something?
I don't know if it is a bug, but it appears to be TRIE-specific - if I modify the core capture from '( AB | A | BC )' to '( AB | [AZ] | BC )' to avoid the trie optimization, I get your expected output: ! count=1; accum=[AB]; pos=2 ! count=1; accum=[A]; pos=1 ! count=2; accum=[A BC]; pos=3 !!! count=2; accum=[A BC] words=[A BC] We generally expect that optimizations may alter the course or predictability of backtracking behaviour, but I think it's entirely possible that this is simply a bug. From the -Dr output, I suspect here may be the signal that we've bypassed whatever triggers the unwinding of localization on backtracking: | 2| TRIE matched word #2, continuing | 2| only one match left, short-circuiting: #2 <A> If that's true, we might see similar effects for things like capture buffers that follow a trie. Yves/DaveM, any clue on this? Hugo
To: Hugo van der Sanden via RT <perlbug-followup [...] perl.org>
From: Dave Mitchell <davem [...] iabyn.com>
Subject: Re: [perl #126697] local() in embedded code in regex not working as expected
Date: Tue, 14 Feb 2017 18:15:33 +0000
CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 4.5k
On Tue, Feb 07, 2017 at 01:40:32PM -0800, Hugo van der Sanden via RT wrote: Show quoted text
> On Fri, 20 Nov 2015 11:59:43 -0800, mauke- wrote:
> > This program tries to match the string "ABC" as a sequence of segments, each of > > which is either "AB" or "A" or "BC", while also recording which segments were > > used. > > > > For debugging purposes it also temporarily records the number of segments used > > in $count. > > > > According to the description in perlre, (?{ local $foo = ... }) should set $foo > > for the current backtracking attempt; i.e. backtracking over the (?{ }) group > > should restore the old value of the variable. > > > > Therefore I expect the correct output of this to be: > > ! count=1; accum=[AB]; pos=2 > > ! count=1; accum=[A]; pos=1 > > ! count=2; accum=[A BC]; pos=3 > > !!! count=2; accum=[AB A BC] > > words=[A BC] > > > > However, the actual output I get is: > > ! count=1; accum=[AB]; pos=2 > > ! count=2; accum=[AB A]; pos=1 > > ! count=3; accum=[AB A BC]; pos=3 > > !!! count=3; accum=[AB A BC] > > words=[AB A BC]
> [...]
> > Is this indeed a bug in (at least) 5.14+ or am I missing something?
> > I don't know if it is a bug, but it appears to be TRIE-specific - if I modify the core capture from '( AB | A | BC )' to '( AB | [AZ] | BC )' to avoid the trie optimization, I get your expected output: > > ! count=1; accum=[AB]; pos=2 > ! count=1; accum=[A]; pos=1 > ! count=2; accum=[A BC]; pos=3 > !!! count=2; accum=[A BC] > words=[A BC] > > We generally expect that optimizations may alter the course or predictability of backtracking behaviour, but I think it's entirely possible that this is simply a bug. From the -Dr output, I suspect here may be the signal that we've bypassed whatever triggers the unwinding of localization on backtracking: > | 2| TRIE matched word #2, continuing > | 2| only one match left, short-circuiting: #2 <A> > > If that's true, we might see similar effects for things like capture buffers that follow a trie. > > Yves/DaveM, any clue on this?
Fixed by the commit below, which is part of a lager branch I've just merged that fixes up various things related to (?{...}), WHILEM and scope; and also makes some things faster, e.g. this is now 25% faster: $s = ("a" x 1000); $s =~ /^(?:(.)(.))*?[XY]/ for 1..10_000; Here are the commits in the merge: ca81c32 [MERGE] regex (?{...}) and WHILEM scope fixups 7758414 S_regmatch: eliminate WHILEM_A_min paren saving bb414e1 S_regmatch: eliminate WHILEM_B paren saving cbb658a Add a comment on why TRIE.jump does a UNWIND_PAREN 4ee1652 clear savestack on (?{...}) failure and backtrack f4197b8 -Mre=Debug,ALL: indicate regex state stack pushes 4b9c7ca fix pad/scope issue in re_evals and here is the commit which fixes this ticket: commit 4ee16520199a0e11bf4dbdbac71f0a64e5510271 Author: David Mitchell <davem@iabyn.com> AuthorDate: Tue Feb 14 15:59:57 2017 +0000 Commit: David Mitchell <davem@iabyn.com> CommitDate: Tue Feb 14 17:49:58 2017 +0000 clear savestack on (?{...}) failure and backtrack RT #126697 In a regex, after executing a (?{...}) code block, if we fail and backtrack over the codeblock, we're supposed to unwind the savestack, so that for any example any local()s within the code block are undone. It turns out that a backtracking state isn't pushed for (?{...}), only for postponed evals ( i.e. (??{...})). This means that it relies on one of the earlier backtracking states to clear the savestack on its behalf. This can't always be relied upon, and the ticket above contains code where this falls down; in particular: 'ABC' =~ m{ \A (?: (?: AB | A | BC ) (?{ local $count = $count + 1; print "! count=$count; ; pos=${\pos}\n"; }) )* \z }x Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next doesn't, since there's nothing it would be responsible for that needs cleaning up. The solution to this is to push a backtrack state for every (?{...}) as well as every (??{...}). The sole job of that state is to do a LEAVE_SCOPE(ST.lastcp). The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB to make it clear it's only used on postponed /(??{A})B/ regexes, and a new state has been added, EVAL_B, which is only called when backtracking after failing something in the B in /(?{...})B/. -- My get-up-and-go just got up and went.
Download (untitled) / with headers
text/plain 313b
Thank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been resolved. Perl 5.26.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.26.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org