Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local() in embedded code in regex not working as expected #15056

Closed
p5pRT opened this issue Nov 20, 2015 · 7 comments
Closed

local() in embedded code in regex not working as expected #15056

p5pRT opened this issue Nov 20, 2015 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 20, 2015

Migrated from rt.perl.org#126697 (status was 'resolved')

Searchable as RT126697$

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 2015

From @mauke

Created by @mauke

#!/usr/bin/perl
use strict;
use warnings;

my $str = 'ABC';

my @​words;
{
  my @​accum;
  our $count;

  $str =~ m{
  \A
  (?{ $count = 0; })
  (?​:
  ( AB | A | BC )
  (?{
  local $accum[@​accum] = $1;
  local $count = $count + 1;
  print "! count=$count; accum=[@​accum]; pos=${\pos}\n";
  })
  )*
  \z
  (?{
  print "!!! count=$count; accum=[@​accum]\n";
  @​words = @​accum;
  })
  }x
  or die;
}

print "words=[@​words]\n";
__END__

This program tries to match the string "ABC" as a sequence of segments, each of
which is either "AB" or "A" or "BC", while also recording which segments were
used.

For debugging purposes it also temporarily records the number of segments used
in $count.

According to the description in perlre, (?{ local $foo = ... }) should set $foo
for the current backtracking attempt; i.e. backtracking over the (?{ }) group
should restore the old value of the variable.

Therefore I expect the correct output of this to be​:
! count=1; accum=[AB]; pos=2
! count=1; accum=[A]; pos=1
! count=2; accum=[A BC]; pos=3
!!! count=2; accum=[AB A BC]
words=[A BC]

However, the actual output I get is​:
! count=1; accum=[AB]; pos=2
! count=2; accum=[AB A]; pos=1
! count=3; accum=[AB A BC]; pos=3
!!! count=3; accum=[AB A BC]
words=[AB A BC]

That is, despite pos() resetting from 2 to 1, the old values of $count and
@​accum are not restored.

I get this unexpected output from 5.22.0, 5.23.5, 5.20.1, 5.18.2, 5.16.3, and
5.14.4.

5.6.2, however, says​:
! count=1; accum=[AB]; pos=2
Use of uninitialized value in join or string at (re_eval 2) line 4.
! count=1; accum=[ A]; pos=1
Use of uninitialized value in join or string at (re_eval 2) line 4.
! count=2; accum=[ A BC]; pos=3
Use of uninitialized value in join or string at (re_eval 3) line 2.
!!! count=2; accum=[ A BC]
Use of uninitialized value in join or string at try.pl line 32.
words=[ A BC]

This is almost correct. It properly resets $count and $accum[0]; it just
doesn't shrink the array down after undoing the assignment to $accum[0]. But
that seems to be a general issue in 5.6​: 'my @​foo; { local $foo[0] = "A"; }'
keeps @​foo == 1 and $foo[0] = undef.

(I wasn't able to test with 5.8 because 'perlbrew install perl-5.8.9' fails at
../ext/Errno/t/Errno...............................FAILED--Further testing stopped​: No errno's are exported
which I believe is a general problem with gcc 5.)

Is this indeed a bug in (at least) 5.14+ or am I missing something?

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.22.0:

Configured by mauke at Sat Jul  4 14:56:57 CEST 2015.

Summary of my perl5 (revision 5 version 22 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=4.0.1-1-arch, archname=i686-linux
    uname='linux simplicio 4.0.1-1-arch #1 smp preempt wed apr 29 12:15:20 cest 2015 i686 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='5.1.0', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12, longdblkind=3
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/5.1.0/include-fixed /usr/lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.21'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'



@INC for perl 5.22.0:
    /home/mauke/usr/lib/perl5/site_perl/5.22.0/i686-linux
    /home/mauke/usr/lib/perl5/site_perl/5.22.0
    /home/mauke/usr/lib/perl5/5.22.0/i686-linux
    /home/mauke/usr/lib/perl5/5.22.0
    .


Environment for perl 5.22.0:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_BASHRC_VERSION=0.69
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_ROOT=/home/mauke/perl5/perlbrew
    PERL_BADLANG (unset)
    PERL_UNICODE=SAL
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2017

From @hvds

On Fri, 20 Nov 2015 11​:59​:43 -0800, mauke- wrote​:

This program tries to match the string "ABC" as a sequence of segments, each of
which is either "AB" or "A" or "BC", while also recording which segments were
used.

For debugging purposes it also temporarily records the number of segments used
in $count.

According to the description in perlre, (?{ local $foo = ... }) should set $foo
for the current backtracking attempt; i.e. backtracking over the (?{ }) group
should restore the old value of the variable.

Therefore I expect the correct output of this to be​:
! count=1; accum=[AB]; pos=2
! count=1; accum=[A]; pos=1
! count=2; accum=[A BC]; pos=3
!!! count=2; accum=[AB A BC]
words=[A BC]

However, the actual output I get is​:
! count=1; accum=[AB]; pos=2
! count=2; accum=[AB A]; pos=1
! count=3; accum=[AB A BC]; pos=3
!!! count=3; accum=[AB A BC]
words=[AB A BC]
[...]
Is this indeed a bug in (at least) 5.14+ or am I missing something?

I don't know if it is a bug, but it appears to be TRIE-specific - if I modify the core capture from '( AB | A | BC )' to '( AB | [AZ] | BC )' to avoid the trie optimization, I get your expected output​:

! count=1; accum=[AB]; pos=2
! count=1; accum=[A]; pos=1
! count=2; accum=[A BC]; pos=3
!!! count=2; accum=[A BC]
words=[A BC]

We generally expect that optimizations may alter the course or predictability of backtracking behaviour, but I think it's entirely possible that this is simply a bug. From the -Dr output, I suspect here may be the signal that we've bypassed whatever triggers the unwinding of localization on backtracking​:
  | 2| TRIE matched word #2, continuing
  | 2| only one match left, short-circuiting​: #2 <A>

If that's true, we might see similar effects for things like capture buffers that follow a trie.

Yves/DaveM, any clue on this?

Hugo

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 14, 2017

From @iabyn

On Tue, Feb 07, 2017 at 01​:40​:32PM -0800, Hugo van der Sanden via RT wrote​:

On Fri, 20 Nov 2015 11​:59​:43 -0800, mauke- wrote​:

This program tries to match the string "ABC" as a sequence of segments, each of
which is either "AB" or "A" or "BC", while also recording which segments were
used.

For debugging purposes it also temporarily records the number of segments used
in $count.

According to the description in perlre, (?{ local $foo = ... }) should set $foo
for the current backtracking attempt; i.e. backtracking over the (?{ }) group
should restore the old value of the variable.

Therefore I expect the correct output of this to be​:
! count=1; accum=[AB]; pos=2
! count=1; accum=[A]; pos=1
! count=2; accum=[A BC]; pos=3
!!! count=2; accum=[AB A BC]
words=[A BC]

However, the actual output I get is​:
! count=1; accum=[AB]; pos=2
! count=2; accum=[AB A]; pos=1
! count=3; accum=[AB A BC]; pos=3
!!! count=3; accum=[AB A BC]
words=[AB A BC]
[...]
Is this indeed a bug in (at least) 5.14+ or am I missing something?

I don't know if it is a bug, but it appears to be TRIE-specific - if I modify the core capture from '( AB | A | BC )' to '( AB | [AZ] | BC )' to avoid the trie optimization, I get your expected output​:

! count=1; accum=[AB]; pos=2
! count=1; accum=[A]; pos=1
! count=2; accum=[A BC]; pos=3
!!! count=2; accum=[A BC]
words=[A BC]

We generally expect that optimizations may alter the course or predictability of backtracking behaviour, but I think it's entirely possible that this is simply a bug. From the -Dr output, I suspect here may be the signal that we've bypassed whatever triggers the unwinding of localization on backtracking​:
| 2| TRIE matched word #2, continuing
| 2| only one match left, short-circuiting​: #2 <A>

If that's true, we might see similar effects for things like capture buffers that follow a trie.

Yves/DaveM, any clue on this?

Fixed by the commit below, which is part of a lager branch I've just
merged that fixes up various things related to (?{...}), WHILEM and scope;
and also makes some things faster, e.g. this is now 25% faster​:

  $s = ("a" x 1000);
  $s =~ /^(?​:(.)(.))*?[XY]/ for 1..10_000;

Here are the commits in the merge​:

  ca81c32 [MERGE] regex (?{...}) and WHILEM scope fixups
  7758414 S_regmatch​: eliminate WHILEM_A_min paren saving
  bb414e1 S_regmatch​: eliminate WHILEM_B paren saving
  cbb658a Add a comment on why TRIE.jump does a UNWIND_PAREN
  4ee1652 clear savestack on (?{...}) failure and backtrack
  f4197b8 -Mre=Debug,ALL​: indicate regex state stack pushes
  4b9c7ca fix pad/scope issue in re_evals

and here is the commit which fixes this ticket​:

commit 4ee1652
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Tue Feb 14 15​:59​:57 2017 +0000
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Tue Feb 14 17​:49​:58 2017 +0000

  clear savestack on (?{...}) failure and backtrack
 
  RT #126697
 
  In a regex, after executing a (?{...}) code block, if we fail and
  backtrack over the codeblock, we're supposed to unwind the savestack, so
  that for any example any local()s within the code block are undone.
 
  It turns out that a backtracking state isn't pushed for (?{...}), only
  for postponed evals ( i.e. (??{...})). This means that it relies on one
  of the earlier backtracking states to clear the savestack on its behalf.
  This can't always be relied upon, and the ticket above contains code where
  this falls down; in particular​:
 
  'ABC' =~ m{
  \A
  (?​:
  (?​: AB | A | BC )
  (?{
  local $count = $count + 1;
  print "! count=$count; ; pos=${\pos}\n";
  })
  )*
  \z
  }x
 
  Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next
  doesn't, since there's nothing it would be responsible for that needs
  cleaning up.
 
  The solution to this is to push a backtrack state for every (?{...}) as
  well as every (??{...}). The sole job of that state is to do a
  LEAVE_SCOPE(ST.lastcp).
 
  The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB
  to make it clear it's only used on postponed /(??{A})B/ regexes, and a new
  state has been added, EVAL_B, which is only called when backtracking after
  failing something in the B in /(?{...})B/.

--
My get-up-and-go just got up and went.

@p5pRT
Copy link
Author

p5pRT commented Feb 15, 2017

@iabyn - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant