Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexec.c stack overflow #15669

Closed
p5pRT opened this issue Oct 17, 2016 · 22 comments
Closed

regexec.c stack overflow #15669

p5pRT opened this issue Oct 17, 2016 · 22 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 17, 2016

Migrated from rt.perl.org#129903 (status was 'open')

Searchable as RT129903$

@p5pRT
Copy link
Author

p5pRT commented Oct 17, 2016

From @geeknik

I'm not convinced that this is an actual bug, but #p5p was silent when I
asked about it. Affects Perl back to 5.20.2 including v5.25.6
(v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on
forever.

perl -e '/(?{m}(0)},s\/\/\/})//0'

==6615==ERROR​: AddressSanitizer​: stack-overflow on address 0x7ffc78f88ca8
(pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0)
  #0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b)
  #1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c​:442​:18
  #2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c​:3128​:9
  #3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c​:2981​:10
  #4 0x7f4483 in Perl_runops_debug /root/perl/dump.c​:2246​:23
  #5 0xb984b6 in S_regmatch /root/perl/regexec.c​:6888​:3
  #6 0xb7337c in S_regtry /root/perl/regexec.c​:3622​:14
...
...
  #247 0xb4b82b in Perl_regexec_flags /root/perl/regexec.c​:3489​:7
  #248 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c​:2981​:10
  #249 0x7f4483 in Perl_runops_debug /root/perl/dump.c​:2246​:23
  #250 0xb984b6 in S_regmatch /root/perl/regexec.c​:6888​:3
  #251 0xb7337c in S_regtry /root/perl/regexec.c​:3622​:14

SUMMARY​: AddressSanitizer​: stack-overflow ??​:0 calloc
==6615==ABORTING

==19424== Stack overflow in thread 1​: can't grow stack to 0xffe801f90
==19424==
==19424== Process terminating with default action of signal 11 (SIGSEGV)
==19424== Access not within mapped region at address 0xFFE801F90
==19424== at 0x5B4137​: S_regtry (regexec.c​:3578)
==19424== If you believe this happened as a result of a stack
==19424== overflow in your program's main thread (unlikely but
==19424== possible), you can try to increase the size of the
==19424== main thread stack using the --main-stacksize= flag.
==19424== The main thread stack size used in this run was 8388608.
==19424== Stack overflow in thread 1​: can't grow stack to 0xffe801f88
==19424==
==19424== Process terminating with default action of signal 11 (SIGSEGV)
==19424== Access not within mapped region at address 0xFFE801F88
==19424== at 0x4A236C0​: _vgnU_freeres (vg_preloaded.c​:58)
==19424== If you believe this happened as a result of a stack
==19424== overflow in your program's main thread (unlikely but
==19424== possible), you can try to increase the size of the
==19424== main thread stack using the --main-stacksize= flag.
==19424== The main thread stack size used in this run was 8388608.
Segmentation fault

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @cpansprout

On Mon Oct 17 10​:26​:19 2016, brian.carpenter@​gmail.com wrote​:

I'm not convinced that this is an actual bug,

I think it is.

but #p5p was silent when I
asked about it. Affects Perl back to 5.20.2 including v5.25.6
(v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on
forever.

perl -e '/(?{m}(0)},s\/\/\/})//0'

That is nonsensical code.
$ perl5.18.3 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault​: 11
$ perl5.14.4 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <-- HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

==6615==ERROR​: AddressSanitizer​: stack-overflow on address 0x7ffc78f88ca8
(pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0)
#0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b)
#1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c​:442​:18
#2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c​:3128​:9
#3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c​:2981​:10
#4 0x7f4483 in Perl_runops_debug /root/perl/dump.c​:2246​:23
#5 0xb984b6 in S_regmatch /root/perl/regexec.c​:6888​:3
#6 0xb7337c in S_regtry /root/perl/regexec.c​:3622​:14

It should not even be getting that far. It should fail at compile time.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @dcollinsn

$ perl5.14.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.18.0 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.16.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.17.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.17.5 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.17.3 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.17.1 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl Porting/bisect.pl --start=v5.17.0 --end=v5.17.1 --crash -- ./perl
-Ilib -e '/(?{m}(0)},s\/\/\/})//0'
68e2671 is the first bad commit
commit 68e2671
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Aug 25 11​:41​:49 2011 +0100

  Mostly complete fix for literal /(?{..})/ blocks

  Change the way that code blocks in patterns are parsed and executed,
  especially as regards lexical and scoping behaviour.

  (Note that this fix only applies to literal code blocks appearing within
  patterns​: run-time patterns, and literals within qr//, are still done
the
  old broken way for now).

  This change means that for literal /(?{..})/ and /(??{..})/​:

  * the code block is now fully parsed in the same pass as the surrounding
  code, which means that the compiler no longer just does a simplistic
  count of balancing {} to find the limits of the code block;
  i.e. stuff like /(?{ $x = "{" })/ now works (in the same way
  that subscripts in double quoted strings always have​: "$a{'{'}" )

  * Error and warning messages will now appear to emanate from the main
body
  rather than an re_eval; e.g. the output from

  #!/usr/bin/perl
  /(?{ warn "boo" })/

  has changed from

  boo at (re_eval 1) line 1.

  to

  boo at /tmp/p line 2.

  * scope and closures now behave as you might expect; for example

  for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ }

  now prints "abc" rather than ""

  * with recursion, it now finds the lexical within the appropriate depth
  of pad​: this code now prints "012" rather than "000"​:

  sub recurse {
  my ($n) = @​_;
  return if $n > 2;
  "" =~ /^(?{print $n})/;
  recurse($n+1);
  }
  recurse(0);

  * an earlier fix that stopped 'my' declarations within code blocks
causing
  crashes, required the accumulating of two SAVECOMPPADs on the stack
for
  each iteration of the code block; this is no longer needed;

  * UNITCHECK blocks within literal code blocks are now run as part of the
  main body of code (run-time code blocks still trigger an immediate
  call to the UNITCHECK block though)

  This is all achieved by building upon the efforts of the commits which
led
  up to this; those altered the parser to parse literal code blocks
  directly, but up until now those code blocks were discarded by
  Perl_pmruntime and the block re-compiled using the original re_eval
  mechanism. As of this commit, for the non-qr and non-runtime variants,
  those code blocks are no longer thrown away. Instead​:

  * the LISTOP generated by the parser, which contains all the code
  blocks plus OP_CONSTs that collectively make up the literal pattern,
  is now stored in a new field in PMOPs, called op_code_list. For
example
  in /A(?{BLOCK})C/, the listop stored in op_code_list looks like

  LIST
  PUSHMARK
  CONST['A']
  NULL/special (aka a DO block)
  BLOCK
  CONST['(?{BLOCK})']
  CONST['B']

  * each of the code blocks has its last op set to null and is
individually
  run through the peephole optimiser, so each one becomes a little
  self-contained block of code, rather than a list of blocks that run
into
  each other;

  * then in re_op_compile(), we concatenate the list of CONSTs to produce
a
  string to be compiled, but at the same time we note any DO blocks and
  note the start and end positions of the corresponding
CONST['(?{BLOCK})'];

  * (if the current regex engine isn't the built-in perl one, then we just
  throw away the code blocks and pass the concatenated string to the
engine)

  * then during regex compilation, whenever we encounter a '(?{', we see
if
  it matches the index of one of the pre-compiled blocks, and if so, we
  store a pointer to that block in an 'l' data slot, and use the end
index
  to skip over the text of the code body. Conversely, if the index
doesn't
  match, then we know that it's a run-time pattern and (for now),
compile
  it in the old way.

  * During execution, when an EVAL op is encountered, if data->what is
'l',
  then we just use the pad that was in effect when the pattern was
called;
  i.e. we use the current pad slot of the currently executing CV that
the
  pattern is embedded within.

:100644 100644 75ee7e7bae9366e089980b8a7d346198645cdfe9
8f2fa766c6cc393d413d59b55ad6f896ec5c7010 M dump.c
:100644 100644 5cc98874ae03eae36d4207071ed587d51e80ebed
4d82c7cc5797c4a97f2251f4f1487aa090efda99 M op.c
:100644 100644 6aa16f5725db2ae0a6a01b0d5b34b6446e8b5ebf
f267da2c9c752b765151eca4c7009230c97b82ed M op.h
:040000 040000 3454e32e87872db5fef604d816c7ca37dc487166
f7b4543ded1cbaf299521b31150d1c38e5b11509 M pod
:100644 100644 394502b314a65eea36bca2f52e31f50048f0060d
b45122c1a79df44e6094c1001fc7b3456543b740 M regcomp.c
:100644 100644 0fdb0058b8b5fce8763f7d0da05b187d460838b6
a9da0c97e12a8038ef0bc49870737fe62a7c01e6 M regcomp.h
:100644 100644 bb845a79216c447b2fa7ec08cee46602b05d3461
f384c4d32c11fe1917e7e8fbcd998b60dc1355c3 M regexec.c
:040000 040000 a2c61b299bd88c2308af219684388ae3deea371d
4ae03f25b1ebcee1518e0dc6ccc7ae23b1433f61 M t
bisect run success
That took 772 seconds.

--
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @demerphq

On 18 October 2016 at 03​:17, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Oct 17 10​:26​:19 2016, brian.carpenter@​gmail.com wrote​:

I'm not convinced that this is an actual bug,

I think it is.

but #p5p was silent when I
asked about it. Affects Perl back to 5.20.2 including v5.25.6
(v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on
forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

With some editing I read the original code as being the same as

(m/(?{
  m!(0)!,
  s!!!
})/) / 0

Which is strange but IMO valid code.

$ perl5.18.3 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault​: 11
$ perl5.14.4 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <-- HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

I am not sure why. Is it because you expect the first } to close the
(?{ , while I dont? The inside of the (?{ ... }) is code, and should
follow nesting rules, so m}(0)} should be parsed and extracted as a
balanced construct, just as it would if it were q}foo}, and then the
s/// gets parsed out, to find the trailing }) which finish the (?{})
construct.

I would say the 5.14.4 behavior is a misparse.

If this is legal​:

perl -le'print q}foo}'
foo

then I would expect this​:

m}(0)}

to be legal too, and i would expect it to be legal inside of a (?{ ...
}) construct in a pattern.

==6615==ERROR​: AddressSanitizer​: stack-overflow on address 0x7ffc78f88ca8
(pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0)
#0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b)
#1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c​:442​:18
#2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c​:3128​:9
#3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c​:2981​:10
#4 0x7f4483 in Perl_runops_debug /root/perl/dump.c​:2246​:23
#5 0xb984b6 in S_regmatch /root/perl/regexec.c​:6888​:3
#6 0xb7337c in S_regtry /root/perl/regexec.c​:3622​:14

It should not even be getting that far. It should fail at compile time.

Looking forward to hearing more on why.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @demerphq

On 18 October 2016 at 10​:00, demerphq <demerphq@​gmail.com> wrote​:

On 18 October 2016 at 03​:17, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Oct 17 10​:26​:19 2016, brian.carpenter@​gmail.com wrote​:

I'm not convinced that this is an actual bug,

I think it is.

but #p5p was silent when I
asked about it. Affects Perl back to 5.20.2 including v5.25.6
(v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on
forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

With some editing I read the original code as being the same as

(m/(?{
m!(0)!,
s!!!
})/) / 0

Which is strange but IMO valid code.

$ perl5.18.3 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault​: 11
$ perl5.14.4 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <-- HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

I am not sure why. Is it because you expect the first } to close the
(?{ , while I dont? The inside of the (?{ ... }) is code, and should
follow nesting rules, so m}(0)} should be parsed and extracted as a
balanced construct, just as it would if it were q}foo}, and then the
s/// gets parsed out, to find the trailing }) which finish the (?{})
construct.

I would say the 5.14.4 behavior is a misparse.

If this is legal​:

perl -le'print q}foo}'
foo

then I would expect this​:

m}(0)}

to be legal too, and i would expect it to be legal inside of a (?{ ...
}) construct in a pattern.

==6615==ERROR​: AddressSanitizer​: stack-overflow on address 0x7ffc78f88ca8
(pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0)
#0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b)
#1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c​:442​:18
#2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c​:3128​:9
#3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c​:2981​:10
#4 0x7f4483 in Perl_runops_debug /root/perl/dump.c​:2246​:23
#5 0xb984b6 in S_regmatch /root/perl/regexec.c​:6888​:3
#6 0xb7337c in S_regtry /root/perl/regexec.c​:3622​:14

It should not even be getting that far. It should fail at compile time.

Looking forward to hearing more on why.

I dont think the simplified version of the code should infinite loop
on the stack.

For this code (because $_ is the empty string) we should perform an
outer match once, then a match using qr/(0)/ twice, once via the
"empty pattern special case" in the s///, both of which should fail,
and then the outer match should stop. We should not loop. Apparently
the problem is related to the empty-pattern special case behavior. If
I change the pattern from

m/(?{ m!(0)!, s!!! })/

to

m/(?{ m!(0)!, s!(0)!! })/

which should be equivalent valgrind is happy and we get what I expect,
a division by zero error​:

$ ./perl -le'(m/(?{ m!(0)!, s!(0)!! })/) / 0'
Illegal division by zero at -e line 1.

$ ./perl -le'(m/(?{ m!(0)!, s!!! })/) / 0'
Segmentation fault

And here is the valgrind output​:

$ valgrind ./perl -le'(m/(?{ m!(0)!, s!!! })/) / 0'
==25240== Memcheck, a memory error detector
==25240== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==25240== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==25240== Command​: ./perl -le(m/(?{\ m!(0)!,\ s!!!\ })/)\ /\ 0
==25240==
==25240== Stack overflow in thread 1​: can't grow stack to 0xffe801f88
==25240==
==25240== Process terminating with default action of signal 11 (SIGSEGV)
==25240== Access not within mapped region at address 0xFFE801F88
==25240== at 0x5D01B0​: Perl_sv_2pv_flags (sv.c​:2939)
==25240== If you believe this happened as a result of a stack
==25240== overflow in your program's main thread (unlikely but
==25240== possible), you can try to increase the size of the
==25240== main thread stack using the --main-stacksize= flag.
==25240== The main thread stack size used in this run was 8388608.
==25240== Stack overflow in thread 1​: can't grow stack to 0xffe801f68
==25240==
==25240== Process terminating with default action of signal 11 (SIGSEGV)
==25240== Access not within mapped region at address 0xFFE801F68
==25240== at 0x4A256B0​: _vgnU_freeres (in
/usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==25240== If you believe this happened as a result of a stack
==25240== overflow in your program's main thread (unlikely but
==25240== possible), you can try to increase the size of the
==25240== main thread stack using the --main-stacksize= flag.
==25240== The main thread stack size used in this run was 8388608.
==25240==
==25240== HEAP SUMMARY​:
==25240== in use at exit​: 7,449,285 bytes in 11,675 blocks
==25240== total heap usage​: 11,825 allocs, 150 frees, 8,150,685
bytes allocated
==25240==
==25240== LEAK SUMMARY​:
==25240== definitely lost​: 0 bytes in 0 blocks
==25240== indirectly lost​: 0 bytes in 0 blocks
==25240== possibly lost​: 0 bytes in 0 blocks
==25240== still reachable​: 7,449,285 bytes in 11,675 blocks
==25240== suppressed​: 0 bytes in 0 blocks
==25240== Rerun with --leak-check=full to see details of leaked memory
==25240==
==25240== For counts of detected and suppressed errors, rerun with​: -v
==25240== ERROR SUMMARY​: 0 errors from 0 contexts (suppressed​: 0 from 0)
Segmentation fault

This is definitely a bug.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @demerphq

On 18 October 2016 at 10​:17, demerphq <demerphq@​gmail.com> wrote​:

... Apparently
the problem is related to the empty-pattern special case behavior.
[...]

This is definitely a bug.

And yet another example of the empty pattern special case causing
problems. Sigh, so much trouble for so little value.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @demerphq

On 18 October 2016 at 10​:21, demerphq <demerphq@​gmail.com> wrote​:

On 18 October 2016 at 10​:17, demerphq <demerphq@​gmail.com> wrote​:

... Apparently
the problem is related to the empty-pattern special case behavior.
[...]

This is definitely a bug.

And yet another example of the empty pattern special case causing
problems. Sigh, so much trouble for so little value.

Actually this case can be boiled down to​:

./perl -le'/(?{ s!!! })/'

which causes an infinite loop due to PL_curpm being set at the start
of the match, which the s!!! then sees as the "last successful match".
IMO it shouldn't as the outer match hasn't actually finished yet, but
because of when and how we set up PL_curpm at match start we have this
problem inside of (?{ }) constructs.

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @hvds

On Tue Oct 18 02​:30​:01 2016, demerphq wrote​:

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

And do what? Even if they're the same, it still might also legitimately be the last successful match, no?

It seems to me we can't offer the promised behaviour unless there's a variable that really is set only at the point we complete a successful match.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @demerphq

On 18 October 2016 at 12​:03, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 02​:30​:01 2016, demerphq wrote​:

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

And do what? Even if they're the same, it still might also legitimately be the last successful match, no?

No I don't think so, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use
when we are setting up PL_curpm before executing a (?{...}) or (??{
... }) code block so that things like $1 and stuff like that point at
the current pattern. Once the code is executed PL_curpm is unwound to
no longer point at PL_reg_curpm. Essentially the problem comes down to
PL_curpm being the place where things like $1 and friends are derived,
as well as how we find the pattern for the last successful match, but
inside of a (?{ ... }) we want $1 to point at the *current* pattern,
not the *last* successfully matched one. This conflict currently
causes use of the empty pattern in a code block to point at the
pattern that contains the code, and trigger infinite recursion. So to
fix this I think we would have to split PL_curpm in two, and have one
var for $1 and friends representing "the last successful matched
pattern or when in a regex code construct the currently executing
pattern", and another for "last successful matched pattern" which
would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern
inside of a regex code block and have a much simpler patch. :-)

Yves

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @cpansprout

On Tue Oct 18 01​:01​:21 2016, demerphq wrote​:

On 18 October 2016 at 03​:17, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Oct 17 10​:26​:19 2016, brian.carpenter@​gmail.com wrote​:

I'm not convinced that this is an actual bug,

I think it is.

but #p5p was silent when I
asked about it. Affects Perl back to 5.20.2 including v5.25.6
(v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just
goes on
forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

I do not know what happened, but the line that I quoted was​:

perl -e '/(?{m}(0)},s\/\/\/})//0'

not​:

perl -e '/0'

With some editing I read the original code as being the same as

(m/(?{
m!(0)!,
s!!!
})/) / 0

I see now that I misread the code. the } for the m}} delimiter was what threw me off.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @cpansprout

On Tue Oct 18 04​:08​:30 2016, demerphq wrote​:

On 18 October 2016 at 12​:03, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 02​:30​:01 2016, demerphq wrote​:

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

And do what? Even if they're the same, it still might also
legitimately be the last successful match, no?

No I don't think so, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use
when we are setting up PL_curpm before executing a (?{...}) or (??{
... }) code block so that things like $1 and stuff like that point at
the current pattern. Once the code is executed PL_curpm is unwound to
no longer point at PL_reg_curpm. Essentially the problem comes down to
PL_curpm being the place where things like $1 and friends are derived,
as well as how we find the pattern for the last successful match, but
inside of a (?{ ... }) we want $1 to point at the *current* pattern,
not the *last* successfully matched one. This conflict currently
causes use of the empty pattern in a code block to point at the
pattern that contains the code, and trigger infinite recursion. So to
fix this I think we would have to split PL_curpm in two, and have one
var for $1 and friends representing "the last successful matched
pattern or when in a regex code construct the currently executing
pattern", and another for "last successful matched pattern" which
would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern
inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2016

From @demerphq

On 18 Oct 2016 23​:46, "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 04​:08​:30 2016, demerphq wrote​:

On 18 October 2016 at 12​:03, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 02​:30​:01 2016, demerphq wrote​:

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

And do what? Even if they're the same, it still might also
legitimately be the last successful match, no?

No I don't think so, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use
when we are setting up PL_curpm before executing a (?{...}) or (??{
... }) code block so that things like $1 and stuff like that point at
the current pattern. Once the code is executed PL_curpm is unwound to
no longer point at PL_reg_curpm. Essentially the problem comes down to
PL_curpm being the place where things like $1 and friends are derived,
as well as how we find the pattern for the last successful match, but
inside of a (?{ ... }) we want $1 to point at the *current* pattern,
not the *last* successfully matched one. This conflict currently
causes use of the empty pattern in a code block to point at the
pattern that contains the code, and trigger infinite recursion. So to
fix this I think we would have to split PL_curpm in two, and have one
var for $1 and friends representing "the last successful matched
pattern or when in a regex code construct the currently executing
pattern", and another for "last successful matched pattern" which
would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern
inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Yves

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2016

From @demerphq

On 19 October 2016 at 09​:22, demerphq <demerphq@​gmail.com> wrote​:

On 18 Oct 2016 23​:46, "Father Chrysostomos via RT"
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 04​:08​:30 2016, demerphq wrote​:

On 18 October 2016 at 12​:03, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 18 02​:30​:01 2016, demerphq wrote​:

I think we can fix it by checking PL_curpm against PL_reg_curpm when
we do the empty pattern check; running tests now.

And do what? Even if they're the same, it still might also
legitimately be the last successful match, no?

No I don't think so, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use
when we are setting up PL_curpm before executing a (?{...}) or (??{
... }) code block so that things like $1 and stuff like that point at
the current pattern. Once the code is executed PL_curpm is unwound to
no longer point at PL_reg_curpm. Essentially the problem comes down to
PL_curpm being the place where things like $1 and friends are derived,
as well as how we find the pattern for the last successful match, but
inside of a (?{ ... }) we want $1 to point at the *current* pattern,
not the *last* successfully matched one. This conflict currently
causes use of the empty pattern in a code block to point at the
pattern that contains the code, and trigger infinite recursion. So to
fix this I think we would have to split PL_curpm in two, and have one
var for $1 and friends representing "the last successful matched
pattern or when in a regex code construct the currently executing
pattern", and another for "last successful matched pattern" which
would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern
inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

BTW, I pushed a patch to forbid use of the empty pattern inside of
regex code blocks.

Once I understand what else you worry about with PL_curpm I will do a
patch to make it work properly.

But I figure a run time catchable fatal exception is better than a C
stack overflow until we can make it work properly.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2016

From @cpansprout

On Wed Oct 19 00​:23​:19 2016, demerphq wrote​:

On 18 Oct 2016 23​:46, "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Having PL_curpm point to the actual pattern in the op tree results in different recursion levels sharing the same set of match variables​:

$u = ",echle etn sJ";
$t = "\nrka rPrhoatu";
$_ = $u.$t;
sub foo { s/(.)//s or return; bar(); print chop $$1 }
sub bar { s/(.)//s or return; foo(); print chop $$1 }
foo

(I thought I had reported this before, but I cannot find the ticket now.)

Using a qr// works.

Similarly, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

PL_curpm is serving three purposes​: the last match op, the last set of matched buffers ($1 etc.), and the innermost pattern enclosing the currently executing code.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2016

From @demerphq

On 19 October 2016 at 22​:51, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Wed Oct 19 00​:23​:19 2016, demerphq wrote​:

On 18 Oct 2016 23​:46, "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Having PL_curpm point to the actual pattern in the op tree results in different recursion levels sharing the same set of match variables​:

$u = ",echle etn sJ";
$t = "\nrka rPrhoatu";
$_ = $u.$t;
sub foo { s/(.)//s or return; bar(); print chop $$1 }
sub bar { s/(.)//s or return; foo(); print chop $$1 }
foo

(I thought I had reported this before, but I cannot find the ticket now.)

Wow. It took me a while to work that through. :-)

I see what you mean. That is going to be a lot harder to fix than the
other issues with PL_curpm IMO.

Using a qr// works.

Similarly, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

PL_curpm is serving three purposes​: the last match op, the last set of matched buffers ($1 etc.), and the innermost pattern enclosing the currently executing code.

Yes, I see what you mean. I think I might have put it differently, but
i get your point now for sure.

Yves

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

From @demerphq

On 19 October 2016 at 23​:29, demerphq <demerphq@​gmail.com> wrote​:

On 19 October 2016 at 22​:51, Father Chrysostomos via RT

Similarly, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it
possible to use the empty pattern in regexen, but honestly I dont see
how it is very useful, it is all too easy to create an infinite loop
(which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print'
xa
Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch, but
if you prefer it I can push it.

FWIW, this seems to be yet another good reason to get rid of the empty pattern.

Yves

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

From @demerphq

On 20 October 2016 at 09​:37, demerphq <demerphq@​gmail.com> wrote​:

On 19 October 2016 at 23​:29, demerphq <demerphq@​gmail.com> wrote​:

On 19 October 2016 at 22​:51, Father Chrysostomos via RT

Similarly, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it
possible to use the empty pattern in regexen, but honestly I dont see
how it is very useful, it is all too easy to create an infinite loop
(which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print'
xa
Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch, but
if you prefer it I can push it.

FWIW, this seems to be yet another good reason to get rid of the empty pattern.

Oh, here is another reason​:

./perl -le'$_="ab"; my $p=qr/(?{ s!!x! })/; /$p/; print; /a/; /$p/;
print; /b/; /$p/; print; //;'
xab
xxb
xxx
Infinite recursion via empty pattern at -e line 1.

Crazy.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

From @cpansprout

On Thu Oct 20 00​:37​:45 2016, demerphq wrote​:

On 19 October 2016 at 23​:29, demerphq <demerphq@​gmail.com> wrote​:
So following up for this I put together a quick hack to make it
possible to use the empty pattern in regexen, but honestly I dont see
how it is very useful, it is all too easy to create an infinite loop
(which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print'
xa
Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch, but
if you prefer it I can push it.

I would prefer it, but I do not feel strongly about it.

FWIW, this seems to be yet another good reason to get rid of the empty
pattern.

I am still opposed to that, unfortunately. :-)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

From @cpansprout

On Thu Oct 20 00​:42​:10 2016, demerphq wrote​:

On 20 October 2016 at 09​:37, demerphq <demerphq@​gmail.com> wrote​:

On 19 October 2016 at 23​:29, demerphq <demerphq@​gmail.com> wrote​:

On 19 October 2016 at 22​:51, Father Chrysostomos via RT

Similarly, having m|(?{ // })| try to use the *current* pattern
instead of the last pattern to match successfully—solely due to an
implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it
possible to use the empty pattern in regexen, but honestly I dont see
how it is very useful, it is all too easy to create an infinite loop
(which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print'
xa
Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch, but
if you prefer it I can push it.

FWIW, this seems to be yet another good reason to get rid of the
empty pattern.

Oh, here is another reason​:

./perl -le'$_="ab"; my $p=qr/(?{ s!!x! })/; /$p/; print; /a/; /$p/;
print; /b/; /$p/; print; //;'
xab
xxb
xxx
Infinite recursion via empty pattern at -e line 1.

Crazy.

I would agree that the *Perl* code there is crazy and the author of it gets what he deserves.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 1, 2016

From @demerphq

On 20 October 2016 at 22​:12, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Thu Oct 20 00​:37​:45 2016, demerphq wrote​:

On 19 October 2016 at 23​:29, demerphq <demerphq@​gmail.com> wrote​:
So following up for this I put together a quick hack to make it
possible to use the empty pattern in regexen, but honestly I dont see
how it is very useful, it is all too easy to create an infinite loop
(which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print'
xa
Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch, but
if you prefer it I can push it.

I would prefer it, but I do not feel strongly about it.

Well I just push a patch to meet your preference.

FWIW, this seems to be yet another good reason to get rid of the empty
pattern.

I am still opposed to that, unfortunately. :-)

We can debate that one (again) in another thread. :-)

Yves

@hvds
Copy link
Contributor

hvds commented Jan 23, 2020

Looks like the final fix for this was 5585e75 (20 Oct 2016), closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants