Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture lookup inside regex is not threadsafe #6436

Open
p6rt opened this issue Aug 9, 2017 · 5 comments
Open

Capture lookup inside regex is not threadsafe #6436

p6rt opened this issue Aug 9, 2017 · 5 comments
Labels
conc regex Regular expressions, pattern matching, user-defined grammars, tokens and rules testneeded

Comments

@p6rt
Copy link

p6rt commented Aug 9, 2017

Migrated from rt.perl.org#131870 (status was 'open')

Searchable as RT131870$

@p6rt
Copy link
Author

p6rt commented Aug 9, 2017

From @smls

When the same regex is used concurrently from multiple `start` threads,
looking up already matched captures like $0 or $<foo> from an embedded
code block *inside* that regex is unreliable​:

  ➜ await do for ^10 -> $i { start { "A".match​: /(“A”) { say "BOOM​:
$i) ", $0 if not $0 }/ } }
  BOOM​: 2) Nil
  BOOM​: 4) Nil
  BOOM​: 2) 「A」
  BOOM​: 3) 「A」
  BOOM​: 6) Nil

That example shouldn't print anything, because `$0` should always be a
truthy `Match` object after the `(“A”)`.

But in practice, it demonstrates the incorrect behavior shown above on
most runs​: *Some* iterations (different ones each time) seem to see `$0`
as `Nil` or as a falsy `Match`.

Additional failure modes that occur more rarely​:
- "Type check failed in binding to parameter '<anon>';" error
- hanging indefinitely

Increase the `^10` to `^1000` or so to guarantee failure.


Bisect finds​:

  https://gist.github.com/c04a40f14712ce5a67dc171f87d3bcdb
  rakudo/rakudo@08e39ee

However​:

‎ <‎Zoffix‎>‎ It's not a regression tho
‎ ‎<‎Zoffix‎>‎ Or maybe it is.
‎ ‎<‎Zoffix‎>‎ It's just before that commit, it looks like the $/ was
shared around, so it wasn't necesarily "not $0"
‎ ‎<‎Zoffix‎>‎ c​: 08e39ee2653b1ae^ await ^100 .map​: -> $i { $/ = 42;
start { "$i".match​: /(\d+)/; if $i != +$0 { say "Boom​: $i $0" } } }
‎ ‎<‎committable6‎>‎ Zoffix,
https://gist.github.com/557c6716c35afeee06cd90a3daf22c47
‎ ‎<‎Zoffix‎>‎ Yup.


This is Rakudo version 2017.07-136-gda4a0f50a built on MoarVM version
2017.07-318-g604da4d0
implementing Perl 6.c.

@p6rt
Copy link
Author

p6rt commented Aug 9, 2017

From @smls

Also notable is that multiple iterations somehow see the same value for `$i` (as observed in the output listing above).

I've sumbitted a separate issue for that (RT #​131871), because the Capture-lookup bug of the current issue occurs even when removing the `$i`.

@p6rt
Copy link
Author

p6rt commented Aug 9, 2017

From @smls

Heh, on further thought, they may be the same issue after all.

In `/(...) { ... $i ... $0 ... }`, both the $i and the $0 (or $/ rather) are outer lexicals from the point of view of the curly block, right?

So it might be a general problem with the way that *code blocks inside regexes* access outer lexicals.

(It doesn't happen with code blocks *outside* of regexes, and as the other ticket demonstrates, also doesn't happen inside the regex-portion of regexes.)

@p6rt
Copy link
Author

p6rt commented Mar 11, 2018

From @dogbert17

On Wed, 09 Aug 2017 13​:32​:58 -0700, smls75@​gmail.com wrote​:

Heh, on further thought, they may be the same issue after all.

In `/(...) { ... $i ... $0 ... }`, both the $i and the $0 (or $/
rather) are outer lexicals from the point of view of the curly block,
right?

So it might be a general problem with the way that *code blocks inside
regexes* access outer lexicals.

(It doesn't happen with code blocks *outside* of regexes, and as the
other ticket demonstrates, also doesn't happen inside the regex-
portion of regexes.)

It was initially broken with commit (2016-08-03) rakudo/rakudo@08e39ee
and fixed with the better-sched merge, i.e.
7c18112c59d20413b82356e5c48b38d8a66fc7ea
c285b489c6629ccdf0c4cb11d2d695b9ef1f890c
7fcab1067de4757bfdf2fdd1c66893ce4ab06e1b
89b9ac7830bdc195cb303f5241641e0dbe0ebbde
683037be698d0bdc21b3c23588085b2d076d7a0a
b5605c2dd6d361b705a59136c8ad641f245a5da5
c50d35a90e66346157b31cd92643c2a64e801c24
de311f46a98f13a5b0211d2585fbd9b17ce1bf2c
340d8ed3bb4b45af85708771bea396cf862a7330
3b98fb9e396d040a8cb2c32d23cee54a5e88f878
596611c8fdc3baf119bc94a8ea30efc0a12cf673
80b49320cf854ac68a17cdd216575ee26e380325
61a77e60a7d936415503d8916fcc7546569e9135

@p6rt
Copy link
Author

p6rt commented Mar 11, 2018

The RT System itself - Status changed from 'new' to 'open'

@p6rt p6rt added conc regex Regular expressions, pattern matching, user-defined grammars, tokens and rules testneeded labels Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conc regex Regular expressions, pattern matching, user-defined grammars, tokens and rules testneeded
Projects
None yet
Development

No branches or pull requests

1 participant