Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal array interpolation in regex doesn't match as expected. #4779

Open
p6rt opened this issue Nov 23, 2015 · 4 comments
Open

Literal array interpolation in regex doesn't match as expected. #4779

p6rt opened this issue Nov 23, 2015 · 4 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Nov 23, 2015

Migrated from rt.perl.org#126713 (status was 'new')

Searchable as RT126713$

@p6rt
Copy link
Author

p6rt commented Nov 23, 2015

From @peschwa

Consider the following two regexen and their matching output, or lack thereof​:

11​:39 < psch> m​: say "abcd" ~~ /^(a | b | bc | cd)*?$/; my @​a = < a b bc cd >; say "abcd" ~~ /^(@​a)*?$/
11​:39 <+camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 => 「cd」␤Nil␤»

S05 says​:

An interpolated array​:

  / @​cmds /

is matched as if it were an alternation of its literal elements. Ordinarily it matches using junctive semantics​:

  / [ $(@​cmds[0]) | $(@​cmds[1]) | $(@​cmds[2]) | ... ] /

Taking this literal still matches when written as the long form​:

11​:51 <psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^( $(@​a[0]) | $(@​a[1]) | $(@​a[2]) | $(@​a[3]) )*?$/
11​:51 <camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 => 「cd」␤»

With the S05 quote I'd expect the array to interpolate into any of the two other regexen and produce the same match.

@p6rt
Copy link
Author

p6rt commented Dec 20, 2015

From @peschwa

On Mon Nov 23 03​:52​:55 2015, peschwa@​gmail.com wrote​:

Consider the following two regexen and their matching output, or lack
thereof​:

11​:39 < psch> m​: say "abcd" ~~ /^(a | b | bc | cd)*?$/; my @​a = < a b
bc cd >; say "abcd" ~~ /^(@​a)*?$/
11​:39 <+camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
「cd」␤Nil␤»

S05 says​:

An interpolated array​:

/ @​cmds /

is matched as if it were an alternation of its literal elements.
Ordinarily it matches using junctive semantics​:

/ [ $(@​cmds[0]) | $(@​cmds[1]) | $(@​cmds[2]) | ... ] /

Taking this literal still matches when written as the long form​:

11​:51 <psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^( $(@​a[0]) |
$(@​a[1]) | $(@​a[2]) | $(@​a[3]) )*?$/
11​:51 <camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
「cd」␤»

With the S05 quote I'd expect the array to interpolate into any of the
two other regexen and produce the same match.

03​:39 < Juerd> psch​: In bug reports I try to read significance in every character. So given ^(@​a)*?$ I wonder why it's anchored, and why there's a ? after the *
03​:39 < psch> the ? after * allows backtracking
03​:39 < Juerd> Yes, but I'm trying to figure out why that was needed for the bug to trigger
03​:40 < Juerd> And if it wasn't, then it's not golfed enough yet :)
03​:40 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /(@​a)*/
03​:40 <+camelia> rakudo-moar 091ee7​: OUTPUT«「abc」␤ 0 => 「a」␤ 0 => 「bc」␤»
03​:40 < psch> right, the report could've been more golfed i guess
03​:41 < Juerd> And it might have been clearer if the two attempts were not in a single m​:
03​:41 < psch> i'm not confident enough to predict the right behavior i'd say. quoting the synopsis alludes to that
03​:41 < psch> yes, that's definitely true
03​:41 < Juerd> 'cause it took me, well, a few minutes, to realise that I should have been looking at that Nil in the output, not the rest.

@p6rt
Copy link
Author

p6rt commented Jul 11, 2016

From @peschwa

On Sat Dec 19 19​:44​:53 2015, peschwa@​gmail.com wrote​:

On Mon Nov 23 03​:52​:55 2015, peschwa@​gmail.com wrote​:

Consider the following two regexen and their matching output, or lack
thereof​:

11​:39 < psch> m​: say "abcd" ~~ /^(a | b | bc | cd)*?$/; my @​a = < a b
bc cd >; say "abcd" ~~ /^(@​a)*?$/
11​:39 <+camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0
=>
「cd」␤Nil␤»

S05 says​:

An interpolated array​:

/ @​cmds /

is matched as if it were an alternation of its literal elements.
Ordinarily it matches using junctive semantics​:

/ [ $(@​cmds[0]) | $(@​cmds[1]) | $(@​cmds[2]) | ... ] /

Taking this literal still matches when written as the long form​:

11​:51 <psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^( $(@​a[0]) |
$(@​a[1]) | $(@​a[2]) | $(@​a[3]) )*?$/
11​:51 <camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
「cd」␤»

With the S05 quote I'd expect the array to interpolate into any of
the
two other regexen and produce the same match.

03​:39 < Juerd> psch​: In bug reports I try to read significance in
every character. So given ^(@​a)*?$ I wonder why it's anchored, and
why there's a ? after the *
03​:39 < psch> the ? after * allows backtracking
03​:39 < Juerd> Yes, but I'm trying to figure out why that was needed
for the bug to trigger
03​:40 < Juerd> And if it wasn't, then it's not golfed enough yet :)
03​:40 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /(@​a)*/
03​:40 <+camelia> rakudo-moar 091ee7​: OUTPUT«「abc」␤ 0 => 「a」␤ 0 =>
「bc」␤»
03​:40 < psch> right, the report could've been more golfed i guess
03​:41 < Juerd> And it might have been clearer if the two attempts were
not in a single m​:
03​:41 < psch> i'm not confident enough to predict the right behavior
i'd say. quoting the synopsis alludes to that
03​:41 < psch> yes, that's definitely true
03​:41 < Juerd> 'cause it took me, well, a few minutes, to realise that
I should have been looking at that Nil in the output, not the rest.

19​:24 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^([||@​a])*?$/
19​:24 <+camelia> rakudo-moar d075c8​: OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
  「cd」␤»
19​:25 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^([|@​a])*?$/
19​:25 <+camelia> rakudo-moar d075c8​: OUTPUT«Nil␤»
19​:25 < psch> so, yeah, i suppose it's "explicit or implicit LTM alternating
  interpolation of arrays matches differently than bareword LTM
  alternations"
19​:26 < psch> but well, it's still about what S05 actually means, in a way

@p6rt
Copy link
Author

p6rt commented Jul 11, 2016

From [Unknown Contact. See original ticket]

On Sat Dec 19 19​:44​:53 2015, peschwa@​gmail.com wrote​:

On Mon Nov 23 03​:52​:55 2015, peschwa@​gmail.com wrote​:

Consider the following two regexen and their matching output, or lack
thereof​:

11​:39 < psch> m​: say "abcd" ~~ /^(a | b | bc | cd)*?$/; my @​a = < a b
bc cd >; say "abcd" ~~ /^(@​a)*?$/
11​:39 <+camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0
=>
「cd」␤Nil␤»

S05 says​:

An interpolated array​:

/ @​cmds /

is matched as if it were an alternation of its literal elements.
Ordinarily it matches using junctive semantics​:

/ [ $(@​cmds[0]) | $(@​cmds[1]) | $(@​cmds[2]) | ... ] /

Taking this literal still matches when written as the long form​:

11​:51 <psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^( $(@​a[0]) |
$(@​a[1]) | $(@​a[2]) | $(@​a[3]) )*?$/
11​:51 <camelia> rakudo-moar : OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
「cd」␤»

With the S05 quote I'd expect the array to interpolate into any of
the
two other regexen and produce the same match.

03​:39 < Juerd> psch​: In bug reports I try to read significance in
every character. So given ^(@​a)*?$ I wonder why it's anchored, and
why there's a ? after the *
03​:39 < psch> the ? after * allows backtracking
03​:39 < Juerd> Yes, but I'm trying to figure out why that was needed
for the bug to trigger
03​:40 < Juerd> And if it wasn't, then it's not golfed enough yet :)
03​:40 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /(@​a)*/
03​:40 <+camelia> rakudo-moar 091ee7​: OUTPUT«「abc」␤ 0 => 「a」␤ 0 =>
「bc」␤»
03​:40 < psch> right, the report could've been more golfed i guess
03​:41 < Juerd> And it might have been clearer if the two attempts were
not in a single m​:
03​:41 < psch> i'm not confident enough to predict the right behavior
i'd say. quoting the synopsis alludes to that
03​:41 < psch> yes, that's definitely true
03​:41 < Juerd> 'cause it took me, well, a few minutes, to realise that
I should have been looking at that Nil in the output, not the rest.

19​:24 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^([||@​a])*?$/
19​:24 <+camelia> rakudo-moar d075c8​: OUTPUT«「abcd」␤ 0 => 「a」␤ 0 => 「b」␤ 0 =>
  「cd」␤»
19​:25 < psch> m​: my @​a = < a b bc cd >; say "abcd" ~~ /^([|@​a])*?$/
19​:25 <+camelia> rakudo-moar d075c8​: OUTPUT«Nil␤»
19​:25 < psch> so, yeah, i suppose it's "explicit or implicit LTM alternating
  interpolation of arrays matches differently than bareword LTM
  alternations"
19​:26 < psch> but well, it's still about what S05 actually means, in a way

@p6rt p6rt added the Bug label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant