Skip Menu |
Report information
Id: 130910
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: alex.jakimenko [at] gmail.com
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Subject: [WEIRD] Sometimes parameterized regexes print nonsense about the number of arguments (/<smth(42)>/)
Download (untitled) / with headers
text/plain 610b
Here are 3 examples that work as expected: Code: my regex meh($t) { xy }; say 'xy' ~~ /^ <meh(42)> $/ Result: 「xy」 meh => 「xy」 Code my regex meh($t) { xy }; say 'ab' ~~ /^ <meh(42)> $/ Result: Nil Code: my regex meh($t) { .. }; say 'xy' ~~ /^ <meh(42)> $/ Result: 「xy」 meh => 「xy」 And here is one that doesn't: Code: my regex meh($t) { .. }; say 'xyz' ~~ /^ <meh(42)> $/ Result: Too few positionals passed; expected 2 arguments but got 1 in regex meh at <tmp> line 1 in block <unit> at <tmp> line 1 Why? What second argument is it expecting? Nil is the right answer here.
On Fri, 03 Mar 2017 20:25:27 -0800, alex.jakimenko@gmail.com wrote: Show quoted text
> Here are 3 examples that work as expected: > > Code: > my regex meh($t) { xy }; say 'xy' ~~ /^ <meh(42)> $/ > > Result: > 「xy」 > meh => 「xy」 > > > Code > my regex meh($t) { xy }; say 'ab' ~~ /^ <meh(42)> $/ > > Result: > Nil > > > Code: > my regex meh($t) { .. }; say 'xy' ~~ /^ <meh(42)> $/ > > Result: > 「xy」 > meh => 「xy」 > > > > And here is one that doesn't: > > Code: > my regex meh($t) { .. }; say 'xyz' ~~ /^ <meh(42)> $/ > > Result: > Too few positionals passed; expected 2 arguments but got 1 > in regex meh at <tmp> line 1 > in block <unit> at <tmp> line 1 > > > > Why? What second argument is it expecting? > Nil is the right answer here.
Some more examples [23:10] <MasterDuke> m: my regex meh($t, $s) { .. }; say 'xyz' ~~ /^ <meh(1)> $/ [23:10] <evalable6> MasterDuke, rakudo-moar 11ee2fe17: OUTPUT: «(exit code 1) Too few positionals passed; expected 3 arguments but got 2␤ in regex meh at /tmp/cCSpiqVwyt line 1␤ in block <unit> at /tmp/cCSpiqVwyt line 1␤» [23:11] <MasterDuke> m: my regex meh($t, $s) { .. }; say 'xyz' ~~ /^ <meh(1, 2)> $/ [23:11] <evalable6> MasterDuke, rakudo-moar 11ee2fe17: OUTPUT: «(exit code 1) Too few positionals passed; expected 3 arguments but got 1␤ in regex meh at /tmp/FVoRYxVyfG line 1␤ in block <unit> at /tmp/FVoRYxVyfG line 1␤» [23:11] <MasterDuke> m: my regex meh($t, $s) { .. }; say 'xyz' ~~ /^ <meh(1, 2, 3)> $/ [23:11] <evalable6> MasterDuke, rakudo-moar 11ee2fe17: OUTPUT: «(exit code 1) Too many positionals passed; expected 3 arguments but got 4␤ in regex meh at /tmp/RumqR3rpLh line 1␤ in block <unit> at /tmp/RumqR3rpLh line 1␤» [23:11] <MasterDuke> very weird [23:11] <MasterDuke> m: my regex meh($t, $s) { .. }; say 'xyz' ~~ /^ <meh(1, 2, 3, 4)> $/ [23:11] <evalable6> MasterDuke, rakudo-moar 11ee2fe17: OUTPUT: «(exit code 1) Too many positionals passed; expected 3 arguments but got 5␤ in regex meh at /tmp/Mtn6BJJxmP line 1␤ in block <unit> at /tmp/Mtn6BJJxmP line 1␤»
Download (untitled) / with headers
text/plain 1.6k
It has to do with backtracking, because: 1) The problem disappears when `:ratchet` mode is enabled in the top-level regex: ➜ my regex meh ($t) { . }; ➜ say "ab" ~~ /^ :ratchet <meh(1)> $ /; Nil 2) The problem disappears when the named regex is made a `token`: ➜ my token meh ($t) { . }; ➜ say "ab" ~~ /^ <meh(1)> $ /; Nil Of course, the regex engine could avoid backtracking entirely in that example, but maybe it's just not optimized enough to know that. Here's a different example in which backtracking is actually necessary: my regex meh ($t) { { say "meh start"} .+? { say "meh end"} } say "abcde" ~~ / ^ <meh(42)> { say '$<meh> = ', $<meh> } $ /; It outputs: meh start meh end $<meh> = 「abcde」 Too few positionals passed; expected 2 arguments but got 1 in regex meh at [...] Note how the error message appears after having reached the end of the regex for the first time, just before it would have backtracked into `meh` for the first time. In comparison, when removing the parameterization of `meh`, the example prints the following (Note how it backtracked into `meh` four times, like it should): meh start meh end $<meh> = 「a」 meh end $<meh> = 「ab」 meh end $<meh> = 「abc」 meh end $<meh> = 「abcd」 meh end $<meh> = 「abcde」 In summary, what *appears* to be happening, is this: - If a named subrule is called with parameters... - And it matched... - But then the regex engine wants to backtrack into it... - Then it "calls" the subrule again, but fails to pass the parameters again.
Download (untitled) / with headers
text/plain 201b
Sorry, copy-pasto in the second-to-last output listing. It is: meh start meh end $<meh> = 「a」 Too few positionals passed; expected 2 arguments but got 1 in regex meh at [...]
Download (untitled) / with headers
text/plain 2.9k
This is also an issue in nqp. $ nqp -e 'grammar f { regex TOP { ^ <foo(42)> $ }; regex foo($i) { .. } }; nqp::say(f.parse("aaa"));' Too few positionals passed; expected 2 arguments but got 1 Fixing it in nqp first is probably the best first step. To that end I investigated some and it looks like this will require some fairly tricky modifications. Currently, a Cursor will fill in its $!regexsub parameter by getting the callercode of the rule that called a .cursor_start_* method. This code has the param checking instructions at the top. Then when the cursor is matched it copies this code reference into $!restart in .cursor_pass. Then the regex node code (made by .regex_mast which is called by .as_mast which simply inserts the .regex_mast instructions inline with the rest of the code .as_mast generates) will call cursor_next when backtracking. If it finds code in $!restart, .cursor_next invokes it with no arguments. The .as_mast code will skip calling the .regex_mast code when invoked with a function pointer in $!restart so it will only unwind the cursor stack (based on the backtrack stack). However, the code to check the parameter count is before the as_mast code in the frame and gets hit before it gets there. You can see this behavior as such by making the positional optional: $ nqp -e 'grammar f { regex TOP { ^ <foo(42)> $ }; regex foo($i?) { {nqp::say($i)} .. } }; f.parse("aaa");' 42 ...noting that the 42 is only said once on the first call where the match occurs, not on the second call during the backtrack. There is also a cursor_more in NQP which seems to be unused in NQP, which will call $!regexsub with nothing but a new cursor as a parameter. In rakudo, cursor_next and cursor_more are replicated under different names, along with an additional one used for exhaustive/overlapping, and then renamed pointers to those functions are thrown into a grist mill of code where it is hard to enumerate the number of places in which they are called. Long story short, it does not look like passing args along down the call chain is practical. Either some way to move the param checks for everything but the invocant down into the regex_mast instructions, or to take a curry closure around the params and put that in regexsub instead would be required. Worth noting as a side note, it has been expressed before that having a way to fire a phaser (or code somehow otherwise attached) when a block in a regex is backtracked over would be useful in building some interesting constructs. It is speculated in S04/"Definition of Success" that a block that gets backtracked over should fire UNDO (which implies that KEEP would not be fired until the whole match succeeds.) I would guess we would only want to keep half-finished frames around to do that when there actually were user-defined phasers to fire, for performance reasons. Also any block where the return value is used for interpolation or assertion would obviously not be compatible with this premise.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org