New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN *, int): Assertion `*(d - 1) == ')'' failed #15838
Comments
From @dur-randirCreated by @dur-randirWhile fuzzing perl v5.25.9-35-g32207c637b built with afl and run hexdump -C 0042 to cause an assertion failure. It crashes on perls dating back to at (gdb) bt Perl Info
|
From @dur-randir |
From @hvdsOn Thu, 26 Jan 2017 02:19:19 -0800, randir wrote:
We're hitting S_pat_upgrade_to_utf8() with a code block of "(?{<<})\n\n". My initial suspicion is that that's fine, and the assumption that the last char of such a code block must be ')' is wrong, but I don't know. There's also a similar assertion in S_compile_runtime_code() line 6670: Hugo |
The RT System itself - Status changed from 'new' to 'open' |
From @iabynOn Sun, Jan 29, 2017 at 08:17:33AM -0800, Hugo van der Sanden via RT wrote:
Hmmm... the assertion is correct, the toker is very wrong. When compile-time code is seen in a pattern, the code is parsed, so that /abc(?{...})def/ the toker returns this sequence of tokens: FUNC, '(', const("abc"), 'DO', '{', ...., '}, '(?{...})', 'def', ')' As well as the individual parsed tokens for the code block, the text of The problem with m{\x{100}(?{<<EOF}) is that the stringification of the code block is being returned by yylex() "(?{<<EOF})\nx\nEOF" rather than what I'd expect: "(?{\"x\n\"})" (or similar). But to a certain extent it depends on how heredocs are supposed to operate This is all too horrible to contemplate at the moment. -- |
From @cpansproutOn Mon, 30 Jan 2017 08:42:51 -0800, davem wrote:
I fixed up the deparsing of code blocks, by actually deparsing the code inside the regexp, instead of just stringifying it. Prior to that, I did many fix-ups in the parsing of here-docs, but I don’t recall doing anything specific to (?{...}) blocks; in fact, I think it predated your rewrite of those blocks.
What’s funny is that the length of the string that is supposed to represent the stringification of the code block amounts to the length of the code block plus the length of the trailing here-doc. But the code that gets used is a string of that length taken indiscriminately from the source code, beginning at the start of the code block. In other words, m{\x{100}(?{<<EOF})123456789 produces the token PV("(?{<<EOF})123456") because the here-doc is 6 characters lon ("x\nEOF\n" or maybe "\nx\nEOF"--I don’t know which). So I can get past the assertion by putting a parenthesis at the right spot: print qr{\x{100}((?{<<EOF})12345) This gives me (?^u:\x{100}((?{<<EOF})12345)12345) which is completely wrong. Traditionally the stringification of a regular expression with a here-doc body outside the pattern has not included the here-doc body. It still behaves that way: $ ./perl -lIlib -e 'print qr/(?{<<EOF})/' -eEOF I think that is acceptable. There is really no way to make it behave correctly when stringified and then recompiled as a regexp (which is generally true of code blocks, which may or may not work). So can we do something similar with here-doc bodies inside the pattern? (Actually, I though we were already doing that. Look for the ‘Paranoia’ comment in toke.c. Why is that not working?) -- Father Chrysostomos |
Migrated from rt.perl.org#130648 (status was 'open')
Searchable as RT130648$
The text was updated successfully, but these errors were encountered: