New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555
Comments
From @Etsukata# Summary - a crafted regular expression can cause heap-buffer-overflow read during ## Affected Versions - perl-blead(Commit: 77dddf9) # PoC - dump a variable to stderr # Note - related ticket: regcomp: heap-buffer-overflow read in S_grok_bslash_N |
From @khwilliamsonOn 05/11/2018 05:28 PM, Eiichi Tsukata (via RT) wrote:
Thank you for finding this. The full solution to this is somewhat complicated. I have three patches The patch changes the code not to fix the underlying problem, but to Running the PoC in older perls results in not panicking, but a warning The second attached patch fixes the underlying cause of this bug. You The third attached patch is simply white space changes for the second
|
From @khwilliamson0008-PATCH-perl-133194-Heap-buffer-overflow.patchFrom 9ac24c8b60dd40531da9bf9eac5339a0940b7132 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 22:20:33 -0600
Subject: [PATCH 8/8] PATCH: [perl #133194] Heap buffer overflow
On debugging builds, the code in the ticket instead fails an assert. I
thought I had previous failures figured out, but it turns out I was
wrong. To prevent potential security failures by going ahead and
printing memory known to be outside what we are expecting on
non-debugging builds, instead panic.
This doesn't fix the underlying problem of how we came to be trying to
output bad stuff, but it keeps all such cases from being security
bugs.
---
regcomp.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/regcomp.c b/regcomp.c
index c3e5550d87..6bad27e9b9 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -657,7 +657,13 @@ static const scan_data_t zero_scan_data = {
UTF8fARG(UTF, \
(xI(xC) > eC) /* Don't run off end */ \
? eC - sC /* Length before the <--HERE */ \
- : ( __ASSERT_(xI_offset(xC) >= 0) xI_offset(xC) ), \
+ : ((xI_offset(xC) >= 0) \
+ ? xI_offset(xC) \
+ : (Perl_croak(aTHX_ "panic: %s: %d: negative offset: %" \
+ IVdf " trying to output message for " \
+ " pattern %.*s", \
+ __FILE__, __LINE__, xI_offset(xC), \
+ ((int) (eC - sC)), sC), 0)), \
sC), /* The input pattern printed up to the <--HERE */ \
UTF8fARG(UTF, \
(xI(xC) > eC) ? 0 : eC - xI(xC), /* Length after <--HERE */ \
--
2.11.0
|
From @khwilliamson0006-regcomp.c-Avoid-a-panic-on-malformed-qr-N-.-i.patchFrom b2f3cb2b8190162fc775d34d58a9bf142ed66bcb Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 21:40:02 -0600
Subject: [PATCH 6/8] regcomp.c: Avoid a panic on malformed qr/\N{..}/i
Input constructs that expand to more than one character are handled very
very specially when they occur within a bracketed character class. What
happens effectively is that they are removed from the class and parsed
separately, using the regular code in regcomp.c to generate something
like a trie for them. The other characters within the class are handled
normally.
The specially handled stuff is parsed from a separate string. In the
case where that stuff is of the form \N{U+...}, I neglected to
adequately consider that the syntax could trigger an error. When such
an error is raised, it can violate our assumptions about the state of
things, and lead to a panic.
THe code actually parses the construct twice. The first time while
deciding if this expands to multiple characters (so that it can be
separated out), and the second time to actually figure out and return
the expansion. This commit fixes the bug by adding error checking
during the first pass. Previously, the minimal amount of work was done
to be able to find the count of characters in the expansion. Now, more
work is done to do the checking, as we go along with the counting. This
actually results in less special case code needing to be executed, so
there is a net code removal from this commit.
---
regcomp.c | 48 +++++++++++++++++++-----------------------------
1 file changed, 19 insertions(+), 29 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index e1a048ff97..7fb5fbaf8e 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12236,8 +12236,8 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
* *node_p, nor *code_point_p, nor *flagp.
*
* If <cp_count> is not NULL, the caller wants to know the length (in code
- * points) that this \N sequence matches. This is set even if the function
- * returns FALSE, as detailed below.
+ * points) that this \N sequence matches. This is set, and the input is
+ * parsed for errors, even if the function returns FALSE, as detailed below.
*
* There are 5 possibilities here, as detailed in the next 5 paragraphs.
*
@@ -12460,31 +12460,9 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
}
/* Here, looks like its really a multiple character sequence. Fail
- * if that's not what the caller wants. */
- if (! node_p) {
-
- /* But even if failing, we count the code points if requested, and
- * don't back up up the pointer as the caller is expected to
- * handle this situation */
- if (cp_count) {
- char * dot = RExC_parse + 1;
- do {
- dot = (char *) memchr(dot, '.', endbrace - dot);
- if (! dot) {
- break;
- }
- count++;
- dot++;
- } while (dot < endbrace);
- count++;
-
- *cp_count = count;
- RExC_parse = endbrace;
- nextchar(pRExC_state);
- }
- else { /* Back up the pointer. */
- RExC_parse = p;
- }
+ * if that's not what the caller wants. But continue with counting
+ * and error checking if they still want a count */
+ if (! node_p && ! cp_count) {
return FALSE;
}
@@ -12492,25 +12470,37 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
* form \x{char1}\x{char2}... and then call reg recursively to
* parse it (enclosing in "(?: ... )" ). That way, it retains its
* atomicness, while not having to worry about special handling
- * that some code points may have. */
+ * that some code points may have. We don't create a subpattern,
+ * but go through the motions of code point counting and error
+ * checking, if the caller doesn't want a node returned. */
- if (count == 1) {
+ if (node_p && count == 1) {
substitute_parse = newSVpvs("?:");
}
do_concat:
+ if (node_p) {
/* Convert to notation the rest of the code understands */
sv_catpv(substitute_parse, "\\x{");
sv_catpvn(substitute_parse, start_digit, RExC_parse - start_digit);
sv_catpv(substitute_parse, "}");
+ }
/* Move to after the dot (or ending brace the final time through.)
* */
RExC_parse++;
+ count++;
} while (RExC_parse < endbrace);
+ if (! node_p) { /* Doesn't want the node */
+ assert (cp_count);
+
+ *cp_count = count;
+ return FALSE;
+ }
+
sv_catpv(substitute_parse, ")");
#ifdef EBCDIC
--
2.11.0
|
From @khwilliamson0007-regcomp.c-White-space-only.patchFrom 1222fa4d7aaab552d07a95c74668cdb44eb41f86 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 21:58:56 -0600
Subject: [PATCH 7/8] regcomp.c: White space only
Indent code that now has a newly formed block around it.
---
regcomp.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 7fb5fbaf8e..c3e5550d87 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12481,10 +12481,11 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
do_concat:
if (node_p) {
- /* Convert to notation the rest of the code understands */
- sv_catpv(substitute_parse, "\\x{");
- sv_catpvn(substitute_parse, start_digit, RExC_parse - start_digit);
- sv_catpv(substitute_parse, "}");
+ /* Convert to notation the rest of the code understands */
+ sv_catpv(substitute_parse, "\\x{");
+ sv_catpvn(substitute_parse, start_digit,
+ RExC_parse - start_digit);
+ sv_catpv(substitute_parse, "}");
}
/* Move to after the dot (or ending brace the final time through.)
--
2.11.0
|
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonAgain, thanks for finding and reporting this This is not a security bug, as it doesn't happen in any stable release. All 3 patches have been approved by the pumpking for 5.28. Applied as -- |
@khwilliamson - Status changed from 'open' to 'resolved' |
From @geeknikThis assertion failure is triggered with Perl v5.27.11-39-g3cacfe7. The ./perl -e '$p00="[\\N{U+$]N}0";/$p00/' perl: regcomp.c:12459: S_grok_bslash_N: Assertion `((IV) |
From @khwilliamsonOn 05/24/2018 04:46 PM, Brian Carpenter (via RT) wrote:
This was fixed by commit bc929b6 PATCH: [perl #133194] Heap buffer overflow On debugging builds, the code in the ticket instead fails an This doesn't fix the underlying problem of how we came to be trying to I'll merge this ticket to that one |
The RT System itself - Status changed from 'new' to 'open' |
Migrated from rt.perl.org#133194 (status was 'resolved')
Searchable as RT133194$
The text was updated successfully, but these errors were encountered: