regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555

p5pRT · 2018-05-11T23:28:41Z

Migrated from rt.perl.org#133194 (status was 'resolved')

Searchable as RT133194$

p5pRT · 2018-05-11T23:28:42Z

From @Etsukata

# Summary

- a crafted regular expression can cause heap-buffer-overflow read during
compilation which leads to information leak (ex: secret variable, source
code)

## Affected Versions

- perl-blead(Commit: 77dddf9)

# PoC

- dump a variable to stderr
```
$ ~/perl5/perlbrew/perls/perl-blead/bin/perl -e
'$x="SECRET";$r="[\\N{U+a.*" . "a" x 139 . "}a]"; qr/$r/;'
Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE in m/
<-- HERE
!SECRET!main1?.?[\N{U+a.*aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}a]/
at -e line 1.
```

# Note

- related ticket: regcomp: heap-buffer-overflow read in S_grok_bslash_N
(perl-5.26.2) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133192

p5pRT · 2018-05-13T05:16:57Z

From @khwilliamson

On 05/11/2018 05:28 PM, Eiichi Tsukata (via RT) wrote:

# New Ticket Created by Eiichi Tsukata
# Please include the string: [perl #133194]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133194 >

# Summary

- a crafted regular expression can cause heap-buffer-overflow read during
compilation which leads to information leak (ex: secret variable, source
code)

## Affected Versions

- perl-blead(Commit: 77dddf9)

# PoC

- dump a variable to stderr
```
$ ~/perl5/perlbrew/perls/perl-blead/bin/perl -e
'$x="SECRET";$r="[\\N{U+a.*" . "a" x 139 . "}a]"; qr/$r/;'
Invalid hexadecimal number in \N{U+...} in regex; marked by <-- HERE in m/
<-- HERE
!SECRET!main1?.?[\N{U+a.*aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}a]/
at -e line 1.

Thank you for finding this.

The full solution to this is somewhat complicated. I have three patches
that do it, but my guess is that we will only want to do the first one
attached for 5.28. (It does not yet include a test with the PoC of this
ticket)

The patch changes the code not to fix the underlying problem, but to
instead panic without dumping out stuff that shouldn't get output. That
way, bugs from not only this situation, but also any in all other code
that calls these macros will not be security bugs, but merely regular bugs.

Running the PoC in older perls results in not panicking, but a warning
(if enabled) and the pattern compiles to something that wasn't intended.
So this is not a regression introduced by 5.26 -- the input is handled
improperly in all releases I tested.

The second attached patch fixes the underlying cause of this bug. You
can read the commit text, as it is somewhat subtle. It involves some
restructuring, and it might be too late in our freeze to do this, even
though it results in more lines removed than added.

The third attached patch is simply white space changes for the second
patch, so that the difference listing of the second is easier to grok

```

# Note

- related ticket: regcomp: heap-buffer-overflow read in S_grok_bslash_N
(perl-5.26.2) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133192

p5pRT · 2018-05-13T05:16:57Z

From @khwilliamson

0008-PATCH-perl-133194-Heap-buffer-overflow.patch

From 9ac24c8b60dd40531da9bf9eac5339a0940b7132 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 22:20:33 -0600
Subject: [PATCH 8/8] PATCH: [perl #133194] Heap buffer overflow

On debugging builds, the code in the ticket instead fails an assert.  I
thought I had previous failures figured out, but it turns out I was
wrong.  To prevent potential security failures by going ahead and
printing memory known to be outside what we are expecting on
non-debugging builds, instead panic.

This doesn't fix the underlying problem of how we came to be trying to
output bad stuff, but it keeps all such cases from being security
bugs.
---
 regcomp.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/regcomp.c b/regcomp.c
index c3e5550d87..6bad27e9b9 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -657,7 +657,13 @@ static const scan_data_t zero_scan_data = {
     UTF8fARG(UTF,                                                           \
              (xI(xC) > eC) /* Don't run off end */                          \
               ? eC - sC   /* Length before the <--HERE */                   \
-              : ( __ASSERT_(xI_offset(xC) >= 0) xI_offset(xC) ),            \
+              : ((xI_offset(xC) >= 0)                                       \
+                 ? xI_offset(xC)                                            \
+                 : (Perl_croak(aTHX_ "panic: %s: %d: negative offset: %"    \
+                                    IVdf " trying to output message for "   \
+                                    " pattern %.*s",                        \
+                                    __FILE__, __LINE__, xI_offset(xC),      \
+                                    ((int) (eC - sC)), sC), 0)),            \
              sC),         /* The input pattern printed up to the <--HERE */ \
     UTF8fARG(UTF,                                                           \
              (xI(xC) > eC) ? 0 : eC - xI(xC), /* Length after <--HERE */    \
-- 
2.11.0

p5pRT · 2018-05-13T05:16:57Z

From @khwilliamson

0006-regcomp.c-Avoid-a-panic-on-malformed-qr-N-.-i.patch

From b2f3cb2b8190162fc775d34d58a9bf142ed66bcb Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 21:40:02 -0600
Subject: [PATCH 6/8] regcomp.c: Avoid a panic on malformed qr/\N{..}/i

Input constructs that expand to more than one character are handled very
very specially when they occur within a bracketed character class.  What
happens effectively is that they are removed from the class and parsed
separately, using the regular code in regcomp.c to generate something
like a trie for them.  The other characters within the class are handled
normally.

The specially handled stuff is parsed from a separate string.  In the
case where that stuff is of the form \N{U+...}, I neglected to
adequately consider that the syntax could trigger an error.  When such
an error is raised, it can violate our assumptions about the state of
things, and lead to a panic.

THe code actually parses the construct twice.  The first time while
deciding if this expands to multiple characters (so that it can be
separated out), and the second time to actually figure out and return
the expansion.  This commit fixes the bug by adding error checking
during the first pass.  Previously, the minimal amount of work was done
to be able to find the count of characters in the expansion.  Now, more
work is done to do the checking, as we go along with the counting.  This
actually results in less special case code needing to be executed, so
there is a net code removal from this commit.
---
 regcomp.c | 48 +++++++++++++++++++-----------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index e1a048ff97..7fb5fbaf8e 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12236,8 +12236,8 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
   * *node_p, nor *code_point_p, nor *flagp.
   *
   * If <cp_count> is not NULL, the caller wants to know the length (in code
-  * points) that this \N sequence matches.  This is set even if the function
-  * returns FALSE, as detailed below.
+  * points) that this \N sequence matches.  This is set, and the input is
+  * parsed for errors, even if the function returns FALSE, as detailed below.
   *
   * There are 5 possibilities here, as detailed in the next 5 paragraphs.
   *
@@ -12460,31 +12460,9 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
             }
 
             /* Here, looks like its really a multiple character sequence.  Fail
-             * if that's not what the caller wants. */
-            if (! node_p) {
-
-                /* But even if failing, we count the code points if requested, and
-                 * don't back up up the pointer as the caller is expected to
-                 * handle this situation */
-                if (cp_count) {
-                    char * dot = RExC_parse + 1;
-                    do {
-                        dot = (char *) memchr(dot, '.', endbrace - dot);
-                        if (! dot) {
-                            break;
-                        }
-                        count++;
-                        dot++;
-                    } while (dot < endbrace);
-                    count++;
-
-                    *cp_count = count;
-                    RExC_parse = endbrace;
-                    nextchar(pRExC_state);
-                }
-                else {  /* Back up the pointer. */
-                    RExC_parse = p;
-                }
+             * if that's not what the caller wants.  But continue with counting
+             * and error checking if they still want a count */
+            if (! node_p && ! cp_count) {
                 return FALSE;
             }
 
@@ -12492,25 +12470,37 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
              * form \x{char1}\x{char2}...  and then call reg recursively to
              * parse it (enclosing in "(?: ... )" ).  That way, it retains its
              * atomicness, while not having to worry about special handling
-             * that some code points may have. */
+             * that some code points may have.  We don't create a subpattern,
+             * but go through the motions of code point counting and error
+             * checking, if the caller doesn't want a node returned. */
 
-            if (count == 1) {
+            if (node_p && count == 1) {
                 substitute_parse = newSVpvs("?:");
             }
 
           do_concat:
 
+            if (node_p) {
             /* Convert to notation the rest of the code understands */
             sv_catpv(substitute_parse, "\\x{");
             sv_catpvn(substitute_parse, start_digit, RExC_parse - start_digit);
             sv_catpv(substitute_parse, "}");
+            }
 
             /* Move to after the dot (or ending brace the final time through.)
              * */
             RExC_parse++;
+            count++;
 
         } while (RExC_parse < endbrace);
 
+        if (! node_p) { /* Doesn't want the node */
+            assert (cp_count);
+
+            *cp_count = count;
+            return FALSE;
+        }
+
         sv_catpv(substitute_parse, ")");
 
 #ifdef EBCDIC
-- 
2.11.0

p5pRT · 2018-05-13T05:16:57Z

From @khwilliamson

0007-regcomp.c-White-space-only.patch

From 1222fa4d7aaab552d07a95c74668cdb44eb41f86 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Sat, 12 May 2018 21:58:56 -0600
Subject: [PATCH 7/8] regcomp.c:  White space only

Indent code that now has a newly formed block around it.
---
 regcomp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 7fb5fbaf8e..c3e5550d87 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12481,10 +12481,11 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,
           do_concat:
 
             if (node_p) {
-            /* Convert to notation the rest of the code understands */
-            sv_catpv(substitute_parse, "\\x{");
-            sv_catpvn(substitute_parse, start_digit, RExC_parse - start_digit);
-            sv_catpv(substitute_parse, "}");
+                /* Convert to notation the rest of the code understands */
+                sv_catpv(substitute_parse, "\\x{");
+                sv_catpvn(substitute_parse, start_digit,
+                                            RExC_parse - start_digit);
+                sv_catpv(substitute_parse, "}");
             }
 
             /* Move to after the dot (or ending brace the final time through.)
-- 
2.11.0

p5pRT · 2018-05-13T05:16:57Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2018-05-19T15:14:30Z

From @khwilliamson

Again, thanks for finding and reporting this

This is not a security bug, as it doesn't happen in any stable release.

All 3 patches have been approved by the pumpking for 5.28. Applied as
bc929b6
94a3864
22806e5

--
Karl Williamson

p5pRT · 2018-05-19T15:14:31Z

@khwilliamson - Status changed from 'open' to 'resolved'

p5pRT · 2018-05-24T22:46:14Z

From @geeknik

This assertion failure is triggered with Perl v5.27.11-39-g3cacfe7. The
testcase is pretty similar to #132163 which was fixed in March.

./perl -e '$p00="[\\N{U+$]N}0";/$p00/'

perl: regcomp.c:12459: S_grok_bslash_N: Assertion `((IV)
((pRExC_state->precomp_adj) + ((pRExC_state->parse) -
(pRExC_state->adjusted_start)))) >= 0' failed.
Aborted

p5pRT · 2018-05-24T23:12:43Z

From @khwilliamson

On 05/24/2018 04:46 PM, Brian Carpenter (via RT) wrote:

# New Ticket Created by Brian Carpenter
# Please include the string: [perl #133222]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133222 >

This assertion failure is triggered with Perl v5.27.11-39-g3cacfe7. The
testcase is pretty similar to #132163 which was fixed in March.

./perl -e '$p00="[\\N{U+$]N}0";/$p00/'

perl: regcomp.c:12459: S_grok_bslash_N: Assertion `((IV)
((pRExC_state->precomp_adj) + ((pRExC_state->parse) -
(pRExC_state->adjusted_start)))) >= 0' failed.
Aborted

This was fixed by

commit bc929b6
Author: Karl Williamson <khw@cpan.org>
Date: Sat May 12 22:20:33 2018 -0600

PATCH: [perl #133194] Heap buffer overflow

On debugging builds, the code in the ticket instead fails an
assert. I
thought I had previous failures figured out, but it turns out I was
wrong. To prevent potential security failures by going ahead and
printing memory known to be outside what we are expecting on
non-debugging builds, instead panic.

This doesn't fix the underlying problem of how we came to be trying to
output bad stuff, but it keeps all such cases from being security
bugs.

I'll merge this ticket to that one

p5pRT · 2018-05-24T23:12:43Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT closed this as completed May 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555

regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555

p5pRT commented May 11, 2018

p5pRT commented May 11, 2018

p5pRT commented May 13, 2018

p5pRT commented May 13, 2018

p5pRT commented May 13, 2018

p5pRT commented May 13, 2018

p5pRT commented May 13, 2018

p5pRT commented May 19, 2018

p5pRT commented May 19, 2018

p5pRT commented May 24, 2018

p5pRT commented May 24, 2018

p5pRT commented May 24, 2018

regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555

regcomp: heap-buffer-overflow read in S_grok_bslash_N (perl-blead) #16555

Comments

p5pRT commented May 11, 2018

p5pRT commented May 11, 2018

From @Etsukata

p5pRT commented May 13, 2018

From @khwilliamson

p5pRT commented May 13, 2018

From @khwilliamson

p5pRT commented May 13, 2018

From @khwilliamson

p5pRT commented May 13, 2018

From @khwilliamson

p5pRT commented May 13, 2018

p5pRT commented May 19, 2018

From @khwilliamson

p5pRT commented May 19, 2018

p5pRT commented May 24, 2018

From @geeknik

p5pRT commented May 24, 2018

From @khwilliamson

p5pRT commented May 24, 2018