Skip Menu |
Report information
Id: 132063
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: imdb95 [at] gmail.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: (no value)
Type: (no value)
Perl Version: (no value)
Fixed In: (no value)

Attachments
0002-PATCH-perl-132063-Heap-buffer-overflow.patch
0004-Subject-PATCH-perl-132063-Heap-buffer-overflow.patch
0006-utf8.c-Don-t-dump-malformation-past-first-NUL.patch
0375-5.24-maint-perl-132063-Heap-buffer-overflow-in-byte_.patch
5.24-0002-perl-132063-Heap-buffer-overflow.patch
5.24-0003-v5.24.3-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acc.patch
5.24-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
5.26-0002-perl-132063-Heap-buffer-overflow.patch
5.26-0003-5.26.1-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acco.patch
5.26-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
5.26-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patch
blead-0002-perl-132063-Heap-buffer-overflow.patch
blead-0003-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-account-for.patch
blead-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
blead-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patch
blead_fix_132063.patch
maxlen.log
v5224_fix_132063.patch
v5243_fix_132063.patch
v5261_fix_132063.patch



To: Tony Cook via RT <perl5-security-report [...] perl.org>
Subject: Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
From: Manh Nguyen <imdb95 [...] gmail.com>
Date: Mon, 11 Sep 2017 09:31:57 +0700
Download (untitled) / with headers
text/plain 5.7k
Greetings,
With crafted regex match, I have found a heap-over-flow in function Perl__byte_dump_string, which would lead to memory leak.

**********Build Date & Hardware**********
Version: Version: the dev version (https://perl5.git.perl.org/perl.git)
manh@manh-VirtualBox:~/Fuzzing/afl/perl$ ./perl/perl -v

This is perl 5, version 27, subversion 4 (v5.27.4 (v5.27.3-14-gd2dccc0)) built for x86_64-linux

Copyright 1987-2017, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
--------------
OS: Ubuntu 16.04 Desktop
manh@manh-VirtualBox:~/Fuzzing/afl/perl$ uname -a
Linux manh-VirtualBox 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
--------------
Compilation:
AFL_USE_ASAN=1 ./Configure -des -Dusedevel -DDEBUGGING -Dcc=afl-clang-fast -Doptimize=-O0\ -g && AFL_USE_ASAN=1 make

**********Reproduce**********
manh@manh-VirtualBox:~/Fuzzing/afl/perl$ ./perl/perl -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;'
=================================================================
==11464==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000001b3a at pc 0x000000c66a61 bp 0x7ffd2e7fafb0 sp 0x7ffd2e7fafa8
READ of size 1 at 0x602000001b3a thread T0
    #0 0xc66a60 in Perl__byte_dump_string /home/manh/Fuzzing/afl/perl/perl/utf8.c:977:39
    #1 0xc66a60 in S_unexpected_non_continuation_text /home/manh/Fuzzing/afl/perl/perl/utf8.c:1036
    #2 0xc66a60 in Perl_utf8n_to_uvchr_error /home/manh/Fuzzing/afl/perl/perl/utf8.c:1707
    #3 0xc5d678 in Perl__force_out_malformed_utf8_message /home/manh/Fuzzing/afl/perl/perl/utf8.c:90:12
    #4 0xc726ae in Perl__to_utf8_fold_flags /home/manh/Fuzzing/afl/perl/perl/utf8.c:3583:5
    #5 0xbf7b22 in S_find_byclass /home/manh/Fuzzing/afl/perl/perl/regexec.c:2626:25
    #6 0xbde716 in Perl_regexec_flags /home/manh/Fuzzing/afl/perl/perl/regexec.c:3389:13
    #7 0x91ee92 in Perl_pp_match /home/manh/Fuzzing/afl/perl/perl/pp_hot.c:2222:10
    #8 0x8396bc in Perl_runops_debug /home/manh/Fuzzing/afl/perl/perl/dump.c:2486:23
    #9 0x5e0342 in S_run_body /home/manh/Fuzzing/afl/perl/perl/perl.c
    #10 0x5e0342 in perl_run /home/manh/Fuzzing/afl/perl/perl/perl.c:2484
    #11 0x5095cb in main /home/manh/Fuzzing/afl/perl/perl/perlmain.c:154:9
    #12 0x7fc01990082f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #13 0x435928 in _start (/home/manh/Fuzzing/afl/perl/perl/perl+0x435928)

0x602000001b3a is located 0 bytes to the right of 10-byte region [0x602000001b30,0x602000001b3a)
allocated by thread T0 here:
    #0 0x4dc62c in malloc /scratch/llvm/clang-4/xenial/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:66:3
    #1 0x83ed2b in Perl_safesysmalloc /home/manh/Fuzzing/afl/perl/perl/util.c:153:21

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/manh/Fuzzing/afl/perl/perl/utf8.c:977:39 in Perl__byte_dump_string
Shadow bytes around the buggy address:
  0x0c047fff8310: fa fa 00 02 fa fa 00 06 fa fa 00 02 fa fa 00 02
  0x0c047fff8320: fa fa 00 fa fa fa 00 02 fa fa 00 fa fa fa 00 03
  0x0c047fff8330: fa fa 00 05 fa fa 02 fa fa fa fd fd fa fa fd fd
  0x0c047fff8340: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
  0x0c047fff8350: fa fa 00 04 fa fa fd fa fa fa fd fd fa fa fd fd
=>0x0c047fff8360: fa fa fd fa fa fa 00[02]fa fa fd fa fa fa fd fd
  0x0c047fff8370: fa fa fd fd fa fa fd fa fa fa fd fd fa fa 00 02
  0x0c047fff8380: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
  0x0c047fff8390: fa fa fd fa fa fa fd fd fa fa fd fd fa fa fd fd
  0x0c047fff83a0: fa fa fd fd fa fa fd fd fa fa fd fa fa fa 00 00
  0x0c047fff83b0: fa fa fd fd fa fa fd fd fa fa 00 00 fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==11464==ABORTING
**********More info**********
If perl is compiled with gcc, in non-debug mode
    ./Configure -des -Dusedevel -Dcc=gcc -Doptimize=-O0\ -ggdb
then perl will leak some bytes after the crafted utf8 string:
root@manh-VirtualBox:/home# ./perl/perl -MConfig -e 'print $Config{optimize}, "\n"'
-O0 -ggdb
root@manh-VirtualBox:/home# ./perl/perl -MConfig -e 'print $Config{ccflags}, "\n"'
-fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2
root@manh-VirtualBox:/home# ./perl/perl -MConfig -e 'print $Config{cc}, "\n"'
gcc
root@manh-VirtualBox:/home# ./perl/perl -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;'
Malformed UTF-8 character: \xff\x00\xfa\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00 (unexpected non-continuation byte 0x00, immediately after start byte 0xff; need 13 bytes, got 1) in pattern match (m//) at -e line 1.
Malformed UTF-8 character (fatal) at -e line 1.
root@manh-VirtualBox:/home# ./perl/perl -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;'
Malformed UTF-8 character: \xff\x00\xb0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 (unexpected non-continuation byte 0x00, immediately after start byte 0xff; need 13 bytes, got 1) in pattern match (m//) at -e line 1.
Malformed UTF-8 character (fatal) at -e line 1.

Best,
Manh
RT-Send-CC: perl5-security-report [...] perl.org, perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 441b
I looked at this enough so I believe I understand the cause. In several places in the trie handling code in regexec.c, it passes UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are.
Subject: maxlen.log
Download maxlen.log
text/x-log 809b
./pp_pack.c:2585: if (utf8) end -= UTF8_MAXLEN-1; ./pp_pack.c:2598: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2612: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2644: if (!utf8) end -= UTF8_MAXLEN; ./pp_pack.c:2665: end = start+SvLEN(cat)-UTF8_MAXLEN; ./regexec.c:1507: uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:1525: uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:1544: uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:5812: uvc = utf8n_to_uvchr((U8*)uc, UTF8_MAXLEN, &len, ./regexec.c:5825: uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len, ./mathoms.c:1722: return utf8_to_uvchr_buf(s, s + UTF8_MAXBYTES, retlen);
Download (untitled) / with headers
text/plain 441b
I looked at this enough so I believe I understand the cause. In several places in the trie handling code in regexec.c, it passes UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are.
Subject: maxlen.log
Download maxlen.log
text/x-log 809b
./pp_pack.c:2585: if (utf8) end -= UTF8_MAXLEN-1; ./pp_pack.c:2598: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2612: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2644: if (!utf8) end -= UTF8_MAXLEN; ./pp_pack.c:2665: end = start+SvLEN(cat)-UTF8_MAXLEN; ./regexec.c:1507: uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:1525: uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:1544: uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \ ./regexec.c:5812: uvc = utf8n_to_uvchr((U8*)uc, UTF8_MAXLEN, &len, ./regexec.c:5825: uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len, ./mathoms.c:1722: return utf8_to_uvchr_buf(s, s + UTF8_MAXBYTES, retlen);
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
From: Karl Williamson <public [...] khwilliamson.com>
Date: Tue, 26 Sep 2017 23:03:12 -0600
CC: perl5-security-report [...] perl.org
To: rt-comment [...] perl.org
Download (untitled) / with headers
text/plain 547b
On 09/26/2017 10:28 PM, Karl Williamson via RT wrote: Show quoted text
> I looked at this enough so I believe I understand the cause. > In several places in the trie handling code in regexec.c, it passes UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. > > I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are. >
By "these", I meant the ones in regexec.c
Date: Mon, 22 Jan 2018 13:47:27 -0700
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
To: rt-comment [...] perl.org
CC: perl5-security-report [...] perl.org, Yves Orton <yves.orton [...] booking.com>
Download (untitled) / with headers
text/plain 1.1k
On 09/26/2017 11:03 PM, Karl Williamson wrote: Show quoted text
> On 09/26/2017 10:28 PM, Karl Williamson via RT wrote:
>> I looked at this enough so I believe I understand the cause. >> In several places in the trie handling code in regexec.c, it passes >> UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. >> >> I grepped blead for other possible problems with this, or its synonym >> UTF8_MAXBYTES.  Fortunately there aren't many.  I haven't investigated >> the attached results to see whether the others are problematic or >> not.  But these certainly are. >>
> > By "these", I meant the ones in regexec.c >
This should be fixed by the attached patch. The consequences of this bug are that if a perl program were to allow user input of patterns, this one could be used to read up to 11 bytes of memory beyond the end of the scalar being matched. I don't know if this is a real security issue or not. I tried the test without using /l as a pattern modifier, and it did not fail, so the attacker would have to furnish that, or the pattern would need to be executed within the scope of 'use locale' Tell me how to proceed.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 1.5k
On Tue, 26 Sep 2017 21:28:51 -0700, khw wrote: Show quoted text
> I looked at this enough so I believe I understand the cause. > In several places in the trie handling code in regexec.c, it passes > UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. > > I grepped blead for other possible problems with this, or its synonym > UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated > the attached results to see whether the others are problematic or not. > But these certainly are.
./pp_pack.c:2585: if (utf8) end -= UTF8_MAXLEN-1; ./pp_pack.c:2598: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2612: end = start+SvLEN(cat)-UTF8_MAXLEN; ./pp_pack.c:2644: if (!utf8) end -= UTF8_MAXLEN; ./pp_pack.c:2665: end = start+SvLEN(cat)-UTF8_MAXLEN; These can both lead to end pointing before the beginning of the allocated PV for the W and U pack formats. This is classified as undefined behaviour by the C standard. The end pointer is only used for comparisons to ensure that enough space is allocated when adding new characters to the buffer, with the small offset (UTF8_MAXLEN) the pointer is unlikely to wrap on modern systems. There's no memory accesses through the end pointers, so while I think the pack() issues are bugs, I don't think they're security issues. Show quoted text
> The consequences of this bug are that if a perl program were to allow > user input of patterns, this one could be used to read up to 11 bytes of > memory beyond the end of the scalar being matched. > > I don't know if this is a real security issue or not.
It sounds like a security issue to me. Tony
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Wed, 24 Jan 2018 20:26:09 -0800, tonyc wrote: Show quoted text
> It sounds like a security issue to me.
Karl asked me in IRC what needs to be done with security patches. I believe we need to do the following (not necessarily in this order). 1) produce a fix (Karl's done this), including fixes for currently supported maint branches. These should not be pushed to the public repository unless the issue is public. Hopefully you'll get votes from the people monitoring the security list for the change to be included in maint. 2) Allocate a CVE number, this is done with the form at: https://cveform.mitre.org/ If the issue is not public yet then the description should be vague and the rest of form should not contain the details of the problem. You might want to think about those details for when we update the CVE. I can do this if you can get me to understand the issue. 3) Notify downstreams, with the fixes. Last time Sawyer did this. 4) Deal with queries from downstreams, it's more than likely they'll want to know: a) whether the issue applies to the unsupported maint releases they ship, and if it does b) whether the fixes work for those older maints and possibly platform issues. 5) Pick a publication date in consultation with downstreams. On publication date: 6) push the fix to blead, maint 7) Update the CVE entry with the details of the issue (if the issue wasn't public.) 8) Once downstreams are ready, make the ticket public. Tony
From: Karl Williamson <public [...] khwilliamson.com>
Date: Wed, 31 Jan 2018 21:00:09 -0700
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
To: perl5-security-report-followup [...] perl.org
CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 2.5k
On 01/31/2018 08:30 PM, Tony Cook via RT wrote: Show quoted text
> On Wed, 24 Jan 2018 20:26:09 -0800, tonyc wrote:
>> It sounds like a security issue to me.
> > Karl asked me in IRC what needs to be done with security patches. > > I believe we need to do the following (not necessarily in this order). > > 1) produce a fix (Karl's done this), including fixes for currently supported maint branches. These should not be pushed to the public repository unless the issue is public. > > Hopefully you'll get votes from the people monitoring the security list for the change to be included in maint. > > 2) Allocate a CVE number, this is done with the form at: > > https://cveform.mitre.org/ > > If the issue is not public yet then the description should be vague and the rest of form should not contain the details of the problem. You might want to think about those details for when we update the CVE. > > I can do this if you can get me to understand the issue.
So is this a real security problem? The original code is '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' The trick here is that the pattern has to compile as a trie. It doesn't have to be this particular pattern. In UTF-8, the length of a sequence to form a character is given by the first byte in the sequence. The maximum length of a character is 13 bytes. If only a partial character is furnished, and the code doesn't guard against it, it can lead to reading beyond the buffer end. The maximum it could read is 12 bytes. It turns out that tries are vulnerable, because they just assume there is enough space, so they go ahead and try to access it. It is almost certain that the bytes that are off the end won't be legal UTF-8 continuation bytes, so the translator will detect that this is malformed, and will output a message showing what those bytes are (unless it segfaults). Thus someone knowing that there is a trie, can feed it input that will show them up to 12 bytes beyond the end of the end of the buffer that their string is in. Show quoted text
> > 3) Notify downstreams, with the fixes. Last time Sawyer did this. > > 4) Deal with queries from downstreams, it's more than likely they'll want to know: > > a) whether the issue applies to the unsupported maint releases they ship, and if it does > > b) whether the fixes work for those older maints > > and possibly platform issues. > > 5) Pick a publication date in consultation with downstreams. > > On publication date: > > 6) push the fix to blead, maint > > 7) Update the CVE entry with the details of the issue (if the issue wasn't public.) > > 8) Once downstreams are ready, make the ticket public. > > Tony >
From: Karl Williamson <public [...] khwilliamson.com>
CC: perl5-security-report [...] perl.org
To: perl5-security-report-followup [...] perl.org
Date: Wed, 31 Jan 2018 21:34:27 -0700
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Download (untitled) / with headers
text/plain 1.9k
On 01/24/2018 09:26 PM, Tony Cook via RT wrote: Show quoted text
> On Tue, 26 Sep 2017 21:28:51 -0700, khw wrote:
>> I looked at this enough so I believe I understand the cause. >> In several places in the trie handling code in regexec.c, it passes >> UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. >> >> I grepped blead for other possible problems with this, or its synonym >> UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated >> the attached results to see whether the others are problematic or not. >> But these certainly are.
> > ./pp_pack.c:2585: if (utf8) end -= UTF8_MAXLEN-1; > ./pp_pack.c:2598: end = start+SvLEN(cat)-UTF8_MAXLEN; > ./pp_pack.c:2612: end = start+SvLEN(cat)-UTF8_MAXLEN; > ./pp_pack.c:2644: if (!utf8) end -= UTF8_MAXLEN; > ./pp_pack.c:2665: end = start+SvLEN(cat)-UTF8_MAXLEN; > > These can both lead to end pointing before the beginning of the allocated PV for the W and U pack formats. > > This is classified as undefined behaviour by the C standard. > > The end pointer is only used for comparisons to ensure that enough space is allocated when adding new characters to the buffer, with the small offset (UTF8_MAXLEN) the pointer is unlikely to wrap on modern systems. > > There's no memory accesses through the end pointers, so while I think the pack() issues are bugs, I don't think they're security issues. >
>> The consequences of this bug are that if a perl program were to allow >> user input of patterns, this one could be used to read up to 11 bytes of >> memory beyond the end of the scalar being matched. >> >> I don't know if this is a real security issue or not.
> > It sounds like a security issue to me. > > Tony >
Hmm. I missed this email until just now. My analysis is slightly different in the email I just sent. I said 12 bytes; 11 is more realistic given the liklihood of a trailing NUL. And the attacker doesn't have to specify the pattern. It could happen for various tries, given malicious input to match against.
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Date: Thu, 1 Feb 2018 16:21:27 +1100
From: Tony Cook <tony [...] develop-help.com>
To: Karl Williamson <public [...] khwilliamson.com>
CC: perl5-security-report-followup [...] perl.org, perl5-security-report [...] perl.org
On Wed, Jan 31, 2018 at 09:00:09PM -0700, Karl Williamson wrote: Show quoted text
> > 2) Allocate a CVE number, this is done with the form at: > > > > https://cveform.mitre.org/ > > > > If the issue is not public yet then the description should be vague and the rest of form should not contain the details of the problem. You might want to think about those details for when we update the CVE. > > > > I can do this if you can get me to understand the issue.
> > So is this a real security problem? > > The original code is > > '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' > > The trick here is that the pattern has to compile as a trie. It doesn't > have to be this particular pattern. In UTF-8, the length of a sequence to > form a character is given by the first byte in the sequence. The maximum > length of a character is 13 bytes. If only a partial character is > furnished, and the code doesn't guard against it, it can lead to reading > beyond the buffer end. The maximum it could read is 12 bytes. > > It turns out that tries are vulnerable, because they just assume there is > enough space, so they go ahead and try to access it. It is almost certain > that the bytes that are off the end won't be legal UTF-8 continuation bytes, > so the translator will detect that this is malformed, and will output a > message showing what those bytes are (unless it segfaults). Thus someone > knowing that there is a trie, can feed it input that will show them up to 12 > bytes beyond the end of the end of the buffer that their string is in.
There's two potential attacks that I can see with this bug: 1) the regexp engine can access beyond the end of a malloced block. This can crash perl, leading to a denial of service if the perl process is running as some sort of long-term process. 2) the error message may reveal sensitive information, eg. if the memory used by the SV had been previously used for a password, it could reveal part of the password. This all assumes that the regexp engine is intended to work with untrusted regexps - if it isn't, we should probably remove C< use re 'eval';> Tony
From: Sawyer X <xsawyerx [...] gmail.com>
To: perl5-security-report-followup [...] perl.org
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Date: Thu, 1 Feb 2018 09:59:28 +0100
Download (untitled) / with headers
text/plain 2.1k
On 1 February 2018 at 04:30, Tony Cook via RT <perl5-security-report-followup@perl.org> wrote: Show quoted text
> > On Wed, 24 Jan 2018 20:26:09 -0800, tonyc wrote:
> > It sounds like a security issue to me.
> > Karl asked me in IRC what needs to be done with security patches. > > I believe we need to do the following (not necessarily in this order). > > 1) produce a fix (Karl's done this), including fixes for currently supported maint branches. These should not be pushed to the public repository unless the issue is public. > > Hopefully you'll get votes from the people monitoring the security list for the change to be included in maint.
Show quoted text
> > > 2) Allocate a CVE number, this is done with the form at: > > https://cveform.mitre.org/ > > If the issue is not public yet then the description should be vague and the rest of form should not contain the details of the problem. You might want to think about those details for when we update the CVE. > > I can do this if you can get me to understand the issue.
Thanks, Tony. Show quoted text
> > > 3) Notify downstreams, with the fixes. Last time Sawyer did this.
Yes. I handle that. Show quoted text
> > > 4) Deal with queries from downstreams, it's more than likely they'll want to know: > > a) whether the issue applies to the unsupported maint releases they ship, and if it does > > b) whether the fixes work for those older maints > > and possibly platform issues.
This happens sometimes and when it does, I normally contact the person who fixed it for more details, if I cannot answer it myself. Show quoted text
> > > 5) Pick a publication date in consultation with downstreams.
This is usually about two weeks, a month, or far longer. Vendors usually need at least two weeks, but sometimes prefer a month or more. It depends on several factors, including the urgency, the likliness of it being publicly known or discovered soon, whether we also released a maint with a fix, etc. Show quoted text
> > > On publication date: > > 6) push the fix to blead, maint > > 7) Update the CVE entry with the details of the issue (if the issue wasn't public.) > > 8) Once downstreams are ready, make the ticket public.
These can be done prior to a release, too, to avoid the fix being public until a release is made.
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
From: demerphq <demerphq [...] gmail.com>
Date: Fri, 2 Feb 2018 21:41:33 +0100
To: rt-comment [...] perl.org
CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 1.7k
On 27 September 2017 at 06:28, Karl Williamson via RT <rt-comment@perl.org> wrote: Show quoted text
> I looked at this enough so I believe I understand the cause. > In several places in the trie handling code in regexec.c, it passes UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space available. > > I grepped blead for other possible problems with this, or its synonym UTF8_MAXBYTES. Fortunately there aren't many. I haven't investigated the attached results to see whether the others are problematic or not. But these certainly are.
I have reviewed this ticket and I disagree with Karls analysis. U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; Anyway, when we use UTF8_MAXBYTES in uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); if you check you will find that uscan points into foldbuf[] mentioned above. The code in uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \ flags); \ would have to construct a malformed utf8 sequence inside of foldbuf. What is actually is happening is that the logic to choose what strategy we use for reading data in the trie code does not account for latin strings when under case folding locale dependent logic. The code falls into the utf8 reading branch of the logic, which it should not do. What is needed is the patch I am attaching here for blead. This affects v5.22 and later, so we need patches for 22, 24, and 26, and one for blead. Me personally, if i had found this without the security report I would have just pushed the blead fix, so I dont know if these patches need to sequestered orwhatever. I will let you guys figure this out. All i know is that m//li is not very common, or we would definitely have seen and heard about this bug before. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 749b
I agree with Yves' analysis. Mine was flawed about the cause of this particular issue. I think I still think that when dealing with UTF-8, it's best to use the real space available size, but I'd have to examine it more closely in light of the new analysis. What Yves didn't mention is that this bug requires the following things in order to manifest: 1) the pattern is compiled under /il 2) the pattern does not contain any characters below 256 3) the target string is not UTF-8. Items 1) and 2) together would be extremely rare in the field, unless the attacker gets to choose the pattern as well. No non-UTF-8 string may match this pattern, so if we could just short circuit the rest of trying the match, it would work. -- Karl Williamson
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 5.3k
On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote: Show quoted text
> On 27 September 2017 at 06:28, Karl Williamson via RT > <rt-comment@perl.org> wrote:
> > I looked at this enough so I believe I understand the cause. > > In several places in the trie handling code in regexec.c, it passes > > UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space > > available. > > > > I grepped blead for other possible problems with this, or its synonym > > UTF8_MAXBYTES. Fortunately there aren't many. I haven't > > investigated the attached results to see whether the others are > > problematic or not. But these certainly are.
> > I have reviewed this ticket and I disagree with Karls analysis. > > U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; > > Anyway, when we use UTF8_MAXBYTES in > > uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags > ); > > if you check you will find that uscan points into foldbuf[] mentioned > above. The code in > > uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, > foldbuf, &foldlen, \ > > flags); \ > > would have to construct a malformed utf8 sequence inside of foldbuf.
No, it points into the string we're matching: tony@mars:.../git/perl$ valgrind --vgdb-error=1 --vgdb=full -q ./perl -Ilib -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' ==25312== ==25312== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==25312== /path/to/gdb ./perl ==25312== and then give GDB the following command ==25312== target remote | /usr/lib/valgrind/../../bin/vgdb --pid=25312 ==25312== --pid is optional if only one valgrind process is running ==25312== ==25312== Conditional jump or move depends on uninitialised value(s) ==25312== at 0x2E9522: Perl__byte_dump_string (utf8.c:991) ==25312== by 0x2E9629: S_unexpected_non_continuation_text (utf8.c:1036) ==25312== by 0x2EA52B: Perl_utf8n_to_uvchr_msgs (utf8.c:1935) ... tony@mars:.../git/perl$ gdb ./perl ... Reading symbols from lib/auto/re/re.so...done. 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, format=false) at utf8.c:991 991 if (high_nibble < 10) { (gdb) p start $1 = (const U8 * const) 0x5f82be0 "\377" (gdb) bt #0 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, format=false) at utf8.c:991 #1 0x00000000002e962a in S_unexpected_non_continuation_text ( s=0x5f82be0 "\377", print_len=13, non_cont_byte_pos=1, expect_len=13) at utf8.c:1036 #2 0x00000000002ea52c in Perl_utf8n_to_uvchr_msgs (s=0x5f82be1 "", curlen=1, retlen=0x0, flags=0, errors=0xffeffffb4, msgs=0x0) at utf8.c:1935 #3 0x00000000002e88ee in Perl__force_out_malformed_utf8_message ( p=0x5f82be0 "\377", e=0x5f82bed "", flags=0, die_here=true) at utf8.c:90 #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', file=0x37041d "", line=0) at utf8.c:3856 #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) at regexec.c:3003 #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 ... (gdb) up 6 #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 3766 if (find_byclass(prog, c, s, strend, reginfo)) (gdb) p foldbuf No symbol "foldbuf" in current context. (gdb) down #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) at regexec.c:3003 3003 REXEC_TRIE_READ_CHAR(trie_type, trie, (gdb) p foldbuf $2 = "\000\000\000\000\000\000x\272\372\005\000\000\000" (gdb) p &foldbuf $3 = (U8 (*)[14]) 0xfff00023a (gdb) down #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', file=0x37041d "", line=0) at utf8.c:3856 so the bytes being dumped come from the SV, not from foldbuf[] If I continue until we get the invalid access: Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000002e94c3 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, format=false) at utf8.c:978 978 const unsigned high_nibble = (*s & 0xF0) >> 4; (gdb) p s $4 = (const U8 *) 0x5f82bea "" Still in the string we're matching. Show quoted text
> What is actually is happening is that the logic to choose what > strategy we use for reading data in the trie code does not account for > latin strings when under case folding locale dependent logic. The code > falls into the utf8 reading branch of the logic, which it should not > do.
I suspect code that uses the :utf8 layer to read from a file could hit this bug with this confusion[1], but I tend to think that such code is buggy in itself. I don't know if the regexp engine is intended to handle badly encoded SVf_UTF8 SVs. Show quoted text
> What is needed is the patch I am attaching here for blead. > > This affects v5.22 and later, so we need patches for 22, 24, and 26, > and one for blead.
Thanks for the version range, saves me having to work it out for the CVE. Tony [1] since :utf8 could return invalidly encoded SVf_UTF8 SVs.
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 2.2k
On Fri, 02 Feb 2018 13:15:17 -0800, khw wrote: Show quoted text
> I agree with Yves' analysis. Mine was flawed about the cause of this > particular issue. I think I still think that when dealing with UTF-8, > it's best to use the real space available size, but I'd have to > examine it more closely in light of the new analysis. > > What Yves didn't mention is that this bug requires the following > things in order to manifest: > > 1) the pattern is compiled under /il > 2) the pattern does not contain any characters below 256 > 3) the target string is not UTF-8. > > Items 1) and 2) together would be extremely rare in the field, unless > the attacker gets to choose the pattern as well. > > No non-UTF-8 string may match this pattern, so if we could just short > circuit the rest of trying the match, it would work.
I can imagine an older application containing/managing non-Unicode data allowing a regular expression search that could trigger this issue. CVE form entry for once this is public: Vulnerability type: Buffer Overflow Vendor of the product: Perl5 Porters Product: perl Version: 5.22 through 5.26 Has vendor confirmed or acknowledged the vulnerability? Yes Attack vector: Context dependent Impact: Denial of service (it can crash perl) Information disclosure (it exposes the bytes beyond the end of the NUL) - the other choices for Impact are: Code Execution - I don't think is this possible Escalation of Privilege - nope Other - not that I can think of Affected Components: regexec.c Perl_regexec_flags() Attack vector Matching a locale dependent regular expression against a non-UTF-8 encoded scalar could result in a heap buffer read overflow while reporting an error message. That error message includes bytes beyond the end of the string, and possibly beyond the end of the buffer, providing a potential information disclosure if the memory had contained any sensitive information. Suggested description of the vulnerability for use in the CVE Matching a crafted locale dependent regular expression can cause a heap buffer read overflow and potentially information disclosure. Discoverer(s)/Credits Nguyen Duc Manh <imdb95@gmail.com> Reference(s) https://rt.perl.org/Public/Bug/Display.html?id=132063 Additional Information (blank)
CC: perl5-security-report [...] perl.org
To: perl5-security-report-followup [...] perl.org
Date: Mon, 5 Feb 2018 09:09:00 +0100
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 8.6k
On 5 February 2018 at 05:32, Tony Cook via RT <perl5-security-report-followup@perl.org> wrote: Show quoted text
> On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
>> On 27 September 2017 at 06:28, Karl Williamson via RT >> <rt-comment@perl.org> wrote:
>> > I looked at this enough so I believe I understand the cause. >> > In several places in the trie handling code in regexec.c, it passes >> > UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space >> > available. >> > >> > I grepped blead for other possible problems with this, or its synonym >> > UTF8_MAXBYTES. Fortunately there aren't many. I haven't >> > investigated the attached results to see whether the others are >> > problematic or not. But these certainly are.
>> >> I have reviewed this ticket and I disagree with Karls analysis. >> >> U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; >> >> Anyway, when we use UTF8_MAXBYTES in >> >> uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags >> ); >> >> if you check you will find that uscan points into foldbuf[] mentioned >> above. The code in >> >> uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, >> foldbuf, &foldlen, \ >> >> flags); \ >> >> would have to construct a malformed utf8 sequence inside of foldbuf.
> > No, it points into the string we're matching:
Yes, because it is not going through the correct code path. It is trying to read a latin-1 string as utf8, and blowing up. If it went through the right code path it would not blow up. IOW, Karl analysed this being a problem in code that is not involved. This ticket is extremely misleading FWIW. The bug says its about Perl__byte_dump_string. It isnt. That is a red-herring. That comes from the utf8 error throwing code which is triggered by other logic failures further up the stack. If you look at the stack the problem is caused here Show quoted text
> #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", > e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', > file=0x37041d "", line=0) at utf8.c:3856
while we are in Show quoted text
> #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, > s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) > at regexec.c:3003
That Perl__to_utf8_fold_flags() is the same thing as uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, flags); And as you can see the *target* buffer is foldbuf. The *source* is the string, which we have misidentified as utf8, even though we know it is not as the utf8 flag is NOT set. We should *never* end up in the code to read utf8, however a bug in the logic for handling the EXACTLU8 variants of the TRIE code does not account for the fact that target string might not be unicode (in which case it cannot match), and mistakenly tries to read a _raw_ \xFF as though it was utf8. If you look at the patch I posted you can see how this was fixed. Show quoted text
> tony@mars:.../git/perl$ valgrind --vgdb-error=1 --vgdb=full -q ./perl -Ilib -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' > ==25312== > ==25312== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==25312== /path/to/gdb ./perl > ==25312== and then give GDB the following command > ==25312== target remote | /usr/lib/valgrind/../../bin/vgdb --pid=25312 > ==25312== --pid is optional if only one valgrind process is running > ==25312== > ==25312== Conditional jump or move depends on uninitialised value(s) > ==25312== at 0x2E9522: Perl__byte_dump_string (utf8.c:991) > ==25312== by 0x2E9629: S_unexpected_non_continuation_text (utf8.c:1036) > ==25312== by 0x2EA52B: Perl_utf8n_to_uvchr_msgs (utf8.c:1935) > ... > > > tony@mars:.../git/perl$ gdb ./perl > ... > Reading symbols from lib/auto/re/re.so...done. > 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, > format=false) at utf8.c:991 > 991 if (high_nibble < 10) { > (gdb) p start > $1 = (const U8 * const) 0x5f82be0 "\377" > (gdb) bt > #0 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", > len=13, format=false) at utf8.c:991 > #1 0x00000000002e962a in S_unexpected_non_continuation_text ( > s=0x5f82be0 "\377", print_len=13, non_cont_byte_pos=1, expect_len=13) > at utf8.c:1036 > #2 0x00000000002ea52c in Perl_utf8n_to_uvchr_msgs (s=0x5f82be1 "", curlen=1, > retlen=0x0, flags=0, errors=0xffeffffb4, msgs=0x0) at utf8.c:1935 > #3 0x00000000002e88ee in Perl__force_out_malformed_utf8_message ( > p=0x5f82be0 "\377", e=0x5f82bed "", flags=0, die_here=true) at utf8.c:90 > #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", > e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', > file=0x37041d "", line=0) at utf8.c:3856 > #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, > s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) > at regexec.c:3003 > #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, > stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", > minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 > ... > (gdb) up 6 > #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, > stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", > minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 > 3766 if (find_byclass(prog, c, s, strend, reginfo)) > (gdb) p foldbuf > No symbol "foldbuf" in current context. > (gdb) down > #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, > s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) > at regexec.c:3003 > 3003 REXEC_TRIE_READ_CHAR(trie_type, trie, > (gdb) p foldbuf > $2 = "\000\000\000\000\000\000x\272\372\005\000\000\000" > (gdb) p &foldbuf > $3 = (U8 (*)[14]) 0xfff00023a > (gdb) down > #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", > e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', > file=0x37041d "", line=0) at utf8.c:3856 > > so the bytes being dumped come from the SV, not from foldbuf[]
That wasnt the point. That function reads a character of utf8, and then writes the folded equivalent to a buffer, in this case foldbuf[]. The place were UTF8_MAXLEN is relevant is when we fold, we have to assume the input string is wellformed, or could end up validating it inside of a quadratic loop, which is unnacceptable in terms of performance. However in this case it is intentionally (albeit incorrectly) being fed latin1. Show quoted text
> > If I continue until we get the invalid access: > > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x00000000002e94c3 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, > format=false) at utf8.c:978 > 978 const unsigned high_nibble = (*s & 0xF0) >> 4; > (gdb) p s > $4 = (const U8 *) 0x5f82bea "" > > Still in the string we're matching.
And still not relevant. If you construct a sting with a single 0xFF byte in it, and then feed it to ANY of our utf8 handling code it will blow up. The thing is you shouldnt be able to construct a string with a single 0xFF byte in it and have it be utf8-on. Show quoted text
>> What is actually is happening is that the logic to choose what >> strategy we use for reading data in the trie code does not account for >> latin strings when under case folding locale dependent logic. The code >> falls into the utf8 reading branch of the logic, which it should not >> do.
> > I suspect code that uses the :utf8 layer to read from a file could hit this bug with this confusion[1], but I tend to think that such code is buggy in itself.
Well, I would need to a see a proper case. This ticket isn't such. Show quoted text
> I don't know if the regexp engine is intended to handle badly encoded SVf_UTF8 SVs.
No, it is not, and it shouldn't be. If you load malformed utf8 without validating it you get to keep the results. The regex engine might read the same utf8 character millions of times during matching, we cannot defend against deliberate malformed input without paying a disproportionate costs. I adamantly refuse to accept that the regex engine should guard against malformed input in any way other than doing a prescan over the target string. No such validation should occur internally or it could end up being repeated over and over and over and over and over and over. Show quoted text
>> What is needed is the patch I am attaching here for blead. >> >> This affects v5.22 and later, so we need patches for 22, 24, and 26, >> and one for blead.
> > Thanks for the version range, saves me having to work it out for the CVE. > > Tony > > [1] since :utf8 could return invalidly encoded SVf_UTF8 SVs.
Then that is a bug in the :utf8 layer, not in the regex engine, and should be addressed there. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
From: Karl Williamson <public [...] khwilliamson.com>
CC: perl5-security-report [...] perl.org
To: demerphq <demerphq [...] gmail.com>, perl5-security-report-followup [...] perl.org
Date: Mon, 5 Feb 2018 11:04:16 -0700
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Download (untitled) / with headers
text/plain 9.8k
On 02/05/2018 01:09 AM, demerphq wrote: Show quoted text
> On 5 February 2018 at 05:32, Tony Cook via RT > <perl5-security-report-followup@perl.org> wrote:
>> On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
>>> On 27 September 2017 at 06:28, Karl Williamson via RT >>> <rt-comment@perl.org> wrote:
>>>> I looked at this enough so I believe I understand the cause. >>>> In several places in the trie handling code in regexec.c, it passes >>>> UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space >>>> available. >>>> >>>> I grepped blead for other possible problems with this, or its synonym >>>> UTF8_MAXBYTES. Fortunately there aren't many. I haven't >>>> investigated the attached results to see whether the others are >>>> problematic or not. But these certainly are.
>>> >>> I have reviewed this ticket and I disagree with Karls analysis. >>> >>> U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; >>> >>> Anyway, when we use UTF8_MAXBYTES in >>> >>> uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags >>> ); >>> >>> if you check you will find that uscan points into foldbuf[] mentioned >>> above. The code in >>> >>> uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, >>> foldbuf, &foldlen, \ >>> >>> flags); \ >>> >>> would have to construct a malformed utf8 sequence inside of foldbuf.
>> >> No, it points into the string we're matching:
> > Yes, because it is not going through the correct code path. It is > trying to read a latin-1 string as utf8, and blowing up. > > If it went through the right code path it would not blow up. IOW, Karl > analysed this being a problem in code that is not involved. > > This ticket is extremely misleading FWIW. > > The bug says its about Perl__byte_dump_string. It isnt. That is a > red-herring. That comes from the utf8 error throwing code which is > triggered by other logic failures further up the stack. > > If you look at the stack the problem is caused here >
>> #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", >> e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', >> file=0x37041d "", line=0) at utf8.c:3856
> > while we are in >
>> #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, >> s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) >> at regexec.c:3003
> > That Perl__to_utf8_fold_flags() is the same thing as > > > uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, flags); > > And as you can see the *target* buffer is foldbuf. The *source* is the > string, which we have misidentified as utf8, even though we know it is > not as the utf8 flag is NOT set. > > We should *never* end up in the code to read utf8, however a bug in > the logic for handling the EXACTLU8 variants of the TRIE code does not > account for the fact that target string might not be unicode (in which > case it cannot match), and mistakenly tries to read a _raw_ \xFF as > though it was utf8. > > If you look at the patch I posted you can see how this was fixed. > >
>> tony@mars:.../git/perl$ valgrind --vgdb-error=1 --vgdb=full -q ./perl -Ilib -e '$x="(?il)\x{100}|\x{100}"; "\xff" =~ /$x/;' >> ==25312== >> ==25312== TO DEBUG THIS PROCESS USING GDB: start GDB like this >> ==25312== /path/to/gdb ./perl >> ==25312== and then give GDB the following command >> ==25312== target remote | /usr/lib/valgrind/../../bin/vgdb --pid=25312 >> ==25312== --pid is optional if only one valgrind process is running >> ==25312== >> ==25312== Conditional jump or move depends on uninitialised value(s) >> ==25312== at 0x2E9522: Perl__byte_dump_string (utf8.c:991) >> ==25312== by 0x2E9629: S_unexpected_non_continuation_text (utf8.c:1036) >> ==25312== by 0x2EA52B: Perl_utf8n_to_uvchr_msgs (utf8.c:1935) >> ... >> >> >> tony@mars:.../git/perl$ gdb ./perl >> ... >> Reading symbols from lib/auto/re/re.so...done. >> 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, >> format=false) at utf8.c:991 >> 991 if (high_nibble < 10) { >> (gdb) p start >> $1 = (const U8 * const) 0x5f82be0 "\377" >> (gdb) bt >> #0 0x00000000002e9522 in Perl__byte_dump_string (start=0x5f82be0 "\377", >> len=13, format=false) at utf8.c:991 >> #1 0x00000000002e962a in S_unexpected_non_continuation_text ( >> s=0x5f82be0 "\377", print_len=13, non_cont_byte_pos=1, expect_len=13) >> at utf8.c:1036 >> #2 0x00000000002ea52c in Perl_utf8n_to_uvchr_msgs (s=0x5f82be1 "", curlen=1, >> retlen=0x0, flags=0, errors=0xffeffffb4, msgs=0x0) at utf8.c:1935 >> #3 0x00000000002e88ee in Perl__force_out_malformed_utf8_message ( >> p=0x5f82be0 "\377", e=0x5f82bed "", flags=0, die_here=true) at utf8.c:90 >> #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", >> e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', >> file=0x37041d "", line=0) at utf8.c:3856 >> #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, >> s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) >> at regexec.c:3003 >> #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, >> stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", >> minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 >> ... >> (gdb) up 6 >> #6 0x00000000002d5117 in Perl_regexec_flags (rx=0x5f835e0, >> stringarg=0x5f82be0 "\377", strend=0x5f82be1 "", strbeg=0x5f82be0 "\377", >> minend=0, sv=0x5f735d8, data=0x0, flags=97) at regexec.c:3766 >> 3766 if (find_byclass(prog, c, s, strend, reginfo)) >> (gdb) p foldbuf >> No symbol "foldbuf" in current context. >> (gdb) down >> #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, c=0x7059540, >> s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) >> at regexec.c:3003 >> 3003 REXEC_TRIE_READ_CHAR(trie_type, trie, >> (gdb) p foldbuf >> $2 = "\000\000\000\000\000\000x\272\372\005\000\000\000" >> (gdb) p &foldbuf >> $3 = (U8 (*)[14]) 0xfff00023a >> (gdb) down >> #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 "\377", >> e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 '\002', >> file=0x37041d "", line=0) at utf8.c:3856 >> >> so the bytes being dumped come from the SV, not from foldbuf[]
> > That wasnt the point. That function reads a character of utf8, and > then writes the folded equivalent to a buffer, in this case foldbuf[]. > The place were UTF8_MAXLEN is relevant is when we fold, we have to > assume the input string is wellformed, or could end up validating it > inside of a quadratic loop, which is unnacceptable in terms of > performance. > > However in this case it is intentionally (albeit incorrectly) being fed latin1. >
>> >> If I continue until we get the invalid access: >> >> Program received signal SIGTRAP, Trace/breakpoint trap. >> 0x00000000002e94c3 in Perl__byte_dump_string (start=0x5f82be0 "\377", len=13, >> format=false) at utf8.c:978 >> 978 const unsigned high_nibble = (*s & 0xF0) >> 4; >> (gdb) p s >> $4 = (const U8 *) 0x5f82bea "" >> >> Still in the string we're matching.
> > And still not relevant. If you construct a sting with a single 0xFF > byte in it, and then feed it to ANY of our utf8 handling code it will > blow up. > > The thing is you shouldnt be able to construct a string with a single > 0xFF byte in it and have it be utf8-on. >
>>> What is actually is happening is that the logic to choose what >>> strategy we use for reading data in the trie code does not account for >>> latin strings when under case folding locale dependent logic. The code >>> falls into the utf8 reading branch of the logic, which it should not >>> do.
>> >> I suspect code that uses the :utf8 layer to read from a file could hit this bug with this confusion[1], but I tend to think that such code is buggy in itself.
> > Well, I would need to a see a proper case. This ticket isn't such. >
>> I don't know if the regexp engine is intended to handle badly encoded SVf_UTF8 SVs.
> > No, it is not, and it shouldn't be. If you load malformed utf8 without > validating it you get to keep the results. > > The regex engine might read the same utf8 character millions of times > during matching, we cannot defend against deliberate malformed input > without paying a disproportionate costs. I adamantly refuse to accept > that the regex engine should guard against malformed input in any way > other than doing a prescan over the target string. No such validation > should occur internally or it could end up being repeated over and > over and over and over and over and over.
Yves is correct that the trie code is wrong (and I made it wrong) and if it were correct, this bug would not have happened. The problem is that it is incorrectly branching to a portion of the code that expects it is handling UTF-8. And the input isn't UTF-8, and the code that assumes it is UTF-8 should not have been branched to. But I also think we should look that any UTF-8 we handle isn't too long for the space available. What I diagnosed was code that passed in a made-up number as to the available length. We shouldn't be making up numbers. The actual length is known, and we should use that. The made-up number doesn't save any time, as the translator uses that value; it might as well use the correct one. So this bug wouldn't have manifested as being a security issue if we had used the correct length, instead of a made-up one. I want to emphasize that no extra time is required here to process using the correct length as opposed to a made-up one. Show quoted text
>
>>> What is needed is the patch I am attaching here for blead. >>> >>> This affects v5.22 and later, so we need patches for 22, 24, and 26, >>> and one for blead.
>> >> Thanks for the version range, saves me having to work it out for the CVE. >> >> Tony >> >> [1] since :utf8 could return invalidly encoded SVf_UTF8 SVs.
> > Then that is a bug in the :utf8 layer, not in the regex engine, and > should be addressed there. > > Yves >
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Date: Mon, 5 Feb 2018 21:38:02 +0100
To: Karl Williamson <public [...] khwilliamson.com>
CC: perl5-security-report-followup [...] perl.org, perl5-security-report [...] perl.org
From: demerphq <demerphq [...] gmail.com>
On 5 February 2018 at 19:04, Karl Williamson <public@khwilliamson.com> wrote: Show quoted text
> On 02/05/2018 01:09 AM, demerphq wrote:
>> >> On 5 February 2018 at 05:32, Tony Cook via RT >> <perl5-security-report-followup@perl.org> wrote:
>>> >>> On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
Show quoted text
>> The regex engine might read the same utf8 character millions of times >> during matching, we cannot defend against deliberate malformed input >> without paying a disproportionate costs. I adamantly refuse to accept >> that the regex engine should guard against malformed input in any way >> other than doing a prescan over the target string. No such validation >> should occur internally or it could end up being repeated over and >> over and over and over and over and over.
> > > Yves is correct that the trie code is wrong (and I made it wrong) and if it > were correct, this bug would not have happened. The problem is that it is > incorrectly branching to a portion of the code that expects it is handling > UTF-8. And the input isn't UTF-8, and the code that assumes it is UTF-8 > should not have been branched to. > > But I also think we should look that any UTF-8 we handle isn't too long for > the space available. What I diagnosed was code that passed in a made-up > number as to the available length.
Well that really isnt clear to me at all. I assume you mean the patch from the 22nd. The changes with utf8_to_uvchr_buf_flags() I am not fond of. The UTF8_MAXLEN should just be replaced with "foldlen" in pretty much every place it is used in the TRIE code. Once you do that replacement, there is only one other use of utf8_to_uvchr_buf_flags which I would prefer see just gets fixed explicitly. What is left after that looks fine to me. Show quoted text
> We shouldn't be making up numbers. The > actual length is known, and we should use that. The made-up number doesn't > save any time, as the translator uses that value; it might as well use the > correct one.
Sure, but we dont need to do the things that patch that does. foldlen tells us how much is left in the buffer. Show quoted text
> So this bug wouldn't have manifested as being a security issue if we had > used the correct length, instead of a made-up one.
Actually I beg to differ. The *security* issue comes from the utf8 code, and the logic in Perl__byte_dump_string(). The security issue doesn't come from the trie code at all, nor does it come from any call utf8_to_uvchr_buf_flags(), but rather from the call to _toFOLD_utf8_flags() and then from the internal error throwing logic in the utf8 code. If we see a bad utf8 sequence we should not dump any of its bytes as it is a security violation. Also notice that the problem in that code is that it trusts UTFSKIP(), not anything to do with "a made up value", like UTF8_MAXLEN. If you look at the stack trace: READ of size 1 at 0x602000001b3a thread T0 #0 0xc66a60 in Perl__byte_dump_string /home/manh/Fuzzing/afl/perl/perl/utf8.c:977:39 #1 0xc66a60 in S_unexpected_non_continuation_text /home/manh/Fuzzing/afl/perl/perl/utf8.c:1036 #2 0xc66a60 in Perl_utf8n_to_uvchr_error /home/manh/Fuzzing/afl/perl/perl/utf8.c:1707 #3 0xc5d678 in Perl__force_out_malformed_utf8_message /home/manh/Fuzzing/afl/perl/perl/utf8.c:90:12 #4 0xc726ae in Perl__to_utf8_fold_flags The security issues comes from S_unexpected_non_continuation_text() or Perl_byte_dump_string() leaking data from obviously corrupt input. To fix the security issue we should fix them. On the other hand the bug that leads to the security issue is that we treat latin as utf8, and/or use UTF8-skip and not actual buffer length in the call to _to_utf8_fold_flags(). But it doesnt have anything to do with reading foldbuf, which is where we use UTF8_MAXLEN. Anyway, yes we can and should change the UTF8_MAXLEN, and stop using UTF8SKIP like this, but we should use foldlen, and do the other fixes explicitly. Show quoted text
> I want to emphasize that no extra time is required here to process using the > correct length as opposed to a made-up one.
Yeah I get it. Lets redo that patch reflecting the responses above and then I see no problem with it. Yves
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 4.7k
On Mon, 05 Feb 2018 00:09:08 -0800, demerphq wrote: Show quoted text
> On 5 February 2018 at 05:32, Tony Cook via RT > <perl5-security-report-followup@perl.org> wrote:
> > On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
> >> On 27 September 2017 at 06:28, Karl Williamson via RT > >> <rt-comment@perl.org> wrote:
> >> > I looked at this enough so I believe I understand the cause. > >> > In several places in the trie handling code in regexec.c, it > >> > passes > >> > UTF8_MAXLEN to utf8n_to_uvchr() instead of the actual space > >> > available. > >> > > >> > I grepped blead for other possible problems with this, or its > >> > synonym > >> > UTF8_MAXBYTES. Fortunately there aren't many. I haven't > >> > investigated the attached results to see whether the others are > >> > problematic or not. But these certainly are.
> >> > >> I have reviewed this ticket and I disagree with Karls analysis. > >> > >> U8 foldbuf[ UTF8_MAXBYTES_CASE + 1 ]; > >> > >> Anyway, when we use UTF8_MAXBYTES in > >> > >> uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags > >> ); > >> > >> if you check you will find that uscan points into foldbuf[] > >> mentioned > >> above. The code in > >> > >> uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, > >> foldbuf, &foldlen, \ > >> > >> flags); \ > >> > >> would have to construct a malformed utf8 sequence inside of foldbuf.
> > > > No, it points into the string we're matching:
> > Yes, because it is not going through the correct code path. It is > trying to read a latin-1 string as utf8, and blowing up. > > If it went through the right code path it would not blow up. IOW, Karl > analysed this being a problem in code that is not involved.
It's true the bug isn't in Perl__byte_dump_string(), but that is where an attacker takes advantage of the bug. Show quoted text
> > This ticket is extremely misleading FWIW. > > The bug says its about Perl__byte_dump_string. It isnt. That is a > red-herring. That comes from the utf8 error throwing code which is > triggered by other logic failures further up the stack.
The title simply says there's a heap buffer overflow in Perl__byte_dump_string(), which is the case, it just happens because of a bug elsewhere. Show quoted text
> > If you look at the stack the problem is caused here >
> > #4 0x00000000002eeb44 in Perl__to_utf8_fold_flags (p=0x5f82be0 > > "\377", > > e=0x5f82bed "", ustrp=0xfff00023a "", lenp=0xfff000120, flags=2 > > '\002', > > file=0x37041d "", line=0) at utf8.c:3856
> > while we are in >
> > #5 0x00000000002d2bdc in S_find_byclass (prog=0x61254c0, > > c=0x7059540, > > s=0x5f82be0 "\377", strend=0x5f82be1 "", reginfo=0xfff000390) > > at regexec.c:3003
> > That Perl__to_utf8_fold_flags() is the same thing as > > > uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, > flags); > > And as you can see the *target* buffer is foldbuf. The *source* is the > string, which we have misidentified as utf8, even though we know it is > not as the utf8 flag is NOT set. > > We should *never* end up in the code to read utf8, however a bug in > the logic for handling the EXACTLU8 variants of the TRIE code does not > account for the fact that target string might not be unicode (in which > case it cannot match), and mistakenly tries to read a _raw_ \xFF as > though it was utf8. > > If you look at the patch I posted you can see how this was fixed.
Sorry, I misunderstood the point you were trying to make. Show quoted text
> > I don't know if the regexp engine is intended to handle badly encoded > > SVf_UTF8 SVs.
> > No, it is not, and it shouldn't be. If you load malformed utf8 without > validating it you get to keep the results. > > The regex engine might read the same utf8 character millions of times > during matching, we cannot defend against deliberate malformed input > without paying a disproportionate costs. I adamantly refuse to accept > that the regex engine should guard against malformed input in any way > other than doing a prescan over the target string. No such validation > should occur internally or it could end up being repeated over and > over and over and over and over and over.
The regular expression engine does have code supporting badly encoded inputs - all the calls to utf8n_to_uvchr() and its variants could be replaced with calls to a much simpler function with less error checking/handling if it wasn't trying to handle such badly encoded data (by croaking instead of crashing). I think your change is needed to fix both the behaviour bug (treating the Latin-1 data as UTF-8) which results in a mistaken call that results in a heap buffer read overflow. I think Karl's change is useful to improve robustness - it's not a direct fix the behaviour bug - which would have resulted in an error with only Karl's change, rather than a heap buffer read overflow. Tony
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 137b
On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote: Show quoted text
> CVE form entry for once this is public:
I've requested a CVE id for this issue. Tony
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Date: Tue, 6 Feb 2018 14:36:40 +0200
To: perl5-security-report-followup [...] perl.org
CC: Perl 5 Security Report <perl5-security-report [...] perl.org>
From: Sawyer X <xsawyerx [...] gmail.com>
Download (untitled) / with headers
text/plain 362b
Please let me know when we decide on a patch (or patches, depending on version). I will notify vendors once I have it. On 6 February 2018 at 07:47, Tony Cook via RT <perl5-security-report-followup@perl.org> wrote: Show quoted text
> On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
>> CVE form entry for once this is public:
> > I've requested a CVE id for this issue. > > Tony
From: Karl Williamson <public [...] khwilliamson.com>
Date: Tue, 6 Feb 2018 16:01:37 -0700
Subject: Re: [perl #132063] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
To: demerphq <demerphq [...] gmail.com>
CC: perl5-security-report-followup [...] perl.org, perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 6.6k
On 02/05/2018 01:38 PM, demerphq wrote: Show quoted text
> On 5 February 2018 at 19:04, Karl Williamson <public@khwilliamson.com> wrote:
>> On 02/05/2018 01:09 AM, demerphq wrote:
>>> >>> On 5 February 2018 at 05:32, Tony Cook via RT >>> <perl5-security-report-followup@perl.org> wrote:
>>>> >>>> On Fri, 02 Feb 2018 12:41:41 -0800, demerphq wrote:
>
>>> The regex engine might read the same utf8 character millions of times >>> during matching, we cannot defend against deliberate malformed input >>> without paying a disproportionate costs. I adamantly refuse to accept >>> that the regex engine should guard against malformed input in any way >>> other than doing a prescan over the target string. No such validation >>> should occur internally or it could end up being repeated over and >>> over and over and over and over and over.
>> >> >> Yves is correct that the trie code is wrong (and I made it wrong) and if it >> were correct, this bug would not have happened. The problem is that it is >> incorrectly branching to a portion of the code that expects it is handling >> UTF-8. And the input isn't UTF-8, and the code that assumes it is UTF-8 >> should not have been branched to. >> >> But I also think we should look that any UTF-8 we handle isn't too long for >> the space available. What I diagnosed was code that passed in a made-up >> number as to the available length.
> > Well that really isnt clear to me at all. I assume you mean the patch > from the 22nd. > > The changes with utf8_to_uvchr_buf_flags() I am not fond of. The > UTF8_MAXLEN should just be replaced with "foldlen" in pretty much > every place it is used in the TRIE code. Once you do that replacement, > there is only one other use of utf8_to_uvchr_buf_flags which I would > prefer see just gets fixed explicitly. > > What is left after that looks fine to me. >
>> We shouldn't be making up numbers. The >> actual length is known, and we should use that. The made-up number doesn't >> save any time, as the translator uses that value; it might as well use the >> correct one.
> > Sure, but we dont need to do the things that patch that does. foldlen > tells us how much is left in the buffer. >
>> So this bug wouldn't have manifested as being a security issue if we had >> used the correct length, instead of a made-up one.
> > Actually I beg to differ. The *security* issue comes from the utf8 > code, and the logic in Perl__byte_dump_string(). > > The security issue doesn't come from the trie code at all, nor does it > come from any call utf8_to_uvchr_buf_flags(), but rather from the call > to > > _toFOLD_utf8_flags() > > and then from the internal error throwing logic in the utf8 code. If > we see a bad utf8 sequence we should not dump any of its bytes as it > is a security violation. Also notice that the problem in that code is > that it trusts UTFSKIP(), not anything to do with "a made up value", > like UTF8_MAXLEN. > > If you look at the stack trace: > > READ of size 1 at 0x602000001b3a thread T0 > #0 0xc66a60 in Perl__byte_dump_string > /home/manh/Fuzzing/afl/perl/perl/utf8.c:977:39 > #1 0xc66a60 in S_unexpected_non_continuation_text > /home/manh/Fuzzing/afl/perl/perl/utf8.c:1036 > #2 0xc66a60 in Perl_utf8n_to_uvchr_error > /home/manh/Fuzzing/afl/perl/perl/utf8.c:1707 > #3 0xc5d678 in Perl__force_out_malformed_utf8_message > /home/manh/Fuzzing/afl/perl/perl/utf8.c:90:12 > #4 0xc726ae in Perl__to_utf8_fold_flags > > The security issues comes from S_unexpected_non_continuation_text() or > Perl_byte_dump_string() leaking data from obviously corrupt input. To > fix the security issue we should fix them. > > On the other hand the bug that leads to the security issue is that we > treat latin as utf8, and/or use UTF8-skip and not actual buffer length > in the call to _to_utf8_fold_flags(). But it doesnt have anything to > do with reading foldbuf, which is where we use UTF8_MAXLEN. > > Anyway, yes we can and should change the UTF8_MAXLEN, and stop using > UTF8SKIP like this, but we should use foldlen, and do the other fixes > explicitly. >
>> I want to emphasize that no extra time is required here to process using the >> correct length as opposed to a made-up one.
> > Yeah I get it. Lets redo that patch reflecting the responses above and > then I see no problem with it. > > Yves >
The point of dumping out the data is to save a lot of debugging time. It's not clear if you are arguing that when we find a UTF-8 malformation that we shouldn't give any data to help the recipient discern what the problem is, out of fear of running off the end of the buffer. It occurs to me as I write, that the dump function could assume that a NUL it encounters is the terminating NUL of the string, and to stop dumping there. NULs can occur in Perl strings, but not in these contexts. That would have prevented this issue. I could change blead to do this, marked as a defensive coding measure, and not give an attacker any real information as to this vector. The function already has a flag to indicate that this particular call has doubt about if we're too close to the buffer end to safely display the whole malformation. It then dumps just to the first error byte (which we've already had to read in order to determine that it's malformed) But it is also good housekeeping to not assume that there is sufficient space, hence my original patch. In working with UTF-8 over the years, I have also come to believe that it's easier and safer to give the upper limit pointer in calling these functions, rather than a length, which varies as one moves along the input string. which is what I did in my original patch. But attached is a revision based on Yves' objections. I do think it's better to pass foldbuf + sizeof(foldbuf) than foldlen, because that avoids any problems with the calculated foldlen getting too big. It's just safer. And this is the final place in blead that still just just passes this MAX value without checking, though there are some still that pass UTF8SKIP as the length. So now we have three things This security bug becomes a non-security one if I just stop dumping a malformation when it finds a NUL Or, this security bug becomes non-security if the calls to the utf8 translator are changed to not assume that there is space available Or, this security bug becomes non-security if the trie handler is fixed. I now think the first one is what we should do for the security fix. It also would prevent other similar problems that we haven't discovered yet. I haven't looked at backporting. The dumping has been revised in recent releases, so it may be that the trie bug isn't a security issue except recently. The attached patch is testing for something that won't happen if either of the other two changes are done.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 217b
On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote: Show quoted text
> On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
> > CVE form entry for once this is public:
> > I've requested a CVE id for this issue.
This is CVE-2018-6798 Tony
Subject: Re: [perl #132063] [CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Date: Tue, 13 Feb 2018 07:45:51 -0700
From: Karl Williamson <public [...] khwilliamson.com>
To: perl5-security-report-followup [...] perl.org
CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 591b
On 02/07/2018 03:23 PM, Tony Cook via RT wrote: Show quoted text
> On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
>> On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
>>> CVE form entry for once this is public:
>> >> I've requested a CVE id for this issue.
> > This is CVE-2018-6798 > > Tony >
Attached is my utf8.c patch that solves this at the lowest level. I wrote the commit message in such a way as to not indicate we're solving a particular bug, and its far enough away from the scene of the crime that we might be able to sneak it into blead earlier than CVE disclosure date. I don't know.

Message body is not shown because sender requested not to inline it.

CC: perl5-security-report [...] perl.org
To: perl5-security-report-followup [...] perl.org
From: Karl Williamson <public [...] khwilliamson.com>
Date: Wed, 14 Feb 2018 20:19:34 -0700
Subject: Re: [perl #132063] [CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Download (untitled) / with headers
text/plain 344b
On 02/07/2018 03:23 PM, Tony Cook via RT wrote: Show quoted text
> On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
>> On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
>>> CVE form entry for once this is public:
>> >> I've requested a CVE id for this issue.
> > This is CVE-2018-6798 > > Tony >
Attached a patch for 5.24, since the other one doesn't apply.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 427b
On Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote: Show quoted text
> On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
> > On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
> > > CVE form entry for once this is public:
> > > > I've requested a CVE id for this issue.
> > This is CVE-2018-6798
Patches for 5.24 for this ticket, all merged and de-conflicted. These apply after the changes in 131844 (though there should be no conflicts there.)
Subject: 5.24-0002-perl-132063-Heap-buffer-overflow.patch
From 29231d73407542051a287cab5e18546e5a622f4a Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 6 Feb 2018 14:50:48 -0700 Subject: [perl #132063]: Heap buffer overflow The proximal cause is several instances in regexec.c of the code assuming that the input was valid UTF-8, whereas the input was too short for what the start byte claimed it would be. I grepped through the core for any other similar uses, and did not find any. --- regexec.c | 29 ++++++++++++++++------------- t/lib/warnings/regexec | 7 +++++++ 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/regexec.c b/regexec.c index 5735b997fd..ea432c39d3 100644 --- a/regexec.c +++ b/regexec.c @@ -1466,7 +1466,9 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_fold \ : trie_latin_utf8_fold))) -#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ +/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is + * 'foldbuf+sizeof(foldbuf)' */ +#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ STMT_START { \ STRLEN skiplen; \ U8 flags = FOLD_FLAGS_FULL; \ @@ -1474,7 +1476,7 @@ STMT_START { case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ - _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \ + _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ case trie_utf8_exactfa_fold: \ @@ -1483,7 +1485,7 @@ STMT_START { case trie_utf8_fold: \ do_trie_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ @@ -1500,7 +1502,7 @@ STMT_START { /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ @@ -1519,7 +1521,7 @@ STMT_START { } \ /* FALLTHROUGH */ \ case trie_utf8: \ - uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \ break; \ case trie_plain: \ uvc = (UV)*uc; \ @@ -2599,10 +2601,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, } points[pointpos++ % maxlen]= uc; if (foldlen || uc < (U8*)strend) { - REXEC_TRIE_READ_CHAR(trie_type, trie, - widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, + (U8 *) strend, uscan, len, uvc, + charid, foldlen, foldbuf, + uniflags); DEBUG_TRIE_EXECUTE_r({ dump_exec_pos( (char *)uc, c, strend, real_start, s, utf8_target, 0); @@ -5511,8 +5513,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) if ( base && (foldlen || uc < (U8*)(reginfo->strend))) { I32 offset; REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + (U8 *) reginfo->strend, uscan, + len, uvc, charid, foldlen, + foldbuf, uniflags); charcount++; if (foldlen>0) ST.longfold = TRUE; @@ -5642,8 +5645,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) while (foldlen) { if (!--chars) break; - uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len, - uniflags); + uvc = utf8n_to_uvchr(uscan, foldlen, &len, + uniflags); uscan += len; foldlen -= len; } diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 900dd6ee7f..6635142dea 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); "k" =~ /(?[ \N{KELVIN SIGN} ])/i; ":" =~ /(?[ \: ])/; EXPECT +######## +# NAME perl #132063, read beyond buffer end +# OPTION fatal +"\xff" =~ /(?il)\x{100}|\x{100}/; +EXPECT +Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. +Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
Subject: 5.24-0003-v5.24.3-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acc.patch
From a59dc1f157bd0f626b6b864b9996480f9bac44aa Mon Sep 17 00:00:00 2001 From: Yves Orton <demerphq@gmail.com> Date: Mon, 19 Feb 2018 13:49:46 +1100 Subject: v5.24.3: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target --- regexec.c | 14 ++++++++++---- t/re/re_tests | 1 + 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/regexec.c b/regexec.c index ea432c39d3..ff8e89cb65 100644 --- a/regexec.c +++ b/regexec.c @@ -1451,7 +1451,7 @@ Perl_re_intuit_start(pTHX_ #define DECL_TRIE_TYPE(scan) \ const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \ trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \ - trie_utf8l, trie_flu8 } \ + trie_utf8l, trie_flu8, trie_flu8_latin } \ trie_type = ((scan->flags == EXACT) \ ? (utf8_target ? trie_utf8 : trie_plain) \ : (scan->flags == EXACTL) \ @@ -1461,10 +1461,12 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_exactfa_fold \ : trie_latin_utf8_exactfa_fold) \ : (scan->flags == EXACTFLU8 \ - ? trie_flu8 \ + ? (utf8_target \ + ? trie_flu8 \ + : trie_flu8_latin) \ : (utf8_target \ ? trie_utf8_fold \ - : trie_latin_utf8_fold))) + : trie_latin_utf8_fold))) /* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is * 'foldbuf+sizeof(foldbuf)' */ @@ -1475,7 +1477,7 @@ STMT_START { switch (trie_type) { \ case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ - if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ + if (UTF8_IS_ABOVE_LATIN1(*uc)) { \ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ @@ -1497,10 +1499,14 @@ STMT_START { uscan = foldbuf + skiplen; \ } \ break; \ + case trie_flu8_latin: \ + _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ + goto do_trie_latin_utf8_fold; \ case trie_latin_utf8_exactfa_fold: \ flags |= FOLD_FLAGS_NOMIX_ASCII; \ /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ + do_trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ diff --git a/t/re/re_tests b/t/re/re_tests index 7e8522da98..ab7ddbb848 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1968,6 +1968,7 @@ ab(?#Comment){2}c abbc y $& abbc (?:.||)(?|)000000000@ 000000000@ y $& 000000000@ # [perl #126405] aa$|a(?R)a|a aaa y $& aaa # [perl 128420] recursive matches (?:\1|a)([bcd])\1(?:(?R)|e)\1 abbaccaddedcb y $& abbaccaddedcb # [perl 128420] recursive match with backreferences +(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF # Keep these lines at the end of the file # vim: softtabstop=0 noexpandtab -- 2.11.0
Subject: 5.24-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
From 9dd4e0280eca2ba666cc0671ec3724610ed7d366 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Mon, 19 Feb 2018 15:11:42 +1100 Subject: (perl #132063) we should no longer warn for this code The first patch for 132063 prevented the buffer read overflow when dumping the warning but didn't fix the underlying problem. The next change treats the supplied buffer correctly, preventing the non-UTF-8 SV from being treated as UTF-8, preventing the warning. --- t/lib/warnings/regexec | 3 --- 1 file changed, 3 deletions(-) diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 6635142dea..c370ddc3c7 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); EXPECT ######## # NAME perl #132063, read beyond buffer end -# OPTION fatal "\xff" =~ /(?il)\x{100}|\x{100}/; EXPECT -Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. -Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 294b
On Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote: Show quoted text
> On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
> > On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
> > > CVE form entry for once this is public:
> > > > I've requested a CVE id for this issue.
> > This is CVE-2018-6798
Patches for 5.26.
Subject: 5.26-0002-perl-132063-Heap-buffer-overflow.patch
From 7dc63e39d2d84920efd005092bc4d03b6ab24e1c Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 6 Feb 2018 14:50:48 -0700 Subject: [perl #132063]: Heap buffer overflow The proximal cause is several instances in regexec.c of the code assuming that the input was valid UTF-8, whereas the input was too short for what the start byte claimed it would be. I grepped through the core for any other similar uses, and did not find any. --- regexec.c | 33 ++++++++++++++++++--------------- t/lib/warnings/regexec | 7 +++++++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/regexec.c b/regexec.c index 134b196fc4..fa888823bd 100644 --- a/regexec.c +++ b/regexec.c @@ -1487,7 +1487,9 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_fold \ : trie_latin_utf8_fold))) -#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ +/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is + * 'foldbuf+sizeof(foldbuf)' */ +#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ STMT_START { \ STRLEN skiplen; \ U8 flags = FOLD_FLAGS_FULL; \ @@ -1495,7 +1497,7 @@ STMT_START { case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ - _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \ + _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ case trie_utf8_exactfa_fold: \ @@ -1504,14 +1506,14 @@ STMT_START { case trie_utf8_fold: \ do_trie_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ } else { \ - len = UTF8SKIP(uc); \ - uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \ + uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \ flags); \ + len = UTF8SKIP(uc); \ skiplen = UVCHR_SKIP( uvc ); \ foldlen -= skiplen; \ uscan = foldbuf + skiplen; \ @@ -1522,7 +1524,7 @@ STMT_START { /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ @@ -1541,7 +1543,7 @@ STMT_START { } \ /* FALLTHROUGH */ \ case trie_utf8: \ - uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \ break; \ case trie_plain: \ uvc = (UV)*uc; \ @@ -2623,10 +2625,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, } points[pointpos++ % maxlen]= uc; if (foldlen || uc < (U8*)strend) { - REXEC_TRIE_READ_CHAR(trie_type, trie, - widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, + (U8 *) strend, uscan, len, uvc, + charid, foldlen, foldbuf, + uniflags); DEBUG_TRIE_EXECUTE_r({ dump_exec_pos( (char *)uc, c, strend, real_start, s, utf8_target, 0); @@ -5686,8 +5688,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) if ( base && (foldlen || uc < (U8*)(reginfo->strend))) { I32 offset; REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + (U8 *) reginfo->strend, uscan, + len, uvc, charid, foldlen, + foldbuf, uniflags); charcount++; if (foldlen>0) ST.longfold = TRUE; @@ -5822,8 +5825,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) while (foldlen) { if (!--chars) break; - uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len, - uniflags); + uvc = utf8n_to_uvchr(uscan, foldlen, &len, + uniflags); uscan += len; foldlen -= len; } diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 900dd6ee7f..6635142dea 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); "k" =~ /(?[ \N{KELVIN SIGN} ])/i; ":" =~ /(?[ \: ])/; EXPECT +######## +# NAME perl #132063, read beyond buffer end +# OPTION fatal +"\xff" =~ /(?il)\x{100}|\x{100}/; +EXPECT +Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. +Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
Subject: 5.26-0003-5.26.1-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-acco.patch
From d58a6811d830c2f2f850a03a18129c38cb732791 Mon Sep 17 00:00:00 2001 From: Yves Orton <demerphq@gmail.com> Date: Tue, 13 Feb 2018 16:11:55 +1100 Subject: 5.26.1: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target --- regexec.c | 14 ++++++++++---- t/re/re_tests | 1 + 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/regexec.c b/regexec.c index fa888823bd..cf81b07e30 100644 --- a/regexec.c +++ b/regexec.c @@ -1472,7 +1472,7 @@ Perl_re_intuit_start(pTHX_ #define DECL_TRIE_TYPE(scan) \ const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \ trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \ - trie_utf8l, trie_flu8 } \ + trie_utf8l, trie_flu8, trie_flu8_latin } \ trie_type = ((scan->flags == EXACT) \ ? (utf8_target ? trie_utf8 : trie_plain) \ : (scan->flags == EXACTL) \ @@ -1482,10 +1482,12 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_exactfa_fold \ : trie_latin_utf8_exactfa_fold) \ : (scan->flags == EXACTFLU8 \ - ? trie_flu8 \ + ? (utf8_target \ + ? trie_flu8 \ + : trie_flu8_latin) \ : (utf8_target \ ? trie_utf8_fold \ - : trie_latin_utf8_fold))) + : trie_latin_utf8_fold))) /* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is * 'foldbuf+sizeof(foldbuf)' */ @@ -1496,7 +1498,7 @@ STMT_START { switch (trie_type) { \ case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ - if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ + if (UTF8_IS_ABOVE_LATIN1(*uc)) { \ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ @@ -1519,10 +1521,14 @@ STMT_START { uscan = foldbuf + skiplen; \ } \ break; \ + case trie_flu8_latin: \ + _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ + goto do_trie_latin_utf8_fold; \ case trie_latin_utf8_exactfa_fold: \ flags |= FOLD_FLAGS_NOMIX_ASCII; \ /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ + do_trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ diff --git a/t/re/re_tests b/t/re/re_tests index 410fceadac..78baed6ffc 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1985,6 +1985,7 @@ AB\s+\x{100} AB \x{100}X y - - /(?x)[a b]/xx \N{SPACE} yS $& # Note a space char here /(?xx)[a b]/x \N{SPACE} n - - /(?-x:[a b])/xx \N{SPACE} yS $& # Note a space char here +(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF # Keep these lines at the end of the file # vim: softtabstop=0 noexpandtab -- 2.11.0
Subject: 5.26-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
From 3c4ea4886efdb477335f6931d0a553e818ee172f Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Mon, 19 Feb 2018 15:11:42 +1100 Subject: (perl #132063) we should no longer warn for this code The first patch for 132063 prevented the buffer read overflow when dumping the warning but didn't fix the underlying problem. The next change treats the supplied buffer correctly, preventing the non-UTF-8 SV from being treated as UTF-8, preventing the warning. --- t/lib/warnings/regexec | 3 --- 1 file changed, 3 deletions(-) diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 6635142dea..c370ddc3c7 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); EXPECT ######## # NAME perl #132063, read beyond buffer end -# OPTION fatal "\xff" =~ /(?il)\x{100}|\x{100}/; EXPECT -Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. -Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
Subject: 5.26-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patch
From d72ba890c8d8ac800c9d00a1f542deca11551f33 Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 13 Feb 2018 07:03:43 -0700 Subject: utf8.c: Don't dump malformation past first NUL When a UTF-8 string contains a malformation, the bytes are dumped out as a debugging aid. One should exercise caution, however, and not dump out bytes that are actually past the end of the string. Commit 99a765e9e37 from 2016 added the capability to signal to the dumping routines that we're not sure where the string ends, and to dump the minimal possible. It occurred to me that an additional safety measure can be easily added, which this commit does. And that is, in the dumping routines to stop at the first NUL. All strings automatically get a traiing NUL added, even if they contain embedded NULs. A NUL can never be part of a malformation, and so its presence likely signals the end of the string. --- utf8.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/utf8.c b/utf8.c index a3d5f61b64..61346f0cb6 100644 --- a/utf8.c +++ b/utf8.c @@ -810,7 +810,7 @@ Perl__byte_dump_string(pTHX_ const U8 * s, const STRLEN len, const bool format) PERL_STATIC_INLINE char * S_unexpected_non_continuation_text(pTHX_ const U8 * const s, - /* How many bytes to print */ + /* Max number of bytes to print */ STRLEN print_len, /* Which one is the non-continuation */ @@ -826,6 +826,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s, ? "immediately" : Perl_form(aTHX_ "%d bytes", (int) non_cont_byte_pos); + const U8 * x = s + non_cont_byte_pos; + const U8 * e = s + print_len; PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT; @@ -833,10 +835,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s, * calculated, it's likely faster to pass it; verify under DEBUGGING */ assert(expect_len == UTF8SKIP(s)); + /* As a defensive coding measure, don't output anything past a NUL. Such + * bytes shouldn't be in the middle of a malformation, and could mark the + * end of the allocated string, and what comes after is undefined */ + for (; x < e; x++) { + if (*x == '\0') { + x++; /* Output this particular NUL */ + break; + } + } + return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x," " %s after start byte 0x%02x; need %d bytes, got %d)", malformed_text, - _byte_dump_string(s, print_len, 0), + _byte_dump_string(s, x - s, 0), *(s + non_cont_byte_pos), where, *s, -- 2.11.0
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 294b
On Wed, 07 Feb 2018 14:23:08 -0800, tonyc wrote: Show quoted text
> On Mon, 05 Feb 2018 21:47:34 -0800, tonyc wrote:
> > On Sun, 04 Feb 2018 20:40:45 -0800, tonyc wrote:
> > > CVE form entry for once this is public:
> > > > I've requested a CVE id for this issue.
> > This is CVE-2018-6798
Patches for blead.
Subject: blead-0002-perl-132063-Heap-buffer-overflow.patch
From af3b52eb96c0db51ca5efb7c605f59b4c89c3ae3 Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 6 Feb 2018 14:50:48 -0700 Subject: [perl #132063]: Heap buffer overflow The proximal cause is several instances in regexec.c of the code assuming that the input was valid UTF-8, whereas the input was too short for what the start byte claimed it would be. I grepped through the core for any other similar uses, and did not find any. --- regexec.c | 33 ++++++++++++++++++--------------- t/lib/warnings/regexec | 7 +++++++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/regexec.c b/regexec.c index 1cda2e8e88..47793e6e16 100644 --- a/regexec.c +++ b/regexec.c @@ -1822,7 +1822,9 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_fold \ : trie_latin_utf8_fold))) -#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ +/* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is + * 'foldbuf+sizeof(foldbuf)' */ +#define REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, uc_end, uscan, len, uvc, charid, foldlen, foldbuf, uniflags) \ STMT_START { \ STRLEN skiplen; \ U8 flags = FOLD_FLAGS_FULL; \ @@ -1830,7 +1832,7 @@ STMT_START { case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ - _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc + UTF8SKIP(uc)); \ + _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ case trie_utf8_exactfa_fold: \ @@ -1839,14 +1841,14 @@ STMT_START { case trie_utf8_fold: \ do_trie_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ } else { \ - len = UTF8SKIP(uc); \ - uvc = _toFOLD_utf8_flags( (const U8*) uc, uc + len, foldbuf, &foldlen, \ + uvc = _toFOLD_utf8_flags( (const U8*) uc, uc_end, foldbuf, &foldlen, \ flags); \ + len = UTF8SKIP(uc); \ skiplen = UVCHR_SKIP( uvc ); \ foldlen -= skiplen; \ uscan = foldbuf + skiplen; \ @@ -1857,7 +1859,7 @@ STMT_START { /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ - uvc = utf8n_to_uvchr( (const U8*) uscan, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ uscan += len; \ len=0; \ @@ -1876,7 +1878,7 @@ STMT_START { } \ /* FALLTHROUGH */ \ case trie_utf8: \ - uvc = utf8n_to_uvchr( (const U8*) uc, UTF8_MAXLEN, &len, uniflags ); \ + uvc = utf8n_to_uvchr( (const U8*) uc, uc_end - uc, &len, uniflags ); \ break; \ case trie_plain: \ uvc = (UV)*uc; \ @@ -3038,10 +3040,10 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, } points[pointpos++ % maxlen]= uc; if (foldlen || uc < (U8*)strend) { - REXEC_TRIE_READ_CHAR(trie_type, trie, - widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, + (U8 *) strend, uscan, len, uvc, + charid, foldlen, foldbuf, + uniflags); DEBUG_TRIE_EXECUTE_r({ dump_exec_pos( (char *)uc, c, strend, real_start, s, utf8_target, 0); @@ -6100,8 +6102,9 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) if ( base && (foldlen || uc < (U8*)(reginfo->strend))) { I32 offset; REXEC_TRIE_READ_CHAR(trie_type, trie, widecharmap, uc, - uscan, len, uvc, charid, foldlen, - foldbuf, uniflags); + (U8 *) reginfo->strend, uscan, + len, uvc, charid, foldlen, + foldbuf, uniflags); charcount++; if (foldlen>0) ST.longfold = TRUE; @@ -6236,8 +6239,8 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) while (foldlen) { if (!--chars) break; - uvc = utf8n_to_uvchr(uscan, UTF8_MAXLEN, &len, - uniflags); + uvc = utf8n_to_uvchr(uscan, foldlen, &len, + uniflags); uscan += len; foldlen -= len; } diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 900dd6ee7f..6635142dea 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -260,3 +260,10 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); "k" =~ /(?[ \N{KELVIN SIGN} ])/i; ":" =~ /(?[ \: ])/; EXPECT +######## +# NAME perl #132063, read beyond buffer end +# OPTION fatal +"\xff" =~ /(?il)\x{100}|\x{100}/; +EXPECT +Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. +Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
Subject: blead-0003-fix-TRIE_READ_CHAR-and-DECL_TRIE_TYPE-to-account-for.patch
From 7cb7018008ced2cac377126364a3842d3f91d275 Mon Sep 17 00:00:00 2001 From: Yves Orton <demerphq@gmail.com> Date: Wed, 14 Feb 2018 10:29:26 +1100 Subject: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target --- regexec.c | 14 ++++++++++---- t/re/re_tests | 2 +- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/regexec.c b/regexec.c index 47793e6e16..1630b676ac 100644 --- a/regexec.c +++ b/regexec.c @@ -1807,7 +1807,7 @@ Perl_re_intuit_start(pTHX_ #define DECL_TRIE_TYPE(scan) \ const enum { trie_plain, trie_utf8, trie_utf8_fold, trie_latin_utf8_fold, \ trie_utf8_exactfa_fold, trie_latin_utf8_exactfa_fold, \ - trie_utf8l, trie_flu8 } \ + trie_utf8l, trie_flu8, trie_flu8_latin } \ trie_type = ((scan->flags == EXACT) \ ? (utf8_target ? trie_utf8 : trie_plain) \ : (scan->flags == EXACTL) \ @@ -1817,10 +1817,12 @@ Perl_re_intuit_start(pTHX_ ? trie_utf8_exactfa_fold \ : trie_latin_utf8_exactfa_fold) \ : (scan->flags == EXACTFLU8 \ - ? trie_flu8 \ + ? (utf8_target \ + ? trie_flu8 \ + : trie_flu8_latin) \ : (utf8_target \ ? trie_utf8_fold \ - : trie_latin_utf8_fold))) + : trie_latin_utf8_fold))) /* 'uscan' is set to foldbuf, and incremented, so below the end of uscan is * 'foldbuf+sizeof(foldbuf)' */ @@ -1831,7 +1833,7 @@ STMT_START { switch (trie_type) { \ case trie_flu8: \ _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ - if (utf8_target && UTF8_IS_ABOVE_LATIN1(*uc)) { \ + if (UTF8_IS_ABOVE_LATIN1(*uc)) { \ _CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG(uc, uc_end - uc); \ } \ goto do_trie_utf8_fold; \ @@ -1854,10 +1856,14 @@ STMT_START { uscan = foldbuf + skiplen; \ } \ break; \ + case trie_flu8_latin: \ + _CHECK_AND_WARN_PROBLEMATIC_LOCALE; \ + goto do_trie_latin_utf8_fold; \ case trie_latin_utf8_exactfa_fold: \ flags |= FOLD_FLAGS_NOMIX_ASCII; \ /* FALLTHROUGH */ \ case trie_latin_utf8_fold: \ + do_trie_latin_utf8_fold: \ if ( foldlen>0 ) { \ uvc = utf8n_to_uvchr( (const U8*) uscan, foldlen, &len, uniflags ); \ foldlen -= len; \ diff --git a/t/re/re_tests b/t/re/re_tests index 61b8c875e2..3db034c67e 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1989,7 +1989,7 @@ AB\s+\x{100} AB \x{100}X y - - ([[:ascii:]]+)\x81 a\x80b\x81 y $& b\x81 [[:^ascii:]]+b \x80a\x81b y $& \x81b [[:^ascii:]]+b \x80a\x81\x{100}b y $& \x81\x{100}b - +(?il)\x{100}|\x{100}|\x{FF} \xFF y $& \xFF # Keep these lines at the end of the file # vim: softtabstop=0 noexpandtab -- 2.11.0
Subject: blead-0004-perl-132063-we-should-no-longer-warn-for-this-code.patch
From f7234720f553e29e4629b88d3928052def520da3 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Mon, 19 Feb 2018 15:11:42 +1100 Subject: (perl #132063) we should no longer warn for this code The first patch for 132063 prevented the buffer read overflow when dumping the warning but didn't fix the underlying problem. The next change treats the supplied buffer correctly, preventing the non-UTF-8 SV from being treated as UTF-8, preventing the warning. --- t/lib/warnings/regexec | 3 --- 1 file changed, 3 deletions(-) diff --git a/t/lib/warnings/regexec b/t/lib/warnings/regexec index 6635142dea..c370ddc3c7 100644 --- a/t/lib/warnings/regexec +++ b/t/lib/warnings/regexec @@ -262,8 +262,5 @@ setlocale(&POSIX::LC_CTYPE, $utf8_locale); EXPECT ######## # NAME perl #132063, read beyond buffer end -# OPTION fatal "\xff" =~ /(?il)\x{100}|\x{100}/; EXPECT -Malformed UTF-8 character: \xff (too short; 1 byte available, need 13) in pattern match (m//) at - line 2. -Malformed UTF-8 character (fatal) at - line 2. -- 2.11.0
Subject: blead-0005-utf8.c-Don-t-dump-malformation-past-first-NUL.patch
From f348493a414bc87cf78b4c882c409e2d4b775595 Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 13 Feb 2018 07:03:43 -0700 Subject: utf8.c: Don't dump malformation past first NUL When a UTF-8 string contains a malformation, the bytes are dumped out as a debugging aid. One should exercise caution, however, and not dump out bytes that are actually past the end of the string. Commit 99a765e9e37 from 2016 added the capability to signal to the dumping routines that we're not sure where the string ends, and to dump the minimal possible. It occurred to me that an additional safety measure can be easily added, which this commit does. And that is, in the dumping routines to stop at the first NUL. All strings automatically get a traiing NUL added, even if they contain embedded NULs. A NUL can never be part of a malformation, and so its presence likely signals the end of the string. --- utf8.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/utf8.c b/utf8.c index a30182a43f..4bec553d56 100644 --- a/utf8.c +++ b/utf8.c @@ -1121,7 +1121,7 @@ Perl__byte_dump_string(pTHX_ const U8 * const start, const STRLEN len, const boo PERL_STATIC_INLINE char * S_unexpected_non_continuation_text(pTHX_ const U8 * const s, - /* How many bytes to print */ + /* Max number of bytes to print */ STRLEN print_len, /* Which one is the non-continuation */ @@ -1137,6 +1137,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s, ? "immediately" : Perl_form(aTHX_ "%d bytes", (int) non_cont_byte_pos); + const U8 * x = s + non_cont_byte_pos; + const U8 * e = s + print_len; PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT; @@ -1144,10 +1146,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s, * calculated, it's likely faster to pass it; verify under DEBUGGING */ assert(expect_len == UTF8SKIP(s)); + /* As a defensive coding measure, don't output anything past a NUL. Such + * bytes shouldn't be in the middle of a malformation, and could mark the + * end of the allocated string, and what comes after is undefined */ + for (; x < e; x++) { + if (*x == '\0') { + x++; /* Output this particular NUL */ + break; + } + } + return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x," " %s after start byte 0x%02x; need %d bytes, got %d)", malformed_text, - _byte_dump_string(s, print_len, 0), + _byte_dump_string(s, x - s, 0), *(s + non_cont_byte_pos), where, *s, -- 2.11.0
From: Sawyer X <xsawyerx [...] gmail.com>
CC: Perl 5 Security Report <perl5-security-report [...] perl.org>
Date: Sat, 24 Feb 2018 18:56:17 +0200
Subject: Re: [perl #132063] [CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
To: perl5-security-report-followup [...] perl.org
The public disclosure date is set for March 15th.
RT-Send-CC: perl5-security-report [...] perl.org
Download (untitled) / with headers
text/plain 143b
On Sat, 24 Feb 2018 08:56:44 -0800, xsawyerx@gmail.com wrote: Show quoted text
> The public disclosure date is set for March 15th.
This has been delayed. Tony
To: perl5-security-report-followup [...] perl.org
CC: Perl 5 Security Report <perl5-security-report [...] perl.org>
From: Sawyer X <xsawyerx [...] gmail.com>
Date: Tue, 20 Mar 2018 00:26:20 +0200
Subject: Re: [perl #132063] [CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Download (untitled) / with headers
text/plain 312b
The disclosure date has been postponed officially to April 14th. On 19 March 2018 at 00:13, Tony Cook via RT <perl5-security-report-followup@perl.org> wrote: Show quoted text
> On Sat, 24 Feb 2018 08:56:44 -0800, xsawyerx@gmail.com wrote:
>> The public disclosure date is set for March 15th.
> > This has been delayed. > > Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 152b
On Mon, 19 Mar 2018 15:26:47 -0700, xsawyerx@gmail.com wrote: Show quoted text
> The disclosure date has been postponed officially to April 14th.
Moved to public queue.
Download (untitled) / with headers
text/plain 317b
Thank you for filing this report. You have helped make Perl better. With the release yesterday of Perl 5.28.0, this and 185 other issues have been resolved. Perl 5.28.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.28.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org