Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heap-buffer-overflow in S__byte_dump_string (utf8.c:709) #15659

Closed
p5pRT opened this issue Oct 15, 2016 · 9 comments
Closed

heap-buffer-overflow in S__byte_dump_string (utf8.c:709) #15659

p5pRT opened this issue Oct 15, 2016 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 15, 2016

Migrated from rt.perl.org#129887 (status was 'resolved')

Searchable as RT129887$

@p5pRT
Copy link
Author

p5pRT commented Oct 15, 2016

From @geeknik

Triggered with AFL+ASAN in Perl v5.25.6 (v5.25.5-104-gaff2be5). Note​: If
you can't trigger a crash under GDB, you'll need to LD_PRELOAD the
libdislocator.so that comes with AFL.

Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.

==5806==ERROR​: AddressSanitizer​: heap-buffer-overflow on address
0x60400000a1b3 at pc 0x000000bc488e bp 0x7fff6d3270b0 sp 0x7fff6d3270a8
READ of size 1 at 0x60400000a1b3 thread T0
  #0 0xbc488d in S__byte_dump_string /root/perl/utf8.c​:709​:9
  #1 0xbc488d in S_unexpected_non_continuation_text /root/perl/utf8.c​:763
  #2 0xbc488d in Perl_utf8n_to_uvchr_error /root/perl/utf8.c​:1345
  #3 0xbc87d7 in S_is_utf8_common /root/perl/utf8.c​:2342​:17
  #4 0x684f74 in S_parse_ident /root/perl/toke.c​:8917​:24
  #5 0x688d16 in S_scan_ident /root/perl/toke.c​:9017​:9
  #6 0x65384d in Perl_yylex /root/perl/toke.c​:6331​:6
  #7 0x6addde in Perl_yyparse /root/perl/perly.c​:334​:19
  #8 0x59c581 in S_parse_body /root/perl/perl.c​:2374​:9
  #9 0x59291c in perl_parse /root/perl/perl.c​:1689​:2
  #10 0x4de5e5 in main /root/perl/perlmain.c​:121​:18
  #11 0x7f63ba823b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287
  #12 0x4de27c in _start (/root/perl/perl+0x4de27c)

0x60400000a1b3 is located 0 bytes to the right of 35-byte region
[0x60400000a190,0x60400000a1b3)
allocated by thread T0 here​:
  #0 0x4c0eee in realloc (/root/perl/perl+0x4c0eee)
  #1 0x7f89e6 in Perl_safesysrealloc /root/perl/util.c​:274​:18
  #2 0x632853 in Perl_yylex /root/perl/toke.c​:6600​:6
  #3 0x6addde in Perl_yyparse /root/perl/perly.c​:334​:19
  #4 0x59c581 in S_parse_body /root/perl/perl.c​:2374​:9
  #5 0x59291c in perl_parse /root/perl/perl.c​:1689​:2
  #6 0x4de5e5 in main /root/perl/perlmain.c​:121​:18
  #7 0x7f63ba823b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287

SUMMARY​: AddressSanitizer​: heap-buffer-overflow /root/perl/utf8.c​:709
S__byte_dump_string
Shadow bytes around the buggy address​:
  0x0c087fff93e0​: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x0c087fff93f0​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 06 fa
  0x0c087fff9400​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 02
  0x0c087fff9410​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x0c087fff9420​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
=>0x0c087fff9430​: fa fa 00 00 00 00[03]fa fa fa 00 00 00 00 00 00
  0x0c087fff9440​: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 07 fa
  0x0c087fff9450​: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 02
  0x0c087fff9460​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 00
  0x0c087fff9470​: fa fa 00 00 00 00 03 fa fa fa 00 00 00 00 03 fa
  0x0c087fff9480​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes)​:
  Addressable​: 00
  Partially addressable​: 01 02 03 04 05 06 07
  Heap left redzone​: fa
  Heap right redzone​: fb
  Freed heap region​: fd
  Stack left redzone​: f1
  Stack mid redzone​: f2
  Stack right redzone​: f3
  Stack partial redzone​: f4
  Stack after return​: f5
  Stack use after scope​: f8
  Global redzone​: f9
  Global init order​: f6
  Poisoned by user​: f7
  Container overflow​: fc
  ASan internal​: fe
==5806==ABORTING

Valgrind + non-ASAN Perl​:

Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.
==5797== Invalid read of size 1
==5797== at 0x5CA273​: S__byte_dump_string (utf8.c​:709)
==5797== by 0x5CB052​: S_unexpected_non_continuation_text (utf8.c​:760)
==5797== by 0x5CB052​: Perl_utf8n_to_uvchr_error (utf8.c​:1344)
==5797== by 0x5D2BE6​: S_is_utf8_common (utf8.c​:2342)
==5797== by 0x5D2BE6​: Perl__is_utf8_perl_idstart (utf8.c​:2384)
==5797== by 0x4683A8​: S_parse_ident (toke.c​:8917)
==5797== by 0x4683A8​: S_scan_ident (toke.c​:9017)
==5797== by 0x4768E0​: Perl_yylex (toke.c​:6331)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797== Address 0x5f83013 is 0 bytes after a block of size 35 alloc'd
==5797== at 0x4C2AF2E​: realloc (vg_replace_malloc.c​:692)
==5797== by 0x4D9DDF​: Perl_safesysrealloc (util.c​:274)
==5797== by 0x46A502​: S_scan_str (toke.c​:10213)
==5797== by 0x4729DD​: Perl_yylex (toke.c​:6600)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797==
Malformed UTF-8 character​:
\xff\x80\x69\x6e\x64\x00\x20\x20\x00\x00\x00\x00\x00 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.
==5797== Invalid read of size 1
==5797== at 0x5CA273​: S__byte_dump_string (utf8.c​:709)
==5797== by 0x5CB052​: S_unexpected_non_continuation_text (utf8.c​:760)
==5797== by 0x5CB052​: Perl_utf8n_to_uvchr_error (utf8.c​:1344)
==5797== by 0x5D2BE6​: S_is_utf8_common (utf8.c​:2342)
==5797== by 0x5D2BE6​: Perl__is_utf8_perl_idstart (utf8.c​:2384)
==5797== by 0x468E72​: S_scan_ident (toke.c​:9055)
==5797== by 0x4768E0​: Perl_yylex (toke.c​:6331)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797== Address 0x5f83013 is 0 bytes after a block of size 35 alloc'd
==5797== at 0x4C2AF2E​: realloc (vg_replace_malloc.c​:692)
==5797== by 0x4D9DDF​: Perl_safesysrealloc (util.c​:274)
==5797== by 0x46A502​: S_scan_str (toke.c​:10213)
==5797== by 0x4729DD​: Perl_yylex (toke.c​:6600)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797==
Malformed UTF-8 character​:
\xff\x80\x69\x6e\x64\x00\x20\x20\x00\x00\x00\x00\x00 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Malformed UTF-8 character​: \xff\x80\x69\x6e\x64\x00\x20\x20 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Constant(qq)​: $^H{q} is not defined at test273 line 2, within string
Global symbol "$hWml" requires explicit package name (did you forget to
declare "my $hWml"?) at test273 line 2.
Constant(qq)​: $^H{q} is not defined at test273 line 3, near "Sust (;$hWml"
  (Might be a runaway multi-line "" string starting on line 2)
Constant(qq)​: $^H{q} is not defined at test273 line 3, near "$hWml=~ s/S[C/
/g
$"
syntax error at test273 line 3, near "$hWml=~ s/S[C/ /g
$▒▒ind "
Constant(q)​: $^H{q} is not defined at test273 line 3, near "$▒▒ind "
  'o'"
Constant(0)​: $^H{integer} is not defined at test273 line 3, at end of line
Global symbol "$B" requires explicit package name (did you forget to
declare "my $B"?) at test273 line 3.
Execution of test273 aborted due to compilation errors.

gdb + non-ASAN Perl + libdislocator.so​:

Program received signal SIGSEGV, Segmentation fault.
S__byte_dump_string (s=0x7ffff65f0000 "", len=<optimized out>) at utf8.c​:709
709 const unsigned high_nibble = (*s & 0xF0) >> 4;
(gdb) bt
#0 S__byte_dump_string (s=0x7ffff65f0000 "", len=<optimized out>) at
utf8.c​:709
#1 0x00000000005cb053 in S_unexpected_non_continuation_text
(expect_len=<optimized out>, non_cont_byte_pos=<optimized out>,
print_len=<optimized out>, s=<optimized out>) at utf8.c​:760
#2 Perl_utf8n_to_uvchr_error (s=0x7ffff65efff7 "\377\200ind", curlen=10,
retlen=0x30, flags=0, errors=0x7fffffffdbbc) at utf8.c​:1344
#3 0x00000000005d2be7 in S_is_utf8_common (invlist=<optimized out>,
swashname=<optimized out>, swash=<optimized out>, p=<optimized out>) at
utf8.c​:2342
#4 Perl__is_utf8_perl_idstart (p=0x7ffff65efff7 "\377\200ind") at
utf8.c​:2384
#5 0x00000000004683a9 in S_parse_ident (check_dollar=<optimized out>,
is_utf8=<optimized out>, allow_package=<optimized out>, e=<optimized out>,
d=<optimized out>, s=<optimized out>) at toke.c​:8917
#6 S_scan_ident (s=0x7ffff6585fcb
"\\xff\\x80\\x69\\x6e\\x64\\x00\\x20\\x20\\x00", 'A' <repeats 17 times>,
dest=0x7ffff665eee1 "hWml", destlen=48, ck_uni=-161980429) at toke.c​:9017
#7 0x00000000004768e1 in Perl_yylex () at toke.c​:6331
#8 0x000000000048aa3b in Perl_yyparse (gramtype=171) at perly.c​:334
#9 0x0000000000450f88 in S_parse_body (env=env@​entry=0x0,
xsinit=xsinit@​entry=0x421a90 <xs_init>) at perl.c​:2374
#10 0x0000000000452b1d in perl_parse (my_perl=<optimized out>,
xsinit=xsinit@​entry=0x421a90 <xs_init>, argc=2, argv=0x7fffffffe688,
env=env@​entry=0x0) at perl.c​:1689
#11 0x0000000000421900 in main (argc=2, argv=0x7fffffffe688,
env=0x7fffffffe6a0) at perlmain.c​:121
(gdb) i r
rax 0x7ffff6585ff2 140737326374898
rbx 0x7ffff65f0004 140737326809092
rcx 0x7ffff6585ff3 140737326374899
rdx 0x30 48
rsi 0xa 10
rdi 0x7ffff6585fcb 140737326374859
rbp 0x7ffff65efff7 0x7ffff65efff7
rsp 0x7fffffffdb20 0x7fffffffdb20
r8 0x30 48
r9 0x7ffff65f0000 140737326809088
r10 0x22 34
r11 0x246 582
r12 0x7ffff6585fcb 140737326374859
r13 0x0 0
r14 0xd 13
r15 0x7fffffffdbbc 140737488346044
rip 0x5ca273 0x5ca273 <S__byte_dump_string+115>
eflags 0x10297 [ CF PF AF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) list
704 Newx(output, output_len, char);
705 SAVEFREEPV(output);
706
707 d = output;
708 for (; s < e; s++) {
709 const unsigned high_nibble = (*s & 0xF0) >> 4;
710 const unsigned low_nibble = (*s & 0x0F);
711
712 *d++ = '\\';
713 *d++ = 'x';

@p5pRT
Copy link
Author

p5pRT commented Oct 15, 2016

From @geeknik

test273.gz

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

From @khwilliamson

On 10/15/2016 03​:40 PM, Brian Carpenter (via RT) wrote​:

# New Ticket Created by Brian Carpenter
# Please include the string​: [perl #129887]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129887 >

Triggered with AFL+ASAN in Perl v5.25.6 (v5.25.5-104-gaff2be5). Note​: If
you can't trigger a crash under GDB, you'll need to LD_PRELOAD the
libdislocator.so that comes with AFL.

Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.

This is fundamentally the result of poor initial design in the utf8
character classification macros. They assume well-formed UTF-8, and
hence do not have a length parameter that can be checked. This was made
worse recently by the changes to utf8.c that print the entire byte
sequence that appears to be wrong, when in fact the tail of that
sequence can be beyond the buffer end. I have just fixed this
particular case by commit 3cc6a05 which
works to not read past a NUL, which is "typically" appended to strings.
One could argue that one should not try to read past the first erroneous
byte, and that would minimize the chances further. But that doesn't
bring the probability of reading past the buffer to zero. What I think
should happen is that all the macros that can cause this should have new
versions that take a length parameter, like utf8_hop_safe() just added
by Tony. The existing macros can be turned into function calls that can
be deprecated, or can raise deprecation warnings as part of their
expansions. This could result in a flood of messages unless a hash was
kept so that they only raise the warning once per invocation place.

This particular ticket isn't a security issue, since the problem has not
made it even into a 5.25 development release. But I'm leaving it in
this queue for the moment because of the larger issues brought out by
it, and I want to get feedback about them.

=================================================================
==5806==ERROR​: AddressSanitizer​: heap-buffer-overflow on address
0x60400000a1b3 at pc 0x000000bc488e bp 0x7fff6d3270b0 sp 0x7fff6d3270a8
READ of size 1 at 0x60400000a1b3 thread T0
#0 0xbc488d in S__byte_dump_string /root/perl/utf8.c​:709​:9
#1 0xbc488d in S_unexpected_non_continuation_text /root/perl/utf8.c​:763
#2 0xbc488d in Perl_utf8n_to_uvchr_error /root/perl/utf8.c​:1345
#3 0xbc87d7 in S_is_utf8_common /root/perl/utf8.c​:2342​:17
#4 0x684f74 in S_parse_ident /root/perl/toke.c​:8917​:24
#5 0x688d16 in S_scan_ident /root/perl/toke.c​:9017​:9
#6 0x65384d in Perl_yylex /root/perl/toke.c​:6331​:6
#7 0x6addde in Perl_yyparse /root/perl/perly.c​:334​:19
#8 0x59c581 in S_parse_body /root/perl/perl.c​:2374​:9
#9 0x59291c in perl_parse /root/perl/perl.c​:1689​:2
#10 0x4de5e5 in main /root/perl/perlmain.c​:121​:18
#11 0x7f63ba823b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287
#12 0x4de27c in _start (/root/perl/perl+0x4de27c)

0x60400000a1b3 is located 0 bytes to the right of 35-byte region
[0x60400000a190,0x60400000a1b3)
allocated by thread T0 here​:
#0 0x4c0eee in realloc (/root/perl/perl+0x4c0eee)
#1 0x7f89e6 in Perl_safesysrealloc /root/perl/util.c​:274​:18
#2 0x632853 in Perl_yylex /root/perl/toke.c​:6600​:6
#3 0x6addde in Perl_yyparse /root/perl/perly.c​:334​:19
#4 0x59c581 in S_parse_body /root/perl/perl.c​:2374​:9
#5 0x59291c in perl_parse /root/perl/perl.c​:1689​:2
#6 0x4de5e5 in main /root/perl/perlmain.c​:121​:18
#7 0x7f63ba823b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287

SUMMARY​: AddressSanitizer​: heap-buffer-overflow /root/perl/utf8.c​:709
S__byte_dump_string
Shadow bytes around the buggy address​:
0x0c087fff93e0​: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
0x0c087fff93f0​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 06 fa
0x0c087fff9400​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 02
0x0c087fff9410​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
0x0c087fff9420​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
=>0x0c087fff9430​: fa fa 00 00 00 00[03]fa fa fa 00 00 00 00 00 00
0x0c087fff9440​: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 07 fa
0x0c087fff9450​: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 02
0x0c087fff9460​: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 00
0x0c087fff9470​: fa fa 00 00 00 00 03 fa fa fa 00 00 00 00 03 fa
0x0c087fff9480​: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes)​:
Addressable​: 00
Partially addressable​: 01 02 03 04 05 06 07
Heap left redzone​: fa
Heap right redzone​: fb
Freed heap region​: fd
Stack left redzone​: f1
Stack mid redzone​: f2
Stack right redzone​: f3
Stack partial redzone​: f4
Stack after return​: f5
Stack use after scope​: f8
Global redzone​: f9
Global init order​: f6
Poisoned by user​: f7
Container overflow​: fc
ASan internal​: fe
==5806==ABORTING

Valgrind + non-ASAN Perl​:

Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.
==5797== Invalid read of size 1
==5797== at 0x5CA273​: S__byte_dump_string (utf8.c​:709)
==5797== by 0x5CB052​: S_unexpected_non_continuation_text (utf8.c​:760)
==5797== by 0x5CB052​: Perl_utf8n_to_uvchr_error (utf8.c​:1344)
==5797== by 0x5D2BE6​: S_is_utf8_common (utf8.c​:2342)
==5797== by 0x5D2BE6​: Perl__is_utf8_perl_idstart (utf8.c​:2384)
==5797== by 0x4683A8​: S_parse_ident (toke.c​:8917)
==5797== by 0x4683A8​: S_scan_ident (toke.c​:9017)
==5797== by 0x4768E0​: Perl_yylex (toke.c​:6331)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797== Address 0x5f83013 is 0 bytes after a block of size 35 alloc'd
==5797== at 0x4C2AF2E​: realloc (vg_replace_malloc.c​:692)
==5797== by 0x4D9DDF​: Perl_safesysrealloc (util.c​:274)
==5797== by 0x46A502​: S_scan_str (toke.c​:10213)
==5797== by 0x4729DD​: Perl_yylex (toke.c​:6600)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797==
Malformed UTF-8 character​:
\xff\x80\x69\x6e\x64\x00\x20\x20\x00\x00\x00\x00\x00 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Passing malformed UTF-8 to "_Perl_IDStart" is deprecated at test273 line 3.
==5797== Invalid read of size 1
==5797== at 0x5CA273​: S__byte_dump_string (utf8.c​:709)
==5797== by 0x5CB052​: S_unexpected_non_continuation_text (utf8.c​:760)
==5797== by 0x5CB052​: Perl_utf8n_to_uvchr_error (utf8.c​:1344)
==5797== by 0x5D2BE6​: S_is_utf8_common (utf8.c​:2342)
==5797== by 0x5D2BE6​: Perl__is_utf8_perl_idstart (utf8.c​:2384)
==5797== by 0x468E72​: S_scan_ident (toke.c​:9055)
==5797== by 0x4768E0​: Perl_yylex (toke.c​:6331)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797== Address 0x5f83013 is 0 bytes after a block of size 35 alloc'd
==5797== at 0x4C2AF2E​: realloc (vg_replace_malloc.c​:692)
==5797== by 0x4D9DDF​: Perl_safesysrealloc (util.c​:274)
==5797== by 0x46A502​: S_scan_str (toke.c​:10213)
==5797== by 0x4729DD​: Perl_yylex (toke.c​:6600)
==5797== by 0x48AA3A​: Perl_yyparse (perly.c​:334)
==5797== by 0x450F87​: S_parse_body (perl.c​:2374)
==5797== by 0x452B1C​: perl_parse (perl.c​:1689)
==5797== by 0x4218FF​: main (perlmain.c​:121)
==5797==
Malformed UTF-8 character​:
\xff\x80\x69\x6e\x64\x00\x20\x20\x00\x00\x00\x00\x00 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Malformed UTF-8 character​: \xff\x80\x69\x6e\x64\x00\x20\x20 (unexpected
non-continuation byte 0x69, 2 bytes after start byte 0xff; need 13 bytes,
got 2) at test273 line 3.
Constant(qq)​: $^H{q} is not defined at test273 line 2, within string
Global symbol "$hWml" requires explicit package name (did you forget to
declare "my $hWml"?) at test273 line 2.
Constant(qq)​: $^H{q} is not defined at test273 line 3, near "Sust (;$hWml"
(Might be a runaway multi-line "" string starting on line 2)
Constant(qq)​: $^H{q} is not defined at test273 line 3, near "$hWml=~ s/S[C/
/g
$"
syntax error at test273 line 3, near "$hWml=~ s/S[C/ /g
$▒▒ind "
Constant(q)​: $^H{q} is not defined at test273 line 3, near "$▒▒ind "
'o'"
Constant(0)​: $^H{integer} is not defined at test273 line 3, at end of line
Global symbol "$B" requires explicit package name (did you forget to
declare "my $B"?) at test273 line 3.
Execution of test273 aborted due to compilation errors.

gdb + non-ASAN Perl + libdislocator.so​:

Program received signal SIGSEGV, Segmentation fault.
S__byte_dump_string (s=0x7ffff65f0000 "", len=<optimized out>) at utf8.c​:709
709 const unsigned high_nibble = (*s & 0xF0) >> 4;
(gdb) bt
#0 S__byte_dump_string (s=0x7ffff65f0000 "", len=<optimized out>) at
utf8.c​:709
#1 0x00000000005cb053 in S_unexpected_non_continuation_text
(expect_len=<optimized out>, non_cont_byte_pos=<optimized out>,
print_len=<optimized out>, s=<optimized out>) at utf8.c​:760
#2 Perl_utf8n_to_uvchr_error (s=0x7ffff65efff7 "\377\200ind", curlen=10,
retlen=0x30, flags=0, errors=0x7fffffffdbbc) at utf8.c​:1344
#3 0x00000000005d2be7 in S_is_utf8_common (invlist=<optimized out>,
swashname=<optimized out>, swash=<optimized out>, p=<optimized out>) at
utf8.c​:2342
#4 Perl__is_utf8_perl_idstart (p=0x7ffff65efff7 "\377\200ind") at
utf8.c​:2384
#5 0x00000000004683a9 in S_parse_ident (check_dollar=<optimized out>,
is_utf8=<optimized out>, allow_package=<optimized out>, e=<optimized out>,
d=<optimized out>, s=<optimized out>) at toke.c​:8917
#6 S_scan_ident (s=0x7ffff6585fcb
"\\xff\\x80\\x69\\x6e\\x64\\x00\\x20\\x20\\x00", 'A' <repeats 17 times>,
dest=0x7ffff665eee1 "hWml", destlen=48, ck_uni=-161980429) at toke.c​:9017
#7 0x00000000004768e1 in Perl_yylex () at toke.c​:6331
#8 0x000000000048aa3b in Perl_yyparse (gramtype=171) at perly.c​:334
#9 0x0000000000450f88 in S_parse_body (env=env@​entry=0x0,
xsinit=xsinit@​entry=0x421a90 <xs_init>) at perl.c​:2374
#10 0x0000000000452b1d in perl_parse (my_perl=<optimized out>,
xsinit=xsinit@​entry=0x421a90 <xs_init>, argc=2, argv=0x7fffffffe688,
env=env@​entry=0x0) at perl.c​:1689
#11 0x0000000000421900 in main (argc=2, argv=0x7fffffffe688,
env=0x7fffffffe6a0) at perlmain.c​:121
(gdb) i r
rax 0x7ffff6585ff2 140737326374898
rbx 0x7ffff65f0004 140737326809092
rcx 0x7ffff6585ff3 140737326374899
rdx 0x30 48
rsi 0xa 10
rdi 0x7ffff6585fcb 140737326374859
rbp 0x7ffff65efff7 0x7ffff65efff7
rsp 0x7fffffffdb20 0x7fffffffdb20
r8 0x30 48
r9 0x7ffff65f0000 140737326809088
r10 0x22 34
r11 0x246 582
r12 0x7ffff6585fcb 140737326374859
r13 0x0 0
r14 0xd 13
r15 0x7fffffffdbbc 140737488346044
rip 0x5ca273 0x5ca273 <S__byte_dump_string+115>
eflags 0x10297 [ CF PF AF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) list
704 Newx(output, output_len, char);
705 SAVEFREEPV(output);
706
707 d = output;
708 for (; s < e; s++) {
709 const unsigned high_nibble = (*s & 0xF0) >> 4;
710 const unsigned low_nibble = (*s & 0x0F);
711
712 *d++ = '\\';
713 *d++ = 'x';

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2017

From @tonycoz

On Wed, 19 Oct 2016 20​:46​:15 -0700, public@​khwilliamson.com wrote​:

This is fundamentally the result of poor initial design in the utf8
character classification macros. They assume well-formed UTF-8, and
hence do not have a length parameter that can be checked. This was
made
worse recently by the changes to utf8.c that print the entire byte
sequence that appears to be wrong, when in fact the tail of that
sequence can be beyond the buffer end. I have just fixed this
particular case by commit 3cc6a05
which
works to not read past a NUL, which is "typically" appended to
strings.
One could argue that one should not try to read past the first
erroneous
byte, and that would minimize the chances further. But that doesn't
bring the probability of reading past the buffer to zero. What I
think
should happen is that all the macros that can cause this should have
new
versions that take a length parameter, like utf8_hop_safe() just added
by Tony. The existing macros can be turned into function calls that
can
be deprecated, or can raise deprecation warnings as part of their
expansions. This could result in a flood of messages unless a hash
was
kept so that they only raise the warning once per invocation place.

This particular ticket isn't a security issue, since the problem has
not
made it even into a 5.25 development release. But I'm leaving it in
this queue for the moment because of the larger issues brought out by
it, and I want to get feedback about them.

Would you have a list of UTF-8 functions/macros that don't take a length limit and their replacements (if any)?

This would give others a checklist we can use to​:

- add replacements where they don't exist

- deprecate them where they aren't already deprecated

- replace the use of the deprecated functions in core

- possibly add the replacements to Devel​::PPPort

Tony

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2017

From @khwilliamson

On 01/16/2017 05​:24 PM, Tony Cook via RT wrote​:

On Wed, 19 Oct 2016 20​:46​:15 -0700, public@​khwilliamson.com wrote​:

This is fundamentally the result of poor initial design in the utf8
character classification macros. They assume well-formed UTF-8, and
hence do not have a length parameter that can be checked. This was
made
worse recently by the changes to utf8.c that print the entire byte
sequence that appears to be wrong, when in fact the tail of that
sequence can be beyond the buffer end. I have just fixed this
particular case by commit 3cc6a05
which
works to not read past a NUL, which is "typically" appended to
strings.
One could argue that one should not try to read past the first
erroneous
byte, and that would minimize the chances further. But that doesn't
bring the probability of reading past the buffer to zero. What I
think
should happen is that all the macros that can cause this should have
new
versions that take a length parameter, like utf8_hop_safe() just added
by Tony. The existing macros can be turned into function calls that
can
be deprecated, or can raise deprecation warnings as part of their
expansions. This could result in a flood of messages unless a hash
was
kept so that they only raise the warning once per invocation place.

This particular ticket isn't a security issue, since the problem has
not
made it even into a 5.25 development release. But I'm leaving it in
this queue for the moment because of the larger issues brought out by
it, and I want to get feedback about them.

Would you have a list of UTF-8 functions/macros that don't take a length limit and their replacements (if any)?

This would give others a checklist we can use to​:

- add replacements where they don't exist

- deprecate them where they aren't already deprecated

- replace the use of the deprecated functions in core

- possibly add the replacements to Devel​::PPPort

Tony

I forgot about this ticket. These have all been fixed in core, are all
documented in perlapi, and the unsafe versions are all deprecated.

It took a bunch of commits to accomplish this, but
da8c1a9
a239b1e

are the ones that added them, with surrounding commits involved in
converting core uses and deprecating the unsafe ones.

I'm reluctant to put them in PPPort, as the unsafe ones weren't in it,
and our UTF-8 handling hasn't been really very good until fairly
recently. And there has been no clamoring for this functionality.

So this ticket can be closed. Should it be moved to the public queue?
I don't understand the motivation for moving the others that have gone
there.

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2017

From @tonycoz

On Mon, Jan 16, 2017 at 09​:29​:43PM -0700, Karl Williamson wrote​:

I forgot about this ticket. These have all been fixed in core, are all
documented in perlapi, and the unsafe versions are all deprecated.

It took a bunch of commits to accomplish this, but
da8c1a9
a239b1e

are the ones that added them, with surrounding commits involved in
converting core uses and deprecating the unsafe ones.

Thanks.

I'm reluctant to put them in PPPort, as the unsafe ones weren't in it, and
our UTF-8 handling hasn't been really very good until fairly recently. And
there has been no clamoring for this functionality.

If the unsafe ones aren't there, we can leave the safe ones out for now.

So this ticket can be closed. Should it be moved to the public queue? I
don't understand the motivation for moving the others that have gone there.

This is a public project, ideally we keep security tickets private
only long enough to satisfy the requirements of security and then make
them public.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jan 19, 2017

From @khwilliamson

Moved to public queue and closed
--
Karl Williamson

@p5pRT p5pRT closed this as completed Jan 19, 2017
@p5pRT
Copy link
Author

p5pRT commented Jan 19, 2017

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant