Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl_re_op_compile(SV **const, int, OP *, const regexp_engine *, REGEXP *, _Bool *, U32, U32): Assertion `*(pRExC_state->end) == '\0'' failed (regcomp.c:7012) #15606

Closed
p5pRT opened this issue Sep 16, 2016 · 38 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 16, 2016

Migrated from rt.perl.org#129287 (status was 'resolved')

Searchable as RT129287$

@p5pRT
Copy link
Author

p5pRT commented Aug 19, 2016

From @geeknik

perl -e 'v300&O|0' triggers a heap-buffer-overflow in Perl_my_atof2
(numeric.c​:1349). This was found with AFL, ASAN and libdislocator.so and
affects v5.25.4 (v5.25.3-305-g8c6b0c7). Perl 5.20.2 returns errors, doesn't
crash.

==23567==ERROR​: AddressSanitizer​: heap-buffer-overflow on address
0x60200000e1ba at pc 0x0000004abd02 bp 0x7ffced70a210 sp 0x7ffced7099d0
READ of size 11 at 0x60200000e1ba thread T0
  #0 0x4abd01 in __interceptor_strlen (/root/perl/perl+0x4abd01)
  #1 0xc2edf7 in Perl_my_atof2 /root/perl/numeric.c​:1349​:28
  #2 0xc2e7a5 in Perl_my_atof /root/perl/numeric.c​:1244​:13
  #3 0x99bbfc in S_sv_setnv /root/perl/sv.c​:2113​:9
  #4 0x8fd5fe in S_sv_2iuv_common /root/perl/sv.c​:2298​:13
  #5 0x900fd5 in Perl_sv_2uv_flags /root/perl/sv.c​:2574​:6
  #6 0x9c7738 in Perl_pp_bit_or /root/perl/pp.c​:2463​:35
  #7 0x7f1d93 in Perl_runops_debug /root/perl/dump.c​:2234​:23
  #8 0x5a11d6 in S_run_body /root/perl/perl.c​:2524​:2
  #9 0x5a11d6 in perl_run /root/perl/perl.c​:2447
  #10 0x4de85d in main /root/perl/perlmain.c​:123​:9
  #11 0x7ff026dedb44 in __libc_start_main
/build/glibc-uPj9cH/glibc-2.19/csu/libc-start.c​:287
  #12 0x4de4cc in _start (/root/perl/perl+0x4de4cc)

0x60200000e1ba is located 0 bytes to the right of 10-byte region
[0x60200000e1b0,0x60200000e1ba)
allocated by thread T0 here​:
  #0 0x4c0e4b in malloc (/root/perl/perl+0x4c0e4b)
  #1 0x7f5bd7 in Perl_safesysmalloc /root/perl/util.c​:153​:21

SUMMARY​: AddressSanitizer​: heap-buffer-overflow ??​:0 __interceptor_strlen
Shadow bytes around the buggy address​:
  0x0c047fff9be0​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9bf0​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c00​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c10​: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9c20​: fa fa fa fa fa fa fd fd fa fa 00 02 fa fa fd fd
=>0x0c047fff9c30​: fa fa fd fd fa fa 00[02]fa fa fd fd fa fa fd fd
  0x0c047fff9c40​: fa fa 00 02 fa fa 05 fa fa fa 00 02 fa fa 00 02
  0x0c047fff9c50​: fa fa fd fd fa fa fd fd fa fa 02 fa fa fa 00 02
  0x0c047fff9c60​: fa fa 00 07 fa fa 00 fa fa fa 00 02 fa fa 05 fa
  0x0c047fff9c70​: fa fa 00 02 fa fa 06 fa fa fa 00 02 fa fa 05 fa
  0x0c047fff9c80​: fa fa 00 05 fa fa 04 fa fa fa 05 fa fa fa 05 fa
Shadow byte legend (one shadow byte represents 8 application bytes)​:
  Addressable​: 00
  Partially addressable​: 01 02 03 04 05 06 07
  Heap left redzone​: fa
  Heap right redzone​: fb
  Freed heap region​: fd
  Stack left redzone​: f1
  Stack mid redzone​: f2
  Stack right redzone​: f3
  Stack partial redzone​: f4
  Stack after return​: f5
  Stack use after scope​: f8
  Global redzone​: f9
  Global init order​: f6
  Poisoned by user​: f7
  Container overflow​: fc
  ASan internal​: fe
==23567==ABORTING

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From @tonycoz

On Fri Aug 19 12​:44​:43 2016, brian.carpenter@​gmail.com wrote​:

perl -e 'v300&O|0' triggers a heap-buffer-overflow in Perl_my_atof2
(numeric.c​:1349). This was found with AFL, ASAN and libdislocator.so and
affects v5.25.4 (v5.25.3-305-g8c6b0c7). Perl 5.20.2 returns errors, doesn't
crash.

==23567==ERROR​: AddressSanitizer​: heap-buffer-overflow on address
0x60200000e1ba at pc 0x0000004abd02 bp 0x7ffced70a210 sp 0x7ffced7099d0
READ of size 11 at 0x60200000e1ba thread T0
#0 0x4abd01 in __interceptor_strlen (/root/perl/perl+0x4abd01)
#1 0xc2edf7 in Perl_my_atof2 /root/perl/numeric.c​:1349​:28
#2 0xc2e7a5 in Perl_my_atof /root/perl/numeric.c​:1244​:13
#3 0x99bbfc in S_sv_setnv /root/perl/sv.c​:2113​:9
#4 0x8fd5fe in S_sv_2iuv_common /root/perl/sv.c​:2298​:13
#5 0x900fd5 in Perl_sv_2uv_flags /root/perl/sv.c​:2574​:6
#6 0x9c7738 in Perl_pp_bit_or /root/perl/pp.c​:2463​:35
#7 0x7f1d93 in Perl_runops_debug /root/perl/dump.c​:2234​:23
#8 0x5a11d6 in S_run_body /root/perl/perl.c​:2524​:2
#9 0x5a11d6 in perl_run /root/perl/perl.c​:2447
#10 0x4de85d in main /root/perl/perlmain.c​:123​:9
#11 0x7ff026dedb44 in __libc_start_main
/build/glibc-uPj9cH/glibc-2.19/csu/libc-start.c​:287
#12 0x4de4cc in _start (/root/perl/perl+0x4de4cc)

Also produces uninitialized access warnings from valgrind​:

==13615== Conditional jump or move depends on uninitialised value(s)
==13615== at 0x4C2C1B8​: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13615== by 0x716772​: Perl_my_atof2 (numeric.c​:1349)
==13615== by 0x716502​: Perl_my_atof (numeric.c​:1244)
==13615== by 0x5C8003​: S_sv_setnv (sv.c​:2113)
==13615== by 0x5C9C33​: S_sv_2iuv_common (sv.c​:2298)
==13615== by 0x5CC200​: Perl_sv_2uv_flags (sv.c​:2574)
==13615== by 0x62DD67​: Perl_pp_bit_or (pp.c​:2463)
==13615== by 0x55906B​: Perl_runops_debug (dump.c​:2234)
==13615== by 0x462A94​: S_run_body (perl.c​:2525)
==13615== by 0x4620BF​: perl_run (perl.c​:2448)
==13615== by 0x41EFDD​: main (perlmain.c​:123)
==13615==

The UTF-8 string path of bit-and in do_vop() doesn't NUL terminate
the resulting string.

The bit-or op then attempts to or the result of that and a number, so
attempts to convert that result into a number and strlen() attempts to
access the undefined bytes following the result of the bit-and.

I couldn't work up a test for it that would fail without sanitize or valgrind.

I can see a small chance of this being a security issue, eg. if an SV was re-used for the result of bit-and and that value was used in a context
where NUL termination is required (like opening a file), but that seems unlikely for a string bit-and result.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From @tonycoz

0001-perl-128998-NUL-terminate-the-result-of-bit-on-UTF-8.patch
From bca1770e25759434e3e158f55ddcbf2cbe39c0a3 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 23 Aug 2016 10:46:32 +1000
Subject: [PATCH] (perl #128998) NUL terminate the result of bit & on UTF-8
 strings

---
 doop.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doop.c b/doop.c
index ad9172a..f9e308b 100644
--- a/doop.c
+++ b/doop.c
@@ -1093,6 +1093,7 @@ Perl_do_vop(pTHX_ I32 optype, SV *sv, SV *left, SV *right)
 	    if (sv == left || sv == right)
 		(void)sv_usepvn(sv, dcorig, needlen);
 	    SvCUR_set(sv, dc - dcorig);
+            *SvEND(sv) = '\0';
 	    break;
 	case OP_BIT_XOR:
 	    while (lulen && rulen) {
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From zefram@fysh.org

Tony Cook via RT wrote​:

The UTF-8 string path of bit-and in do_vop() doesn't NUL terminate
the resulting string.

With this being a problem here, there are going to be similar problems
from other sources of unterminated strings. It's semi-approved for
XS modules to release unterminated PVs into general use, which they
generate by mmapping files, for example. Vulnerabilities could arise
by combining such a module with anything that depends on nul termination.

I couldn't work up a test for it that would fail without sanitize or valgrind.

On some, but not all, builds I can detect the problem thus​:

$ perl -lwe 'print length unpack("p", pack("p", v300 & "O"))'
4

Devel​::Peek shows lack of nul termination on all builds that I tried,
but its results aren't available back to Perl code. In any case, the
extra content beyond the end of the string is garbage, so this kind of
test will be unreliable. I haven't found a way to reliably control the
extra string content. It's not as simple as initialising the SV that
will hold the result​: &= yields nul termination.

I can see a small chance of this being a security issue, eg. if an SV
was re-used for the result of bit-and and that value was used in a context
where NUL termination is required (like opening a file),

This problem does occur​:

$ perl -lwe ' open(FOO, ">", (v335 & "O"))'
$ echo O* | od -tx1
0000000 4f 69 44 02 0a
0000005

Unlikely, yes, but the semantics are notionally guaranteed. I think it
is proper to treat this as a security issue.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From @dcollinsn

Hello!

While fuzzing the a quadmath build of the perl interpreter, I encountered the following testcase​:

perl -e '0|v1000&E'

This appears to be creating a string with a codepoint over 255. The output​:

$ ../bin/perl -e '0|v1000&E'
Operator or semicolon missing before &E at -e line 1.
Ambiguous use of & resolved as operator & at -e line 1.
Use of strings with code points over 0xFF as arguments to bitwise and (&) operator is deprecated at -e line 1.
$ echo $?
0

Under Valgrind, with --track-origins=yes​:

Operator or semicolon missing before &E at -e line 1.
Ambiguous use of & resolved as operator & at -e line 1.
Use of strings with code points over 0xFF as arguments to bitwise and (&) operator is deprecated at -e line 1.
==45631== Conditional jump or move depends on uninitialised value(s)
==45631== at 0x4C2BBD8​: strlen (vg_replace_strmem.c​:454)
==45631== by 0xDF4F77​: Perl_my_atof2 (numeric.c​:1349)
==45631== by 0xDF4F77​: Perl_my_atof (numeric.c​:1210)
==45631== by 0x93B64B​: S_sv_setnv (sv.c​:2113)
==45631== by 0x9D1321​: S_sv_2iuv_common (sv.c​:2298)
==45631== by 0x9D880F​: Perl_sv_2uv_flags (sv.c​:2574)
==45631== by 0xA80A65​: Perl_pp_bit_or (pp.c​:2464)
==45631== by 0x7E0833​: Perl_runops_debug (dump.c​:2234)
==45631== by 0x53A0D8​: S_run_body (perl.c​:2525)
==45631== by 0x53A0D8​: perl_run (perl.c​:2448)
==45631== by 0x429A47​: main (perlmain.c​:123)
==45631== Uninitialised value was created by a heap allocation
==45631== at 0x4C28C0F​: malloc (vg_replace_malloc.c​:299)
==45631== by 0x7EF59C​: Perl_safesysmalloc (util.c​:153)
==45631== by 0x99646F​: Perl_sv_grow (sv.c​:1605)
==45631== by 0x9CB173​: Perl_sv_setpvn (sv.c​:4898)
==45631== by 0xC324AE​: Perl_do_vop (doop.c​:1011)
==45631== by 0xA7E15E​: Perl_pp_bit_and (pp.c​:2407)
==45631== by 0x7E0833​: Perl_runops_debug (dump.c​:2234)
==45631== by 0x45DB75​: S_fold_constants (op.c​:4512)
==45631== by 0x665A10​: Perl_yyparse (perly.y​:970)
==45631== by 0x530344​: S_parse_body (perl.c​:2373)
==45631== by 0x537586​: perl_parse (perl.c​:1689)
==45631== by 0x4297B7​: main (perlmain.c​:121)
==45631==
==45631==
==45631== HEAP SUMMARY​:
==45631== in use at exit​: 104,090 bytes in 517 blocks
==45631== total heap usage​: 700 allocs, 183 frees, 148,350 bytes allocated
==45631==
==45631== LEAK SUMMARY​:
==45631== definitely lost​: 0 bytes in 0 blocks
==45631== indirectly lost​: 0 bytes in 0 blocks
==45631== possibly lost​: 0 bytes in 0 blocks
==45631== still reachable​: 104,090 bytes in 517 blocks
==45631== suppressed​: 0 bytes in 0 blocks
==45631== Rerun with --leak-check=full to see details of leaked memory
==45631==
==45631== For counts of detected and suppressed errors, rerun with​: -v
==45631== ERROR SUMMARY​: 1 errors from 1 contexts (suppressed​: 0 from 0)

And under GDB, using AFL's libdislocator to turn the invalid access into a SEGV​:

(gdb) run
Starting program​: /usr/local/perl-afl/bin/perl -e 0\|v1000\&E
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Operator or semicolon missing before &E at -e line 1.
Ambiguous use of & resolved as operator & at -e line 1.
Use of strings with code points over 0xFF as arguments to bitwise and (&) operator is deprecated at -e line 1.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b452b0 in strlen () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff6b452b0 in strlen () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000df4f78 in Perl_my_atof2 (value=<synthetic pointer>, orig=0x7ffff637fff6 "@​AAAAAAAAA") at numeric.c​:1349
#2 Perl_my_atof (s=0x7ffff637fff6 "@​AAAAAAAAA") at numeric.c​:1210
#3 0x000000000093b64c in S_sv_setnv (sv=sv@​entry=0x7ffff660ae98, numtype=numtype@​entry=0) at sv.c​:2113
#4 0x00000000009d1322 in S_sv_2iuv_common (sv=sv@​entry=0x7ffff660ae98) at sv.c​:2298
#5 0x00000000009d8810 in Perl_sv_2uv_flags (sv=sv@​entry=0x7ffff660ae98, flags=flags@​entry=0) at sv.c​:2574
#6 0x0000000000a80a66 in Perl_pp_bit_or () at pp.c​:2464
#7 0x00000000007e0834 in Perl_runops_debug () at dump.c​:2234
#8 0x000000000053a0d9 in S_run_body (oldscope=1) at perl.c​:2525
#9 perl_run (my_perl=<optimized out>) at perl.c​:2448
#10 0x0000000000429a48 in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at perlmain.c​:123
(gdb) f 1
#1 0x0000000000df4f78 in Perl_my_atof2 (value=<synthetic pointer>, orig=0x7ffff637fff6 "@​AAAAAAAAA") at numeric.c​:1349
1349 const char* send = s + strlen(orig); /* one past the last */
(gdb) l
1344 Perl_my_atof2(pTHX_ const char* orig, NV* value)
1345 {
1346 const char* s = orig;
1347 NV result[3] = {0.0, 0.0, 0.0};
1348 #if defined(USE_PERL_ATOF) || defined(USE_QUADMATH)
1349 const char* send = s + strlen(orig); /* one past the last */
1350 bool negative = 0;
1351 #endif
1352 #if defined(USE_PERL_ATOF) && !defined(USE_QUADMATH)
1353 UV accumulator[2] = {0,0}; /* before/after dp */
(gdb) info locals
s = 0x7ffff637fff6 "@​AAAAAAAAA"
send = <optimized out>
negative = <optimized out>
(gdb) p s[10]
$1 = 0 '\000'

So fold_constants is trying to reduce this, something is disappointed because the v-string uses codepoints over 255. Because this is a quadmath build, we take this branch​:

1209 #ifdef USE_QUADMATH
1210 Perl_my_atof2(aTHX_ s, &x);
1211 return x;
1212 #else

Now, that string /should/ just read "@​". Breaking on Perl_my_atof2 without libdislocator loaded yields the following​:

Breakpoint 1, Perl_my_atof (s=0x122a200 "@​") at numeric.c​:1210
1210 Perl_my_atof2(aTHX_ s, &x);
(gdb) p s[1]
$1 = 0 '\000'
(gdb) l
1205 NV
1206 Perl_my_atof(pTHX_ const char* s)
1207 {
1208 NV x = 0.0;
1209 #ifdef USE_QUADMATH
1210 Perl_my_atof2(aTHX_ s, &x);
1211 return x;
1212 #else
1213 # ifdef USE_LOCALE_NUMERIC
1214 PERL_ARGS_ASSERT_MY_ATOF;
(gdb) s
Perl_my_atof2 (value=<synthetic pointer>, orig=0x122a200 "@​") at numeric.c​:1349
1349 const char* send = s + strlen(orig); /* one past the last */
(gdb) s
1369 while (isSPACE(*s))
(gdb) p send
$2 = 0x122a201 ""

The "AAAAA" is an uninitialized poison pattern. The root cause *seems* to be that this string is not being null-terminated when it is created, presumably for the same reason as the deprecation warning. This is happening in Perl_do_vop. The case OP_BIT_AND around line 1077 does not append a null. There are three cases here​: AND, OR, and XOR. OR falls through to mop_up_utf, which appends a NUL. XOR jumps to it. AND breaks from the case structure, skipping mop_up_utf. The patch which I am currently testing corrects this issue, eliminating the valgrind noise and the SEGV under libdislocator. I'm only waiting for a clean test run.

A test is difficult to add since the only outwardly visible symptoms require that the memory allocated to the constant-folded SV be non-zeroed, or that valgrind be used.

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From @dcollinsn

Upon further review, the AND case is deliberately different than the OR and XOR cases. I can tell because when I applied my patch that has the AND case just jump to mop_up_utf, the test suite failed a lot in t/op/bop.t. So instead I'll be a bit less heavy-handed and just pull in the line that NUL-terminates the string. This passes all tests (woo!).

My question - there was a recent discussion about whether perl strings could be assumed to be null-terminated. The outcome, as I understand it, was that perl strings would usually be null-terminated, but XS code could produce SVs with unterminated strings. So while this patch fixes one issue by null-terminating this particular string, a question​: Is it safe, in the general case, for Perl_my_atof2() to use strlen? If not, there are surely more similar bugs hiding out. (That discussion was in RT #128170)

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From @dcollinsn

0001-RT-129058-Null-terminate-strings-after-utf8-bitwise-.patch
From d92e756c7ac47f94845cd28b96f17a8afc89c436 Mon Sep 17 00:00:00 2001
From: Dan Collins <dcollinsn@gmail.com>
Date: Tue, 23 Aug 2016 13:18:47 -0400
Subject: [PATCH] [RT #129058] Null-terminate strings after utf8 bitwise and

The bug report pertains to the testcase:

    perl -e '0|v1000&E'

The bitwise AND operation was not properly null-terminating the output
string. This is evidently fine for most cases, as the LEN and CUR
are properly set, but in the quadmath build where I discovered this
error, the subsequent bitwise OR with a number caused perl to call
Perl_my_atof2, which in turn calls strlen, on the constant string
'v1000'&'E'. This causes strlen to read uninitialized memory.

This patch adds a null-termination to the AND case in Perl_do_vop.
---
 doop.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doop.c b/doop.c
index ad9172a..f9e308b 100644
--- a/doop.c
+++ b/doop.c
@@ -1093,6 +1093,7 @@ Perl_do_vop(pTHX_ I32 optype, SV *sv, SV *left, SV *right)
 	    if (sv == left || sv == right)
 		(void)sv_usepvn(sv, dcorig, needlen);
 	    SvCUR_set(sv, dc - dcorig);
+            *SvEND(sv) = '\0';
 	    break;
 	case OP_BIT_XOR:
 	    while (lulen && rulen) {
-- 
2.8.1

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

From zefram@fysh.org

Dan Collins via RT wrote​:

The outcome, as I understand it, was that perl strings would usually
be null-terminated, but XS code could produce SVs with unterminated
strings. So while this patch fixes one issue by null-terminating this
particular string, a question​: Is it safe, in the general case, for
Perl_my_atof2() to use strlen?

There is an unresolved conflict between these two. It's kind-of accepted
for XS modules to generate unterminated strings, particularly by mmapping,
but it's also even more accepted for code using Perl scalars to assume
nul termination. (That's the point of nul-terminating most strings,
after all.) Each of these practices can individually coexist with the
bulk of Perl XS-level code (both core and XS modules) that follows the
Postel principle, not assuming nul termination on its inputs while always
generating nul terminated outputs. This means that code deviating from
the dominant behaviour in one way or the other is seen to work, so there
isn't much pressure on developers to stick to the Postel behaviour.

Obviously the two non-Postel behaviours (assuming nul termination on
input and omitting it on output) can't coexist with each other. There is
no resolution to this issue that lets us keep both. We have to make a
once-and-for-all choice, which we've been lamentably unwilling to do.

I think it is more workable to insist on nul termination. This seems
to be the original intent, that it be safe to pass Perl strings directly
to C library functions taking nul-terminated strings. We should declare
code generating unterminated strings to be faulty. This would interfere
with some current mmap tricks, but there are ways to reformulate them
that don't break Perl's string model. We can enforce nul termination
by having debugging builds check nul termination, as an assertion,
at opportune moments.

To go the other way is also possible, but with more hassle. Although it
would rescue the mmap crowd, it would require copying of strings (to nul
terminate them) in many more bits of C code, many of which execute more
frequently. For code receiving strings to not depend on nul termination
can't be enforced by an assertion​: it could only be policed by more
specialised means such as valgrind.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2016

From @tonycoz

This issue is now public in #129058, though the security implications
haven't been mentioned.

On Mon Aug 22 19​:00​:31 2016, zefram@​fysh.org wrote​:

Tony Cook via RT wrote​:

The UTF-8 string path of bit-and in do_vop() doesn't NUL terminate
the resulting string.

With this being a problem here, there are going to be similar problems
from other sources of unterminated strings. It's semi-approved for
XS modules to release unterminated PVs into general use, which they
generate by mmapping files, for example. Vulnerabilities could arise
by combining such a module with anything that depends on nul
termination.

That "semi-approval" has always seemed questionable to me.

I couldn't work up a test for it that would fail without sanitize or
valgrind.

On some, but not all, builds I can detect the problem thus​:

$ perl -lwe 'print length unpack("p", pack("p", v300 & "O"))'
4

Devel​::Peek shows lack of nul termination on all builds that I tried,
but its results aren't available back to Perl code. In any case, the
extra content beyond the end of the string is garbage, so this kind of
test will be unreliable. I haven't found a way to reliably control
the
extra string content. It's not as simple as initialising the SV that
will hold the result​: &= yields nul termination.

I tried writing a simple xsub to check for NUL termination, but I suspect
the copies in entersub is messing that up.

Devel​::Peek​::Dump() shows the missing NUL, but it's implemented as a
custom op, so avoids the copies.

I can see a small chance of this being a security issue, eg. if an SV
was re-used for the result of bit-and and that value was used in a
context
where NUL termination is required (like opening a file),

This problem does occur​:

$ perl -lwe ' open(FOO, ">", (v335 & "O"))'
$ echo O* | od -tx1
0000000 4f 69 44 02 0a
0000005

Unlikely, yes, but the semantics are notionally guaranteed. I think
it
is proper to treat this as a security issue.

Ok.

Does it deserve a release, or just a published fix? (#129058 includes the
same fix I gave, with a longer commit message.)

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2016

From @iabyn

On Tue, Aug 23, 2016 at 06​:15​:07PM -0700, Tony Cook via RT wrote​:

This issue is now public in #129058, though the security implications
haven't been mentioned.

On Mon Aug 22 19​:00​:31 2016, zefram@​fysh.org wrote​:

Tony Cook via RT wrote​:

The UTF-8 string path of bit-and in do_vop() doesn't NUL terminate
the resulting string.

With this being a problem here, there are going to be similar problems
from other sources of unterminated strings. It's semi-approved for
XS modules to release unterminated PVs into general use, which they
generate by mmapping files, for example. Vulnerabilities could arise
by combining such a module with anything that depends on nul
termination.

That "semi-approval" has always seemed questionable to me.

I couldn't work up a test for it that would fail without sanitize or
valgrind.

On some, but not all, builds I can detect the problem thus​:

$ perl -lwe 'print length unpack("p", pack("p", v300 & "O"))'
4

Devel​::Peek shows lack of nul termination on all builds that I tried,
but its results aren't available back to Perl code. In any case, the
extra content beyond the end of the string is garbage, so this kind of
test will be unreliable. I haven't found a way to reliably control
the
extra string content. It's not as simple as initialising the SV that
will hold the result​: &= yields nul termination.

I tried writing a simple xsub to check for NUL termination, but I suspect
the copies in entersub is messing that up.

Devel​::Peek​::Dump() shows the missing NUL, but it's implemented as a
custom op, so avoids the copies.

I can see a small chance of this being a security issue, eg. if an SV
was re-used for the result of bit-and and that value was used in a
context
where NUL termination is required (like opening a file),

This problem does occur​:

$ perl -lwe ' open(FOO, ">", (v335 & "O"))'
$ echo O* | od -tx1
0000000 4f 69 44 02 0a
0000005

Unlikely, yes, but the semantics are notionally guaranteed. I think
it
is proper to treat this as a security issue.

Ok.

Does it deserve a release, or just a published fix? (#129058 includes the
same fix I gave, with a longer commit message.)

I think this is obscure enough to just push the fix and have a peldelta entry.

But thinking further, there are two issues here​:

* Perl_do_vop() returned an unterminated string; fixed (but not pushed
  yet);

* Perl_atof and atof2 don't accept a length arg. This is arguably a design
flaw. I suggest we introduce Perl_atofn say, with Perl_atof(s) being
defined as Perl_atofn(s, strlen(s)) for backcompat. Ditto Perl_atof2.

atof is only used in a few places in core.

--
My Dad used to say 'always fight fire with fire', which is probably why
he got thrown out of the fire brigade.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From @geeknik

Triggered in v5.25.5 (v5.25.4-130-g7aa7bbc) w/ AFL + ASAN.

od -tx1 assert78
0000000 73 70 6c 69 74 00 44 26 2e 35 30 30 2e 30
0000016

./perl assert78
perl​: regcomp.c​:7012​: REGEXP *Perl_re_op_compile(SV **const, int, OP *, const regexp_engine *, REGEXP *, _Bool *, U32, U32)​: Assertion `*(pRExC_state->end) == '\0'' failed.
Aborted

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From @geeknik

assert78.gz

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From @dcollinsn

On Fri Sep 16 14​:40​:54 2016, brian.carpenter@​gmail.com wrote​:

Triggered in v5.25.5 (v5.25.4-130-g7aa7bbc) w/ AFL + ASAN.

od -tx1 assert78
0000000 73 70 6c 69 74 00 44 26 2e 35 30 30 2e 30
0000016

./perl assert78
perl​: regcomp.c​:7012​: REGEXP *Perl_re_op_compile(SV **const, int, OP
*, const regexp_engine *, REGEXP *, _Bool *, U32, U32)​: Assertion
`*(pRExC_state->end) == '\0'' failed.
Aborted

==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4DC93D​: Perl_re_op_compile (regcomp.c​:7012)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C7159​: S_regatom (regcomp.c​:13418)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C72C5​: S_regatom (regcomp.c​:13814)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4C72DC​: S_regatom (regcomp.c​:13816)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698==
==26698== HEAP SUMMARY​:
==26698== in use at exit​: 112,854 bytes in 525 blocks
==26698== total heap usage​: 702 allocs, 177 frees, 148,431 bytes allocated
==26698==
==26698== LEAK SUMMARY​:
==26698== definitely lost​: 0 bytes in 0 blocks
==26698== indirectly lost​: 0 bytes in 0 blocks
==26698== possibly lost​: 0 bytes in 0 blocks
==26698== still reachable​: 112,854 bytes in 525 blocks
==26698== suppressed​: 0 bytes in 0 blocks
==26698== Rerun with --leak-check=full to see details of leaked memory
==26698==
==26698== For counts of detected and suppressed errors, rerun with​: -v
==26698== Use --track-origins=yes to see where uninitialised values come from
==26698== ERROR SUMMARY​: 18 errors from 16 contexts (suppressed​: 0 from 0)

(gdb) r
Starting program​: /home/dcollins/toolchain/perldebug/perl -Ilib assert78
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized out>, expr=<optimized out>, eng=0x6a80a0 <PL_core_reg_engine>, old_re=0x0, is_bare_re=<optimized out>,
  orig_rx_flags=2048, pm_flags=0) at regcomp.c​:7012
7012 assert(*RExC_end == '\0');
(gdb) bt
#0 Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized out>, expr=<optimized out>, eng=0x6a80a0 <PL_core_reg_engine>, old_re=0x0, is_bare_re=<optimized out>, orig_rx_flags=2048,
  pm_flags=0) at regcomp.c​:7012
#1 0x000000000044e20c in Perl_pmruntime (o=0x9440b0, expr=expr@​entry=0x944180, floor=0, isreg=false, repl=0x0) at op.c​:5756
#2 0x000000000044eb5f in Perl_ck_split (o=0x944138) at op.c​:11147
#3 0x0000000000436784 in Perl_op_convert_list (type=156, flags=<optimized out>, flags@​entry=0, o=0x944138) at op.c​:4782
#4 0x000000000049d068 in Perl_yyparse (gramtype=gramtype@​entry=258) at perly.y​:883
#5 0x00000000004593f0 in S_parse_body (env=env@​entry=0x0, xsinit=xsinit@​entry=0x4211c0 <xs_init>) at perl.c​:2374
#6 0x000000000045ae6d in perl_parse (my_perl=<optimized out>, xsinit=xsinit@​entry=0x4211c0 <xs_init>, argc=3, argv=<optimized out>, env=env@​entry=0x0) at perl.c​:1689
#7 0x0000000000421020 in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at perlmain.c​:121
(gdb) p *pRExC_state
$14 = {flags = 2304, pm_flags = 0, precomp = 0x942050 "", precomp_end = 0x942051 "\302\005\367\377\177", rx_sv = 0x20, rx = 0x74696c704426, rxi = 0xa, start = 0x942050 "",
  end = 0x942051 "\302\005\367\377\177", parse = 0x942050 "", adjusted_start = 0x942050 "", precomp_adj = 0, whilem_seen = 0, emit_start = 0x7fffffffddc8, emit_bound = 0x7fffffffddc0,
  emit = 0x7fffffffdcf8, emit_dummy = {flags = 2 '\002', type = 0 '\000', next_off = 0, arg1 = 0,
  bitmap = "\n\000\000\000\000\000\000\000@​", '\000' <repeats 15 times>, "\002\000\000\000\060\000\000", classflags = 0, invlist = 0x0}, naughty = 0, sawback = 0, seen = 0, size = 0, npar = 1,
  nestroot = 0, extralen = 0, seen_zerolen = 0, open_parens = 0x0, close_parens = 0x0, end_op = 0x0, utf8 = 536870912, orig_utf8 = 536870912, uni_semantics = 0, paren_names = 0x0, recurse = 0x0,
  recurse_count = 0, study_chunk_recursed = 0x0, study_chunk_recursed_bytes = 0, in_lookbehind = 0, contains_locale = 0, contains_i = 0, override_recoding = 0, in_multi_char_class = 0,
  code_blocks = 0x0, num_code_blocks = 0, code_index = 0, maxlen = 0, frame_head = 0x0, frame_last = 0x0, frame_count = 0, warn_text = 0x0, runtime_code_qr = 0x0,
  lastparse = 0xffffffffffffffff <error​: Cannot access memory at address 0xffffffffffffffff>, lastnum = 7, paren_name_list = 0x0, study_chunk_recursed_count = 9715808, mysv1 = 0x1, mysv2 = 0x0,
  seen_unfolded_sharp_s = false, strict = false, study_started = false}
(gdb)

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2016

From [Unknown Contact. See original ticket]

On Fri Sep 16 14​:40​:54 2016, brian.carpenter@​gmail.com wrote​:

Triggered in v5.25.5 (v5.25.4-130-g7aa7bbc) w/ AFL + ASAN.

od -tx1 assert78
0000000 73 70 6c 69 74 00 44 26 2e 35 30 30 2e 30
0000016

./perl assert78
perl​: regcomp.c​:7012​: REGEXP *Perl_re_op_compile(SV **const, int, OP
*, const regexp_engine *, REGEXP *, _Bool *, U32, U32)​: Assertion
`*(pRExC_state->end) == '\0'' failed.
Aborted

==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4DC93D​: Perl_re_op_compile (regcomp.c​:7012)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C7159​: S_regatom (regcomp.c​:13418)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C72C5​: S_regatom (regcomp.c​:13814)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4C72DC​: S_regatom (regcomp.c​:13816)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698==
==26698== HEAP SUMMARY​:
==26698== in use at exit​: 112,854 bytes in 525 blocks
==26698== total heap usage​: 702 allocs, 177 frees, 148,431 bytes allocated
==26698==
==26698== LEAK SUMMARY​:
==26698== definitely lost​: 0 bytes in 0 blocks
==26698== indirectly lost​: 0 bytes in 0 blocks
==26698== possibly lost​: 0 bytes in 0 blocks
==26698== still reachable​: 112,854 bytes in 525 blocks
==26698== suppressed​: 0 bytes in 0 blocks
==26698== Rerun with --leak-check=full to see details of leaked memory
==26698==
==26698== For counts of detected and suppressed errors, rerun with​: -v
==26698== Use --track-origins=yes to see where uninitialised values come from
==26698== ERROR SUMMARY​: 18 errors from 16 contexts (suppressed​: 0 from 0)

(gdb) r
Starting program​: /home/dcollins/toolchain/perldebug/perl -Ilib assert78
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized out>, expr=<optimized out>, eng=0x6a80a0 <PL_core_reg_engine>, old_re=0x0, is_bare_re=<optimized out>,
  orig_rx_flags=2048, pm_flags=0) at regcomp.c​:7012
7012 assert(*RExC_end == '\0');
(gdb) bt
#0 Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized out>, expr=<optimized out>, eng=0x6a80a0 <PL_core_reg_engine>, old_re=0x0, is_bare_re=<optimized out>, orig_rx_flags=2048,
  pm_flags=0) at regcomp.c​:7012
#1 0x000000000044e20c in Perl_pmruntime (o=0x9440b0, expr=expr@​entry=0x944180, floor=0, isreg=false, repl=0x0) at op.c​:5756
#2 0x000000000044eb5f in Perl_ck_split (o=0x944138) at op.c​:11147
#3 0x0000000000436784 in Perl_op_convert_list (type=156, flags=<optimized out>, flags@​entry=0, o=0x944138) at op.c​:4782
#4 0x000000000049d068 in Perl_yyparse (gramtype=gramtype@​entry=258) at perly.y​:883
#5 0x00000000004593f0 in S_parse_body (env=env@​entry=0x0, xsinit=xsinit@​entry=0x4211c0 <xs_init>) at perl.c​:2374
#6 0x000000000045ae6d in perl_parse (my_perl=<optimized out>, xsinit=xsinit@​entry=0x4211c0 <xs_init>, argc=3, argv=<optimized out>, env=env@​entry=0x0) at perl.c​:1689
#7 0x0000000000421020 in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at perlmain.c​:121
(gdb) p *pRExC_state
$14 = {flags = 2304, pm_flags = 0, precomp = 0x942050 "", precomp_end = 0x942051 "\302\005\367\377\177", rx_sv = 0x20, rx = 0x74696c704426, rxi = 0xa, start = 0x942050 "",
  end = 0x942051 "\302\005\367\377\177", parse = 0x942050 "", adjusted_start = 0x942050 "", precomp_adj = 0, whilem_seen = 0, emit_start = 0x7fffffffddc8, emit_bound = 0x7fffffffddc0,
  emit = 0x7fffffffdcf8, emit_dummy = {flags = 2 '\002', type = 0 '\000', next_off = 0, arg1 = 0,
  bitmap = "\n\000\000\000\000\000\000\000@​", '\000' <repeats 15 times>, "\002\000\000\000\060\000\000", classflags = 0, invlist = 0x0}, naughty = 0, sawback = 0, seen = 0, size = 0, npar = 1,
  nestroot = 0, extralen = 0, seen_zerolen = 0, open_parens = 0x0, close_parens = 0x0, end_op = 0x0, utf8 = 536870912, orig_utf8 = 536870912, uni_semantics = 0, paren_names = 0x0, recurse = 0x0,
  recurse_count = 0, study_chunk_recursed = 0x0, study_chunk_recursed_bytes = 0, in_lookbehind = 0, contains_locale = 0, contains_i = 0, override_recoding = 0, in_multi_char_class = 0,
  code_blocks = 0x0, num_code_blocks = 0, code_index = 0, maxlen = 0, frame_head = 0x0, frame_last = 0x0, frame_count = 0, warn_text = 0x0, runtime_code_qr = 0x0,
  lastparse = 0xffffffffffffffff <error​: Cannot access memory at address 0xffffffffffffffff>, lastnum = 7, paren_name_list = 0x0, study_chunk_recursed_count = 9715808, mysv1 = 0x1, mysv2 = 0x0,
  seen_unfolded_sharp_s = false, strict = false, study_started = false}
(gdb)

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @demerphq

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

While this is documented as legal (we have a length field for a reason) the
regex engine for reasons of its own requires the trailing null.

I have a fix to the regex engine to handle this right but so far I have
been frustrated by valgrind changing behavior so the string does have the
trailing null but considers it uninitialized memory.

IOW I can fix the assert but so far not the valgrind complaints about
uninitialized memory. I will push patches later for folks to review.

Yved

On 16 Sep 2016 6​:43 p.m., "Dan Collins via RT" <perlbug-comment@​perl.org>
wrote​:

On Fri Sep 16 14​:40​:54 2016, brian.carpenter@​gmail.com wrote​:

Triggered in v5.25.5 (v5.25.4-130-g7aa7bbc) w/ AFL + ASAN.

od -tx1 assert78
0000000 73 70 6c 69 74 00 44 26 2e 35 30 30 2e 30
0000016

./perl assert78
perl​: regcomp.c​:7012​: REGEXP *Perl_re_op_compile(SV **const, int, OP
*, const regexp_engine *, REGEXP *, _Bool *, U32, U32)​: Assertion
`*(pRExC_state->end) == '\0'' failed.
Aborted

==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4DC93D​: Perl_re_op_compile (regcomp.c​:7012)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C7159​: S_regatom (regcomp.c​:13418)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4A4B97​: S_skip_to_be_ignored_text (regcomp.c​:18242)
==26698== by 0x4C72C5​: S_regatom (regcomp.c​:13814)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DC9A0​: Perl_re_op_compile (regcomp.c​:7030)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C46​: S_reg.constprop.51 (regcomp.c​:10602)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4C72DC​: S_regatom (regcomp.c​:13816)
==26698== by 0x4CA069​: S_regpiece (regcomp.c​:11666)
==26698== by 0x4CA069​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA082​: S_regpiece (regcomp.c​:11677)
==26698== by 0x4CA082​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA08E​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA08E​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4CA098​: S_regpiece (regcomp.c​:11808)
==26698== by 0x4CA098​: S_regbranch (regcomp.c​:11591)
==26698== by 0x4D0C66​: S_reg.constprop.51 (regcomp.c​:11329)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C7F​: S_reg.constprop.51 (regcomp.c​:11340)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698== Conditional jump or move depends on uninitialised value(s)
==26698== at 0x4D0C9A​: S_reg.constprop.51 (regcomp.c​:11363)
==26698== by 0x4DE376​: Perl_re_op_compile (regcomp.c​:7248)
==26698== by 0x44E20B​: Perl_pmruntime.constprop.44 (op.c​:5756)
==26698== by 0x44EB5E​: Perl_ck_split (op.c​:11147)
==26698== by 0x436783​: Perl_op_convert_list (op.c​:4782)
==26698== by 0x49D067​: Perl_yyparse (perly.y​:883)
==26698== by 0x4593EF​: S_parse_body (perl.c​:2374)
==26698== by 0x45AE6C​: perl_parse (perl.c​:1689)
==26698== by 0x42101F​: main (perlmain.c​:121)
==26698==
==26698==
==26698== HEAP SUMMARY​:
==26698== in use at exit​: 112,854 bytes in 525 blocks
==26698== total heap usage​: 702 allocs, 177 frees, 148,431 bytes
allocated
==26698==
==26698== LEAK SUMMARY​:
==26698== definitely lost​: 0 bytes in 0 blocks
==26698== indirectly lost​: 0 bytes in 0 blocks
==26698== possibly lost​: 0 bytes in 0 blocks
==26698== still reachable​: 112,854 bytes in 525 blocks
==26698== suppressed​: 0 bytes in 0 blocks
==26698== Rerun with --leak-check=full to see details of leaked memory
==26698==
==26698== For counts of detected and suppressed errors, rerun with​: -v
==26698== Use --track-origins=yes to see where uninitialised values come
from
==26698== ERROR SUMMARY​: 18 errors from 16 contexts (suppressed​: 0 from 0)

(gdb) r
Starting program​: /home/dcollins/toolchain/perldebug/perl -Ilib assert78
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, Perl_re_op_compile (patternp=<optimized out>,
pat_count=<optimized out>, expr=<optimized out>, eng=0x6a80a0
<PL_core_reg_engine>, old_re=0x0, is_bare_re=<optimized out>,
orig_rx_flags=2048, pm_flags=0) at regcomp.c​:7012
7012 assert(*RExC_end == '\0');
(gdb) bt
#0 Perl_re_op_compile (patternp=<optimized out>, pat_count=<optimized
out>, expr=<optimized out>, eng=0x6a80a0 <PL_core_reg_engine>, old_re=0x0,
is_bare_re=<optimized out>, orig_rx_flags=2048,
pm_flags=0) at regcomp.c​:7012
#1 0x000000000044e20c in Perl_pmruntime (o=0x9440b0, expr=expr@​entry=0x944180,
floor=0, isreg=false, repl=0x0) at op.c​:5756
#2 0x000000000044eb5f in Perl_ck_split (o=0x944138) at op.c​:11147
#3 0x0000000000436784 in Perl_op_convert_list (type=156, flags=<optimized
out>, flags@​entry=0, o=0x944138) at op.c​:4782
#4 0x000000000049d068 in Perl_yyparse (gramtype=gramtype@​entry=258) at
perly.y​:883
#5 0x00000000004593f0 in S_parse_body (env=env@​entry=0x0,
xsinit=xsinit@​entry=0x4211c0 <xs_init>) at perl.c​:2374
#6 0x000000000045ae6d in perl_parse (my_perl=<optimized out>,
xsinit=xsinit@​entry=0x4211c0 <xs_init>, argc=3, argv=<optimized out>,
env=env@​entry=0x0) at perl.c​:1689
#7 0x0000000000421020 in main (argc=<optimized out>, argv=<optimized
out>, env=<optimized out>) at perlmain.c​:121
(gdb) p *pRExC_state
$14 = {flags = 2304, pm_flags = 0, precomp = 0x942050 "", precomp_end =
0x942051 "\302\005\367\377\177", rx_sv = 0x20, rx = 0x74696c704426, rxi =
0xa, start = 0x942050 "",
end = 0x942051 "\302\005\367\377\177", parse = 0x942050 "",
adjusted_start = 0x942050 "", precomp_adj = 0, whilem_seen = 0, emit_start
= 0x7fffffffddc8, emit_bound = 0x7fffffffddc0,
emit = 0x7fffffffdcf8, emit_dummy = {flags = 2 '\002', type = 0 '\000',
next_off = 0, arg1 = 0,
bitmap = "\n\000\000\000\000\000\000\000@​", '\000' <repeats 15
times>, "\002\000\000\000\060\000\000", classflags = 0, invlist = 0x0},
naughty = 0, sawback = 0, seen = 0, size = 0, npar = 1,
nestroot = 0, extralen = 0, seen_zerolen = 0, open_parens = 0x0,
close_parens = 0x0, end_op = 0x0, utf8 = 536870912, orig_utf8 = 536870912,
uni_semantics = 0, paren_names = 0x0, recurse = 0x0,
recurse_count = 0, study_chunk_recursed = 0x0,
study_chunk_recursed_bytes = 0, in_lookbehind = 0, contains_locale = 0,
contains_i = 0, override_recoding = 0, in_multi_char_class = 0,
code_blocks = 0x0, num_code_blocks = 0, code_index = 0, maxlen = 0,
frame_head = 0x0, frame_last = 0x0, frame_count = 0, warn_text = 0x0,
runtime_code_qr = 0x0,
lastparse = 0xffffffffffffffff <error​: Cannot access memory at address
0xffffffffffffffff>, lastnum = 7, paren_name_list = 0x0,
study_chunk_recursed_count = 9715808, mysv1 = 0x1, mysv2 = 0x0,
seen_unfolded_sharp_s = false, strict = false, study_started = false}
(gdb)

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @cpansprout

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
  PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
  CUR = 1
  LEN = 10

(PV usually ends with "\0 and not just " by itself.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @demerphq

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it looks
like it should already be adding the required null.

So we have two bugs here. The first is basically what you showed the
second is what I have fixed in the regex engine, in that the internals
should not expect or require the trailing null and if some part does, like
the regex engine it should guard against missing nulls itself.

Then there is the strange way that valgrind changes the behaviour of this
code. I bet if you run that one-liner under valgrind the string ends up
with the additional null but valgrind complains said null is uninitialized.

Yves

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @cpansprout

On Sat Sep 17 07​:46​:32 2016, demerphq wrote​:

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it looks
like it should already be adding the required null.

So we have two bugs here. The first is basically what you showed the
second is what I have fixed in the regex engine, in that the internals
should not expect or require the trailing null and if some part does, like
the regex engine it should guard against missing nulls itself.

If we fix the first, then testing the second is harder. Maybe we need an XS​::APItest​::string_without_null(...) function that can be used in yet another .t file that runs re_tests.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @demerphq

On 17 September 2016 at 19​:05, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:46​:32 2016, demerphq wrote​:

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it looks
like it should already be adding the required null.

So we have two bugs here. The first is basically what you showed the
second is what I have fixed in the regex engine, in that the internals
should not expect or require the trailing null and if some part does, like
the regex engine it should guard against missing nulls itself.

If we fix the first, then testing the second is harder. Maybe we need an XS​::APItest​::string_without_null(...) function that can be used in yet another .t file that runs re_tests.

And maybe elsewhere too

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @demerphq

On 17 September 2016 at 16​:46, demerphq <demerphq@​gmail.com> wrote​:

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT"
<perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1 which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it looks like
it should already be adding the required null.

So we have two bugs here. The first is basically what you showed the
second is what I have fixed in the regex engine, in that the internals
should not expect or require the trailing null and if some part does, like
the regex engine it should guard against missing nulls itself.

Then there is the strange way that valgrind changes the behaviour of this
code. I bet if you run that one-liner under valgrind the string ends up
with the additional null but valgrind complains said null is uninitialized.

And as I suspected it does. A bit of digging reveals that if you use
--malloc-fill=1 or something like that with valgrind then this doesn't
happen.

Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2016

From @cpansprout

On Sat Sep 17 10​:05​:50 2016, sprout wrote​:

If we fix the first, then testing the second is harder. Maybe we need
an XS​::APItest​::string_without_null(...) function that can be used in
yet another .t file that runs re_tests.

I have just added such a function, along with t/re/regexp_nonull.t, which uses it.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2016

From @cpansprout

On Sat Sep 17 11​:08​:33 2016, sprout wrote​:

On Sat Sep 17 10​:05​:50 2016, sprout wrote​:

If we fix the first, then testing the second is harder. Maybe we
need
an XS​::APItest​::string_without_null(...) function that can be used in
yet another .t file that runs re_tests.

I have just added such a function, along with t/re/regexp_nonull.t,
which uses it.

(There are follow-up comments in the thread starting at <CANgJU+XSVs0c4EuHs0NmRk24rOybYj8WwjezZD6Aqrd+rdEuFQ@​mail.gmail.com>.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2016

From @cpansprout

On Sat Sep 17 10​:45​:41 2016, demerphq wrote​:

On 17 September 2016 at 16​:46, demerphq <demerphq@​gmail.com> wrote​:

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT"
<perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1
which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it
looks like
it should already be adding the required null.

So we have two bugs here. The first is basically what you showed
the
second is what I have fixed in the regex engine, in that the
internals
should not expect or require the trailing null and if some part does,
like
the regex engine it should guard against missing nulls itself.

Then there is the strange way that valgrind changes the behaviour of
this
code. I bet if you run that one-liner under valgrind the string ends
up
with the additional null but valgrind complains said null is
uninitialized.

And as I suspected it does. A bit of digging reveals that if you use
--malloc-fill=1 or something like that with valgrind then this doesn't
happen.

The second byte of &’s target was not being written to an any point after allocation, so it was uninitialized. This variant writes a long string to that target before doing a utf-8 bitwise and​:

./perl -Ilib -XMfeature=​:all -e 'for (["aaa","aaa"],[substr ("a\x{100}",0,1), "a"]) { use Devel​::Peek; Dump $$_[0] &. $$_[1] }'
SV = PV(0x7feabb0073b0) at 0x7feabb037c10
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x7feabac18790 "aaa"\0
  CUR = 3
  LEN = 10
SV = PV(0x7feabb0073b0) at 0x7feabb037c10
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK,UTF8)
  PV = 0x7feabac18790 "a" [UTF8 "a"]
  CUR = 1
  LEN = 10

It gives the same results under valgrind.

So I have used that approach in the test added in commit b43665f, which fixes the bug.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2016

From @demerphq

On 19 September 2016 at 05​:21, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 10​:45​:41 2016, demerphq wrote​:

On 17 September 2016 at 16​:46, demerphq <demerphq@​gmail.com> wrote​:

On 17 Sep 2016 10​:39 a.m., "Father Chrysostomos via RT"
<perlbug-followup@​perl.org> wrote​:

On Sat Sep 17 07​:14​:00 2016, demerphq wrote​:

I looked into this a bit yesterday.

The short version is that we end up creating an SVpv of length 1
which
contains a null with no following null.

Wow. I had no idea this would happen​:

$ ./perl -Ilib -MDevel​::Peek -e 'Dump "D" & "\0\x{500}"'
SV = PV(0x7fb4d10074f0) at 0x7fb4d1030a00
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7fb4d0c12050 "\0" [UTF8 "\x{0}"]
CUR = 1
LEN = 10

(PV usually ends with "\0 and not just " by itself.)

Pv's from perl normally do yes. However when I check the code it
looks like
it should already be adding the required null.

So we have two bugs here. The first is basically what you showed
the
second is what I have fixed in the regex engine, in that the
internals
should not expect or require the trailing null and if some part does,
like
the regex engine it should guard against missing nulls itself.

Then there is the strange way that valgrind changes the behaviour of
this
code. I bet if you run that one-liner under valgrind the string ends
up
with the additional null but valgrind complains said null is
uninitialized.

And as I suspected it does. A bit of digging reveals that if you use
--malloc-fill=1 or something like that with valgrind then this doesn't
happen.

The second byte of &’s target was not being written to an any point after allocation, so it was uninitialized. This variant writes a long string to that target before doing a utf-8 bitwise and​:

./perl -Ilib -XMfeature=​:all -e 'for (["aaa","aaa"],[substr ("a\x{100}",0,1), "a"]) { use Devel​::Peek; Dump $$_[0] &. $$_[1] }'
SV = PV(0x7feabb0073b0) at 0x7feabb037c10
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x7feabac18790 "aaa"\0
CUR = 3
LEN = 10
SV = PV(0x7feabb0073b0) at 0x7feabb037c10
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK,UTF8)
PV = 0x7feabac18790 "a" [UTF8 "a"]
CUR = 1
LEN = 10

It gives the same results under valgrind.

So I have used that approach in the test added in commit b43665f, which fixes the bug.

Cool. Nice work. Also I have merged your regexp_nonull.t to blead
along with my fix.

As far as I can tell we can close this ticket now.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2016

From @cpansprout

On Mon Sep 19 13​:34​:28 2016, demerphq wrote​:

Cool. Nice work. Also I have merged your regexp_nonull.t to blead
along with my fix.

Thank you.

As far as I can tell we can close this ticket now.

Done.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2016

@cpansprout - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Oct 13, 2016

From @xsawyerx

On Mon Sep 05 09​:24​:02 2016, davem wrote​:

On Tue, Aug 23, 2016 at 06​:15​:07PM -0700, Tony Cook via RT wrote​:

This issue is now public in #129058, though the security implications
haven't been mentioned.

On Mon Aug 22 19​:00​:31 2016, zefram@​fysh.org wrote​:

[...]
Unlikely, yes, but the semantics are notionally guaranteed. I
think
it
is proper to treat this as a security issue.

Ok.

Does it deserve a release, or just a published fix? (#129058 includes
the
same fix I gave, with a longer commit message.)

I think this is obscure enough to just push the fix and have a
peldelta entry.

Should we assign a CVE number to this?

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2016

From @geeknik

Any word on a CVE assignment for this flaw?

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2016

From @xsawyerx

On Mon, 28 Nov 2016 01​:48​:07 -0800, brian.carpenter@​gmail.com wrote​:

Any word on a CVE assignment for this flaw?

Considering the syntax *does* allow this (without specialized "pack"/"unpack" input), we should consider this a low-risk security issue. Does anyone object or has a different opinion?

@p5pRT
Copy link
Author

p5pRT commented Dec 22, 2016

From @xsawyerx

On Mon, 28 Nov 2016 02​:01​:59 -0800, xsawyerx@​cpan.org wrote​:

On Mon, 28 Nov 2016 01​:48​:07 -0800, brian.carpenter@​gmail.com wrote​:

Any word on a CVE assignment for this flaw?

Considering the syntax *does* allow this (without specialized
"pack"/"unpack" input), we should consider this a low-risk security
issue. Does anyone object or has a different opinion?

After review again (and well, again...), I concluded this does not require a CVE number since the likelihood is exceptionally low. If no one objects, I now believe we should just resolve it, push a fix, and reflect it in perldelta.

(Sorry for the flip flop.)

@p5pRT
Copy link
Author

p5pRT commented Jan 24, 2017

From @tonycoz

On Thu, 22 Dec 2016 06​:21​:28 -0800, xsawyerx@​cpan.org wrote​:

After review again (and well, again...), I concluded this does not
require a CVE number since the likelihood is exceptionally low. If no
one objects, I now believe we should just resolve it, push a fix, and
reflect it in perldelta.

(Sorry for the flip flop.)

This turns out to be duplicated by both 129058 *and* 129287, the latter of which was fixed in b43665f and documented in perl5255delta.pod.

I'll make this ticket public and merge both this and 129058 into 129287.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jan 24, 2017

From @tonycoz

On Tue, 23 Aug 2016 09​:58​:01 -0700, dcollinsn@​gmail.com wrote​:

Hello!

While fuzzing the a quadmath build of the perl interpreter, I
encountered the following testcase​:

perl -e '0|v1000&E'

This was fixed by b43665f which was the fix for 129287.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jan 24, 2017

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant