Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toke.c:3813: char *S_scan_const(char *): Assertion Failed. #15790

Closed
p5pRT opened this issue Jan 3, 2017 · 13 comments
Closed

toke.c:3813: char *S_scan_const(char *): Assertion Failed. #15790

p5pRT opened this issue Jan 3, 2017 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 3, 2017

Migrated from rt.perl.org#130495 (status was 'resolved')

Searchable as RT130495$

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2017

From @geeknik

Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL.

od -tx1 test097
0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00
0000020

./perl test097
perl​: toke.c​:3813​: char *S_scan_const(char *)​: Assertion
`(__builtin_expect(((((U8 *) send) <= ((U8 *) s)) ? (_Bool)1 :
(_Bool)0),(0)) ? 0 : (((unsigned long)(((*(U8 *) s) | 0) | 0) < 128)) ? 1 :
__builtin_expect((((((U8 *) send) - ((U8 *) s)) < PL_utf8skip[*(const
U8*)((U8 *) s)]) ? (_Bool)1 : (_Bool)0),(0)) ? 0 : __builtin_expect(((((((
(sizeof(*(U8 *) s) == 1) || !(((unsigned long)((*(U8 *) s) | 0)) & ~0xFF)))
? (void) (0) : __assert_fail ("( (sizeof(*(U8 *) s) == 1) || !(((unsigned
long)((*(U8 *) s) | 0)) & ~0xFF))", "toke.c", 3813, __PRETTY_FUNCTION__)),
((U8) (*(U8 *) s))) <= 0xF7) ? (_Bool)1 : (_Bool)0),(1)) ? ( ( 0xC2 <=
((const U8*)(U8 *) s)[0] && ((const U8*)(U8 *) s)[0] <= 0xDF ) ? (
__builtin_expect(((( ((const U8*)(U8 *) s)[1] & 0xC0 ) == 0x80) ? (_Bool)1
: (_Bool)0),(1)) ? 2 : 0 ) : ( 0xE0 == ((const U8*)(U8 *) s)[0] ) ? (
__builtin_expect(((( ( ((const U8*)(U8 *) s)[1] & 0xE0 ) == 0xA0 ) && ( (
((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 )) ? (_Bool)1 : (_Bool)0),(1)) ?
3 : 0 )​: ( 0xE1 <= ((const U8*)(U8 *) s)[0] && ((const U8*)(U8 *) s)[0] <=
0xEF ) ? ( __builtin_expect(((( ( ((const U8*)(U8 *) s)[1] & 0xC0 ) == 0x80
) && ( ( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 )) ? (_Bool)1 :
(_Bool)0),(1)) ? 3 : 0 )​: ( 0xF0 == ((const U8*)(U8 *) s)[0] ) ? (
__builtin_expect(((( ( 0x90 <= ((const U8*)(U8 *) s)[1] && ((const U8*)(U8
*) s)[1] <= 0xBF ) && ( ( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 ) ) &&
( ( ((const U8*)(U8 *) s)[3] & 0xC0 ) == 0x80 )) ? (_Bool)1 :
(_Bool)0),(1)) ? 4 : 0 )​: ( ( ( ( 0xF1 <= ((const U8*)(U8 *) s)[0] &&
((const U8*)(U8 *) s)[0] <= 0xF7 ) && __builtin_expect(((( ((const U8*)(U8
*) s)[1] & 0xC0 ) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ) &&
__builtin_expect(((( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80) ? (_Bool)1
: (_Bool)0),(1)) ) && __builtin_expect(((( ((const U8*)(U8 *) s)[3] & 0xC0
) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ) ? 4 : 0 ) :
Perl__is_utf8_char_helper((U8 *) s, (U8 *) send, 0))' failed.
Aborted

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2017

From @geeknik

test097.gz

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2017

From @hvds

On Tue, 03 Jan 2017 10​:57​:21 -0800, brian.carpenter@​gmail.com wrote​:

Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL.

od -tx1 test097
0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00
0000020

./perl test097
perl​: toke.c​:3813​: char *S_scan_const(char *)​: Assertion
`[...snip...]' failed.
Aborted

I'm slightly confused here; I think first we see "m\x{0} \x{0}" which triggers UTF-16LE detection, so the input gets converted into "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x".

Next we parse that as a pattern, with \x{3030} as the delimiter, and the //x flag.

Next we see the '#', and attempt to scan past the //x-enabled comment with this at toke.c​:3260​:
  while (s+1 < send && *s != '\n')
  *d++ = *s++;

The string it's trying to scan past at that point is "#\x{5c30}" encoded as {23 e5 b0 b0}. Since send points past the end of our pattern, this does one byte too little, leaving s pointing at the trailing \xb0. We then try to process that remaining byte, and think we've seen invalid utf8.

I'm confused though that if I try to bypass the UTF-16LE conversion I don't get the same assert​:
  perl -e 'print "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x"' | perl -CI
.. so I think I'm still missing something - and indeed, if I replace 's+1' with 's' in the above check I get test failures.

I'll try to dig further, but it may need more eyes.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2017

From @hvds

On Wed, 04 Jan 2017 05​:00​:20 -0800, hv wrote​:

On Tue, 03 Jan 2017 10​:57​:21 -0800, brian.carpenter@​gmail.com wrote​:

Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL.

od -tx1 test097
0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00
0000020

./perl test097
perl​: toke.c​:3813​: char *S_scan_const(char *)​: Assertion
`[...snip...]' failed.
Aborted

I'm slightly confused here; I think first we see "m\x{0} \x{0}" which
triggers UTF-16LE detection, so the input gets converted into "m
\x{3030}\x{6b30}#\x{5c30}\x{3030}x".

Next we parse that as a pattern, with \x{3030} as the delimiter, and
the //x flag.

Next we see the '#', and attempt to scan past the //x-enabled comment
with this at toke.c​:3260​:
while (s+1 < send && *s != '\n')
*d++ = *s++;

The string it's trying to scan past at that point is "#\x{5c30}"
encoded as {23 e5 b0 b0}. Since send points past the end of our
pattern, this does one byte too little, leaving s pointing at the
trailing \xb0. We then try to process that remaining byte, and think
we've seen invalid utf8.

I'm confused though that if I try to bypass the UTF-16LE conversion I
don't get the same assert​:
perl -e 'print "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x"' | perl -CI
.. so I think I'm still missing something - and indeed, if I replace
's+1' with 's' in the above check I get test failures.

I'll try to dig further, but it may need more eyes.

Hugo

Oh, I think I understand - two wrongs make a right, so failing to skip all the characters it should meant it was ok for there to be no s==send check at the end of the if/else chain.

So I think the fix should actually look like the below. I'll do some more testing.

Hugo
--- a/toke.c
+++ b/toke.c
@​@​ -3257,7 +3257,7 @​@​ S_scan_const(pTHX_ char *start)
  && !in_charclass
  && ((PMOP*)PL_lex_inpat)->op_pmflags & RXf_PMf_EXTENDED)
  {
- while (s+1 < send && *s != '\n')
+ while (s < send && *s != '\n')
  *d++ = *s++;
  }

@​@​ -3298,6 +3298,11 @​@​ S_scan_const(pTHX_ char *start)

  /* End of else if chain - OP_TRANS rejoin rest */

+ if (UNLIKELY(s >= send)) {
+ assert(s == send);
+ break;
+ }
+
  /* backslashes */
  if (*s == '\\' && s+1 < send) {
  char* e; /* Can be used for ending '}', etc. */

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2017

From @hvds

On Wed, 04 Jan 2017 05​:28​:35 -0800, hv wrote​:
[...]

So I think the fix should actually look like the below. I'll do some
more testing.

Hugo
--- a/toke.c
+++ b/toke.c
@​@​ -3257,7 +3257,7 @​@​ S_scan_const(pTHX_ char *start)
&& !in_charclass
&& ((PMOP*)PL_lex_inpat)->op_pmflags &
RXf_PMf_EXTENDED)
{
- while (s+1 < send && *s != '\n')
+ while (s < send && *s != '\n')
*d++ = *s++;
}

@​@​ -3298,6 +3298,11 @​@​ S_scan_const(pTHX_ char *start)

/* End of else if chain - OP_TRANS rejoin rest */

+ if (UNLIKELY(s >= send)) {
+ assert(s == send);
+ break;
+ }
+
/* backslashes */
if (*s == '\\' && s+1 < send) {
char* e; /* Can be used for ending '}', etc. */

Should be fixed by 8faf4f3​:
  [perl #130495] /x comment skipping stops a byte short
 
  If that byte was part of a utf-8 character, this caused inappropriate
  "malformed utf8" warnings or assertions.
 
  In principle this should also skip the newline, but failing to do so
  is safe.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2017

@hvds - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jan 6, 2017

From @craigberry

On Wed, Jan 4, 2017 at 9​:03 AM, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Wed, 04 Jan 2017 05​:28​:35 -0800, hv wrote​:
[...]

So I think the fix should actually look like the below. I'll do some
more testing.

Hugo
--- a/toke.c
+++ b/toke.c
@​@​ -3257,7 +3257,7 @​@​ S_scan_const(pTHX_ char *start)
&& !in_charclass
&& ((PMOP*)PL_lex_inpat)->op_pmflags &
RXf_PMf_EXTENDED)
{
- while (s+1 < send && *s != '\n')
+ while (s < send && *s != '\n')
*d++ = *s++;
}

@​@​ -3298,6 +3298,11 @​@​ S_scan_const(pTHX_ char *start)

/* End of else if chain - OP_TRANS rejoin rest */

+ if (UNLIKELY(s >= send)) {
+ assert(s == send);
+ break;
+ }
+
/* backslashes */
if (*s == '\\' && s+1 < send) {
char* e; /* Can be used for ending '}', etc. */

Should be fixed by 8faf4f3​:
[perl #130495] /x comment skipping stops a byte short

If that byte was part of a utf\-8 character\, this caused inappropriate
"malformed utf8" warnings or assertions\.

The new test in t/re/pat.t throws a warning​:

ok 827 - empty pattern in regex codeblock​: produced the right exception message
Wide character in print at ./test.pl line 982.
ok 828 - [perl \#130495] utf-8 character at end of /x comment should
not misparse

That's in fresh_perl() in t/test.pl where we write out the test
program to a file like so​:

  open TEST, '>', $tmpfile or die "Cannot open $tmpfile​: $!";
  print TEST $prog;
  close TEST or die "Cannot close $tmpfile​: $!";

Surely we have other places where there are UTF-8 characters embedded
in a test program? Opening the file with '>​:utf8' fixes this test but
breaks a lot of other things and I'm not sure it's safe to do that
within the primitive confines of test.pl anyway. Suggestions?

@p5pRT
Copy link
Author

p5pRT commented Jan 7, 2017

From @hvds

On Fri, 06 Jan 2017 14​:41​:29 -0800, craig.a.berry@​gmail.com wrote​:

The new test in t/re/pat.t throws a warning​:

ok 827 - empty pattern in regex codeblock​: produced the right
exception message
Wide character in print at ./test.pl line 982.
ok 828 - [perl \#130495] utf-8 character at end of /x comment should
not misparse

That's in fresh_perl() in t/test.pl where we write out the test
program to a file like so​:

open TEST, '>', $tmpfile or die "Cannot open $tmpfile​: $!";
print TEST $prog;
close TEST or die "Cannot close $tmpfile​: $!";

Surely we have other places where there are UTF-8 characters embedded
in a test program? Opening the file with '>​:utf8' fixes this test but
breaks a lot of other things and I'm not sure it's safe to do that
within the primitive confines of test.pl anyway. Suggestions?

I propose the patch below; I think it should be safe but would welcome a second opinion.

Hugo

commit ef6b07d107b3e73a8c13536ba41cc386f2bf075c
Author​: Hugo van der Sanden <hv@​crypt.org>
Date​: Sat Jan 7 01​:27​:50 2017 +0000

  [perl #130495] add fresh_perl() option for prog with embedded utf8
 
  Support { wide_chars => 1 } in the args hash to fresh_perl*() functions,
  and use it for the recently added test in re/pat.t

Inline Patch
diff --git a/t/re/pat.t b/t/re/pat.t
index a72989f..c5de2cd 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -1864,7 +1864,7 @@ EOF_CODE
         # [perl #130495] /x comment skipping stopped a byte short, leading
         # to assertion failure or 'malformed utf-8 character" warning
         fresh_perl_is(
-            "use utf8; m{a#\x{124}}x", '', {},
+            "use utf8; m{a#\x{124}}x", '', {wide_chars => 1},
             '[perl #130495] utf-8 character at end of /x comment should not misparse',
         );
     }
diff --git a/t/test.pl b/t/test.pl
index 32d0d64..d86f633 100644
--- a/t/test.pl
+++ b/t/test.pl
@@ -979,6 +979,7 @@ sub fresh_perl {
     $runperl_args->{stderr}     = 1 unless exists $runperl_args->{stderr};
 
     open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!";
+    binmode TEST, ':utf8' if $runperl_args->{wide_chars};
     print TEST $prog;
     close TEST or die "Cannot close $tmpfile: $!";
 

@p5pRT
Copy link
Author

p5pRT commented Jan 7, 2017

From @craigberry

On Fri, Jan 6, 2017 at 7​:32 PM, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 06 Jan 2017 14​:41​:29 -0800, craig.a.berry@​gmail.com wrote​:

The new test in t/re/pat.t throws a warning​:

ok 827 - empty pattern in regex codeblock​: produced the right
exception message
Wide character in print at ./test.pl line 982.
ok 828 - [perl \#130495] utf-8 character at end of /x comment should
not misparse

That's in fresh_perl() in t/test.pl where we write out the test
program to a file like so​:

open TEST, '>', $tmpfile or die "Cannot open $tmpfile​: $!";
print TEST $prog;
close TEST or die "Cannot close $tmpfile​: $!";

Surely we have other places where there are UTF-8 characters embedded
in a test program? Opening the file with '>​:utf8' fixes this test but
breaks a lot of other things and I'm not sure it's safe to do that
within the primitive confines of test.pl anyway. Suggestions?

I propose the patch below; I think it should be safe but would welcome a second opinion.

Hugo

commit ef6b07d107b3e73a8c13536ba41cc386f2bf075c
Author​: Hugo van der Sanden <hv@​crypt.org>
Date​: Sat Jan 7 01​:27​:50 2017 +0000

\[perl \#130495\] add fresh\_perl\(\) option for prog with embedded utf8

Support \{ wide\_chars => 1 \} in the args hash to fresh\_perl\*\(\) functions\,
and use it for the recently added test in re/pat\.t

diff --git a/t/re/pat.t b/t/re/pat.t
index a72989f..c5de2cd 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@​@​ -1864,7 +1864,7 @​@​ EOF_CODE
# [perl #130495] /x comment skipping stopped a byte short, leading
# to assertion failure or 'malformed utf-8 character" warning
fresh_perl_is(
- "use utf8; m{a#\x{124}}x", '', {},
+ "use utf8; m{a#\x{124}}x", '', {wide_chars => 1},
'[perl #130495] utf-8 character at end of /x comment should not misparse',
);
}
diff --git a/t/test.pl b/t/test.pl
index 32d0d64..d86f633 100644
--- a/t/test.pl
+++ b/t/test.pl
@​@​ -979,6 +979,7 @​@​ sub fresh_perl {
$runperl_args->{stderr} = 1 unless exists $runperl_args->{stderr};

 open TEST\, '>'\, $tmpfile or die "Cannot open $tmpfile&#8203;: $\!";

+ binmode TEST, '​:utf8' if $runperl_args->{wide_chars};
print TEST $prog;
close TEST or die "Cannot close $tmpfile​: $!";

I like the opt-in idea so only tests that specify this get it. Looks
like a good solution to me.

@p5pRT
Copy link
Author

p5pRT commented Jan 8, 2017

From @hvds

On Sat, 07 Jan 2017 13​:37​:53 -0800, craig.a.berry@​gmail.com wrote​:

I like the opt-in idea so only tests that specify this get it. Looks
like a good solution to me.

Ok, I checked stderr of a full test run and saw no output relating to this, so now pushed as 98c155b.

Hugo

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant