Skip Menu |
Report information
Id: 130495
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: brian.carpenter [at] gmail.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: toke.c:3813: char *S_scan_const(char *): Assertion Failed.
Date: Tue, 3 Jan 2017 12:56:16 -0600
To: perlbug [...] perl.org
From: "Brian 'geeknik' Carpenter" <brian.carpenter [...] gmail.com>
Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL.

od -tx1 test097
0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00
0000020

./perl test097
perl: toke.c:3813: char *S_scan_const(char *): Assertion `(__builtin_expect(((((U8 *) send) <= ((U8 *) s)) ? (_Bool)1 : (_Bool)0),(0)) ? 0 : (((unsigned long)(((*(U8 *) s) | 0) | 0) < 128)) ? 1 : __builtin_expect((((((U8 *) send) - ((U8 *) s)) < PL_utf8skip[*(const U8*)((U8 *) s)]) ? (_Bool)1 : (_Bool)0),(0)) ? 0 : __builtin_expect((((((( (sizeof(*(U8 *) s) == 1) || !(((unsigned long)((*(U8 *) s) | 0)) & ~0xFF))) ? (void) (0) : __assert_fail ("( (sizeof(*(U8 *) s) == 1) || !(((unsigned long)((*(U8 *) s) | 0)) & ~0xFF))", "toke.c", 3813, __PRETTY_FUNCTION__)), ((U8) (*(U8 *) s))) <= 0xF7) ? (_Bool)1 : (_Bool)0),(1)) ? ( ( 0xC2 <= ((const U8*)(U8 *) s)[0] && ((const U8*)(U8 *) s)[0] <= 0xDF ) ? ( __builtin_expect(((( ((const U8*)(U8 *) s)[1] & 0xC0 ) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ? 2 : 0 ) : ( 0xE0 == ((const U8*)(U8 *) s)[0] ) ? ( __builtin_expect(((( ( ((const U8*)(U8 *) s)[1] & 0xE0 ) == 0xA0 ) && ( ( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 )) ? (_Bool)1 : (_Bool)0),(1)) ? 3 : 0 ): ( 0xE1 <= ((const U8*)(U8 *) s)[0] && ((const U8*)(U8 *) s)[0] <= 0xEF ) ? ( __builtin_expect(((( ( ((const U8*)(U8 *) s)[1] & 0xC0 ) == 0x80 ) && ( ( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 )) ? (_Bool)1 : (_Bool)0),(1)) ? 3 : 0 ): ( 0xF0 == ((const U8*)(U8 *) s)[0] ) ? ( __builtin_expect(((( ( 0x90 <= ((const U8*)(U8 *) s)[1] && ((const U8*)(U8 *) s)[1] <= 0xBF ) && ( ( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80 ) ) && ( ( ((const U8*)(U8 *) s)[3] & 0xC0 ) == 0x80 )) ? (_Bool)1 : (_Bool)0),(1)) ? 4 : 0 ): ( ( ( ( 0xF1 <= ((const U8*)(U8 *) s)[0] && ((const U8*)(U8 *) s)[0] <= 0xF7 ) && __builtin_expect(((( ((const U8*)(U8 *) s)[1] & 0xC0 ) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ) && __builtin_expect(((( ((const U8*)(U8 *) s)[2] & 0xC0 ) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ) && __builtin_expect(((( ((const U8*)(U8 *) s)[3] & 0xC0 ) == 0x80) ? (_Bool)1 : (_Bool)0),(1)) ) ? 4 : 0 ) : Perl__is_utf8_char_helper((U8 *) s, (U8 *) send, 0))' failed.
Aborted
Download test097.gz
application/x-gzip 44b

Message body not shown because it is not plain text.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.3k
On Tue, 03 Jan 2017 10:57:21 -0800, brian.carpenter@gmail.com wrote: Show quoted text
> Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL. > > od -tx1 test097 > 0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00 > 0000020 > > ./perl test097 > perl: toke.c:3813: char *S_scan_const(char *): Assertion > `[...snip...]' failed. > Aborted
I'm slightly confused here; I think first we see "m\x{0} \x{0}" which triggers UTF-16LE detection, so the input gets converted into "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x". Next we parse that as a pattern, with \x{3030} as the delimiter, and the //x flag. Next we see the '#', and attempt to scan past the //x-enabled comment with this at toke.c:3260: while (s+1 < send && *s != '\n') *d++ = *s++; The string it's trying to scan past at that point is "#\x{5c30}" encoded as {23 e5 b0 b0}. Since send points past the end of our pattern, this does one byte too little, leaving s pointing at the trailing \xb0. We then try to process that remaining byte, and think we've seen invalid utf8. I'm confused though that if I try to bypass the UTF-16LE conversion I don't get the same assert: perl -e 'print "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x"' | perl -CI .. so I think I'm still missing something - and indeed, if I replace 's+1' with 's' in the above check I get test failures. I'll try to dig further, but it may need more eyes. Hugo
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Wed, 04 Jan 2017 05:00:20 -0800, hv wrote: Show quoted text
> On Tue, 03 Jan 2017 10:57:21 -0800, brian.carpenter@gmail.com wrote:
> > Triggered with Perl v5.25.8-132-gc10193e while fuzzing with AFL. > > > > od -tx1 test097 > > 0000000 6d 00 20 00 30 30 30 6b 23 00 30 5c 30 30 78 00 > > 0000020 > > > > ./perl test097 > > perl: toke.c:3813: char *S_scan_const(char *): Assertion > > `[...snip...]' failed. > > Aborted
> > I'm slightly confused here; I think first we see "m\x{0} \x{0}" which > triggers UTF-16LE detection, so the input gets converted into "m > \x{3030}\x{6b30}#\x{5c30}\x{3030}x". > > Next we parse that as a pattern, with \x{3030} as the delimiter, and > the //x flag. > > Next we see the '#', and attempt to scan past the //x-enabled comment > with this at toke.c:3260: > while (s+1 < send && *s != '\n') > *d++ = *s++; > > The string it's trying to scan past at that point is "#\x{5c30}" > encoded as {23 e5 b0 b0}. Since send points past the end of our > pattern, this does one byte too little, leaving s pointing at the > trailing \xb0. We then try to process that remaining byte, and think > we've seen invalid utf8. > > I'm confused though that if I try to bypass the UTF-16LE conversion I > don't get the same assert: > perl -e 'print "m \x{3030}\x{6b30}#\x{5c30}\x{3030}x"' | perl -CI > .. so I think I'm still missing something - and indeed, if I replace > 's+1' with 's' in the above check I get test failures. > > I'll try to dig further, but it may need more eyes. > > Hugo
Oh, I think I understand - two wrongs make a right, so failing to skip all the characters it should meant it was ok for there to be no s==send check at the end of the if/else chain. So I think the fix should actually look like the below. I'll do some more testing. Hugo --- a/toke.c +++ b/toke.c @@ -3257,7 +3257,7 @@ S_scan_const(pTHX_ char *start) && !in_charclass && ((PMOP*)PL_lex_inpat)->op_pmflags & RXf_PMf_EXTENDED) { - while (s+1 < send && *s != '\n') + while (s < send && *s != '\n') *d++ = *s++; } @@ -3298,6 +3298,11 @@ S_scan_const(pTHX_ char *start) /* End of else if chain - OP_TRANS rejoin rest */ + if (UNLIKELY(s >= send)) { + assert(s == send); + break; + } + /* backslashes */ if (*s == '\\' && s+1 < send) { char* e; /* Can be used for ending '}', etc. */
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.1k
On Wed, 04 Jan 2017 05:28:35 -0800, hv wrote: [...] Show quoted text
> So I think the fix should actually look like the below. I'll do some > more testing. > > Hugo > --- a/toke.c > +++ b/toke.c > @@ -3257,7 +3257,7 @@ S_scan_const(pTHX_ char *start) > && !in_charclass > && ((PMOP*)PL_lex_inpat)->op_pmflags & > RXf_PMf_EXTENDED) > { > - while (s+1 < send && *s != '\n') > + while (s < send && *s != '\n') > *d++ = *s++; > } > > @@ -3298,6 +3298,11 @@ S_scan_const(pTHX_ char *start) > > /* End of else if chain - OP_TRANS rejoin rest */ > > + if (UNLIKELY(s >= send)) { > + assert(s == send); > + break; > + } > + > /* backslashes */ > if (*s == '\\' && s+1 < send) { > char* e; /* Can be used for ending '}', etc. */
Should be fixed by 8faf4f305a9ccdf8b6fdbfccb769dbb73e6b6d25: [perl #130495] /x comment skipping stops a byte short If that byte was part of a utf-8 character, this caused inappropriate "malformed utf8" warnings or assertions. In principle this should also skip the newline, but failing to do so is safe. Hugo
Subject: Re: [perl #130495] toke.c:3813: char *S_scan_const(char *): Assertion Failed.
CC: "Perl5 Porters (E-mail)" <perl5-porters [...] perl.org>
Date: Fri, 6 Jan 2017 16:41:00 -0600
To: Craig Berry via RT <perlbug-followup [...] perl.org>
From: "Craig A. Berry" <craig.a.berry [...] gmail.com>
Download (untitled) / with headers
text/plain 1.9k
On Wed, Jan 4, 2017 at 9:03 AM, Hugo van der Sanden via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Wed, 04 Jan 2017 05:28:35 -0800, hv wrote: > [...]
>> So I think the fix should actually look like the below. I'll do some >> more testing. >> >> Hugo >> --- a/toke.c >> +++ b/toke.c >> @@ -3257,7 +3257,7 @@ S_scan_const(pTHX_ char *start) >> && !in_charclass >> && ((PMOP*)PL_lex_inpat)->op_pmflags & >> RXf_PMf_EXTENDED) >> { >> - while (s+1 < send && *s != '\n') >> + while (s < send && *s != '\n') >> *d++ = *s++; >> } >> >> @@ -3298,6 +3298,11 @@ S_scan_const(pTHX_ char *start) >> >> /* End of else if chain - OP_TRANS rejoin rest */ >> >> + if (UNLIKELY(s >= send)) { >> + assert(s == send); >> + break; >> + } >> + >> /* backslashes */ >> if (*s == '\\' && s+1 < send) { >> char* e; /* Can be used for ending '}', etc. */
> > Should be fixed by 8faf4f305a9ccdf8b6fdbfccb769dbb73e6b6d25: > [perl #130495] /x comment skipping stops a byte short > > If that byte was part of a utf-8 character, this caused inappropriate > "malformed utf8" warnings or assertions.
The new test in t/re/pat.t throws a warning: ok 827 - empty pattern in regex codeblock: produced the right exception message Wide character in print at ./test.pl line 982. ok 828 - [perl \#130495] utf-8 character at end of /x comment should not misparse That's in fresh_perl() in t/test.pl where we write out the test program to a file like so: open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!"; print TEST $prog; close TEST or die "Cannot close $tmpfile: $!"; Surely we have other places where there are UTF-8 characters embedded in a test program? Opening the file with '>:utf8' fixes this test but breaks a lot of other things and I'm not sure it's safe to do that within the primitive confines of test.pl anyway. Suggestions?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.1k
On Fri, 06 Jan 2017 14:41:29 -0800, craig.a.berry@gmail.com wrote: Show quoted text
> The new test in t/re/pat.t throws a warning: > > ok 827 - empty pattern in regex codeblock: produced the right > exception message > Wide character in print at ./test.pl line 982. > ok 828 - [perl \#130495] utf-8 character at end of /x comment should > not misparse > > That's in fresh_perl() in t/test.pl where we write out the test > program to a file like so: > > open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!"; > print TEST $prog; > close TEST or die "Cannot close $tmpfile: $!"; > > Surely we have other places where there are UTF-8 characters embedded > in a test program? Opening the file with '>:utf8' fixes this test but > breaks a lot of other things and I'm not sure it's safe to do that > within the primitive confines of test.pl anyway. Suggestions?
I propose the patch below; I think it should be safe but would welcome a second opinion. Hugo commit ef6b07d107b3e73a8c13536ba41cc386f2bf075c Author: Hugo van der Sanden <hv@crypt.org> Date: Sat Jan 7 01:27:50 2017 +0000 [perl #130495] add fresh_perl() option for prog with embedded utf8 Support { wide_chars => 1 } in the args hash to fresh_perl*() functions, and use it for the recently added test in re/pat.t diff --git a/t/re/pat.t b/t/re/pat.t index a72989f..c5de2cd 100644 --- a/t/re/pat.t +++ b/t/re/pat.t @@ -1864,7 +1864,7 @@ EOF_CODE # [perl #130495] /x comment skipping stopped a byte short, leading # to assertion failure or 'malformed utf-8 character" warning fresh_perl_is( - "use utf8; m{a#\x{124}}x", '', {}, + "use utf8; m{a#\x{124}}x", '', {wide_chars => 1}, '[perl #130495] utf-8 character at end of /x comment should not misparse', ); } diff --git a/t/test.pl b/t/test.pl index 32d0d64..d86f633 100644 --- a/t/test.pl +++ b/t/test.pl @@ -979,6 +979,7 @@ sub fresh_perl { $runperl_args->{stderr} = 1 unless exists $runperl_args->{stderr}; open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!"; + binmode TEST, ':utf8' if $runperl_args->{wide_chars}; print TEST $prog; close TEST or die "Cannot close $tmpfile: $!";
Subject: Re: [perl #130495] toke.c:3813: char *S_scan_const(char *): Assertion Failed.
CC: "Perl5 Porters (E-mail)" <perl5-porters [...] perl.org>
From: "Craig A. Berry" <craig.a.berry [...] gmail.com>
Date: Sat, 7 Jan 2017 15:37:26 -0600
To: Craig Berry via RT <perlbug-followup [...] perl.org>
Download (untitled) / with headers
text/plain 2.4k
On Fri, Jan 6, 2017 at 7:32 PM, Hugo van der Sanden via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Fri, 06 Jan 2017 14:41:29 -0800, craig.a.berry@gmail.com wrote:
>> The new test in t/re/pat.t throws a warning: >> >> ok 827 - empty pattern in regex codeblock: produced the right >> exception message >> Wide character in print at ./test.pl line 982. >> ok 828 - [perl \#130495] utf-8 character at end of /x comment should >> not misparse >> >> That's in fresh_perl() in t/test.pl where we write out the test >> program to a file like so: >> >> open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!"; >> print TEST $prog; >> close TEST or die "Cannot close $tmpfile: $!"; >> >> Surely we have other places where there are UTF-8 characters embedded >> in a test program? Opening the file with '>:utf8' fixes this test but >> breaks a lot of other things and I'm not sure it's safe to do that >> within the primitive confines of test.pl anyway. Suggestions?
> > > I propose the patch below; I think it should be safe but would welcome a second opinion. > > Hugo > > commit ef6b07d107b3e73a8c13536ba41cc386f2bf075c > Author: Hugo van der Sanden <hv@crypt.org> > Date: Sat Jan 7 01:27:50 2017 +0000 > > [perl #130495] add fresh_perl() option for prog with embedded utf8 > > Support { wide_chars => 1 } in the args hash to fresh_perl*() functions, > and use it for the recently added test in re/pat.t > > diff --git a/t/re/pat.t b/t/re/pat.t > index a72989f..c5de2cd 100644 > --- a/t/re/pat.t > +++ b/t/re/pat.t > @@ -1864,7 +1864,7 @@ EOF_CODE > # [perl #130495] /x comment skipping stopped a byte short, leading > # to assertion failure or 'malformed utf-8 character" warning > fresh_perl_is( > - "use utf8; m{a#\x{124}}x", '', {}, > + "use utf8; m{a#\x{124}}x", '', {wide_chars => 1}, > '[perl #130495] utf-8 character at end of /x comment should not misparse', > ); > } > diff --git a/t/test.pl b/t/test.pl > index 32d0d64..d86f633 100644 > --- a/t/test.pl > +++ b/t/test.pl > @@ -979,6 +979,7 @@ sub fresh_perl { > $runperl_args->{stderr} = 1 unless exists $runperl_args->{stderr}; > > open TEST, '>', $tmpfile or die "Cannot open $tmpfile: $!"; > + binmode TEST, ':utf8' if $runperl_args->{wide_chars}; > print TEST $prog; > close TEST or die "Cannot close $tmpfile: $!";
I like the opt-in idea so only tests that specify this get it. Looks like a good solution to me.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 310b
On Sat, 07 Jan 2017 13:37:53 -0800, craig.a.berry@gmail.com wrote: Show quoted text
> I like the opt-in idea so only tests that specify this get it. Looks > like a good solution to me.
Ok, I checked stderr of a full test run and saw no output relating to this, so now pushed as 98c155b53e21e5d8931df3c4340b2b3f986d3ac7. Hugo
Download (untitled) / with headers
text/plain 313b
Thank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been resolved. Perl 5.26.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.26.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org