New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heap-buffer-overflow Perl_utf8_hop (utf8.c:1132) #15532
Comments
From @geeknikThe attached test case triggers a heap-buffer-overflow in Perl_utf8_hop ==18525==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60600000e65f is located 1 bytes to the left of 56-byte region SUMMARY: AddressSanitizer: heap-buffer-overflow /root/perl/utf8.c:1132 |
From @geeknik |
From @tonycozOn Fri Aug 19 14:15:21 2016, brian.carpenter@gmail.com wrote:
A shorter test case attached. 00000000 42 45 47 49 4e 7b 24 5e 48 3d 30 78 38 30 30 30 |BEGIN{$^H=0x8000| Also detected by valgrind: tony@mars:.../git/perl$ valgrind ./perl ../129000.pl As this requires feeding code to the perl parser I don't think it's security It's still a bug, I suspect we're calling functions that are intended to Tony |
From @tonycoz |
The RT System itself - Status changed from 'new' to 'open' |
From @geeknikThe attached test case triggers a heap-buffer-overflow in Perl_utf8_hop od -tx1 test86 ==8516==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000ea0f is located 1 bytes to the left of 32-byte region SUMMARY: AddressSanitizer: heap-buffer-overflow /root/perl/./inline.h:497 |
From @hvdsThe test program sets HINT_UTF8 via In any case, I don't believe it represents a security issue, just a way to shoot yourself in the foot. Hugo |
The RT System itself - Status changed from 'new' to 'open' |
From @tonycozOn Mon Aug 22 18:19:40 2016, tonyc wrote:
Since this isn't a security issue, I've moved it to the public queue. I'm working on a fix. Tony |
From @tonycozOn Tue Oct 18 15:52:57 2016, tonyc wrote:
Attached. Tony |
From @tonycoz0001-perl-129000-create-a-safer-utf8_hop.patchFrom b81f1e952df79b6b643768051d2d5a394d41cf81 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 19 Oct 2016 13:48:58 +1100
Subject: (perl #129000) create a safer utf8_hop()
Unlike utf8_hop(), utf8_hop_safe() won't navigate before the
beginning or after the end of the supplied buffer.
---
embed.fnc | 1 +
embed.h | 1 +
ext/XS-APItest/APItest.xs | 13 +++++++++++++
ext/XS-APItest/t/utf8.t | 46 ++++++++++++++++++++++++++++++++++++++++++++++
inline.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
proto.h | 6 ++++++
6 files changed, 111 insertions(+)
diff --git a/embed.fnc b/embed.fnc
index 46426b6..51aaa1e 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1725,6 +1725,7 @@ Ap |U8* |utf16_to_utf8_reversed|NN U8* p|NN U8 *d|I32 bytelen|NN I32 *newlen
AdpPR |STRLEN |utf8_length |NN const U8* s|NN const U8 *e
AipdPR |IV |utf8_distance |NN const U8 *a|NN const U8 *b
AipdPRn |U8* |utf8_hop |NN const U8 *s|SSize_t off
+AipdPRn |U8* |utf8_hop_safe |NN const U8 *s|SSize_t off|NN const U8 *start|NN const U8 *end
ApMd |U8* |utf8_to_bytes |NN U8 *s|NN STRLEN *len
Apd |int |bytes_cmp_utf8 |NN const U8 *b|STRLEN blen|NN const U8 *u \
|STRLEN ulen
diff --git a/embed.h b/embed.h
index 5df381c..0013ca8 100644
--- a/embed.h
+++ b/embed.h
@@ -732,6 +732,7 @@
#define utf16_to_utf8_reversed(a,b,c,d) Perl_utf16_to_utf8_reversed(aTHX_ a,b,c,d)
#define utf8_distance(a,b) Perl_utf8_distance(aTHX_ a,b)
#define utf8_hop Perl_utf8_hop
+#define utf8_hop_safe Perl_utf8_hop_safe
#define utf8_length(a,b) Perl_utf8_length(aTHX_ a,b)
#define utf8_to_bytes(a,b) Perl_utf8_to_bytes(aTHX_ a,b)
#define utf8_to_uvchr(a,b) Perl_utf8_to_uvchr(aTHX_ a,b)
diff --git a/ext/XS-APItest/APItest.xs b/ext/XS-APItest/APItest.xs
index 6b6f45f..17788ca 100644
--- a/ext/XS-APItest/APItest.xs
+++ b/ext/XS-APItest/APItest.xs
@@ -5538,6 +5538,19 @@ test_is_utf8_fixed_width_buf_loclen_flags(char *s, STRLEN len, U32 flags)
OUTPUT:
RETVAL
+IV
+test_utf8_hop_safe(SV *s_sv, STRLEN s_off, IV off)
+ PREINIT:
+ STRLEN len;
+ U8 *p;
+ U8 *r;
+ CODE:
+ p = (U8 *)SvPV(s_sv, len);
+ r = utf8_hop_safe(p + s_off, off, p, p + len);
+ RETVAL = r - p;
+ OUTPUT:
+ RETVAL
+
UV
test_toLOWER(UV ord)
CODE:
diff --git a/ext/XS-APItest/t/utf8.t b/ext/XS-APItest/t/utf8.t
index 0f2d9ee..3dd044e 100644
--- a/ext/XS-APItest/t/utf8.t
+++ b/ext/XS-APItest/t/utf8.t
@@ -2375,4 +2375,50 @@ foreach my $test (@tests) {
}
}
+SKIP:
+{
+ isASCII
+ or skip "These tests probably break on non-ASCII", 1;
+ my $simple = join "", "A" .. "J";
+ my $utf_ch = "\x{7fffffff}";
+ utf8::encode($utf_ch);
+ my $utf_ch_len = length $utf_ch;
+ note "utf_ch_len $utf_ch_len";
+ my $utf = $utf_ch x 10;
+ my $bad_start = substr($utf, 1);
+ # $bad_end ends with a start byte and a single continuation
+ my $bad_end = substr($utf, 0, length($utf)-$utf_ch_len+2);
+
+ # WARNING: all offsets are *byte* offsets
+ my @hop_tests =
+ (
+ # string s off expected name
+ [ $simple, 0, 5, 5, "simple in range, forward" ],
+ [ $simple, 10, -5, 5, "simple in range, backward" ],
+ [ $simple, 5, 10, 10, "simple out of range, forward" ],
+ [ $simple, 5, -10, 0, "simple out of range, backward" ],
+ [ $utf, $utf_ch_len * 5, 5, length($utf), "utf in range, forward" ],
+ [ $utf, $utf_ch_len * 5, -5, 0, "utf in range, backward" ],
+ [ $utf, $utf_ch_len * 5, 4, $utf_ch_len * 9, "utf in range b, forward" ],
+ [ $utf, $utf_ch_len * 5, -4, $utf_ch_len, "utf in range b, backward" ],
+ [ $utf, $utf_ch_len * 5, 6, length($utf), "utf out of range, forward" ],
+ [ $utf, $utf_ch_len * 5, -6, 0, "utf out of range, backward" ],
+ [ $bad_start, 0, 1, 1, "bad start, forward 1 from 0" ],
+ [ $bad_start, 0, $utf_ch_len-1, $utf_ch_len-1, "bad start, forward ch_len-1 from 0" ],
+ [ $bad_start, 0, $utf_ch_len, $utf_ch_len*2-1, "bad start, forward ch_len from 0" ],
+ [ $bad_start, $utf_ch_len-1, -1, 0, "bad start, back 1 from first start byte" ],
+ [ $bad_start, $utf_ch_len-2, -1, 0, "bad start, back 1 from before first start byte" ],
+ [ $bad_start, 0, -1, 0, "bad start, back 1 from 0" ],
+ [ $bad_start, length $bad_start, -10, 0, "bad start, back 10 from end" ],
+ [ $bad_end, 0, 10, length $bad_end, "bad end, forward 10 from 0" ],
+ [ $bad_end, length($bad_end)-1, 10, length $bad_end, "bad end, forward 1 from end-1" ],
+ );
+
+ for my $test (@hop_tests) {
+ my ($str, $s_off, $off, $want, $name) = @$test;
+ my $result = test_utf8_hop_safe($str, $s_off, $off);
+ is($result, $want, "utf8_hop_safe: $name");
+ }
+}
+
done_testing;
diff --git a/inline.h b/inline.h
index 66ba348..32ed0aa 100644
--- a/inline.h
+++ b/inline.h
@@ -920,6 +920,50 @@ Perl_utf8_hop(const U8 *s, SSize_t off)
}
/*
+=for apidoc utf8_hop_safe
+
+Return the UTF-8 pointer C<s> displaced by up to C<off> characters,
+either forward or backward.
+
+When moving backward it will not move before C<start>.
+
+When moving forward it will not move beyond C<end>.
+
+Will not exceed those limits even if the string is not valid "UTF-8".
+
+=cut
+*/
+
+PERL_STATIC_INLINE U8 *
+Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end)
+{
+ PERL_ARGS_ASSERT_UTF8_HOP_SAFE;
+
+ /* Note: cannot use UTF8_IS_...() too eagerly here since e.g
+ * the bitops (especially ~) can create illegal UTF-8.
+ * In other words: in Perl UTF-8 is not just for Unicode. */
+
+ assert(start <= s && s <= end);
+
+ if (off >= 0) {
+ while (off--) {
+ STRLEN skip = UTF8SKIP(s);
+ if ((STRLEN)(end - s) <= skip)
+ return (U8 *)end;
+ s += skip;
+ }
+ }
+ else {
+ while (off++ && s > start) {
+ s--;
+ while (UTF8_IS_CONTINUATION(*s) && s > start)
+ s--;
+ }
+ }
+ return (U8 *)s;
+}
+
+/*
=for apidoc is_utf8_valid_partial_char
diff --git a/proto.h b/proto.h
index 701dc9e..2c67126 100644
--- a/proto.h
+++ b/proto.h
@@ -3509,6 +3509,12 @@ PERL_STATIC_INLINE U8* Perl_utf8_hop(const U8 *s, SSize_t off)
#define PERL_ARGS_ASSERT_UTF8_HOP \
assert(s)
+PERL_STATIC_INLINE U8* Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end)
+ __attribute__warn_unused_result__
+ __attribute__pure__;
+#define PERL_ARGS_ASSERT_UTF8_HOP_SAFE \
+ assert(s); assert(start); assert(end)
+
PERL_CALLCONV STRLEN Perl_utf8_length(pTHX_ const U8* s, const U8 *e)
__attribute__warn_unused_result__
__attribute__pure__;
--
2.1.4
|
From @tonycoz0002-perl-129000-use-the-new-utf8_hop_safe.patchFrom 48b8f39309cc65e0c3445079a83e3e9633d596de Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 19 Oct 2016 14:20:53 +1100
Subject: (perl #129000) use the new utf8_hop_safe()
when reporting unrecognized characters in UTF mode.
This could start the display before the beginning of the line if the
line was badly encoded.
---
t/op/lex.t | 8 +++++++-
toke.c | 3 ++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/t/op/lex.t b/t/op/lex.t
index f3cb510..3d9eec7 100644
--- a/t/op/lex.t
+++ b/t/op/lex.t
@@ -7,7 +7,7 @@ use warnings;
BEGIN { chdir 't' if -d 't'; require './test.pl'; }
-plan(tests => 33);
+plan(tests => 34);
{
no warnings 'deprecated';
@@ -261,3 +261,9 @@ fresh_perl_like(
{},
'[perl #128997] - buffer read overflow'
);
+fresh_perl_like(
+ qq(BEGIN{\$^H=0x800000}\n 0m 0\xB5\xB500\xB5\0),
+ qr/Unrecognized character \\x\{0\}; marked by <-- HERE after 0m.*<-- HERE near column 12 at - line 2./,
+ {},
+ '[perl #129000] read before buffer'
+);
diff --git a/toke.c b/toke.c
index 041996f..32a2f43 100644
--- a/toke.c
+++ b/toke.c
@@ -4910,7 +4910,8 @@ Perl_yylex(pTHX)
}
len = UTF ? Perl_utf8_length(aTHX_ (U8 *) PL_linestart, (U8 *) s) : (STRLEN) (s - PL_linestart);
if (len > UNRECOGNIZED_PRECEDE_COUNT) {
- d = UTF ? (char *) utf8_hop((U8 *) s, -UNRECOGNIZED_PRECEDE_COUNT) : s - UNRECOGNIZED_PRECEDE_COUNT;
+ d = UTF ? (char *) utf8_hop_safe((U8 *) s, -UNRECOGNIZED_PRECEDE_COUNT, (U8 *)PL_linestart, (U8 *)PL_bufend)
+ : s - UNRECOGNIZED_PRECEDE_COUNT;
} else {
d = PL_linestart;
}
--
2.1.4
|
From @tonycozOn Tue Oct 18 20:51:49 2016, tonyc wrote:
Here's an alternate version. In most cases we know which direction we're going at the call Tony |
From @tonycoz0001-perl-129000-create-a-safer-utf8_hop.patchFrom d32880fb910f2d3e0d67256d4eceabaa7ac9b414 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 31 Oct 2016 14:28:34 +1100
Subject: (perl #129000) create a safer utf8_hop()
Unlike utf8_hop(), utf8_hop_safe() won't navigate before the
beginning or after the end of the supplied buffer.
The original version of this put all of the logic into
utf8_hop_safe(), but in many cases a caller specifically
needs to go forward or backward, and supplying the other limit
made the function less usable, so I split the function
into forward and backward cases.
This split may also make inlining these functions more efficient
or more likely.
---
embed.fnc | 3 ++
embed.h | 3 ++
ext/XS-APItest/APItest.pm | 2 +-
ext/XS-APItest/APItest.xs | 13 ++++++
ext/XS-APItest/t/utf8.t | 46 +++++++++++++++++++
inline.h | 111 ++++++++++++++++++++++++++++++++++++++++++++++
proto.h | 18 ++++++++
7 files changed, 195 insertions(+), 1 deletion(-)
diff --git a/embed.fnc b/embed.fnc
index 5cc73b7..f806db0 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1733,6 +1733,9 @@ Ap |U8* |utf16_to_utf8_reversed|NN U8* p|NN U8 *d|I32 bytelen|NN I32 *newlen
AdpPR |STRLEN |utf8_length |NN const U8* s|NN const U8 *e
AipdPR |IV |utf8_distance |NN const U8 *a|NN const U8 *b
AipdPRn |U8* |utf8_hop |NN const U8 *s|SSize_t off
+AipdPRn |U8* |utf8_hop_back|NN const U8 *s|SSize_t off|NN const U8 *start
+AipdPRn |U8* |utf8_hop_forward|NN const U8 *s|SSize_t off|NN const U8 *end
+AipdPRn |U8* |utf8_hop_safe |NN const U8 *s|SSize_t off|NN const U8 *start|NN const U8 *end
ApMd |U8* |utf8_to_bytes |NN U8 *s|NN STRLEN *len
Apd |int |bytes_cmp_utf8 |NN const U8 *b|STRLEN blen|NN const U8 *u \
|STRLEN ulen
diff --git a/embed.h b/embed.h
index 1af2917..2fdbff6 100644
--- a/embed.h
+++ b/embed.h
@@ -733,6 +733,9 @@
#define utf16_to_utf8_reversed(a,b,c,d) Perl_utf16_to_utf8_reversed(aTHX_ a,b,c,d)
#define utf8_distance(a,b) Perl_utf8_distance(aTHX_ a,b)
#define utf8_hop Perl_utf8_hop
+#define utf8_hop_back Perl_utf8_hop_back
+#define utf8_hop_forward Perl_utf8_hop_forward
+#define utf8_hop_safe Perl_utf8_hop_safe
#define utf8_length(a,b) Perl_utf8_length(aTHX_ a,b)
#define utf8_to_bytes(a,b) Perl_utf8_to_bytes(aTHX_ a,b)
#define utf8_to_uvchr(a,b) Perl_utf8_to_uvchr(aTHX_ a,b)
diff --git a/ext/XS-APItest/APItest.pm b/ext/XS-APItest/APItest.pm
index 64a25f1..473d4a3 100644
--- a/ext/XS-APItest/APItest.pm
+++ b/ext/XS-APItest/APItest.pm
@@ -5,7 +5,7 @@ use strict;
use warnings;
use Carp;
-our $VERSION = '0.86';
+our $VERSION = '0.87';
require XSLoader;
diff --git a/ext/XS-APItest/APItest.xs b/ext/XS-APItest/APItest.xs
index 6dbb297..a2a31ec 100644
--- a/ext/XS-APItest/APItest.xs
+++ b/ext/XS-APItest/APItest.xs
@@ -5538,6 +5538,19 @@ test_is_utf8_fixed_width_buf_loclen_flags(char *s, STRLEN len, U32 flags)
OUTPUT:
RETVAL
+IV
+test_utf8_hop_safe(SV *s_sv, STRLEN s_off, IV off)
+ PREINIT:
+ STRLEN len;
+ U8 *p;
+ U8 *r;
+ CODE:
+ p = (U8 *)SvPV(s_sv, len);
+ r = utf8_hop_safe(p + s_off, off, p, p + len);
+ RETVAL = r - p;
+ OUTPUT:
+ RETVAL
+
UV
test_toLOWER(UV ord)
CODE:
diff --git a/ext/XS-APItest/t/utf8.t b/ext/XS-APItest/t/utf8.t
index fc04dfc..9e7685e 100644
--- a/ext/XS-APItest/t/utf8.t
+++ b/ext/XS-APItest/t/utf8.t
@@ -2379,4 +2379,50 @@ foreach my $test (@tests) {
}
}
+SKIP:
+{
+ isASCII
+ or skip "These tests probably break on non-ASCII", 1;
+ my $simple = join "", "A" .. "J";
+ my $utf_ch = "\x{7fffffff}";
+ utf8::encode($utf_ch);
+ my $utf_ch_len = length $utf_ch;
+ note "utf_ch_len $utf_ch_len";
+ my $utf = $utf_ch x 10;
+ my $bad_start = substr($utf, 1);
+ # $bad_end ends with a start byte and a single continuation
+ my $bad_end = substr($utf, 0, length($utf)-$utf_ch_len+2);
+
+ # WARNING: all offsets are *byte* offsets
+ my @hop_tests =
+ (
+ # string s off expected name
+ [ $simple, 0, 5, 5, "simple in range, forward" ],
+ [ $simple, 10, -5, 5, "simple in range, backward" ],
+ [ $simple, 5, 10, 10, "simple out of range, forward" ],
+ [ $simple, 5, -10, 0, "simple out of range, backward" ],
+ [ $utf, $utf_ch_len * 5, 5, length($utf), "utf in range, forward" ],
+ [ $utf, $utf_ch_len * 5, -5, 0, "utf in range, backward" ],
+ [ $utf, $utf_ch_len * 5, 4, $utf_ch_len * 9, "utf in range b, forward" ],
+ [ $utf, $utf_ch_len * 5, -4, $utf_ch_len, "utf in range b, backward" ],
+ [ $utf, $utf_ch_len * 5, 6, length($utf), "utf out of range, forward" ],
+ [ $utf, $utf_ch_len * 5, -6, 0, "utf out of range, backward" ],
+ [ $bad_start, 0, 1, 1, "bad start, forward 1 from 0" ],
+ [ $bad_start, 0, $utf_ch_len-1, $utf_ch_len-1, "bad start, forward ch_len-1 from 0" ],
+ [ $bad_start, 0, $utf_ch_len, $utf_ch_len*2-1, "bad start, forward ch_len from 0" ],
+ [ $bad_start, $utf_ch_len-1, -1, 0, "bad start, back 1 from first start byte" ],
+ [ $bad_start, $utf_ch_len-2, -1, 0, "bad start, back 1 from before first start byte" ],
+ [ $bad_start, 0, -1, 0, "bad start, back 1 from 0" ],
+ [ $bad_start, length $bad_start, -10, 0, "bad start, back 10 from end" ],
+ [ $bad_end, 0, 10, length $bad_end, "bad end, forward 10 from 0" ],
+ [ $bad_end, length($bad_end)-1, 10, length $bad_end, "bad end, forward 1 from end-1" ],
+ );
+
+ for my $test (@hop_tests) {
+ my ($str, $s_off, $off, $want, $name) = @$test;
+ my $result = test_utf8_hop_safe($str, $s_off, $off);
+ is($result, $want, "utf8_hop_safe: $name");
+ }
+}
+
done_testing;
diff --git a/inline.h b/inline.h
index 66ba348..adcd85d 100644
--- a/inline.h
+++ b/inline.h
@@ -920,6 +920,117 @@ Perl_utf8_hop(const U8 *s, SSize_t off)
}
/*
+=for apidoc utf8_hop_forward
+
+Return the UTF-8 pointer C<s> displaced by up to C<off> characters,
+forward.
+
+C<off> must be non-negative.
+
+C<s> must be before or equal to C<end>.
+
+When moving forward it will not move beyond C<end>.
+
+Will not exceed this limit even if the string is not valid "UTF-8".
+
+=cut
+*/
+
+PERL_STATIC_INLINE U8 *
+Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end)
+{
+ PERL_ARGS_ASSERT_UTF8_HOP_FORWARD;
+
+ /* Note: cannot use UTF8_IS_...() too eagerly here since e.g
+ * the bitops (especially ~) can create illegal UTF-8.
+ * In other words: in Perl UTF-8 is not just for Unicode. */
+
+ assert(s <= end);
+ assert(off >= 0);
+
+ while (off--) {
+ STRLEN skip = UTF8SKIP(s);
+ if ((STRLEN)(end - s) <= skip)
+ return (U8 *)end;
+ s += skip;
+ }
+
+ return (U8 *)s;
+}
+
+/*
+=for apidoc utf8_hop_back
+
+Return the UTF-8 pointer C<s> displaced by up to C<off> characters,
+backward.
+
+C<off> must be non-positive.
+
+C<s> must be after or equal to C<start>.
+
+When moving backward it will not move before C<start>.
+
+Will not exceed this limit even if the string is not valid "UTF-8".
+
+=cut
+*/
+
+PERL_STATIC_INLINE U8 *
+Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start)
+{
+ PERL_ARGS_ASSERT_UTF8_HOP_BACK;
+
+ /* Note: cannot use UTF8_IS_...() too eagerly here since e.g
+ * the bitops (especially ~) can create illegal UTF-8.
+ * In other words: in Perl UTF-8 is not just for Unicode. */
+
+ assert(start <= s);
+ assert(off <= 0);
+
+ while (off++ && s > start) {
+ s--;
+ while (UTF8_IS_CONTINUATION(*s) && s > start)
+ s--;
+ }
+
+ return (U8 *)s;
+}
+
+/*
+=for apidoc utf8_hop_safe
+
+Return the UTF-8 pointer C<s> displaced by up to C<off> characters,
+either forward or backward.
+
+When moving backward it will not move before C<start>.
+
+When moving forward it will not move beyond C<end>.
+
+Will not exceed those limits even if the string is not valid "UTF-8".
+
+=cut
+*/
+
+PERL_STATIC_INLINE U8 *
+Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end)
+{
+ PERL_ARGS_ASSERT_UTF8_HOP_SAFE;
+
+ /* Note: cannot use UTF8_IS_...() too eagerly here since e.g
+ * the bitops (especially ~) can create illegal UTF-8.
+ * In other words: in Perl UTF-8 is not just for Unicode. */
+
+ assert(start <= s && s <= end);
+
+ if (off >= 0) {
+ return utf8_hop_forward(s, off, end);
+ }
+ else {
+ return utf8_hop_back(s, off, start);
+ }
+}
+
+/*
=for apidoc is_utf8_valid_partial_char
diff --git a/proto.h b/proto.h
index 1d79c46..ce7a92c 100644
--- a/proto.h
+++ b/proto.h
@@ -3512,6 +3512,24 @@ PERL_STATIC_INLINE U8* Perl_utf8_hop(const U8 *s, SSize_t off)
#define PERL_ARGS_ASSERT_UTF8_HOP \
assert(s)
+PERL_STATIC_INLINE U8* Perl_utf8_hop_back(const U8 *s, SSize_t off, const U8 *start)
+ __attribute__warn_unused_result__
+ __attribute__pure__;
+#define PERL_ARGS_ASSERT_UTF8_HOP_BACK \
+ assert(s); assert(start)
+
+PERL_STATIC_INLINE U8* Perl_utf8_hop_forward(const U8 *s, SSize_t off, const U8 *end)
+ __attribute__warn_unused_result__
+ __attribute__pure__;
+#define PERL_ARGS_ASSERT_UTF8_HOP_FORWARD \
+ assert(s); assert(end)
+
+PERL_STATIC_INLINE U8* Perl_utf8_hop_safe(const U8 *s, SSize_t off, const U8 *start, const U8 *end)
+ __attribute__warn_unused_result__
+ __attribute__pure__;
+#define PERL_ARGS_ASSERT_UTF8_HOP_SAFE \
+ assert(s); assert(start); assert(end)
+
PERL_CALLCONV STRLEN Perl_utf8_length(pTHX_ const U8* s, const U8 *e)
__attribute__warn_unused_result__
__attribute__pure__;
--
2.1.4
|
From @tonycoz0002-perl-129000-use-the-new-utf8_hop_safe.patchFrom d8f9ff90f4f432b887308b04139ba6ba39887876 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 31 Oct 2016 14:49:17 +1100
Subject: (perl #129000) use the new utf8_hop_safe()
when reporting unrecognized characters in UTF mode.
---
t/op/lex.t | 8 +++++++-
toke.c | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/t/op/lex.t b/t/op/lex.t
index db0cf3a..a0ed5cb 100644
--- a/t/op/lex.t
+++ b/t/op/lex.t
@@ -7,7 +7,7 @@ use warnings;
BEGIN { chdir 't' if -d 't'; require './test.pl'; }
-plan(tests => 32);
+plan(tests => 33);
{
no warnings 'deprecated';
@@ -255,3 +255,9 @@ fresh_perl_is(
{},
'[perl #128996] - use of PL_op after op is freed'
);
+fresh_perl_like(
+ qq(BEGIN{\$^H=0x800000}\n 0m 0\xB5\xB500\xB5\0),
+ qr/Unrecognized character \\x\{0\}; marked by <-- HERE after 0m.*<-- HERE near column 12 at - line 2./,
+ {},
+ '[perl #129000] read before buffer'
+);
diff --git a/toke.c b/toke.c
index b2d9209..261764b 100644
--- a/toke.c
+++ b/toke.c
@@ -4908,7 +4908,7 @@ Perl_yylex(pTHX)
: Perl_form(aTHX_ "\\x%02X", (unsigned char)*s);
len = UTF ? Perl_utf8_length(aTHX_ (U8 *) PL_linestart, (U8 *) s) : (STRLEN) (s - PL_linestart);
if (len > UNRECOGNIZED_PRECEDE_COUNT) {
- d = UTF ? (char *) utf8_hop((U8 *) s, -UNRECOGNIZED_PRECEDE_COUNT) : s - UNRECOGNIZED_PRECEDE_COUNT;
+ d = UTF ? (char *) utf8_hop_back((U8 *) s, -UNRECOGNIZED_PRECEDE_COUNT, (U8 *)PL_linestart) : s - UNRECOGNIZED_PRECEDE_COUNT;
} else {
d = PL_linestart;
}
--
2.1.4
|
From @tonycozOn Sun, 30 Oct 2016 21:02:22 -0700, tonyc wrote:
Applied as 65df57a and cbe6b21. Also added a skip to skip the new test and a few others that depend on UTF-8 Tony |
@tonycoz - Status changed from 'open' to 'pending release' |
From @tonycozOn Thu, 06 Oct 2016 07:34:29 -0700, hv wrote:
This is a duplicate of 129000 which was fixed in cbe6b21. Merging into 129000. Tony |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been Perl 5.26.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#129000 (status was 'resolved')
Searchable as RT129000$
The text was updated successfully, but these errors were encountered: