New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Coverity: regexec.c etc: off-by-one error #13757
Comments
From @jhi- the "c1 > 256" was off-by-one, it needed to be "c1 > 255", Attached. |
From @jhi0005-Fix-for-Coverity-perl5-CID-29033.patchFrom 2bfa09bfd0d93131aa0da7926a49d16ca54970aa Mon Sep 17 00:00:00 2001
From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Mon, 21 Apr 2014 17:07:20 -0400
Subject: [PATCH 5/9] Fix for Coverity perl5 CID 29033: Out-of-bounds read
(OVERRUN) overrun-local: Overrunning array PL_fold_locale of 256 bytes at
byte offset 256 using index c1 (which evaluates to 256).
- the "c1 > 256" was off-by-one, it needed to be "c1 > 255",
it could have caused the PL_fold_locale to be accessed one past the end,
at offset 256, but we have dodged the bullet thanks to the regex engine
optimizing the bad case away before we hit it (analysis by Karl Williamson):
regexec.c
- comment fixes (pointed out by Karl Williamson): regexec.c
- add tests to nail down the behaviour of fold matching
for the last of Latin-1 (0xFF, lowercase which curiously does not have
uppercase within Latin-1). and the first pure Unicode: t/re/pat.t
---
regexec.c | 8 ++++----
t/re/pat.t | 26 +++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/regexec.c b/regexec.c
index 4ed2ba9..6b31aad 100644
--- a/regexec.c
+++ b/regexec.c
@@ -3693,7 +3693,7 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
}
else { /* an EXACTFish node which doesn't begin with a multi-char fold */
c1 = is_utf8_pat ? valid_utf8_to_uvchr(pat, NULL) : *pat;
- if (c1 > 256) {
+ if (c1 > 255) {
/* Load the folds hash, if not already done */
SV** listp;
if (! PL_utf8_foldclosures) {
@@ -3746,10 +3746,10 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
/* Folds that cross the 255/256 boundary are forbidden
* if EXACTFL (and isnt a UTF8 locale), or EXACTFA and
* one is ASCIII. Since the pattern character is above
- * 256, and its only other match is below 256, the only
+ * 255, and its only other match is below 256, the only
* legal match will be to itself. We have thrown away
* the original, so have to compute which is the one
- * above 255 */
+ * above 255. */
if ((c1 < 256) != (c2 < 256)) {
if ((OP(text_node) == EXACTFL
&& ! IN_UTF8_CTYPE_LOCALE)
@@ -3768,7 +3768,7 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
}
}
}
- else /* Here, c1 is < 255 */
+ else /* Here, c1 is <= 255 */
if (utf8_target
&& HAS_NONLATIN1_FOLD_CLOSURE(c1)
&& ( ! (OP(text_node) == EXACTFL && ! IN_UTF8_CTYPE_LOCALE))
diff --git a/t/re/pat.t b/t/re/pat.t
index 04f8b84..81cb64b 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 721; # Update this when adding/deleting tests.
+plan tests => 733; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1580,7 +1580,31 @@ EOP
like "\x{AA}", qr/a?[\W_]/d, "\\W with /d synthetic start class works";
}
+ {
+ # Verify that the very last Latin-1 U+00FF
+ # (LATIN SMALL LETTER Y WITH DIAERESIS)
+ # and its UPPER counterpart (U+0178 which is pure Unicode),
+ # and likewise for the very first pure Unicode
+ # (LATIN CAPITAL LETTER A WITH MACRON) fold-match properly,
+ # and there are no off-by-one logic errors in the transition zone.
+
+ ok("\xFF" =~ /\xFF/i, "Y WITH DIAERESIS l =~ l");
+ ok("\xFF" =~ /\x{178}/i, "Y WITH DIAERESIS l =~ u");
+ ok("\x{178}" =~ /\xFF/i, "Y WITH DIAERESIS u =~ l");
+ ok("\x{178}" =~ /\x{178}/i, "Y WITH DIAERESIS u =~ u");
+ # U+00FF with U+05D0 (non-casing Hebrew letter).
+ ok("\xFF\x{5D0}" =~ /\xFF\x{5D0}/i, "Y WITH DIAERESIS l =~ l");
+ ok("\xFF\x{5D0}" =~ /\x{178}\x{5D0}/i, "Y WITH DIAERESIS l =~ u");
+ ok("\x{178}\x{5D0}" =~ /\xFF\x{5D0}/i, "Y WITH DIAERESIS u =~ l");
+ ok("\x{178}\x{5D0}" =~ /\x{178}\x{5D0}/i, "Y WITH DIAERESIS u =~ u");
+
+ # U+0100.
+ ok("\x{100}" =~ /\x{100}/i, "A WITH MACRON u =~ u");
+ ok("\x{100}" =~ /\x{101}/i, "A WITH MACRON u =~ l");
+ ok("\x{101}" =~ /\x{100}/i, "A WITH MACRON l =~ u");
+ ok("\x{101}" =~ /\x{101}/i, "A WITH MACRON l =~ l");
+ }
} # End of sub run_tests
--
1.8.5.2 (Apple Git-48)
|
From @jhiOn Tuesday-201404-22, 20:28, perlbug-followup@perl.org wrote:
Better summary line, patch attached. |
From @jhi0001-Off-by-one-in-PL_fold_locale-use.patchFrom 25977d2f914c0f17a4131f85c5cf8c024ba37452 Mon Sep 17 00:00:00 2001
From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Wed, 7 May 2014 09:19:00 -0400
Subject: [PATCH] Off-by-one in PL_fold_locale use.
Fix for Coverity perl5 CID 29033: Out-of-bounds read
(OVERRUN) overrun-local: Overrunning array PL_fold_locale of 256 bytes at
byte offset 256 using index c1 (which evaluates to 256).
- the "c1 > 256" was off-by-one, it needed to be "c1 > 255",
it could have caused the PL_fold_locale to be accessed one past the end,
at offset 256, but we have dodged the bullet thanks to the regex engine
optimizing the bad case away before we hit it (analysis by Karl Williamson):
regexec.c
- comment fixes (pointed out by Karl Williamson): regexec.c
- add tests to nail down the behaviour of fold matching
for the last of Latin-1 (0xFF, lowercase which curiously does not have
uppercase within Latin-1). and the first pure Unicode: t/re/pat.t
---
regexec.c | 8 ++++----
t/re/pat.t | 26 +++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/regexec.c b/regexec.c
index 4ed2ba9..6b31aad 100644
--- a/regexec.c
+++ b/regexec.c
@@ -3693,7 +3693,7 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
}
else { /* an EXACTFish node which doesn't begin with a multi-char fold */
c1 = is_utf8_pat ? valid_utf8_to_uvchr(pat, NULL) : *pat;
- if (c1 > 256) {
+ if (c1 > 255) {
/* Load the folds hash, if not already done */
SV** listp;
if (! PL_utf8_foldclosures) {
@@ -3746,10 +3746,10 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
/* Folds that cross the 255/256 boundary are forbidden
* if EXACTFL (and isnt a UTF8 locale), or EXACTFA and
* one is ASCIII. Since the pattern character is above
- * 256, and its only other match is below 256, the only
+ * 255, and its only other match is below 256, the only
* legal match will be to itself. We have thrown away
* the original, so have to compute which is the one
- * above 255 */
+ * above 255. */
if ((c1 < 256) != (c2 < 256)) {
if ((OP(text_node) == EXACTFL
&& ! IN_UTF8_CTYPE_LOCALE)
@@ -3768,7 +3768,7 @@ S_setup_EXACTISH_ST_c1_c2(pTHX_ const regnode * const text_node, int *c1p,
}
}
}
- else /* Here, c1 is < 255 */
+ else /* Here, c1 is <= 255 */
if (utf8_target
&& HAS_NONLATIN1_FOLD_CLOSURE(c1)
&& ( ! (OP(text_node) == EXACTFL && ! IN_UTF8_CTYPE_LOCALE))
diff --git a/t/re/pat.t b/t/re/pat.t
index 04f8b84..81cb64b 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 721; # Update this when adding/deleting tests.
+plan tests => 733; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1580,7 +1580,31 @@ EOP
like "\x{AA}", qr/a?[\W_]/d, "\\W with /d synthetic start class works";
}
+ {
+ # Verify that the very last Latin-1 U+00FF
+ # (LATIN SMALL LETTER Y WITH DIAERESIS)
+ # and its UPPER counterpart (U+0178 which is pure Unicode),
+ # and likewise for the very first pure Unicode
+ # (LATIN CAPITAL LETTER A WITH MACRON) fold-match properly,
+ # and there are no off-by-one logic errors in the transition zone.
+
+ ok("\xFF" =~ /\xFF/i, "Y WITH DIAERESIS l =~ l");
+ ok("\xFF" =~ /\x{178}/i, "Y WITH DIAERESIS l =~ u");
+ ok("\x{178}" =~ /\xFF/i, "Y WITH DIAERESIS u =~ l");
+ ok("\x{178}" =~ /\x{178}/i, "Y WITH DIAERESIS u =~ u");
+ # U+00FF with U+05D0 (non-casing Hebrew letter).
+ ok("\xFF\x{5D0}" =~ /\xFF\x{5D0}/i, "Y WITH DIAERESIS l =~ l");
+ ok("\xFF\x{5D0}" =~ /\x{178}\x{5D0}/i, "Y WITH DIAERESIS l =~ u");
+ ok("\x{178}\x{5D0}" =~ /\xFF\x{5D0}/i, "Y WITH DIAERESIS u =~ l");
+ ok("\x{178}\x{5D0}" =~ /\x{178}\x{5D0}/i, "Y WITH DIAERESIS u =~ u");
+
+ # U+0100.
+ ok("\x{100}" =~ /\x{100}/i, "A WITH MACRON u =~ u");
+ ok("\x{100}" =~ /\x{101}/i, "A WITH MACRON u =~ l");
+ ok("\x{101}" =~ /\x{100}/i, "A WITH MACRON l =~ u");
+ ok("\x{101}" =~ /\x{101}/i, "A WITH MACRON l =~ l");
+ }
} # End of sub run_tests
--
1.9.2
|
The RT System itself - Status changed from 'new' to 'open' |
@tsee - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#121708 (status was 'resolved')
Searchable as RT121708$
The text was updated successfully, but these errors were encountered: