Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl: missing / incorrect overflow check when indexing arrays #9791

Open
p5pRT opened this issue Jul 10, 2009 · 10 comments
Open

perl: missing / incorrect overflow check when indexing arrays #9791

p5pRT opened this issue Jul 10, 2009 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 10, 2009

Migrated from rt.perl.org#67424 (status was 'open')

Searchable as RT67424$

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2009

From tokkee@debian.org

Package​: perl
Version​: 5.8.8-7etch6
Severity​: normal

Hi,

Perl does not handle large array indexes correctly​:

% perl -e '$a[23] = "foo\n"; print $a[1 + ~1];'
foo

FTR​: That '1 + ~1' produces expected results when not used as an array
index​:

% perl -e 'print 1 + ~1, $/;'
4294967295

While I do not expect Perl to handle arbitrary large array indexes, it
would be very nice to get a warning in that case instead of a random
(well, it actually seems to be always the last) element of the array.

Cheers,
Sebastian

--
Sebastian "tokkee" Harl +++ GnuPG-ID​: 0x8501C7FC +++ http​://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2009

From zefram@fysh.org

Sebastian Harl wrote​:

Perl does not handle large array indexes correctly​:

% perl -e '$a[23] = "foo\n"; print $a[1 + ~1];'
foo

When done in signed (twos-complement) integer arithmetic, ~1 yields -2,
so 1+~1 yields -1. So not an entirely unexpected result. The issue here
is that perl is using as a native integer a value that is actually outside
the native integer range. You get a similar result if you use 1e100
as the array index​: it gets clamped to 2^32-1, which gets reinterpreted
as -1. Using -1e100 gives a different result​: it is clamped to -2^31,
which stays the same when interpreted as signed (because it already is),
and you index way off the beginning of the array.

Perl does this sort of thing, presumably just SvIV(), all over the place.
It's a general problem, not specific to array indexing.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2009

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From @dcollinsn

On Fri Jul 10 09​:24​:52 2009, zefram@​fysh.org wrote​:

Perl does this sort of thing, presumably just SvIV(), all over the place.
It's a general problem, not specific to array indexing.

-zefram

I tracked down the specific cases where we have this issue, and added a new macro to use when we need an IV but need not to overflow. I called it SvIV_coerce, because I'm bad at naming things, and it looks like this​:

+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+ (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+ (IV_MAX) : (SvIV(sv)))

I profiled a few testcases before and after this patch - most of them slow down, but it's the formerly "buggy" testcases that are the worst, and the cases like $a[1] and $ind = 1; $a[$ind] are roughly the same as previously. On the whole, the array indexing that always worked has slowed down by about 0-5%.

The actual patch is attached. I know that there are a number of other places that might need this treatment, and I'd by happy to respond to other RT tickets as they appear, but I don't have the bandwidth to audit all the usages of SvIV() right now.

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From @dcollinsn

0001-RT-67424-Prevent-UV-NV-to-IV-overflow-in-array-index.patch
From b68cac44bf706b9c15d6d3e00919e7502a15a5dc Mon Sep 17 00:00:00 2001
From: Dan Collins <dcollinsn@gmail.com>
Date: Wed, 20 Jul 2016 15:51:20 -0400
Subject: [PATCH] [RT #67424] Prevent UV/NV to IV overflow in array index

This longstanding bug arises from the use of SvIV() to turn a
UV or NV into an IV for array indexes. (IV)UV_MAX becomes -1,
turning an obviously bad array index into a valid but
surprising one.

This patch defines a new preprocessor macro, SvIV_coerce. It
is like SvIV, but if it gets a UV or NV, it checks if the
SvUV or SvNV is greater than IV_MAX. If so, it returns IV_MAX.

I've only tried to solve the present bug. It is noted in the
ticket that many other uses of SvIV are similarly flawed, and
surely an audit of those uses - or even of other tickets in
RT - will reveal other places where we should use SvIV_coerce.
---
 op.c         |  5 +++--
 pp_hot.c     |  5 +++--
 sv.h         |  8 ++++++++
 t/op/array.t | 21 ++++++++++++++++++++-
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/op.c b/op.c
index 18692e5..4970291 100644
--- a/op.c
+++ b/op.c
@@ -12852,9 +12852,10 @@ S_maybe_multideref(pTHX_ OP *start, OP *orig_o, UV orig_action, U8 hints)
                         /* it's a constant array index */
                         IV iv;
                         SV *ix_sv = cSVOPo->op_sv;
+
                         if (!SvIOK(ix_sv))
                             break;
-                        iv = SvIV(ix_sv);
+                        iv = SvIV_coerce(ix_sv);
 
                         if (   action_count == 0
                             && iv >= -128
@@ -13999,7 +14000,7 @@ Perl_rpeep(pTHX_ OP *o)
 		    pop->op_next->op_type == OP_AELEM &&
 		    !(pop->op_next->op_private &
 		      (OPpLVAL_INTRO|OPpLVAL_DEFER|OPpDEREF|OPpMAYBE_LVSUB)) &&
-		    (i = SvIV(((SVOP*)pop)->op_sv)) >= -128 && i <= 127)
+		    (i = SvIV_coerce(((SVOP*)pop)->op_sv)) >= -128 && i <= 127)
 		{
 		    GV *gv;
 		    if (cSVOPx(pop)->op_private & OPpCONST_STRICT)
diff --git a/pp_hot.c b/pp_hot.c
index 8734687..313b63e 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -2352,7 +2352,8 @@ PP(pp_multideref)
                      * tie or overload execution), its value will be
                      * meaningless apart from just here */
                     PL_multideref_pc = items;
-                    elem = SvIV(elemsv);
+                    elem = SvIV_coerce(elemsv);
+
                     break;
                 }
 
@@ -4028,7 +4029,7 @@ PP(pp_aelem)
     dSP;
     SV** svp;
     SV* const elemsv = POPs;
-    IV elem = SvIV(elemsv);
+    IV elem = SvIV_coerce(elemsv);
     AV *const av = MUTABLE_AV(POPs);
     const U32 lval = PL_op->op_flags & OPf_MOD || LVRET;
     const U32 defer = PL_op->op_private & OPpLVAL_DEFER;
diff --git a/sv.h b/sv.h
index bfda6bf..7bf108c 100644
--- a/sv.h
+++ b/sv.h
@@ -1531,6 +1531,10 @@ version which guarantees to evaluate C<sv> only once.
 =for apidoc Am|IV|SvIV_nomg|SV* sv
 Like C<SvIV> but doesn't process magic.
 
+=for apidoc Am|IV|SvIV_coerce|SV* sv
+Like C<SvIV>, but guarantees that large UVs and
+NVs will return IV_MAX instead of a negative value.
+
 =for apidoc Am|IV|SvIVx|SV* sv
 Coerces the given SV to an integer and returns it.
 Guarantees to evaluate C<sv> only once.  Only use
@@ -1648,6 +1652,10 @@ Like C<sv_utf8_upgrade>, but doesn't do magic on C<sv>.
 #define SvUV_nomg(sv) (SvIOK(sv) ? SvUVX(sv) : sv_2uv_flags(sv, 0))
 #define SvNV_nomg(sv) (SvNOK(sv) ? SvNVX(sv) : sv_2nv_flags(sv, 0))
 
+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+                          (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+                         (IV_MAX) : (SvIV(sv)))
+
 /* ----*/
 
 #define SvPV(sv, lp)         SvPV_flags(sv, lp, SV_GMAGIC)
diff --git a/t/op/array.t b/t/op/array.t
index c8513d1..fb13bdd 100644
--- a/t/op/array.t
+++ b/t/op/array.t
@@ -6,7 +6,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan (173);
+plan (179);
 
 #
 # @foo, @bar, and @ary are also used from tie-stdarray after tie-ing them
@@ -558,4 +558,23 @@ is $#foo, 3, 'assigning to arylen aliased in foreach(scalar $#arylen)';
 sub { undef *_; shift }->(); # This would crash; no ok() necessary.
 sub { undef *_; pop   }->();
 
+# [RT #67424] - Array elem should not overflow from large UV or
+#               NV to negative IV
+
+{
+    my @a = (0..20);
+    my $ind;
+
+    is($x[1+~1], undef, '$x[1+~1] is not $x[-1]');
+    ok(!defined $x[1+~1+0.01], '$x[1+~1+0.01] is not $x[-1]');
+    ok(!defined $x[1+~1-0.01], '$x[1+~1-0.01] is not $x[-2]');
+
+    $ind = 1+~1;
+    is($x[$ind], undef, '$ind=1+~1; $x[$ind] is not $x[-1]');
+    $ind = 1+~1+0.01;
+    ok(!defined $x[$ind], '$ind=1+~1+0.01; $x[$ind] is not $x[-1]');
+    $ind = 1+~1-0.01;
+    ok(!defined $x[$ind], '$ind=1+~1-0.01; $x[$ind] is not $x[-2]');
+}
+
 "We're included by lib/Tie/Array/std.t so we need to return something true";
-- 
2.8.1

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From zefram@fysh.org

Dan Collins via RT wrote​:

+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+ (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+ (IV_MAX) : (SvIV(sv)))

This screws up on magic. You need a more sophisticated approach to
such a complex read operation. Also, it doesn't handle NVs that are
too negative to fit into IV.

I'm dubious about clamping the value, for indexing operations. It's not
actually producing a correct index value; you're just hoping that a large
index will behave the same as an even larger one would. It seems to me
that there should be a distinct code path for out-of-range values.

+=for apidoc Am|IV|SvIV_coerce|SV* sv
+Like C<SvIV>, but guarantees that large UVs and
+NVs will return IV_MAX instead of a negative value.

That's not very clear. "Guarantees that" suggests that it's ensuring
something that could have happened anyway, which it couldn't in this case.
I think you want some "except that", and you need something more specific
than "large".

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From @dcollinsn

On Wed Jul 20 18​:14​:08 2016, zefram@​fysh.org wrote​:

Dan Collins via RT wrote​:

+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+ (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+ (IV_MAX) : (SvIV(sv)))

This screws up on magic. You need a more sophisticated approach to
such a complex read operation.

I expected that SvIV and friends would handle magic - the existing code just uses SvIV, and this is simply wrapping around that. In what case does this mishandle magic?

Also, it doesn't handle NVs that are
too negative to fit into IV.

I don't believe it has to. That NV would be clamped to -IV_MAX which is already the most reasonable place for it to be.

I'm dubious about clamping the value, for indexing operations. It's not
actually producing a correct index value; you're just hoping that a large
index will behave the same as an even larger one would. It seems to me
that there should be a distinct code path for out-of-range values.

So, you think we should die on an index that cannot be properly represented by an IV? How to handle non-integer NVs? (is 1.5 equally wrong as 2*IV_MAX+0.5?)

+=for apidoc Am|IV|SvIV_coerce|SV* sv
+Like C<SvIV>, but guarantees that large UVs and
+NVs will return IV_MAX instead of a negative value.

That's not very clear. "Guarantees that" suggests that it's ensuring
something that could have happened anyway, which it couldn't in this case.
I think you want some "except that", and you need something more specific
than "large".

Acknowledged, but deferring while we discuss what the behavior should be.

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From [Unknown Contact. See original ticket]

On Wed Jul 20 18​:14​:08 2016, zefram@​fysh.org wrote​:

Dan Collins via RT wrote​:

+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+ (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+ (IV_MAX) : (SvIV(sv)))

This screws up on magic. You need a more sophisticated approach to
such a complex read operation.

I expected that SvIV and friends would handle magic - the existing code just uses SvIV, and this is simply wrapping around that. In what case does this mishandle magic?

Also, it doesn't handle NVs that are
too negative to fit into IV.

I don't believe it has to. That NV would be clamped to -IV_MAX which is already the most reasonable place for it to be.

I'm dubious about clamping the value, for indexing operations. It's not
actually producing a correct index value; you're just hoping that a large
index will behave the same as an even larger one would. It seems to me
that there should be a distinct code path for out-of-range values.

So, you think we should die on an index that cannot be properly represented by an IV? How to handle non-integer NVs? (is 1.5 equally wrong as 2*IV_MAX+0.5?)

+=for apidoc Am|IV|SvIV_coerce|SV* sv
+Like C<SvIV>, but guarantees that large UVs and
+NVs will return IV_MAX instead of a negative value.

That's not very clear. "Guarantees that" suggests that it's ensuring
something that could have happened anyway, which it couldn't in this case.
I think you want some "except that", and you need something more specific
than "large".

Acknowledged, but deferring while we discuss what the behavior should be.

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From zefram@fysh.org

Dan Collins via RT wrote​:

I expected that SvIV and friends would handle magic

Yes, they do. There are two magic problems with your macro. Firstly, you
look at some flags (SvIsUV()) before calling anything that will process
magic, so you see a stale flag state. Secondly, in many paths through
the macro you invoke magic-processing macros more than once, so if the
get magic is not idempotent you see inconsistent state. You need to
process get magic exactly once, before you look at any part of the value.

So, you think we should die on an index that cannot be properly
represented by an IV?

No, there's no precedent for dying for a too-large index. The behaviour
should be consistent with what would be seen if it did fit into an IV.
The point is that this behaviour is not necessarily identical to the
behaviour you'll get for the specific index to which you clamped the
value.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2016

From @tonycoz

On Wed, Jul 20, 2016 at 06​:37​:09PM -0700, Dan Collins via RT wrote​:

On Wed Jul 20 18​:14​:08 2016, zefram@​fysh.org wrote​:

Dan Collins via RT wrote​:

+#define SvIV_coerce(sv) (((SvIsUV(sv) && SvUV(sv) > IV_MAX) || \
+ (SvNOKp(sv) && SvNV(sv) > IV_MAX)) ? \
+ (IV_MAX) : (SvIV(sv)))

This screws up on magic. You need a more sophisticated approach to
such a complex read operation.

I expected that SvIV and friends would handle magic - the existing code just uses SvIV, and this is simply wrapping around that. In what case does this mishandle magic?

For a SV with get magic, none of the flags have any meaning until you
call mg_get() (often via SvGETMAGIC()).

Also, it doesn't handle NVs that are
too negative to fit into IV.

I don't believe it has to. That NV would be clamped to -IV_MAX which is already the most reasonable place for it to be.

It also doesn't check that the index is within the range of a SSize_t,
which is the type of the index passed to av_fetch() and av_store().
IV can be larger than SSize_t with -Duse64bitint on 32-bit platforms.

I'm dubious about clamping the value, for indexing operations. It's not
actually producing a correct index value; you're just hoping that a large
index will behave the same as an even larger one would. It seems to me
that there should be a distinct code path for out-of-range values.

So, you think we should die on an index that cannot be properly represented by an IV? How to handle non-integer NVs? (is 1.5 equally wrong as 2*IV_MAX+0.5?)

NVs should be clamped towards zero (as with trunc() which isn't in C89).

Out of range values should probably throw an exception (what happens
if you try to access $foo[0x7fffffff] on a 32-bit perl?)

+=for apidoc Am|IV|SvIV_coerce|SV* sv
+Like C<SvIV>, but guarantees that large UVs and
+NVs will return IV_MAX instead of a negative value.

That's not very clear. "Guarantees that" suggests that it's ensuring
something that could have happened anyway, which it couldn't in this case.
I think you want some "except that", and you need something more specific
than "large".

Acknowledged, but deferring while we discuss what the behavior should be.

My preference for an interface would be more like grok_number() which
returns zero on a non-number.

  IV iv;
  if (!SvIV_inrange(sv, &iv)) {
  fail loudly with an exception
  }
  do stuff with iv

Tony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants