Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] misuse of I32 #10172

Open
p5pRT opened this issue Feb 13, 2010 · 23 comments
Open

[META] misuse of I32 #10172

p5pRT opened this issue Feb 13, 2010 · 23 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 13, 2010

Migrated from rt.perl.org#72784 (status was 'open')

Searchable as RT72784$

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2010

From @nwc10

There's a lot of inappropriate use of the I32 type throughout the core.
Likely most should be something else, one of U32, STRLEN, SSize_t, IV or UV.

The misuse of I32 causes lots of bugs (panics, SEGVs, silent data errors) if
strings go over 2GB.

This is a meta-ticket for collating tickets relating to these sorts of bugs.

1 similar comment
@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2010

From @nwc10

There's a lot of inappropriate use of the I32 type throughout the core.
Likely most should be something else, one of U32, STRLEN, SSize_t, IV or UV.

The misuse of I32 causes lots of bugs (panics, SEGVs, silent data errors) if
strings go over 2GB.

This is a meta-ticket for collating tickets relating to these sorts of bugs.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

From @cpansprout

This is the same as 72784.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

@cpansprout - Status changed from 'new' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Mar 14, 2012

From @tonycoz

As a general issue for testing this type of bug - we need to allocate
significant amounts of memory (2Gb or much more in some cases) to
regression test this type of bug, which would be unfriendly in the
common build-and-test cases.

Should we define (say) an environment variable to indicate that a large
amount of memory can be used, and roughly how much?

eg.
PERL_TEST_MEMORY=2 # we can use up to 2Gb of memory for large memory tests

It might be desirable to skip these tests when performing a parallel test.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 14, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 14, 2012

From @tsee

On 03/14/2012 06​:46 AM, Tony Cook via RT wrote​:

As a general issue for testing this type of bug - we need to allocate
significant amounts of memory (2Gb or much more in some cases) to
regression test this type of bug, which would be unfriendly in the
common build-and-test cases.

Should we define (say) an environment variable to indicate that a large
amount of memory can be used, and roughly how much?

eg.
PERL_TEST_MEMORY=2 # we can use up to 2Gb of memory for large memory tests

It might be desirable to skip these tests when performing a parallel test.

Or at least run that particular subset of tests in sequence.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented Mar 14, 2012

From @tonycoz

On Wed, Mar 14, 2012 at 08​:05​:48AM +0100, Steffen Mueller wrote​:

On 03/14/2012 06​:46 AM, Tony Cook via RT wrote​:

As a general issue for testing this type of bug - we need to allocate
significant amounts of memory (2Gb or much more in some cases) to
regression test this type of bug, which would be unfriendly in the
common build-and-test cases.

Should we define (say) an environment variable to indicate that a large
amount of memory can be used, and roughly how much?

eg.
PERL_TEST_MEMORY=2 # we can use up to 2Gb of memory for large memory tests

It might be desirable to skip these tests when performing a parallel test.

Or at least run that particular subset of tests in sequence.

Make a t/bigmem/ directory for large memory tests, perhaps.

If I understand t/harness it should be easy to ensure those are run
sequentially.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2012

From @tonycoz

On Wed, Mar 14, 2012 at 07​:36​:27PM +1100, Tony Cook wrote​:

On Wed, Mar 14, 2012 at 08​:05​:48AM +0100, Steffen Mueller wrote​:

On 03/14/2012 06​:46 AM, Tony Cook via RT wrote​:

As a general issue for testing this type of bug - we need to allocate
significant amounts of memory (2Gb or much more in some cases) to
regression test this type of bug, which would be unfriendly in the
common build-and-test cases.

Should we define (say) an environment variable to indicate that a large
amount of memory can be used, and roughly how much?

eg.
PERL_TEST_MEMORY=2 # we can use up to 2Gb of memory for large memory tests

It might be desirable to skip these tests when performing a parallel test.

Or at least run that particular subset of tests in sequence.

Make a t/bigmem/ directory for large memory tests, perhaps.

If I understand t/harness it should be easy to ensure those are run
sequentially.

Attached, two changes​:

0001-add-a-directory-of-tests-to-run-with-large-available.patch

modify t/TEST and t/harness to run t/bigmem/*.t if
$ENV{PERL_TEST_MEMORY} is true.

0002-rt-100514-regression-test-for-read-with-a-2Gib-offse.patch

regression test for RT #100514

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2012

From @tonycoz

On Sat, Mar 24, 2012 at 01​:56​:29PM +1100, Tony Cook wrote​:

Attached, two changes​:

0001-add-a-directory-of-tests-to-run-with-large-available.patch

modify t/TEST and t/harness to run t/bigmem/*.t if
$ENV{PERL_TEST_MEMORY} is true.

0002-rt-100514-regression-test-for-read-with-a-2Gib-offse.patch

regression test for RT #100514

Actually attaching them might help.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2012

From @tonycoz

0001-add-a-directory-of-tests-to-run-with-large-available.patch
From b02b95df63006d5594e2d0ce800b39ffe2fdc514 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Fri, 23 Mar 2012 13:11:03 +0100
Subject: [PATCH 1/2] add a directory of tests to run with large available memory

Intended for testing 64-bit behavious
---
 pod/perlhack.pod |    7 +++++++
 t/TEST           |    1 +
 t/harness        |    1 +
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/pod/perlhack.pod b/pod/perlhack.pod
index 63df5d5..64329ed 100644
--- a/pod/perlhack.pod
+++ b/pod/perlhack.pod
@@ -948,6 +948,13 @@ Setting this variable skips the vrexx.t tests for OS2::REXX.
 
 This sets a variable in op/numconvert.t.
 
+=item * PERL_TEST_MEMORY
+
+Setting this variable includes the tests in F<t/bigmem/>.  This should
+be set to the number of gigabytes of memory available for testing,
+eg. C<PERL_TEST_MEMORY=4> indicates that tests that require 4GiB of
+available memory can be run safely.
+
 =back
 
 See also the documentation for the Test and Test::Harness modules, for
diff --git a/t/TEST b/t/TEST
index c2c81e9..fdd2d53 100755
--- a/t/TEST
+++ b/t/TEST
@@ -463,6 +463,7 @@ unless (@ARGV) {
 	_find_tests('x2p');
 	_find_tests('japh') if $::torture;
 	_find_tests('t/benchmark') if $::benchmark or $ENV{PERL_BENCHMARK};
+	_find_tests('bigmem') if $ENV{PERL_TEST_MEMORY};
     }
 }
 
diff --git a/t/harness b/t/harness
index 7748c26..1a1efdb 100644
--- a/t/harness
+++ b/t/harness
@@ -135,6 +135,7 @@ if (@ARGV) {
 	push @next, 'japh' if $torture;
 	push @next, 'win32' if $^O eq 'MSWin32';
 	push @next, 'benchmark' if $ENV{PERL_BENCHMARK};
+	push @next, 'bigmem' if $ENV{PERL_TEST_MEMORY};
 	# Hopefully TAP::Parser::Scheduler will support this syntax soon.
 	# my $next = { par => '{' . join (',', @next) . '}/*.t' };
 	my $next = { par => [
-- 
1.7.2.5

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2012

From @tonycoz

0002-rt-100514-regression-test-for-read-with-a-2Gib-offse.patch
From a9392a4f703e2368f279108f18ed800b85a9300c Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Fri, 23 Mar 2012 13:11:48 +0100
Subject: [PATCH 2/2] [rt #100514] regression test for read() with a 2Gib offset

---
 MANIFEST        |    1 +
 t/bigmem/read.t |   24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 t/bigmem/read.t

diff --git a/MANIFEST b/MANIFEST
index cc185d6..dc01fdb 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4968,6 +4968,7 @@ t/base/rs.t			See if record-read works
 t/base/term.t			See if various terms work
 t/base/while.t			See if while work
 t/benchmark/rt26188-speed-up-keys-on-empty-hash.t	Benchmark if keys on empty hashes is fast enough
+t/bigmem/read.t			Check read() handles large offsets
 t/cmd/elsif.t			See if else-if works
 t/cmd/for.t			See if for loops work
 t/cmd/mod.t			See if statement modifiers work
diff --git a/t/bigmem/read.t b/t/bigmem/read.t
new file mode 100644
index 0000000..b29c097
--- /dev/null
+++ b/t/bigmem/read.t
@@ -0,0 +1,24 @@
+#!perl
+BEGIN {
+    chdir 't';
+    unshift @INC, "../lib";
+}
+
+use strict;
+require './test.pl';
+use Config qw(%Config);
+
+$ENV{PERL_TEST_MEMORY} >= 3
+    or skip_all("Need ~3Gb for this test");
+$Config{ptrsize} >= 8
+    or skip_all("Need 64-bit pointers for this test");
+
+plan(1);
+
+# RT #100514
+my $x = "";
+read(DATA, $x, 4, 0x80000000);
+is(length $x, 0x80000004, "check we read to the correct offset");
+__DATA__
+Food
+
-- 
1.7.2.5

@p5pRT
Copy link
Author

p5pRT commented May 22, 2012

From @cpansprout

On Fri Mar 23 19​:59​:21 2012, tonyc wrote​:

On Sat, Mar 24, 2012 at 01​:56​:29PM +1100, Tony Cook wrote​:

Attached, two changes​:

0001-add-a-directory-of-tests-to-run-with-large-available.patch

modify t/TEST and t/harness to run t/bigmem/*.t if
$ENV{PERL_TEST_MEMORY} is true.

0002-rt-100514-regression-test-for-read-with-a-2Gib-offse.patch

regression test for RT #100514

Actually attaching them might help.

Thank you. Applied as ff5db60 and 9fda09b.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 28, 2013

From @cpansprout

On Sat Feb 13 07​:51​:05 2010, nicholas wrote​:

There's a lot of inappropriate use of the I32 type throughout the core.
Likely most should be something else, one of U32, STRLEN, SSize_t, IV
or UV.

The misuse of I32 causes lots of bugs (panics, SEGVs, silent data
errors) if
strings go over 2GB.

This is a meta-ticket for collating tickets relating to these sorts of
bugs.

What should the maximum string and array lengths be?

In various places in the perl source, we have overflow checks that use
different sizes; in other places things just overflow, but at different
thresholds.

Take sv_setpvn for example. If the length is greater than IV_MAX it
croaks. safesysmalloc croaks if the length is greater than the maximum
SSize_t can hold, but only on debugging builds. Malloc takes a size_t
as its argument.

So that means 2**63-1 is the longest string supported on 64-bit
platforms via sv_setpvn. If you use sv_grow and write to the string
buffer directly, you can go beyond that, if the malloc implementation
lets you, except under debugging builds, where the limit is still 2**63-1.

On 32-bit platforms, if -Duse64bitint is not used, the situation is the
same, but with 2**31-1 for the maximum.

On 32-bit platforms when -Duse64bitint *is* used, sv_setpvn allows
values all the way up to 2**32-1, but sv_grow will croak on anything
above 2**31-1 under debugging builds. On non-debugging builds both
methods (sv_setpvn and sv_grow+direct write to PVX) allow strings up to
2**32-1.

So nothing is consistent.

For all practical purposes, one is limited to 31-bit string lengths on
32-bit platforms, because, unless you really know perl’s internals well,
your 3GB string *will* be copied, and you will immediately run out of
memory.

I suggest we make 2**(PTRSIZE-1)-1 (*) the maximum string length and
make everything consistent with that. That is already the maximum array
length.

Regular expressions are currently limited to I32_MAX. Changing that
would break the regular expression plugin interface. I don’t think it’s
unreasonable to keep it at its current limit. Note I am talking about
regular expressions themselves, not the string they match against.

In 6174b39 I allowed pos() to record positions up to 2**PTRSIZE-2. I
think that was a mistake, but harmless, as pos() values are always
truncated to the length of the string.

* I.e., SSize_t_MAX, but we have no such macro currently.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 28, 2013

From @tonycoz

On Sat, Jul 27, 2013 at 07​:33​:28PM -0700, Father Chrysostomos via RT wrote​:

On Sat Feb 13 07​:51​:05 2010, nicholas wrote​:

There's a lot of inappropriate use of the I32 type throughout the core.
Likely most should be something else, one of U32, STRLEN, SSize_t, IV
or UV.

The misuse of I32 causes lots of bugs (panics, SEGVs, silent data
errors) if
strings go over 2GB.

This is a meta-ticket for collating tickets relating to these sorts of
bugs.

What should the maximum string and array lengths be?

I think it should be Ssize_t_max.

Or more C standardly - ptrdiff_t-max.

It might be possible in theory to support longer strings on a 32-bit
platform, but with all the bits and pieces that get allocated from the
address space* I don't think it's likely that there's an over 2G hole
to allocate such an object in.

Supporting larger objects on 32-bit platforms could complicate the
code for very little benefit.

Tony

* stack, kernel space, mapped libraries, etc

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

From @chipdude

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?

Why would the answer ever be anything other than size_t?

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

From @demerphq

On 28 July 2013 04​:33, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

Regular expressions are currently limited to I32_MAX. Changing that
would break the regular expression plugin interface.

IMO this is not a sound argument for rejecting a change. There are
very few consumers of that interface, and it has never really been
"blessed" as stable. This has come up a few times and I am firmly of
the opinion that if changes for the better of perl are needed then the
regex engine plug in interface should not be an obstacle.

I don’t think it’s
unreasonable to keep it at its current limit.

Maybe, maybe not. But the plug in interface shouldn't be part of the
justification for your position at all.

Cheers, and thanks for all your work on this stuff!

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

From @cpansprout

On Mon Jul 29 22​:19​:43 2013, chip wrote​:

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?

Why would the answer ever be anything other than size_t?

Because AvFILLp needs to be set to -1 when the array is empty. We
*could* use size_t but reserve ~0 as a special marker, but that
complicates the code for little gain.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

From @cpansprout

On Tue Jul 30 06​:33​:28 2013, sprout wrote​:

On Mon Jul 29 22​:19​:43 2013, chip wrote​:

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?

Why would the answer ever be anything other than size_t?

Because AvFILLp needs to be set to -1 when the array is empty. We
*could* use size_t but reserve ~0 as a special marker, but that
complicates the code for little gain.

Also, the HV APIs use negative lengths for utf8.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @demerphq

On 30 July 2013 17​:20, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Jul 30 06​:33​:28 2013, sprout wrote​:

On Mon Jul 29 22​:19​:43 2013, chip wrote​:

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?

Why would the answer ever be anything other than size_t?

Because AvFILLp needs to be set to -1 when the array is empty. We
*could* use size_t but reserve ~0 as a special marker, but that
complicates the code for little gain.

Also, the HV APIs use negative lengths for utf8.

Yes, but that API is insane. Utterly insane.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @cpansprout

On Sat Jul 27 20​:08​:37 2013, tonyc wrote​:

I think it should be Ssize_t_max.

Is this a correct way to define it, or am I doing something non-portable
here?

#define SSize_t_MAX (SSize_t)(~(size_t)0 >> 1)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @chipdude

On 7/30/2013 6​:33 AM, Father Chrysostomos via RT wrote​:

On Mon Jul 29 22​:19​:43 2013, chip wrote​:

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?
Why would the answer ever be anything other than size_t?
Because AvFILLp needs to be set to -1 when the array is empty. We
*could* use size_t but reserve ~0 as a special marker, but that
complicates the code for little gain.

Your emphasis is incorrect. Rather, size_t adds very little
complication for significant systemic gain​: Correctness and consistency.

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @chipdude

On 7/30/2013 8​:20 AM, Father Chrysostomos via RT wrote​:

On Tue Jul 30 06​:33​:28 2013, sprout wrote​:

On Mon Jul 29 22​:19​:43 2013, chip wrote​:

On 7/27/2013 7​:33 PM, Father Chrysostomos via RT wrote​:

What should the maximum string and array lengths be?
Why would the answer ever be anything other than size_t?
Because AvFILLp needs to be set to -1 when the array is empty. We
*could* use size_t but reserve ~0 as a special marker, but that
complicates the code for little gain.
Also, the HV APIs use negative lengths for utf8.

Those specific APIs can continue to use signed integers if you prefer;
but the signedness need not escape those specific parameter lists if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants