Skip Menu |
Report information
Id: 121562
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: mike.chamberlain [at] pirum.com
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: High
Type: core
Perl Version: 5.14.2
Fixed In: (no value)



From: mike.chamberlain [...] pirum.com
Subject: index segfaults on 2G strings in 64bit perl
CC: support [...] pirum.com
To: perlbug [...] perl.org
Date: Wed, 02 Apr 2014 10:17:49 +0100

Message body is not shown because it is too large.

RT-Send-CC: perl5-porters [...] perl.org
On Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote: Show quoted text
> We're attemping to parse a large file over 2G in size, and it's > segfaulting. We are scanning > the file to identify key points within it. Ultimately the simplest > implementation of the bug is: > > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > Segmentation fault (core dumped) > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > END: 2147483646
Reproduced in blead: [tonyc@dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' Segmentation fault [tonyc@dromedary-001 perl]$ ./perl -v This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi It looks like pp_index suffers from the I32 bug, assuming fbm_instr() and rninstr() are safe it should be easy to fix. Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 837b
On Thu Apr 03 15:44:40 2014, tonyc wrote: Show quoted text
> On Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote:
> > We're attemping to parse a large file over 2G in size, and it's > > segfaulting. We are scanning > > the file to identify key points within it. Ultimately the simplest > > implementation of the bug is: > > > > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x > > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > > Segmentation fault (core dumped) > > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x > > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > > END: 2147483646
> > It looks like pp_index suffers from the I32 bug, assuming fbm_instr() > and rninstr() are safe it should be easy to fix.
Fix attached, for 5.21, though perhaps it should be in 5.20. Tony
Subject: 0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patch
From eca5fe3ca0af1af58bcf4ae90fb24ede7c9d4c56 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Wed, 9 Apr 2014 06:15:08 +0200 Subject: [perl #121562] fix the I32 bug for index() and rindex() --- MANIFEST | 1 + pp.c | 6 +++--- t/bigmem/index.t | 26 ++++++++++++++++++++++++++ 3 files changed, 30 insertions(+), 3 deletions(-) create mode 100644 t/bigmem/index.t diff --git a/MANIFEST b/MANIFEST index 9652cd5..190315f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works t/base/term.t See if various terms work t/base/while.t See if while work t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough +t/bigmem/index.t Check that index() handles large offsets t/bigmem/pos.t Check that pos() handles large offsets t/bigmem/read.t Check read() handles large offsets t/bigmem/regexp.t Test regular expressions with large strings diff --git a/pp.c b/pp.c index 4ec6887..071b4f0 100644 --- a/pp.c +++ b/pp.c @@ -3197,8 +3197,8 @@ PP(pp_index) SV *temp = NULL; STRLEN biglen; STRLEN llen = 0; - I32 offset; - I32 retval; + SSize_t offset; + SSize_t retval; const char *big_p; const char *little_p; bool big_utf8; @@ -3287,7 +3287,7 @@ PP(pp_index) } if (offset < 0) offset = 0; - else if (offset > (I32)biglen) + else if (offset > (SSize_t)biglen) offset = biglen; if (!(little_p = is_index ? fbm_instr((unsigned char*)big_p + offset, diff --git a/t/bigmem/index.t b/t/bigmem/index.t new file mode 100644 index 0000000..0c3658c --- /dev/null +++ b/t/bigmem/index.t @@ -0,0 +1,26 @@ +#!perl +BEGIN { + chdir 't'; + unshift @INC, "../lib"; +} + +use strict; +require './test.pl'; +use Config qw(%Config); + +# some copying means we end up using 4GB, checked with top +$ENV{PERL_TEST_MEMORY} >= 4 + or skip_all("Need ~4GB for this test"); +$Config{ptrsize} >= 8 + or skip_all("Need 64-bit pointers for this test"); + +plan(tests => 2); + +my $space = " "; # avoid constant folding from doubling memory usage +my $work = $space x 0x80000000 . "\n\n"; + +# this would SEGV +is(index($work, "\n"), 0x80000000, "test index() over 2G mark"); + +# this would simply fail +is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark"); -- 1.7.1
From: ilmari [...] ilmari.org (Dagfinn Ilmari Mannsåker)
Subject: Re: [perl #121562] index segfaults on 2G strings in 64bit perl
To: perl5-porters [...] perl.org
Date: Thu, 10 Apr 2014 02:02:15 +0200
Download (untitled) / with headers
text/plain 2.2k
"Tony Cook via RT" <perlbug-followup@perl.org> writes: Show quoted text
> On Thu Apr 03 15:44:40 2014, tonyc wrote:
>> On Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote:
>> > We're attemping to parse a large file over 2G in size, and it's >> > segfaulting. We are scanning >> > the file to identify key points within it. Ultimately the simplest >> > implementation of the bug is: >> > >> > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); $x >> > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' >> > Segmentation fault (core dumped) >> > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); $x >> > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' >> > END: 2147483646
>> >> It looks like pp_index suffers from the I32 bug, assuming fbm_instr() >> and rninstr() are safe it should be easy to fix.
> > Fix attached, for 5.21, though perhaps it should be in 5.20.
It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to handle UTF-8 strings properly: pp.c: In function ‘Perl_pp_index’: pp.c:3284:6: warning: passing argument 2 of ‘Perl_sv_pos_u2b’ from incompatible pointer type [enabled by default] sv_pos_u2b(big, &offset, 0); ^ In file included from perl.h:5012:0, from pp.c:28: proto.h:4266:20: note: expected ‘I32 * const’ but argument is of type ‘ssize_t *’ PERL_CALLCONV void Perl_sv_pos_u2b(pTHX_ SV *const sv, I32 *const offsetp, I32 *const lenp) ^ pp.c:3301:6: warning: passing argument 2 of ‘Perl_sv_pos_b2u’ from incompatible pointer type [enabled by default] sv_pos_b2u(big, &retval); ^ In file included from perl.h:5012:0, from pp.c:28: proto.h:4256:20: note: expected ‘I32 * const’ but argument is of type ‘ssize_t *’ PERL_CALLCONV void Perl_sv_pos_b2u(pTHX_ SV *const sv, I32 *const offsetp) ^ And adding utf8::upgrade($work) to t/bigmem/index.t, gives: ilmari@nurket:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib bigmem/index.t 1..2 panic: sv_pos_b2u: bad byte offset, blen=2147483650, byte=18446744071562067968 at bigmem/index.t line 23. # Looks like you planned 2 tests but ran 0. -- "A disappointingly low fraction of the human race is, at any given time, on fire." - Stig Sandbeck Mathisen
To: perl5-porters [...] perl.org
Date: Wed, 9 Apr 2014 17:31:37 -0700
Subject: Re: [perl #121562] index segfaults on 2G strings in 64bit perl
From: Karen Etheridge <perl [...] froods.org>
On Thu, Apr 03, 2014 at 03:44:41PM -0700, Tony Cook via RT wrote: Show quoted text
> Reproduced in blead: > > [tonyc@dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > Segmentation fault > [tonyc@dromedary-001 perl]$ ./perl -v > > This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34-g6447043)) built for x86_64-linux-thread-multi
FWIW, I cannot reproduce this on 64-bit darwin: $; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' END: 2147483647 $; perl -v This is perl 5, version 19, subversion 10 (v5.19.10) built for darwin-2level $; uname -a Darwin bourbon 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 $; perl -V | grep 32 $; perl -V | grep 64 uname='darwin bourbon 12.5.0 darwin kernel version 12.5.0: sun sep 29 13:33:47 pdt 2013; root:xnu-2050.48.12~1release_x86_64 x86_64 ' use64bitint=define, use64bitall=define, uselongdouble=undef PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Wed Apr 09 17:32:07 2014, perl@froods.org wrote: Show quoted text
> On Thu, Apr 03, 2014 at 03:44:41PM -0700, Tony Cook via RT wrote:
> > Reproduced in blead: > > > > [tonyc@dromedary-001 perl]$ ./perl -e 'my $x = " " x ((2**31 )-1); $x > > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > > Segmentation fault > > [tonyc@dromedary-001 perl]$ ./perl -v > > > > This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.10-34- > > g6447043)) built for x86_64-linux-thread-multi
>
For what it’s worth, I can reproduce this on Mageia Linux x86-64 5/Cauldron with both the Mageia perl-5.18.2-4.mga5 ( /usr/bin/perl ) and bleadperl: [SHELL] shlomif@telaviv1:~$ /home/shlomif/apps/perl/bleadperl/bin/perl5.19.11 -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' Segmentation fault shlomif@telaviv1:~$ /home/shlomif/apps/perl/bleadperl/bin/perl5.19.11 -v This is perl 5, version 19, subversion 11 (v5.19.11 (v5.19.9-308-g935db47*)) built for x86_64-linux (with 1 registered patch, see perl -V for more detail) Copyright 1987-2014, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. shlomif@telaviv1:~$ perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' Segmentation fault shlomif@telaviv1:~$ perl -v This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-thread-multi (with 1 registered patch, see perl -V for more detail) Copyright 1987-2013, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. shlomif@telaviv1:~$ [/SHELL] Show quoted text
> FWIW, I cannot reproduce this on 64-bit darwin: > > $; perl -e 'my $x = " " x ((2**31 )-1); $x .="\n"; my $end = index($x, > "\n", 0); print "END: $end\n"' > END: 2147483647
Does valgrind complain about something when doing that? Regards, -- Shlomi Fish
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Wed Apr 09 17:02:58 2014, ilmari wrote: Show quoted text
> "Tony Cook via RT" <perlbug-followup@perl.org> writes: >
> > On Thu Apr 03 15:44:40 2014, tonyc wrote:
> >> On Wed Apr 02 02:18:05 2014, mike.chamberlain@pirum.com wrote:
> >> > We're attemping to parse a large file over 2G in size, and it's > >> > segfaulting. We are scanning > >> > the file to identify key points within it. Ultimately the simplest > >> > implementation of the bug is: > >> > > >> > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-1); > >> > $x > >> > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > >> > Segmentation fault (core dumped) > >> > chambm@wren9a ~/src/SBLREX> perl -e 'my $x = " " x ((2**31 )-2); > >> > $x > >> > .="\n"; my $end = index($x, "\n", 0); print "END: $end\n"' > >> > END: 2147483646
> >> > >> It looks like pp_index suffers from the I32 bug, assuming > >> fbm_instr() > >> and rninstr() are safe it should be easy to fix.
> > > > Fix attached, for 5.21, though perhaps it should be in 5.20.
> > It needs to be switched to use sv_pos_(u2b|b2u)_flags as well, to > handle > UTF-8 strings properly: > > pp.c: In function ‘Perl_pp_index’: > pp.c:3284:6: warning: passing argument 2 of ‘Perl_sv_pos_u2b’ from > incompatible pointer type [enabled by default] > sv_pos_u2b(big, &offset, 0); > ^ > In file included from perl.h:5012:0, > from pp.c:28: > proto.h:4266:20: note: expected ‘I32 * const’ but argument is of type > ‘ssize_t *’ > PERL_CALLCONV void Perl_sv_pos_u2b(pTHX_ SV *const sv, I32 *const > offsetp, I32 *const lenp) > ^ > pp.c:3301:6: warning: passing argument 2 of ‘Perl_sv_pos_b2u’ from > incompatible pointer type [enabled by default] > sv_pos_b2u(big, &retval); > ^ > In file included from perl.h:5012:0, > from pp.c:28: > proto.h:4256:20: note: expected ‘I32 * const’ but argument is of type > ‘ssize_t *’ > PERL_CALLCONV void Perl_sv_pos_b2u(pTHX_ SV *const sv, I32 *const > offsetp)
Thanks, I looked for warnings when I built it, but must have missed them. Show quoted text
> And adding utf8::upgrade($work) to t/bigmem/index.t, gives: > > ilmari@nurket:~/src/perl/t$ PERL_TEST_MEMORY=4 ../perl -I../lib > bigmem/index.t > 1..2 > panic: sv_pos_b2u: bad byte offset, blen=2147483650, > byte=18446744071562067968 at bigmem/index.t line 23. > # Looks like you planned 2 tests but ran 0.
I should have tested the unicode path too. Tony
From: Ricardo Signes <perl.p5p [...] rjbs.manxome.org>
Date: Mon, 14 Apr 2014 10:29:30 -0400
To: perl5-porters [...] perl.org
Subject: Re: [perl #121562] index segfaults on 2G strings in 64bit perl
Download (untitled) / with headers
text/plain 319b
* Tony Cook via RT <perlbug-followup@perl.org> [2014-04-09T00:30:36] Show quoted text
> Fix attached, for 5.21, though perhaps it should be in 5.20.
Putting aside the subsequent amendment to the patch, do we want this in 5.20? It's not a regression, but if we are quite confident in the fix, ti would be a good thing to fix! -- rjbs
Download signature.asc
application/pgp-signature 473b

Message body not shown because it is not plain text.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 206b
On Mon Apr 14 04:22:05 2014, tonyc wrote: ... Show quoted text
> Thanks, I looked for warnings when I built it, but must have missed them. >
... Show quoted text
> > I should have tested the unicode path too.
Here's a new patch. Tony
Subject: 0001-perl-121562-fix-the-I32-bug-for-index-and-rindex.patch
From ab05702adb71d31562278ae9c3946d757c09d534 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Tue, 15 Apr 2014 03:57:57 +0200 Subject: [PATCH] [perl #121562] fix the I32 bug for index() and rindex() --- MANIFEST | 1 + pp.c | 10 +++++----- t/bigmem/index.t | 37 +++++++++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 5 deletions(-) create mode 100644 t/bigmem/index.t diff --git a/MANIFEST b/MANIFEST index 9652cd5..190315f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -4920,6 +4920,7 @@ t/base/rs.t See if record-read works t/base/term.t See if various terms work t/base/while.t See if while work t/benchmark/rt26188-speed-up-keys-on-empty-hash.t Benchmark if keys on empty hashes is fast enough +t/bigmem/index.t Check that index() handles large offsets t/bigmem/pos.t Check that pos() handles large offsets t/bigmem/read.t Check read() handles large offsets t/bigmem/regexp.t Test regular expressions with large strings diff --git a/pp.c b/pp.c index 4ec6887..04c1f29 100644 --- a/pp.c +++ b/pp.c @@ -3197,8 +3197,8 @@ PP(pp_index) SV *temp = NULL; STRLEN biglen; STRLEN llen = 0; - I32 offset; - I32 retval; + SSize_t offset = 0; + SSize_t retval; const char *big_p; const char *little_p; bool big_utf8; @@ -3281,13 +3281,13 @@ PP(pp_index) offset = is_index ? 0 : biglen; else { if (big_utf8 && offset > 0) - sv_pos_u2b(big, &offset, 0); + offset = sv_pos_u2b_flags(big, offset, 0, SV_CONST_RETURN); if (!is_index) offset += llen; } if (offset < 0) offset = 0; - else if (offset > (I32)biglen) + else if (offset > (SSize_t)biglen) offset = biglen; if (!(little_p = is_index ? fbm_instr((unsigned char*)big_p + offset, @@ -3298,7 +3298,7 @@ PP(pp_index) else { retval = little_p - big_p; if (retval > 0 && big_utf8) - sv_pos_b2u(big, &retval); + retval = sv_pos_b2u_flags(big, retval, SV_CONST_RETURN); } SvREFCNT_dec(temp); fail: diff --git a/t/bigmem/index.t b/t/bigmem/index.t new file mode 100644 index 0000000..fdd502c --- /dev/null +++ b/t/bigmem/index.t @@ -0,0 +1,37 @@ +#!perl +BEGIN { + chdir 't'; + unshift @INC, "../lib"; +} + +use strict; +require './test.pl'; +use Config qw(%Config); + +# memory usage checked with top +$ENV{PERL_TEST_MEMORY} >= 2 + or skip_all("Need ~2GB for this test"); +$Config{ptrsize} >= 8 + or skip_all("Need 64-bit pointers for this test"); + +plan(tests => 4); + +my $space = " "; # avoid constant folding from doubling memory usage +# concatenation here increases memory usage significantly +my $work = $space x 0x80000002; +substr($work, 0x80000000) = "\n\n"; + +# this would SEGV +is(index($work, "\n"), 0x80000000, "test index() over 2G mark"); + +# this would simply fail +is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark"); + +utf8::upgrade($work); + +# this would SEGV +is(index($work, "\n"), 0x80000000, "test index() over 2G mark (utf8-ish)"); + +# this would simply fail +is(rindex($work, "\n"), 0x80000001, "test rindex() over 2G mark (utf8-ish)"); + -- 1.7.1
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 317b
On Mon Apr 14 19:01:04 2014, tonyc wrote: Show quoted text
> On Mon Apr 14 04:22:05 2014, tonyc wrote: > ...
> > Thanks, I looked for warnings when I built it, but must have missed them. > >
> ...
> > > > I should have tested the unicode path too.
> > Here's a new patch.
Applied as b464e2b7c8addfdd9b22b9a5949a89db7c73e43c. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org