Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storable loses information on large strings #15238

Closed
p5pRT opened this issue Mar 18, 2016 · 11 comments
Closed

Storable loses information on large strings #15238

p5pRT opened this issue Mar 18, 2016 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 18, 2016

Migrated from rt.perl.org#127743 (status was 'resolved')

Searchable as RT127743$

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2016

From @jkeenan

[Originally reported in http​://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235167.html -- jkeenan]

Hi,

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable to store
an array of large strings, the retrieved array would lost information of
strings longer than 1,699,160,188 bytes.

For example​:
$genome_string is about 1,500,000,000 bytes long.
$order is 4 times longer than $genome_string.

  foreach my $i(750615880 .. 75061589){
  my $num = vec($order,$i,32);
  my $char = substr($genome_string,$i,1);
  print "before​: $i​: $char $num\n";
  }
  my $global_index = [$genome_string,$order];
  store $global_index, "./test.s";
  my $test_index = retrieve("./test.s");
  foreach my $i(750615880 .. 75061589){
  my $num = vec($$test_index[1],$i,32);
  my $char = substr($$test_index[0],$i,1);
  print "after​: $i​: $char $num\n";
  }
The output looks like this​:
before​: 750615880​: G 1243519261
before​: 750615881​: T 26652294
before​: 750615882​: A 938865154
before​: 750615883​: T 199336229
before​: 750615884​: G 1488701234
before​: 750615885​: T 255130577
before​: 750615886​: T 600771191
before​: 750615887​: C 440286230
before​: 750615888​: T 1249264699
before​: 750615889​: G 1360065150
after​: 750615880​: G 0
after​: 750615881​: T 0
after​: 750615882​: A 0
after​: 750615883​: T 0
after​: 750615884​: G 0
after​: 750615885​: T 0
after​: 750615886​: T 0
after​: 750615887​: C 0
after​: 750615888​: T 0
after​: 750615889​: G 0

Thanks,
Zong

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2016

From @jkeenan

On Fri Mar 18 13​:05​:34 2016, jkeenan wrote​:

[Originally reported in
http​://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235167.html
-- jkeenan]

Hi,

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable to
store
an array of large strings, the retrieved array would lost information
of
strings longer than 1,699,160,188 bytes.

For example​:
$genome_string is about 1,500,000,000 bytes long.
$order is 4 times longer than $genome_string.

foreach my $i(750615880 .. 75061589){
my $num = vec($order,$i,32);
my $char = substr($genome_string,$i,1);
print "before​: $i​: $char $num\n";
}
my $global_index = [$genome_string,$order];
store $global_index, "./test.s";
my $test_index = retrieve("./test.s");
foreach my $i(750615880 .. 75061589){
my $num = vec($$test_index[1],$i,32);
my $char = substr($$test_index[0],$i,1);
print "after​: $i​: $char $num\n";
}
The output looks like this​:
before​: 750615880​: G 1243519261
before​: 750615881​: T 26652294
before​: 750615882​: A 938865154
before​: 750615883​: T 199336229
before​: 750615884​: G 1488701234
before​: 750615885​: T 255130577
before​: 750615886​: T 600771191
before​: 750615887​: C 440286230
before​: 750615888​: T 1249264699
before​: 750615889​: G 1360065150
after​: 750615880​: G 0
after​: 750615881​: T 0
after​: 750615882​: A 0
after​: 750615883​: T 0
after​: 750615884​: G 0
after​: 750615885​: T 0
after​: 750615886​: T 0
after​: 750615887​: C 0
after​: 750615888​: T 0
after​: 750615889​: G 0

Thanks,
Zong

Does anyone on list have a machine with enough memory to explore this bug report?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2016

From @andk

On Fri, 25 Mar 2016 05​:52​:56 -0700, "James E Keenan via RT" <perlbug-followup@​perl.org> said​:

  > Does anyone on list have a machine with enough memory to explore this bug report?

  > Thank you very much.

First I could not reproduce what the OP describes, I only could observe
segfaults. Then I rewrote it to be self-contained and still produces the
segfaults.

  # perl -e '
  use Storable;
  my($order);
  my $i = shift;
  vec($order,$i,32) = 1;
  my $num = vec($order,$i,32);
  print "before​: $i​: $num\n";
  store [$order], "./test.s";
  my $test_index = retrieve("./test.s");
  my $num = vec($test_index->[0],$i,32);
  print " after​: $i​: $num\n";
  unlink "./test.s" or die "Could not unlink​: $!";
  ' 536870911
  before​: 536870911​: 1
  zsh​: segmentation fault (core dumped) perl -e 536870911
  >18​:56​:49 root@​k93​:~# ls -l test.s core
  -rw------- 1 root root 2149343232 2016-03-25 18​:45​:25 core
  -rw-r--r-- 1 root root 2147573760 2016-03-25 18​:45​:24 test.s

On my machine I get no SEGV when I pass 536870910 to the script.

Reproduced with several old and new perls. This is with 5.23.9​:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0 __memcpy_sse2_unaligned ()
  at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S​:36
  36 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S​: No such file or directory.
  (gdb) bt
  #0 __memcpy_sse2_unaligned ()
  at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S​:36
  #1 0x00000000005294ae in memcpy (__len=8192, __src=<optimized out>,
  __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h​:53
  #2 PerlIOBuf_write (f=0x120cfe0, vbuf=<optimized out>, count=<optimized out>)
  at perlio.c​:4138
  #3 0x00007f34a4c7ae6a in store_scalar (cxt=0x121ec20, sv=<optimized out>)
  at Storable.xs​:2250
  #4 0x00007f34a4c848a0 in store (cxt=cxt@​entry=0x121ec20, sv=0x11ff178)
  at Storable.xs​:3713
  #5 0x00007f34a4c86180 in store_array (cxt=0x121ec20, av=0x12133a8) at Storable.xs​:2310
  #6 0x00007f34a4c848a0 in store (cxt=cxt@​entry=0x121ec20, sv=sv@​entry=0x12133a8)
  at Storable.xs​:3713
  #7 0x00007f34a4c868c3 in do_store (f=<optimized out>, optype=1, optype@​entry=0,
  network_order=network_order@​entry=0, res=res@​entry=0x0, sv=<optimized out>,
  sv=<optimized out>) at Storable.xs​:3903
  #8 0x00007f34a4c86cc8 in XS_Storable_pstore (cv=<optimized out>) at Storable.xs​:6659
  #9 0x00000000004b5ea8 in Perl_pp_entersub () at pp_hot.c​:3985
  #10 0x00000000004ae8f3 in Perl_runops_standard () at run.c​:41
  #11 0x0000000000441ee6 in S_run_body (oldscope=1) at perl.c​:2479
  #12 perl_run (my_perl=<optimized out>) at perl.c​:2402
  #13 0x000000000041fa95 in main (argc=4, argv=0x7ffde7fe3158, env=0x7ffde7fe3180)
  at perlmain.c​:116

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Apr 18, 2016

From @tonycoz

On Fri Mar 18 13​:05​:34 2016, jkeenan wrote​:

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable to
store
an array of large strings, the retrieved array would lost information
of
strings longer than 1,699,160,188 bytes.

Storable uses I32 and unsigned long internally for lengths, including both using it for the length of the scalar being stored and some intermediate lengths when calculating the new work buffer size for freeze()/thaw().

The first of the attached patches simply rejects over-large scalars.

The second handles scalars for which the size is too large for I32 as new Storable tags that an older Storable will reject.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 18, 2016

From @tonycoz

0001-perl-127743-simple-stupid-prevent-storing-very-large.patch
From fe1a27444c2dc30d49f7407163a6b037f920e2be Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 13 Apr 2016 08:07:58 +0200
Subject: (perl #127743) simple stupid: prevent storing very large strings

The length of a very large string overflows the 32-bit length stored
in the output.
---
 dist/Storable/Storable.pm | 2 +-
 dist/Storable/Storable.xs | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/dist/Storable/Storable.pm b/dist/Storable/Storable.pm
index c8f6db1..e928401 100644
--- a/dist/Storable/Storable.pm
+++ b/dist/Storable/Storable.pm
@@ -22,7 +22,7 @@ package Storable; @ISA = qw(Exporter);
 
 use vars qw($canonical $forgive_me $VERSION);
 
-$VERSION = '2.56';
+$VERSION = '2.57';
 
 BEGIN {
     if (eval { local $SIG{__DIE__}; require Log::Agent; 1 }) {
diff --git a/dist/Storable/Storable.xs b/dist/Storable/Storable.xs
index 83cd001..49e425a 100644
--- a/dist/Storable/Storable.xs
+++ b/dist/Storable/Storable.xs
@@ -2242,6 +2242,11 @@ static int store_scalar(pTHX_ stcxt_t *cxt, SV *sv)
                              len, SX_VSTRING, SX_LVSTRING);
             }
 #endif
+#if Size_t_size > 4
+            if (len > I32_MAX) {
+                CROAK(("String too long for Storable"));
+            }
+#endif
 
             wlen = (I32) len; /* WLEN via STORE_SCALAR expects I32 */
             if (SvUTF8 (sv))
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Apr 18, 2016

From @tonycoz

0002-perl-127743-handle-2GB-scalars-on-64-bit-platforms.patch
From f0ca707438b56331b963a0cbcd1e7d1ba053a969 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 18 Apr 2016 06:52:28 +0200
Subject: (perl #127743) handle 2GB+ scalars on 64-bit platforms

---
 MANIFEST                  |   1 +
 dist/Storable/Storable.xs | 143 +++++++++++++++++++++++++++++++++++++++++-----
 dist/Storable/t/large.t   |  68 ++++++++++++++++++++++
 dist/Storable/t/malice.t  |   4 +-
 4 files changed, 200 insertions(+), 16 deletions(-)
 create mode 100644 dist/Storable/t/large.t

diff --git a/MANIFEST b/MANIFEST
index 3da3119..589194b 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -3381,6 +3381,7 @@ dist/Storable/t/HAS_OVERLOAD.pm		For auto-requiring of mdoules for overload
 dist/Storable/t/integer.t		See if Storable works
 dist/Storable/t/interwork56.t		Test compatibility kludge for 64bit data under 5.6.x
 dist/Storable/t/just_plain_nasty.t	See if Storable works
+dist/Storable/t/large.t			See if Storable handles large scalars.
 dist/Storable/t/leaks.t			See if Storable leaks (skips in core)
 dist/Storable/t/lock.t			See if Storable works
 dist/Storable/t/make_56_interwork.pl	Make test data for interwork56.t
diff --git a/dist/Storable/Storable.xs b/dist/Storable/Storable.xs
index 49e425a..802fcf9 100644
--- a/dist/Storable/Storable.xs
+++ b/dist/Storable/Storable.xs
@@ -46,6 +46,10 @@
 #endif	/* PERLIO_IS_STDIO */
 #endif	/* USE_PERLIO */
 
+#ifndef SSize_t_MAX
+#define SSize_t_MAX (SSize_t)(~(Size_t)0 >> 1)
+#endif
+
 /*
  * Earlier versions of perl might be used, we can't assume they have the latest!
  */
@@ -157,7 +161,9 @@
 #define SX_VSTRING	C(29)	/* vstring forthcoming (small) */
 #define SX_LVSTRING	C(30)	/* vstring forthcoming (large) */
 #define SX_SVUNDEF_ELEM	C(31)	/* array element set to &PL_sv_undef */
-#define SX_ERROR	C(32)	/* Error */
+#define SX_VLSCALAR	C(32)	/* scalar longer than I32_MAX */
+#define SX_VLUTF8STR	C(33)	/* UTF-8 string longer than I32_MAX */
+#define SX_ERROR	C(34)	/* Error */
 
 /*
  * Those are only used to retrieve "old" pre-0.6 binary images.
@@ -527,11 +533,11 @@ static stcxt_t *Context_ptr = NULL;
 #define MMASK	(MGROW - 1)
 
 #define round_mgrow(x)	\
-	((unsigned long) (((unsigned long) (x) + MMASK) & ~MMASK))
+	((Size_t) (((Size_t) (x) + MMASK) & ~MMASK))
 #define trunc_int(x)	\
-	((unsigned long) ((unsigned long) (x) & ~(sizeof(int)-1)))
+	((Size_t) ((Size_t) (x) & ~(sizeof(int)-1)))
 #define int_aligned(x)	\
-	((unsigned long) (x) == trunc_int(x))
+	((Size_t) (x) == trunc_int(x))
 
 #define MBUF_INIT(x)					\
   STMT_START {							\
@@ -590,8 +596,8 @@ static stcxt_t *Context_ptr = NULL;
 
 #define MBUF_XTEND(x) 				\
   STMT_START {						\
-	int nsz = (int) round_mgrow((x)+msiz);	\
-	int offset = mptr - mbase;		\
+	STRLEN nsz = round_mgrow((x)+msiz);	\
+	STRLEN offset = mptr - mbase;		\
 	ASSERT(!cxt->membuf_ro, ("mbase is not read-only")); \
 	TRACEME(("** extending mbase from %d to %d bytes (wants %d new)", \
 		msiz, nsz, (x)));			\
@@ -1236,6 +1242,8 @@ static SV *retrieve_weakoverloaded(pTHX_ stcxt_t *cxt, const char *cname);
 static SV *retrieve_vstring(pTHX_ stcxt_t *cxt, const char *cname);
 static SV *retrieve_lvstring(pTHX_ stcxt_t *cxt, const char *cname);
 static SV *retrieve_svundef_elem(pTHX_ stcxt_t *cxt, const char *cname);
+static SV *retrieve_vlscalar(pTHX_ stcxt_t *cxt, const char *cname);
+static SV *retrieve_vlutf8str(pTHX_ stcxt_t *cxt, const char *cname);
 
 static const sv_retrieve_t sv_retrieve[] = {
 	0,			/* SX_OBJECT -- entry unused dynamically */
@@ -1270,6 +1278,8 @@ static const sv_retrieve_t sv_retrieve[] = {
 	(sv_retrieve_t)retrieve_vstring,	/* SX_VSTRING */
 	(sv_retrieve_t)retrieve_lvstring,	/* SX_LVSTRING */
 	(sv_retrieve_t)retrieve_svundef_elem,	/* SX_SVUNDEF_ELEM */
+	(sv_retrieve_t)retrieve_vlscalar,	/* SX_VLSCALAR */
+	(sv_retrieve_t)retrieve_vlutf8str,	/* SX_VLUTF8STR */
 	(sv_retrieve_t)retrieve_other,		/* SX_ERROR */
 };
 
@@ -2222,8 +2232,6 @@ static int store_scalar(pTHX_ stcxt_t *cxt, SV *sv)
 #ifdef SvVOK
 	    MAGIC *mg;
 #endif
-            I32 wlen; /* For 64-bit machines */
-
           string_readlen:
             pv = SvPV(sv, len);
 
@@ -2244,15 +2252,30 @@ static int store_scalar(pTHX_ stcxt_t *cxt, SV *sv)
 #endif
 #if Size_t_size > 4
             if (len > I32_MAX) {
-                CROAK(("String too long for Storable"));
+                STRLEN wlen = len;
+                unsigned char smark = SvUTF8(sv) ? SX_VLUTF8STR : SX_VLSCALAR;
+
+                PUTMARK(smark);
+                if (cxt->netorder && BYTEORDER == 0x87654321) {
+                    ((U32*)wlen)[0] = htonl(len >> 32);
+                    ((U32*)wlen)[1] = htonl(len & 0xffffffff);
+                }
+                else {
+                    wlen = len;
+                }
+                WRITE(&wlen, sizeof(wlen));
+                WRITE(pv, len);
             }
+            else
 #endif
 
-            wlen = (I32) len; /* WLEN via STORE_SCALAR expects I32 */
-            if (SvUTF8 (sv))
-                STORE_UTF8STR(pv, wlen);
-            else
-                STORE_SCALAR(pv, wlen);
+            {
+                I32 wlen = (I32) len; /* WLEN via STORE_SCALAR expects I32 */
+                if (SvUTF8 (sv))
+                    STORE_UTF8STR(pv, wlen);
+                else
+                    STORE_SCALAR(pv, wlen);
+            }
             TRACEME(("ok (scalar 0x%"UVxf" '%s', length = %"IVdf")",
                      PTR2UV(sv), SvPVX(sv), (IV)len));
 	} else
@@ -4990,6 +5013,69 @@ static SV *retrieve_scalar(pTHX_ stcxt_t *cxt, const char *cname)
 }
 
 /*
+ * retrieve_lscalar
+ *
+ * Retrieve defined long (string) scalar.
+ *
+ * Layout is SX_LSCALAR <length> <data>, with SX_LSCALAR already read.
+ * The scalar is "long" in that <length> is larger than LG_SCALAR so it
+ * was not stored on a single byte.
+ */
+static SV *retrieve_vlscalar(pTHX_ stcxt_t *cxt, const char *cname)
+{
+#if Size_t_size > 4
+	STRLEN len;
+	SV *sv;
+	HV *stash;
+
+        ASSERT(Size_t_size == 8, ("Size_t isn't 8 bytes, code needs re-work"));
+        if (cxt->netorder && BYTEORDER == 0x87654321) {
+            STRLEN rlen;
+            READ(&rlen, sizeof(rlen));
+            len = ((STRLEN)((U32*)&rlen)[0] << 32) | ((U32*)&rlen)[1];
+        }
+        else {
+            READ(&len, sizeof(len));
+        }
+	TRACEME(("retrieve_vlscalar (#%d), len = %"IVdf, cxt->tagnum, (IV) len));
+        if (len < I32_MAX || len > SSize_t_MAX) {
+            CROAK(("Size of very large scalar out of range"));
+        }
+
+	/*
+	 * Allocate an empty scalar of the suitable length.
+	 */
+
+	sv = NEWSV(10002, len);
+	stash = cname ? gv_stashpv(cname, GV_ADD) : 0;
+	SEEN_NN(sv, stash, 0);	/* Associate this new scalar with tag "tagnum" */
+
+	/*
+	 * WARNING: duplicates parts of sv_setpv and breaks SV data encapsulation.
+	 *
+	 * Now, for efficiency reasons, read data directly inside the SV buffer,
+	 * and perform the SV final settings directly by duplicating the final
+	 * work done by sv_setpv. Since we're going to allocate lots of scalars
+	 * this way, it's worth the hassle and risk.
+	 */
+
+	SAFEREAD(SvPVX(sv), len, sv);
+	SvCUR_set(sv, len);				/* Record C string length */
+	*SvEND(sv) = '\0';				/* Ensure it's null terminated anyway */
+	(void) SvPOK_only(sv);			/* Validate string pointer */
+	if (cxt->s_tainted)				/* Is input source tainted? */
+		SvTAINT(sv);				/* External data cannot be trusted */
+
+	TRACEME(("large scalar len %"IVdf" '%s'", (IV) len, SvPVX(sv)));
+	TRACEME(("ok (retrieve_lscalar at 0x%"UVxf")", PTR2UV(sv)));
+
+	return sv;
+#else
+        CROAK(("Reading a 2GB or larger string on a 32-bit platform"))'
+#endif
+}
+
+/*
  * retrieve_utf8str
  *
  * Like retrieve_scalar(), but tag result as utf8.
@@ -5047,6 +5133,35 @@ static SV *retrieve_lutf8str(pTHX_ stcxt_t *cxt, const char *cname)
 }
 
 /*
+ * retrieve_vlutf8str
+ *
+ * Like retrieve_vlscalar(), but tag result as utf8.
+ * If we're retrieving UTF8 data in a non-UTF8 perl, croaks.
+ */
+static SV *retrieve_vlutf8str(pTHX_ stcxt_t *cxt, const char *cname)
+{
+    SV *sv;
+
+    TRACEME(("retrieve_vlutf8str"));
+
+    sv = retrieve_vlscalar(aTHX_ cxt, cname);
+    if (sv) {
+#ifdef HAS_UTF8_SCALARS
+        SvUTF8_on(sv);
+#else
+        if (cxt->use_bytes < 0)
+            cxt->use_bytes
+                = (SvTRUE(perl_get_sv("Storable::drop_utf8", GV_ADD))
+                   ? 1 : 0);
+        if (cxt->use_bytes == 0)
+            UTF8_CROAK();
+#endif
+    }
+
+    return sv;
+}
+
+/*
  * retrieve_vstring
  *
  * Retrieve a vstring, and then retrieve the stringy scalar following it,
diff --git a/dist/Storable/t/large.t b/dist/Storable/t/large.t
new file mode 100644
index 0000000..f9b2055
--- /dev/null
+++ b/dist/Storable/t/large.t
@@ -0,0 +1,68 @@
+#!./perl -w
+
+BEGIN {
+    unshift @INC, 't';
+    unshift @INC, 't/compat' if $] < 5.006002;
+    require Config; import Config;
+    if ($ENV{PERL_CORE} and $Config{'extensions'} !~ /\bStorable\b/) {
+        print "1..0 # Skip: Storable was not built\n";
+        exit 0;
+    }
+    require 'st-dump.pl';
+}
+
+use Storable qw(freeze thaw nfreeze thaw);
+
+use Test::More;
+
+# memory usage checked with top
+$ENV{PERL_TEST_MEMORY} && $ENV{PERL_TEST_MEMORY} >= 8
+    or plan skip_all => "Need 8GB for this test";
+$Config{ptrsize} >= 8
+    or plan skip_all => "Need 64-bit pointers for this test";
+
+plan tests => 4;
+
+# we might have a lot of RAM, but maybe not so much disk space, so we
+# can only test freeze()/thaw().
+
+my $x = "x"; # avoid constant folding the large x op
+my $data = [ $x x 0x88000000 ]; # 2GB RAM (and a wee bit)
+
+{
+    my $frozen = freeze($data); # another 2GB RAM
+    my $thawed = thaw($frozen); # another 2GB RAM
+    # and add a bit more in case the following 
+    is_deeply($thawed, $data,
+              "check in and out match");
+    undef $frozen;
+    undef $thawed;
+}
+
+{
+    my $frozen = nfreeze($data);
+    my $thawed = thaw($frozen);
+    is_deeply($thawed, $data, "check in and out match (netorder)");
+    undef $frozen;
+    undef $thawed;
+}
+
+$x->[0] .= chr(0x100);
+
+{
+    my $frozen = freeze($data);
+    my $thawed = thaw($frozen);
+    is_deeply($thawed, $data,
+              "check in and out match (utf8)");
+    undef $frozen;
+    undef $thawed;
+}
+
+{
+    my $frozen = nfreeze($data);
+    my $thawed = thaw($frozen);
+    is_deeply($thawed, $data, "check in and out match (utf8,netorder)");
+    undef $frozen;
+    undef $thawed;
+}
+
diff --git a/dist/Storable/t/malice.t b/dist/Storable/t/malice.t
index 867a0d7..1358139 100644
--- a/dist/Storable/t/malice.t
+++ b/dist/Storable/t/malice.t
@@ -208,7 +208,7 @@ sub test_things {
     $where = $file_magic + $network_magic;
   }
 
-  # Just the header and a tag 255. As 31 is currently the highest tag, this
+  # Just the header and a tag 255. As 33 is currently the highest tag, this
   # is "unexpected"
   $copy = substr ($contents, 0, $where) . chr 255;
 
@@ -228,7 +228,7 @@ sub test_things {
   # local $Storable::DEBUGME = 1;
   # This is the delayed croak
   test_corrupt ($copy, $sub,
-                "/^Storable binary image v$header->{major}.$minor6 contains data of type 255. This Storable is v$header->{major}.$minor and can only handle data types up to 31/",
+                "/^Storable binary image v$header->{major}.$minor6 contains data of type 255. This Storable is v$header->{major}.$minor and can only handle data types up to 33/",
                 "bogus tag, minor plus 4");
   # And check again that this croak is not delayed:
   {
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2017

From @xsawyerx

On Sun, 17 Apr 2016 22​:21​:07 -0700, tonyc wrote​:

On Fri Mar 18 13​:05​:34 2016, jkeenan wrote​:

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable to
store
an array of large strings, the retrieved array would lost information
of
strings longer than 1,699,160,188 bytes.

Storable uses I32 and unsigned long internally for lengths, including
both using it for the length of the scalar being stored and some
intermediate lengths when calculating the new work buffer size for
freeze()/thaw().

The first of the attached patches simply rejects over-large scalars.

The second handles scalars for which the size is too large for I32 as
new Storable tags that an older Storable will reject.

Does anyone object merging this?

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2017

From @tonycoz

On Sun, Feb 12, 2017 at 01​:56​:09PM -0800, Sawyer X via RT wrote​:

On Sun, 17 Apr 2016 22​:21​:07 -0700, tonyc wrote​:

On Fri Mar 18 13​:05​:34 2016, jkeenan wrote​:

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable to
store
an array of large strings, the retrieved array would lost information
of
strings longer than 1,699,160,188 bytes.

Storable uses I32 and unsigned long internally for lengths, including
both using it for the length of the scalar being stored and some
intermediate lengths when calculating the new work buffer size for
freeze()/thaw().

The first of the attached patches simply rejects over-large scalars.

The second handles scalars for which the size is too large for I32 as
new Storable tags that an older Storable will reject.

Does anyone object merging this?

At the time I was put off merging this because of​:

http​://blogs.perl.org/users/rurban/2016/04/storable-security-problems-and-overlarge-data.html

(I don't recall him sending me the fixes.)

The second patch uses an op code the cperl changed skipped for some
reason, and uses it only for 64-bit PV storage.[2]

cperl allocates a different op as a prefix op to mark the next op as
taking a 64-bit size/count instead 32-bits for PVs, arrays, hashes.

So the issue for the second patch is file compatibility with cperl - are
we worried about it?

Tony

[1] the format of the storable binary data

[2] cperl seems to treat SX_ERROR as a specific error code rather than
the *first* erroneous code

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2019

From @tonycoz

On Sun, 12 Feb 2017 15​:44​:19 -0800, tonyc wrote​:

On Sun, Feb 12, 2017 at 01​:56​:09PM -0800, Sawyer X via RT wrote​:

On Sun, 17 Apr 2016 22​:21​:07 -0700, tonyc wrote​:

On Fri Mar 18 13​:05​:34 2016, jkeenan wrote​:

My Perl version is v5.20.1 for x86_64-Linux. When I used Sortable
to
store
an array of large strings, the retrieved array would lost
information
of
strings longer than 1,699,160,188 bytes.

Storable uses I32 and unsigned long internally for lengths,
including
both using it for the length of the scalar being stored and some
intermediate lengths when calculating the new work buffer size for
freeze()/thaw().

The first of the attached patches simply rejects over-large
scalars.

The second handles scalars for which the size is too large for I32
as
new Storable tags that an older Storable will reject.

Does anyone object merging this?

At the time I was put off merging this because of​:

http​://blogs.perl.org/users/rurban/2016/04/storable-security-problems-
and-overlarge-data.html

(I don't recall him sending me the fixes.)

The second patch uses an op code the cperl changed skipped for some
reason, and uses it only for 64-bit PV storage.[2]

cperl allocates a different op as a prefix op to mark the next op as
taking a 64-bit size/count instead 32-bits for PVs, arrays, hashes.

So the issue for the second patch is file compatibility with cperl -
are
we worried about it?

I ended up merging much of the cperl Storable changes to blead, which was released as part of perl 5.28.0 and to CPAN as Storable 3.08.

Along with several other patches that fixed this issue and several others.

Closing.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2019

@tonycoz - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant