Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared references (threads::shared) disappear on sub return #13140

Closed
p5pRT opened this issue Jul 31, 2013 · 21 comments
Closed

Shared references (threads::shared) disappear on sub return #13140

p5pRT opened this issue Jul 31, 2013 · 21 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 31, 2013

Migrated from rt.perl.org#119089 (status was 'open')

Searchable as RT119089$

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @snaury

Created by @snaury

This is a bug report for perl from snaury@​gmail.com,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
When using threads​::shared on Perl 5.14 using many kinds of
expressions in sub's return becomes very dangerous, as return
values just become undef undef certain circumstances. For example
this program would die in Perl 5.14, but would work normally in
Perl 5.12 or Perl 5.16​:

  use strict;
  use threads ();
  use threads​::shared;

  sub test_clone {
  my $ref = shared_clone([{a => 1, b => 2}]);
  return $ref->[0];
  }

  die "not defined" unless defined(test_clone);

This causes threaded perl programs on Debian 7.0 or Ubuntu 12.04
to produce really weird failures.

Perl Info

Flags:
    category=library
    severity=high
    module=threads::shared

Site configuration information for perl 5.14.2:

Configured by Debian Project at Mon Mar 18 19:16:26 UTC 2013.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:

  Platform:
    osname=linux, osvers=2.6.42-37-generic,
archname=x86_64-linux-gnu-thread-multi
    uname='linux batsu 2.6.42-37-generic #58-ubuntu smp thu jan 24 15:28:10
utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib/perl/5.14
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.14.2
-Dsitearch=/usr/local/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm
-Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib
-Dlibperl=libperl.so.5.14.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.6.3', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.14.2
    gnulibc_version='2.15'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib
-fstack-protector'

Locally applied patches:



@INC for perl 5.14.2:
    /etc/perl
    /usr/local/lib/perl/5.14.2
    /usr/local/share/perl/5.14.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.14
    /usr/share/perl/5.14
    /usr/local/lib/site_perl
    .


Environment for perl 5.14.2:
    HOME=/home/dragonfox
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_ADDRESS=en_GB.UTF-8
    LC_IDENTIFICATION=en_GB.UTF-8
    LC_MEASUREMENT=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NAME=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_PAPER=en_GB.UTF-8
    LC_TELEPHONE=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/home/dragonfox/perl5/perlbrew/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    PERLBREW_BASHRC_VERSION=0.64
    PERLBREW_HOME=/home/dragonfox/.perlbrew
    PERLBREW_MANPATH=
    PERLBREW_PATH=/home/dragonfox/perl5/perlbrew/bin
    PERLBREW_ROOT=/home/dragonfox/perl5/perlbrew
    PERLBREW_VERSION=0.64
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @iabyn

On Tue, Jul 30, 2013 at 10​:58​:36PM -0700, Alexey Borzenkov wrote​:

# New Ticket Created by Alexey Borzenkov
# Please include the string​: [perl #119089]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119089 >

This is a bug report for perl from snaury@​gmail.com,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
When using threads​::shared on Perl 5.14 using many kinds of
expressions in sub's return becomes very dangerous, as return
values just become undef undef certain circumstances. For example
this program would die in Perl 5.14, but would work normally in
Perl 5.12 or Perl 5.16​:

use strict;
use threads \(\);
use threads&#8203;::shared;

sub test\_clone \{
    my $ref = shared\_clone\(\[\{a => 1\, b => 2\}\]\);
    return $ref\->\[0\];
\}

die "not defined" unless defined\(test\_clone\);

This causes threaded perl programs on Debian 7.0 or Ubuntu 12.04
to produce really weird failures.

This was fixed in perl 5.15.7 with commit
  6f48390,
which wont be back-ported to 5.14.x as its out of its maintenance period.

You can work around the issue on older perls by assigning the value to
a temporary var before returning it; i.e.

  my $ret = $ref->[0];
  return $ret;

--
Technology is dominated by two types of people​: those who understand what
they do not manage, and those who manage what they do not understand.

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @nwc10

On Wed, Jul 31, 2013 at 01​:10​:33PM +0100, Dave Mitchell wrote​:

On Tue, Jul 30, 2013 at 10​:58​:36PM -0700, Alexey Borzenkov wrote​:

When using threads​::shared on Perl 5.14 using many kinds of
expressions in sub's return becomes very dangerous, as return
values just become undef undef certain circumstances. For example
this program would die in Perl 5.14, but would work normally in
Perl 5.12 or Perl 5.16​:

use strict;
use threads \(\);
use threads&#8203;::shared;

sub test\_clone \{
    my $ref = shared\_clone\(\[\{a => 1\, b => 2\}\]\);
    return $ref\->\[0\];
\}

die "not defined" unless defined\(test\_clone\);

This can be simplified to​:

#!./perl
use strict;
use threads ();
use threads​::shared;

sub test_clone {
  my @​a :shared;
  $a[0] = 6;
  $a[0];
}

die "not defined" unless defined test_clone;
__END__

This causes threaded perl programs on Debian 7.0 or Ubuntu 12.04
to produce really weird failures.

This was fixed in perl 5.15.7 with commit
6f48390,
which wont be back-ported to 5.14.x as its out of its maintenance period.

You can work around the issue on older perls by assigning the value to
a temporary var before returning it; i.e.

my $ret = $ref\->\[0\];
return $ret;

Totally true.

If assertions are enabled the test case triggers an assertion failure on
5.14.0. Given this, I was suspicious as to what introduced the assertion
failure, it turns out that there's a heck of a lot more to this than first
meets the eye.

So a git bisect (with assertions enabled) shows that commit 6f48390
(as mentioned above) makes the test case work. It *also* makes the test case
work if assertions are disabled.

commit 6f48390
Author​: Father Chrysostomos <sprout@​cpan.org>
Date​: Wed Jan 4 23​:28​:54 2012 -0800

  [perl #95548] Returned magical temps are not copied
 
  return and leavesub, for speed, were not copying temp variables with a
  refcount of 1, which is fine as long as the fact that it was not cop-
  ied is not observable.
 
  With magical variables, that *can* be observed, so we have to forego
  the optimisation and copy the variable if's magical.
 
  This obviously applies only to rvalue subs.

So, what caused the assertions to start failing between v5.12 and v5.14?
A simple bisect suggests that it's this commit​:

commit f2338a2
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Apr 15 10​:20​:50 2010 +0100

  use cBOOL for bool casts
 
  bool b = (bool)some_int
 
  doesn't necessarily do what you think. In some builds, bool is defined as
  char, and that cast's behaviour is thus undefined. So this line in mg.c​:
 
  const bool was_temp = (bool)SvTEMP(sv);
 
  was actually setting was_temp to false even when the SVs_TEMP flag was set.
  Fix this by replacing all the (bool) casts with a new cBOOL() cast macro
  that (hopefully) does the right thing.

which has nothing directly to do with threads shared, or (not?) copying
values on subroutine return. It turns out after a bit of digging that it's
this section of that change that matters​:

Inline Patch
diff --git a/mg.c b/mg.c
index 3fb8ec4..4a8d767 100644
--- a/mg.c
+++ b/mg.c
@@ -193,7 +193,7 @@ Perl_mg_get(pTHX_ SV *sv)
 {
     dVAR;
     const I32 mgs_ix = SSNEW(sizeof(MGS));
-    const bool was_temp = (bool)SvTEMP(sv);
+    const bool was_temp = cBOOL(SvTEMP(sv));
     bool have_new = 0;
     MAGIC *newmg, *head, *cur, *mg;
     /* guard against sv having being freed midway by holding a private



It's not obvious from the code above, but SvTEMP(sv) returns either 0 or 0x00080000\, and in 2010\, C\ was typedef'd as C\\, so the above code always set was\_temp to 0\, due to truncation\. The cast was added to avoid a warning about the truncation\. The \(buggy\) truncation itself was added when the type of was\_temp was changed from a suitably large integer to a bool as part of this commit​:

commit 35a4481
Author​: Andy Lester <andy@​petdance.com>
Date​: Sun Mar 13 08​:20​:05 2005 -0600

  Adding const qualifiers
  Message-ID​: <20050313202005.GA23535@​petdance.com>
 
  p4raw-id​: //depot/perl@​24037

(note that the commit message doesn't even note that some variables' types
were changed).

The variable in question is used to enable Perl_mg_get() to retain the
SVs_TEMP bit set on values it is processing. This is needed because
Perl_sv_2mortal() conflates "temporary" with "mortal" and automatically sets
the SVs_TEMP flag bit on everything. Hence this code seeks to remove
SVs_TEMP if Perl_sv_2mortal() set it. The upshot of the bug was that
SVs_TEMP was always being removed, so code in subroutine entry that does
*not* copy TEMPs is skipped because the value is becoming unmarked. Hence
the values are always copied.

So, all that conceals what actually going on. When that truncation bug was
first introduced, it didn't matter that TEMPs weren't copied. When that bug
was fixed, it mattered a lot that TEMPs weren't copied. What changed in the
middle?

I set out to bisect between the two commits, patching *that* the bug to fix
it. The code doesn't change much, and test-building on the change points
reveals that at the revision with the last change to that part of the code
(changing a line adjacent to C<was_temp>), fixing *that* bug doesn't break
the test case.

So I made a patch to fix the bug, and bisected with that​:

Porting/bisect.pl -Dusethreads --start f9c6fee --end db63319 --late-fixup ../119089-W -Dnoextensions=Encode ../perl/119089

That reveals that the true trigger of the behaviour regression was this
change​:

commit fd69380
Author​: David Mitchell <davem@​iabyn.com>
Date​: Tue Mar 23 12​:11​:43 2010 +0000

  Fix assorted bugs related to magic (such as pos) not "sticking" to
  magical array and hash elements; e.g. the following looped infinitely​:
 
  $h{tainted_element} =~ /..../g
 
  There are two side-effects of this fix.
  First, MGf_GSKIP has been extended to work on tied array
  elements as well as hash elements. This is the mechanism that skips all
  but the first tied element magic gets until after the next set.
  Second, rvalue hash/array element access where the element has get magic,
  now directly returns the element rather than a mortal copy.
 
  The root cause of the bug was code similar to the following in pp_alem,
  pp_aelemfast, pp_helem and pp_rv2av​:
 
  if (!lval && SvGMAGICAL(sv)) /* see note in pp_helem() */
  sv = sv_mortalcopy(sv);
 
  According to the note, this was added in 1998 to make this work​:
 
  local $tied{foo} = $tied{foo}
 
  Since it returns a copy rather than the element, this make //g fail.
  My first attempt, a few years ago, to fix this, took the approach that
  the LHS of the bind should be made an lvalue in the presence of //g, since
  it now modifies its LHS; i.e.
 
  expr =~ // expr is rvalue
  expr =~ s/// expr is lvalue
  expr =~ //g expr was rvalue, I proposed to change it to lvalue
 
  Unfortunately this fix broke too much stuff (stuff that was arguably
  already broken, but it upset people). For example, f() = s////
  correctly gives the error
 
  Can't modify non-lvalue subroutine call
 
  My fix extended f() =
//g to give the same error. Which is reasonable,
  because the g isn't doing what you want. But plenty of people had code that
  only needed to match once and the g had just been cargo-culted. So it
  broke their working code. So lets not do this.
 
  My new approach has been to remove the sv_mortalcopy(). It turns out
  that this is no longer needed to fix the local $tied{foo} issue.
  Presumably that went away as a side-effect of my container/value magic
  localisation rationalisation of a few years ago, although I haven't
  analysed it - just noted that the tests still pass (!). However, an issue
  with removing it is that mg_get() no longer gets called. So a plain
 
  $tied_hash{elem};
 
  in void context no longer calls FETCH(). Which broke some tests and might
  break some code. Also, there's an issue with the delayed calling of magic
  in @​+[n] and %+{foo}; by the time the get magic is called, the original
  pattern may have gone out of scope.
 
  The solution is to simply replace the original
 
  sv = sv_mortalcopy(sv);
 
  with
 
  mg_get(sv);
 
  This then caused problems with tied array FETCH() getting called too much.
  I fixed this by extending the MGf_GSKIP mechanism to tied arrays as well
  as hashes. I don't understand why tied arrays have always been treated
  differently than tied hashes, but unifying them didn't seem to break
  anything (except for a Storable test, whose comment indicated that the
  test's author thought FETCH() was being called to often anyway).

The key part is this​:

Inline Patch
diff --git a/pp_hot.c b/pp_hot.c
index 3371e88..8f8af53 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -658,7 +658,7 @@ PP(pp_aelemfast)
     SV *sv = (svp ? *svp : &PL_sv_undef);
     EXTEND(SP, 1);
     if (!lval && SvGMAGICAL(sv))	/* see note in pp_helem() */
-	sv = sv_mortalcopy(sv);
+	mg_get(sv);
     PUSHs(sv);
     RETURN;
 }



ie it *gets rid of a copy*. So this removes one place that copied the return value\, and the cast fix removes the other place\, hence why both are needed to trigger the problem\. And why the identified patch fixes it\, by re\-instating a copy\.

The story is not over.

So, I assumed that taking blead, and reverting the patch that fixes the test
case, would cause the test case to fail again. It doesn't. More bisecting
revealed that subsequently this change also has an influence​:

commit 4bac9ae
Author​: Chip Salzenberg <chip@​pobox.com>
Date​: Fri Jun 22 15​:18​:18 2012 -0700

  Magic flags harmonization.
 
  In restore_magic(), which is called after any magic processing, all of
  the public OK flags have been shifted into the private OK flags. Thus
  the lack of an appropriate public OK flags was used to trigger both get
  magic and required conversions. This scheme did not cover ROK, however,
  so all properly written code had to make sure mg_get was called the right
  number of times anyway. Meanwhile the private OK flags gained a second
  purpose of marking converted but non-authoritative values (e.g. the IV
  conversion of an NV), and the inadequate flag shift mechanic broke this
  in some cases.
 
  This patch removes the shift mechanic for magic flags, thus exposing (and
  fixing) some improper usage of magic SVs in which mg_get() was not called
  correctly. It also has the side effect of making magic get functions
  specifically set their SVs to undef if that is desired, as the new behavior
  of empty get functions is to leave the value unchanged. This is a feature,
  as now get magic that does not modify its value, e.g. tainting, does not
  have to be special cased.
 
  The changes to cpan/ here are only temporary, for development only, to
  keep blead working until upstream applies them (or something like them).
 
  Thanks to Rik and Father C for review input.

because part of it removes the entire was_temp (micro?) optimisation logic​:

@​@​ -3302,12 +3285,8 @​@​ S_restore_magic(pTHX_ const void *p)
  So artificially keep it alive a bit longer.
  We avoid turning on the TEMP flag, which can cause the SV's
  buffer to get stolen (and maybe other stuff). */
- int was_temp = SvTEMP(sv);
  sv_2mortal(sv);
- if (!was_temp) {
- SvTEMP_off(sv);
- }
- SvOK_off(sv);
+ SvTEMP_off(sv);
  }
  else
  SvREFCNT_dec(sv); /* undo the inc in S_save_magic() */

meaning that values will always be copied.

(Given the nature of the first commit identified as a bug fix, this seems
reasonable. magic values shouldn't be being short circuited.)

So that explains the given test case. But the story is still not over.

Whilst temporary values should be copied (by default), LVALUEs returned from
:lvalue subs should not. This will fail assertions, SEGV, or worse even on
blead​:

#!./perl
use strict;
use threads ();
use threads​::shared;
use Devel​::Peek;

sub test_clone :lvalue {
  my @​a :shared;
  $a[0];
}

#Dump(test_clone);
test_clone() = 0;
__END__

I believe that the problem is because threads​::shared wrongly assumes that
the proxy for a shared aggregate will always outlive any LVALUE proxies for
its elements. This is generally true, but not for element proxies returned
from :lvalue subroutines for aggregate proxies which go out of scope with
the subroutine.

I believe that the fix is to change threads​::shared to deal with cleanup
differently, so that the last magic struct which frees up the relevant
mg_obj is responsible for cleaning up shared space. This means adding "free"
magic to the element proxies​:

Inline Patch
diff --git a/dist/threads-shared/shared.xs b/dist/threads-shared/shared.xs
index d3e859d..07379cc 100644
--- a/dist/threads-shared/shared.xs
+++ b/dist/threads-shared/shared.xs
@@ -1027,6 +1027,27 @@ sharedsv_elem_mg_DELETE(pTHX_ SV *sv, MAGIC *mg)
     return (0);
 }
 
+int
+sharedsv_elem_mg_free(pTHX_ SV *sv, MAGIC *mg)
+{
+    dTHXc;
+    PERL_UNUSED_ARG(sv);
+    ENTER_LOCK;
+    if (mg->mg_obj) {
+        if (!PL_dirty) {
+            assert(SvROK(mg->mg_obj));
+        }
+        if (SvREFCNT(mg->mg_obj) == 1) {
+            /* If the element has the last pointer to the shared aggregate, then
+               it has to free the shared aggregate.  mg->mg_obj itself is freed
+               by Perl_mg_free()  */
+            S_sharedsv_dec(aTHX_ SHAREDSV_FROM_OBJ(mg->mg_obj));
+        }
+    }
+    LEAVE_LOCK;
+    return (0);
+}
+
 /* Called during cloning of PERL_MAGIC_tiedelem(p) magic in new
  * thread */
 
@@ -1044,7 +1065,7 @@ MGVTBL sharedsv_elem_vtbl = {
     sharedsv_elem_mg_STORE,     /* set */
     0,                          /* len */
     sharedsv_elem_mg_DELETE,    /* clear */
-    0,                          /* free */
+    sharedsv_elem_mg_free,      /* free */
     0,                          /* copy */
     sharedsv_elem_mg_dup,       /* dup */
 #ifdef MGf_LOCAL
@@ -1114,7 +1135,21 @@ int
 sharedsv_array_mg_free(pTHX_ SV *sv, MAGIC *mg)
 {
     PERL_UNUSED_ARG(sv);
-    S_sharedsv_dec(aTHX_ (SV*)mg->mg_ptr);
+    if (!PL_dirty) {
+        assert(mg->mg_obj);
+        assert(SvROK(mg->mg_obj));
+        assert(SvUV(SvRV(mg->mg_obj)) == PTR2UV(mg->mg_ptr));
+    }
+    if (mg->mg_obj) {
+        if (SvREFCNT(mg->mg_obj) == 1) {
+            S_sharedsv_dec(aTHX_ (SV*)mg->mg_ptr);
+        } else {
+            /* An element of this aggregate still has PERL_MAGIC_tied(p)
+               pointing to this shared aggregate.  It will take responsibility
+               for freeing the shared aggregate.  Perl_mg_free() drops the
+               reference count on mg->mg_obj.  */
+        }
+    }
     return (0);
 }


This change has been stewing on the branch smoke-me/nicholas/RT119089-variant and doesn't seem to have any problems\. \(The commit message is now not correct \- 9b9aaeae0f62 fixed the problem on 5\.12\.0\. Global destruction\*\)

I think that it doesn't introduce any leaks, but I'd appreciate a second pair
of eyes.

This fix to threads​::shared has the side effect of also fixing the original
reported bug when running on v5.14

Nicholas Clark

* Global destruction. threads​::shared does not clean up "shared space", even
  when PL_destruct_level is 2. There is no C<perl_destruct> corresponding to
  the C<perl_alloc> in Perl_sharedsv_init(). I'm about to report that as a
  separate bug.

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @cpansprout

On Mon Aug 05 07​:20​:37 2013, nicholas wrote​:

I think that it doesn't introduce any leaks, but I'd appreciate a
second pair
of eyes.

It looks correct to me, but​:
• I haven’t looked at threads​::shared for a long time
• I didn’t get enough sleep last night (guess why? :-)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 12, 2013

From @iabyn

On Mon, Aug 05, 2013 at 03​:19​:44PM +0100, Nicholas Clark wrote​:

use strict;
use threads ();
use threads​::shared;
use Devel​::Peek;

sub test_clone :lvalue {
my @​a :shared;
$a[0];
}

#Dump(test_clone);
test_clone() = 0;
__END__

I believe that the problem is because threads​::shared wrongly assumes that
the proxy for a shared aggregate will always outlive any LVALUE proxies for
its elements. This is generally true, but not for element proxies returned
from :lvalue subroutines for aggregate proxies which go out of scope with
the subroutine.

I believe that the fix is to change threads​::shared to deal with cleanup
differently, so that the last magic struct which frees up the relevant
mg_obj is responsible for cleaning up shared space. This means adding "free"
magic to the element proxies​:

Your fix looks plausible to me.

I had a quick play with trying to break blead in other ways; I though this
should do it, but it doesn't, which confuses me​:

  use threads ();
  use threads​::shared;
  #use Devel​::Peek;

  my $r;
  {
  my @​a :shared;
  $r = \$a[0];
  #Dump $r;
  }
  $r = 1;

So I feel less confident that I understand the issue fully.

+int
+sharedsv_elem_mg_free(pTHX_ SV *sv, MAGIC *mg)
+{
+ dTHXc;
+ PERL_UNUSED_ARG(sv);
+ ENTER_LOCK;
+ if (mg->mg_obj) {
+ if (!PL_dirty) {
+ assert(SvROK(mg->mg_obj));
+ }
+ if (SvREFCNT(mg->mg_obj) == 1) {
+ /* If the element has the last pointer to the shared aggregate, then
+ it has to free the shared aggregate. mg->mg_obj itself is freed
+ by Perl_mg_free() */
+ S_sharedsv_dec(aTHX_ SHAREDSV_FROM_OBJ(mg->mg_obj));
+ }
+ }
+ LEAVE_LOCK;
+ return (0);
+}

I think the ENTER_LOCK and LEAVE_LOCK are superfluous in this function;
its all done in private space apart from S_sharedsv_dec(), which does its
own locking; and there's no savestack manipulation which would require the
ENTER.

sharedsv_array_mg_free(pTHX_ SV *sv, MAGIC *mg)
{
PERL_UNUSED_ARG(sv);
- S_sharedsv_dec(aTHX_ (SV*)mg->mg_ptr);
+ if (!PL_dirty) {
+ assert(mg->mg_obj);
+ assert(SvROK(mg->mg_obj));
+ assert(SvUV(SvRV(mg->mg_obj)) == PTR2UV(mg->mg_ptr));
+ }
+ if (mg->mg_obj) {
+ if (SvREFCNT(mg->mg_obj) == 1) {
+ S_sharedsv_dec(aTHX_ (SV*)mg->mg_ptr);
+ } else {
+ /* An element of this aggregate still has PERL_MAGIC_tied(p)
+ pointing to this shared aggregate. It will take responsibility
+ for freeing the shared aggregate. Perl_mg_free() drops the
+ reference count on mg->mg_obj. */
+ }
+ }
return (0);
}

IIUC, sharedsv_elem_mg_free() and sharedsv_array_mg_free() should be
essentially the same code. They are both dealing with with an mg that has
mg_obj => RV => MG whose IV => shared thing. I'm not sure whether the
asserts in the two subs should be different (apart from the mg_ptr bit);
in particular, whether sharedsv_elem_mg_free() should be skipping the
assert(mg->mg_obj) in the non-dirty case.

--
You never really learn to swear until you learn to drive.

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2013

From @iabyn

re-opening this ticket as it has an unapplied patch

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2013

@iabyn - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2013

From @jmdh

On Mon Aug 05 08​:32​:28 2013, sprout wrote​:

On Mon Aug 05 07​:20​:37 2013, nicholas wrote​:

I think that it doesn't introduce any leaks, but I'd appreciate a
second pair
of eyes.

It looks correct to me, but​:
• I haven’t looked at threads​::shared for a long time
• I didn’t get enough sleep last night (guess why? :-)

I understand that there won't be another 5.14 release for this, but in
Debian this is still potentially an issue for us (see
http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718438). Is there any
reason we shouldn't look at applying Nicholas's patch in Debian?

Thanks,
Dominic.

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2013

From @jmdh

On Sat, Sep 07, 2013 at 10​:22​:20AM -0700, Dominic Hargreaves via RT wrote​:

On Mon Aug 05 08​:32​:28 2013, sprout wrote​:

On Mon Aug 05 07​:20​:37 2013, nicholas wrote​:

I think that it doesn't introduce any leaks, but I'd appreciate a
second pair
of eyes.

It looks correct to me, but​:
• I haven’t looked at threads​::shared for a long time
• I didn’t get enough sleep last night (guess why? :-)

I understand that there won't be another 5.14 release for this, but in
Debian this is still potentially an issue for us (see
http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718438). Is there any
reason we shouldn't look at applying Nicholas's patch in Debian?

FWIW, I tried applying it to the Debian 5.14 and got​:

dist/threads-shared/t/av_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 16 tests, saw 15

dist/threads-shared/t/clone....................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 34 tests, saw 20

dist/threads-shared/t/hv_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 22 tests, saw 12

dist/threads-shared/t/shared_attr................................/../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--non-zero wait status​: 32512

Dominic.

@p5pRT
Copy link
Author

p5pRT commented Sep 13, 2013

From @nwc10

On Sat, Sep 07, 2013 at 07​:25​:39PM +0100, Dominic Hargreaves wrote​:

On Sat, Sep 07, 2013 at 10​:22​:20AM -0700, Dominic Hargreaves via RT wrote​:

I understand that there won't be another 5.14 release for this, but in
Debian this is still potentially an issue for us (see
http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718438). Is there any
reason we shouldn't look at applying Nicholas's patch in Debian?

FWIW, I tried applying it to the Debian 5.14 and got​:

dist/threads-shared/t/av_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 16 tests, saw 15

dist/threads-shared/t/clone....................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 34 tests, saw 20

dist/threads-shared/t/hv_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 22 tests, saw 12

dist/threads-shared/t/shared_attr................................/../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--non-zero wait status​: 32512

You applied the patch to Debian? Or you built the current threads-shared
for Debian?

Because that macro was added to threads​::shared in version 1.42, which in
blead was this commit​:

commit 28399f5
Author​: Jerry D. Hedden <jdhedden@​cpan.org>
Date​: Tue Oct 2 18​:58​:32 2012 -0400

  Upgrade to threads​::shared 1.42

That commit has this​:

-STATIC SV *
-S_sharedsv_from_obj(pTHX_ SV *sv)
-{
- return ((SvROK(sv)) ? INT2PTR(SV *, SvIV(SvRV(sv))) : NULL);
-}
+#define SHAREDSV_FROM_OBJ(sv) ((SvROK(sv)) ? INT2PTR(SV *, SvIV(SvRV(sv))) : NULL)

So you might be OK changing all SHAREDSV_FROM_OBJ(sv) to
S_sharedsv_from_obj(aTHX_ sv)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 14, 2013

From @jmdh

On Fri, Sep 13, 2013 at 02​:26​:57PM +0100, Nicholas Clark wrote​:

On Sat, Sep 07, 2013 at 07​:25​:39PM +0100, Dominic Hargreaves wrote​:

On Sat, Sep 07, 2013 at 10​:22​:20AM -0700, Dominic Hargreaves via RT wrote​:

I understand that there won't be another 5.14 release for this, but in
Debian this is still potentially an issue for us (see
http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=718438). Is there any
reason we shouldn't look at applying Nicholas's patch in Debian?

FWIW, I tried applying it to the Debian 5.14 and got​:

dist/threads-shared/t/av_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 16 tests, saw 15

dist/threads-shared/t/clone....................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 34 tests, saw 20

dist/threads-shared/t/hv_refs..................................../../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--expected 22 tests, saw 12

dist/threads-shared/t/shared_attr................................/../t/perl​: symbol lookup error​: ../../lib/auto/threads/shared/shared.so​: undefined symbol​: SHAREDSV_FROM_OBJ
FAILED--non-zero wait status​: 32512

You applied the patch to Debian? Or you built the current threads-shared
for Debian?

The former.

Because that macro was added to threads​::shared in version 1.42, which in
blead was this commit​:

commit 28399f5
Author​: Jerry D. Hedden <jdhedden@​cpan.org>
Date​: Tue Oct 2 18​:58​:32 2012 -0400

Upgrade to threads&#8203;::shared 1\.42

That commit has this​:

-STATIC SV *
-S_sharedsv_from_obj(pTHX_ SV *sv)
-{
- return ((SvROK(sv)) ? INT2PTR(SV *, SvIV(SvRV(sv))) : NULL);
-}
+#define SHAREDSV_FROM_OBJ(sv) ((SvROK(sv)) ? INT2PTR(SV *, SvIV(SvRV(sv))) : NULL)

So you might be OK changing all SHAREDSV_FROM_OBJ(sv) to
S_sharedsv_from_obj(aTHX_ sv)

Aha. Thanks for the hint! I've updated the patch and the this now
builds and tests okay against 5.14.2-21​:

http​://anonscm.debian.org/gitweb/?p=perl/perl.git;a=shortlog;h=refs/heads/bug-718438

And the original test case from the reporter on the Debian bug now
succeeds.

Cheers,
Dominic.

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2013

From @jdhedden

On Sat Sep 14 16​:09​:05 2013, dom wrote​:

Aha. Thanks for the hint! I've updated the patch and the this now
builds and tests okay against 5.14.2-21​:

http​://anonscm.debian.org/gitweb/?
p=perl/perl.git;a=shortlog;h=refs/heads/bug-
718438

And the original test case from the reporter on the Debian bug now
succeeds.
It appears that this bug can be resolved.

@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2013

From @jkeenan

On Mon Sep 30 08​:43​:03 2013, jdhedden@​gmail.com wrote​:

On Sat Sep 14 16​:09​:05 2013, dom wrote​:

Aha. Thanks for the hint! I've updated the patch and the this now
builds and tests okay against 5.14.2-21​:

http​://anonscm.debian.org/gitweb/?
p=perl/perl.git;a=shortlog;h=refs/heads/bug-
718438

And the original test case from the reporter on the Debian bug now
succeeds.
It appears that this bug can be resolved.

Doing so now. Thanks for the feedback.

@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2013

@jkeenan - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Oct 1, 2013
@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2013

@nwc10 - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2013

From @nwc10

On Mon, Sep 30, 2013 at 08​:43​:04AM -0700, Jerry D. Hedden via RT wrote​:

On Sat Sep 14 16​:09​:05 2013, dom wrote​:

Aha. Thanks for the hint! I've updated the patch and the this now
builds and tests okay against 5.14.2-21​:

http​://anonscm.debian.org/gitweb/?
p=perl/perl.git;a=shortlog;h=refs/heads/bug-
718438

And the original test case from the reporter on the Debian bug now
succeeds.
It appears that this bug can be resolved.

Actually it can't yet, because the patch isn't yet in blead. No-one
who looked at it to review it was confident enough of it to merge it
immediately, and Dave raised a further question which I've not yet had time
to think about.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2013

From @jmdh

On Tue Oct 01 06​:30​:56 2013, nicholas wrote​:

On Mon, Sep 30, 2013 at 08​:43​:04AM -0700, Jerry D. Hedden via RT
wrote​:

On Sat Sep 14 16​:09​:05 2013, dom wrote​:

Aha. Thanks for the hint! I've updated the patch and the this now
builds and tests okay against 5.14.2-21​:

http​://anonscm.debian.org/gitweb/?
p=perl/perl.git;a=shortlog;h=refs/heads/bug-
718438

And the original test case from the reporter on the Debian bug now
succeeds.
It appears that this bug can be resolved.

Actually it can't yet, because the patch isn't yet in blead. No-one
who looked at it to review it was confident enough of it to merge it
immediately, and Dave raised a further question which I've not yet had
time
to think about.

Hmm, I can't see the subsequent question from Dave on this ticket. I'd
be interested in knowing about it even if there isn't an answer for it
yet, since this patch is queued for a Debian point release update in a
bid to fix the original regression. If there are sufficient remaining
uncertainties about the patch, it may be better to revert it before the
point release.

Cheers,
Dominic.

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @tonycoz

On Fri Oct 04 06​:10​:26 2013, dom wrote​:

Hmm, I can't see the subsequent question from Dave on this ticket. I'd
be interested in knowing about it even if there isn't an answer for it
yet, since this patch is queued for a Debian point release update in a
bid to fix the original regression. If there are sufficient remaining
uncertainties about the patch, it may be better to revert it before the
point release.

I think he meant​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119089#txn-1243579

I'm not sure about the correctness of the patch, but I think Dave's "why doesn't
this break" code example isn't correct, it should be​:

  use threads ();
  use threads​::shared;
  #use Devel​::Peek;
  my $r;
  {
  my @​a :shared;
  $r = \$a[0];
  #Dump $r;
  }
 
  $$r = 1; # changed this line

which does break.

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @iabyn

On Thu, Nov 14, 2013 at 04​:14​:37PM -0800, Tony Cook via RT wrote​:

On Fri Oct 04 06​:10​:26 2013, dom wrote​:

Hmm, I can't see the subsequent question from Dave on this ticket. I'd
be interested in knowing about it even if there isn't an answer for it
yet, since this patch is queued for a Debian point release update in a
bid to fix the original regression. If there are sufficient remaining
uncertainties about the patch, it may be better to revert it before the
point release.

I think he meant​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119089#txn-1243579

There was a followup from Nicholas which doesn't seem to have made it into
the RT ticket​:

Message-ID​: <20131007123705.GE66035@​plum.flirble.org>

--
Red sky at night - gerroff my land!
Red sky at morning - gerroff my land!
  -- old farmers' sayings #14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant