Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close and dup hang in threads #11805

Closed
p5pRT opened this issue Dec 14, 2011 · 21 comments
Closed

close and dup hang in threads #11805

p5pRT opened this issue Dec 14, 2011 · 21 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 14, 2011

Migrated from rt.perl.org#106212 (status was 'resolved')

Searchable as RT106212$

@p5pRT
Copy link
Author

p5pRT commented Dec 14, 2011

From patrik.h.hagglund@ericsson.com

Created by patrik.h.hagglund@ericsson.com

This is a bug report for perl from patrik.h.hagglund@​ericsson.com,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------

Filehandle duplication 'open(STDOUT, '>&', $tmph)', and 'close($tmph)' sometimes
hang in child (forked) processes, when threads are used in the top-level process.
A test program is included below (a skeleton of a program used as a test execution
driver). Run the program, and see the hangs using 'watch pstree'.

I don't get this hang on a single-core machine. Only with 2 or more cores. This
is SLES 11.0.

When line 31-39 below is replaced with

  # test close => hangs sometimes
  my $p1 = exec_debug('true');
  close($tmph) || die "closing $tmpname failed​: $!";
  #POSIX​::close(fileno($tmph)) # no hangs!
  waitpid($p1, 0);

I also see hangs on 'close', but not on 'POSIX​::close'.

/Patrik Hägglund

use v5.8.9;
use threads;

use strict;
use warnings;

use File​::Temp 'tempfile', 'unlink0';

# for debug, pass trival programs that can be spotted using 'watch pstree'
sub exec_debug {
  my $pid = fork();
  exec(@​_) if ($pid == 0);
  return $pid;
}

# execute a test command and check the output
sub exec_and_check () {

  my ($tmph, $tmpname) = tempfile();

  my $pid = fork();

  if ($pid == 0) {
  # in child

  # test fork/exec => no hangs
  my $p0 = exec_debug('test');
  waitpid($p0, 0);

  # test dup => hangs sometimes
  my $p1 = exec_debug('true');
  open(STDOUT, '>&', $tmph) || die "duplicating STDOUT to $tmpname failed​: $!";
  waitpid($p1, 0);

  # test dup => no?
  my $p2 = exec_debug('false');
  open(STDERR, '>&', \*STDOUT) || die "duplicating STDOUT to STDERR failed​: $!";
  waitpid($p2, 0);

  ### excecute (here a dummy cmd)
  exec('sleep', 0);
  }

  # in parent

  waitpid($pid, 0);

  ### check output
  open(my $fh, $tmpname);
  while (my $line = <$fh>) {
  ; # check (here empty)
  }
  unlink0($fh, $tmpname)
  or die "Error unlinking file $tmpname safely";

}

# run a test in a number of variants
sub runtest_vary ($) {
  for (my $i = 0; $i < 100; ++$i) {
  exec_and_check();
  }
  }

# wait for thread completion
sub threads_wait () {
  for (; threads->list() > 0;) {
  for (;;sleep(1)) {
  my @​thrs = threads->list(threads​::joinable);
  if (@​thrs > 0) {
  $thrs[0]->join();
  last;
  }
  }
  }
}

# run all tests
for (my $i = 0; $i < 100; ++$i) {
  threads->create(\&runtest_vary);
}
threads_wait();

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.14.2:

Configured by uabpath at Thu Oct 27 12:50:11 CEST 2011.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:

  Platform:
    osname=linux, osvers=2.6.27.42-0.1-pae, archname=i686-linux-thread-multi
    uname='linux esekits1059 2.6.27.42-0.1-pae #1 smp 2010-01-06 16:07:25 +0100 i686 i686 i386 gnulinux '
    config_args='-des -Dusethreads -Dprefix=/proj/flex_asic/LMWP3-32/perl-5.14.2'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.3.2 [gcc-4_3-branch revision 141291]', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.9.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.9'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

Locally applied patches:



@INC for perl 5.14.2:
    /proj/flex_asic/LMWP3-32/perl-5.14.2/lib/site_perl/5.14.2/i686-linux-thread-multi
    /proj/flex_asic/LMWP3-32/perl-5.14.2/lib/site_perl/5.14.2
    /proj/flex_asic/LMWP3-32/perl-5.14.2/lib/5.14.2/i686-linux-thread-multi
    /proj/flex_asic/LMWP3-32/perl-5.14.2/lib/5.14.2
    .


Environment for perl 5.14.2:
    HOME=/home/uabpath
    LANG=en_US
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/app/graphviz/2.26.3/LMWP2/lib
    LOGDIR (unset)
    PATH=/app/graphviz/2.26.3/LMWP2/bin:/proj/flexasic/releases/linux/dbg/flex_dbg_5.10-rc1/bin:/proj/flex_asic/llvm/fader2_sdk_1.8.1/bin:/proj/flex_asic/llvm/fader2_sdk_1.8.1/compiler/tools/bin:/app/kdiff3/0.9.96/LMWP3:/proj/flex_asic/LMWP3-64/git-1.7.7/bin:/usr/atria/bin:/app/gdb/7.3.1/LMWP3/bin:/proj/flex_asic/app/gkrellm-2.3.5/SLES11.0-64/bin:/app/emacs/23.2/LMWP3/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/quest/bin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/openwin/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/app/gridengine/6.2u5/bin/lx24-amd64
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh


@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2012

From @cpansprout

On Wed Dec 14 07​:35​:12 2011, patrik.h.hagglund@​ericsson.com wrote​:

Filehandle duplication 'open(STDOUT, '>&', $tmph)', and 'close($tmph)'
sometimes
hang in child (forked) processes, when threads are used in the top-
level process.
A test program is included below (a skeleton of a program used as a
test execution
driver). Run the program, and see the hangs using 'watch pstree'.

I can reproduce the hang, but when I run it under gdb it never gets that
far. I get this instead​:

Program received signal EXC_SOFTWARE, Software generated exception.
[Switching to process 86000]
0x98e6b1d6 in __wait4 ()

I have no idea what that means.

The backtrace is​:

(gdb) bt
#0 0x98e6b1d6 in __wait4 ()
#1 0x98e6b1c9 in waitpid$UNIX2003 ()
#2 0x000d8016 in Perl_wait4pid (my_perl=0x92b200, pid=89663,
statusp=0xb0102d0c, flags=0) at util.c​:3326
#3 0x001a1b07 in Perl_pp_waitpid (my_perl=0x92b200) at pp_sys.c​:4090
#4 0x000c9de9 in Perl_runops_debug (my_perl=0x92b200) at dump.c​:2119
#5 0x0002c8a2 in Perl_call_sv (my_perl=0x92b200, sv=0x18c1770, flags=9)
at perl.c​:2695
#6 0x002d458b in S_ithread_run (arg=0x55af50) at threads.xs​:517
#7 0x98e3785d in _pthread_start ()
#8 0x98e376e2 in thread_start ()

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2012

From @cpansprout

On Mon Jan 16 08​:49​:12 2012, sprout wrote​:

On Wed Dec 14 07​:35​:12 2011, patrik.h.hagglund@​ericsson.com wrote​:

Filehandle duplication 'open(STDOUT, '>&', $tmph)', and 'close($tmph)'
sometimes
hang in child (forked) processes, when threads are used in the top-
level process.
A test program is included below (a skeleton of a program used as a
test execution
driver). Run the program, and see the hangs using 'watch pstree'.

I can reproduce the hang, but when I run it under gdb it never gets that
far. I get this instead​:

Program received signal EXC_SOFTWARE, Software generated exception.
[Switching to process 86000]
0x98e6b1d6 in __wait4 ()

I have no idea what that means.

The backtrace is​:

(gdb) bt
#0 0x98e6b1d6 in __wait4 ()
#1 0x98e6b1c9 in waitpid$UNIX2003 ()
#2 0x000d8016 in Perl_wait4pid (my_perl=0x92b200, pid=89663,
statusp=0xb0102d0c, flags=0) at util.c​:3326
#3 0x001a1b07 in Perl_pp_waitpid (my_perl=0x92b200) at pp_sys.c​:4090
#4 0x000c9de9 in Perl_runops_debug (my_perl=0x92b200) at dump.c​:2119
#5 0x0002c8a2 in Perl_call_sv (my_perl=0x92b200, sv=0x18c1770, flags=9)
at perl.c​:2695
#6 0x002d458b in S_ithread_run (arg=0x55af50) at threads.xs​:517
#7 0x98e3785d in _pthread_start ()
#8 0x98e376e2 in thread_start ()

And it seems that this script has *really* confused things, because when
I tried to quit, I got this​:

(gdb) q
The program is running. Exit anyway? (y or n) y
error while killing target (killing anyway)​: assertion failure on line
300 of
"/SourceCache/gdb/gdb-1469/src/gdb/macosx/macosx-nat-inferior-util.c" in
function "macosx_inferior_suspend_ptrace"​: status.value.sig ==
TARGET_SIGNAL_STOP

warning​: error on line 2179 of
"/SourceCache/gdb/gdb-1469/src/gdb/macosx/macosx-nat-inferior.c" in
function "macosx_kill_inferior_safe"​: (os/kern) failure (0x5x)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Feb 11, 2013

From patrik.h.hagglund@ericsson.com

I have now tried to fix this myself. See patch and updated test-case
below (or attached files).

However, I'm not sure if I fulfill the requriement in atfork_lock()​:

/* locks must be held in locking order (if any) */

(Also, I find the comment in atfork_unlock strange, saying that the same
order should be used, rather than the reverse.)

/Patrik Hägglund

From fb0fe87ffa55b8121f0610b44a9c87fef519a2f0 Mon Sep 17 00​:00​:00 2001
From​: Patrik Hagglund <patrik.h.hagglund@​ericsson.com>
Date​: Sat, 2 Feb 2013 20​:21​:05 +0100
Subject​: [PATCH] Add PL_perlio_mutex to atfork_lock/unlock, to fix bug
#106212.
MIME-Version​: 1.0
Content-Type​: multipart/mixed; boundary="------------1.8.1.3"

This is a multi-part message in MIME format.
--------------1.8.1.3
Content-Type​: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding​: 8bit

Using threads + fork() on Linux, and IO operations in the threads, the
PL_perlio_mutex may be left in a locked state at the call of fork(),
potentially leading to deadlock in the child process at subsequent IO
operations. (Threads are pre-empted and not continued in the child
process after the fork.)

Therefore, ensure that the PL_perlio_mutex is unlocked in the child
process, right after fork(), by using atfork_lock/unlock.


util.c | 2 ++
1 file changed, 2 insertions(+)

--------------1.8.1.3
Content-Type​: text/x-patch; name="0001-Add-PL_perlio_mutex-to-
atfork_lock-unlock-to-fix-bug.patch"
Content-Transfer-Encoding​: 8bit
Content-Disposition​: attachment; filename="0001-Add-PL_perlio_mutex-to-
atfork_lock-unlock-to-fix-bug.patch"

Inline Patch
diff --git a/util.c b/util.c
index a3fbd3c..4c25a44 100644
--- a/util.c
+++ b/util.c
@@ -2798,6 +2798,7 @@ Perl_atfork_lock(void)
    dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be held in locking order (if any) */
+    MUTEX_LOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_LOCK(&PL_malloc_mutex);
 #  endif
@@ -2812,6 +2813,7 @@ Perl_atfork_unlock(void)
     dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be released in same order as in atfork_lock() */
+    MUTEX_UNLOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_UNLOCK(&PL_malloc_mutex);
 #  endif

--------------1.8.1.3--

### Reproduce a hang on IO ops, while using threads + fork on Linux.

# Skeleton version of a test driver program utilizing Perl threads.

use threads;

# execute a "test"
sub exec_and_check ($) {
  my ($t) = @​_;
  open(my $tmph, '>', "/tmp/foo$t");
  my $pid = fork();
  if ($pid == 0) {
  # in child
  # test an IO op, in this case 'close' => hangs sometimes
  close($tmph);
  ### excecute (here a dummy cmd)
  exec('true');
  }
  # in parent
  waitpid($pid, 0);
}

# run a number of "tests" sequentially
sub runtest_vary ($) {
  my ($t) = @​_;
  for (my $i = 0; $i < 80; ++$i) {
  exec_and_check($t);
  }
}

# wait for thread completion
sub threads_wait () {
  for (; threads->list() > 0;) {
  for (;;sleep(1)) {
  my @​thrs = threads->list(threads​::joinable);
  if (@​thrs > 0) {
  my $tid = $thrs[0]->tid();
  print "thread $tid finished\n";
  $thrs[0]->join();
  last;
  } else {
  # for debug​: print "hangs"
  my @​t = threads->list();
  print "@​t\n";
  }
  }
  }
}

# run all
for (my $t = 0; $t < 10; ++$t) {
  threads->create(\&runtest_vary, $t);
}
threads_wait();

@p5pRT
Copy link
Author

p5pRT commented Feb 11, 2013

@p5pRT
Copy link
Author

p5pRT commented Feb 11, 2013

From patrik.h.hagglund@ericsson.com

0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch
From fb0fe87ffa55b8121f0610b44a9c87fef519a2f0 Mon Sep 17 00:00:00 2001
From: Patrik Hagglund <patrik.h.hagglund@ericsson.com>
Date: Sat, 2 Feb 2013 20:21:05 +0100
Subject: [PATCH] Add PL_perlio_mutex to atfork_lock/unlock, to fix bug
 #106212.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.8.1.3"

This is a multi-part message in MIME format.
--------------1.8.1.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Using threads + fork() on Linux, and IO operations in the threads, the
PL_perlio_mutex may be left in a locked state at the call of fork(),
potentially leading to deadlock in the child process at subsequent IO
operations. (Threads are pre-empted and not continued in the child
process after the fork.)

Therefore, ensure that the PL_perlio_mutex is unlocked in the child
process, right after fork(), by using atfork_lock/unlock.
---
 util.c | 2 ++
 1 file changed, 2 insertions(+)


--------------1.8.1.3
Content-Type: text/x-patch; name="0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch"

diff --git a/util.c b/util.c
index a3fbd3c..4c25a44 100644
--- a/util.c
+++ b/util.c
@@ -2798,6 +2798,7 @@ Perl_atfork_lock(void)
    dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be held in locking order (if any) */
+    MUTEX_LOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_LOCK(&PL_malloc_mutex);
 #  endif
@@ -2812,6 +2813,7 @@ Perl_atfork_unlock(void)
     dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be released in same order as in atfork_lock() */
+    MUTEX_UNLOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_UNLOCK(&PL_malloc_mutex);
 #  endif

--------------1.8.1.3--


@p5pRT
Copy link
Author

p5pRT commented Feb 11, 2013

From [Unknown Contact. See original ticket]

I have now tried to fix this myself. See patch and updated test-case
below (or attached files).

However, I'm not sure if I fulfill the requriement in atfork_lock()​:

/* locks must be held in locking order (if any) */

(Also, I find the comment in atfork_unlock strange, saying that the same
order should be used, rather than the reverse.)

/Patrik Hägglund

From fb0fe87ffa55b8121f0610b44a9c87fef519a2f0 Mon Sep 17 00​:00​:00 2001
From​: Patrik Hagglund <patrik.h.hagglund@​ericsson.com>
Date​: Sat, 2 Feb 2013 20​:21​:05 +0100
Subject​: [PATCH] Add PL_perlio_mutex to atfork_lock/unlock, to fix bug
#106212.
MIME-Version​: 1.0
Content-Type​: multipart/mixed; boundary="------------1.8.1.3"

This is a multi-part message in MIME format.
--------------1.8.1.3
Content-Type​: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding​: 8bit

Using threads + fork() on Linux, and IO operations in the threads, the
PL_perlio_mutex may be left in a locked state at the call of fork(),
potentially leading to deadlock in the child process at subsequent IO
operations. (Threads are pre-empted and not continued in the child
process after the fork.)

Therefore, ensure that the PL_perlio_mutex is unlocked in the child
process, right after fork(), by using atfork_lock/unlock.


util.c | 2 ++
1 file changed, 2 insertions(+)

--------------1.8.1.3
Content-Type​: text/x-patch; name="0001-Add-PL_perlio_mutex-to-
atfork_lock-unlock-to-fix-bug.patch"
Content-Transfer-Encoding​: 8bit
Content-Disposition​: attachment; filename="0001-Add-PL_perlio_mutex-to-
atfork_lock-unlock-to-fix-bug.patch"

Inline Patch
diff --git a/util.c b/util.c
index a3fbd3c..4c25a44 100644
--- a/util.c
+++ b/util.c
@@ -2798,6 +2798,7 @@ Perl_atfork_lock(void)
    dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be held in locking order (if any) */
+    MUTEX_LOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_LOCK(&PL_malloc_mutex);
 #  endif
@@ -2812,6 +2813,7 @@ Perl_atfork_unlock(void)
     dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be released in same order as in atfork_lock() */
+    MUTEX_UNLOCK(&PL_perlio_mutex);
 #  ifdef MYMALLOC
     MUTEX_UNLOCK(&PL_malloc_mutex);
 #  endif

--------------1.8.1.3--

### Reproduce a hang on IO ops, while using threads + fork on Linux.

# Skeleton version of a test driver program utilizing Perl threads.

use threads;

# execute a "test"
sub exec_and_check ($) {
  my ($t) = @​_;
  open(my $tmph, '>', "/tmp/foo$t");
  my $pid = fork();
  if ($pid == 0) {
  # in child
  # test an IO op, in this case 'close' => hangs sometimes
  close($tmph);
  ### excecute (here a dummy cmd)
  exec('true');
  }
  # in parent
  waitpid($pid, 0);
}

# run a number of "tests" sequentially
sub runtest_vary ($) {
  my ($t) = @​_;
  for (my $i = 0; $i < 80; ++$i) {
  exec_and_check($t);
  }
}

# wait for thread completion
sub threads_wait () {
  for (; threads->list() > 0;) {
  for (;;sleep(1)) {
  my @​thrs = threads->list(threads​::joinable);
  if (@​thrs > 0) {
  my $tid = $thrs[0]->tid();
  print "thread $tid finished\n";
  $thrs[0]->join();
  last;
  } else {
  # for debug​: print "hangs"
  my @​t = threads->list();
  print "@​t\n";
  }
  }
  }
}

# run all
for (my $t = 0; $t < 10; ++$t) {
  threads->create(\&runtest_vary, $t);
}
threads_wait();

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From @Leont

On Mon, Feb 11, 2013 at 4​:56 PM, Patrik Hagglund via RT
<perlbug-comment@​perl.org> wrote​:

I have now tried to fix this myself. See patch and updated test-case
below (or attached files).

The fix makes perfect sense to me, this class of issue is exactly why
at_fork was invented.

However, I'm not sure if I fulfill the requriement in atfork_lock()​:

/* locks must be held in locking order (if any) */

Me neither, it would require a deeper look at the code. What is
important is that *any* code that uses at least two of those three
locks (memory, io, ops) does it in the same order (not just in core
but also on cpan). As long as that invariant is held, the specific
order doesn't matter.

(Also, I find the comment in atfork_unlock strange, saying that the same
order should be used, rather than the reverse.)

I would find reverse more intuitive too, but I don't think the unlock
order semantically matters much since it will never block, it's the
lock order that does.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From @Leont

On Mon, Feb 11, 2013 at 4​:56 PM, Patrik Hagglund via RT
<perlbug-comment@​perl.org> wrote​:

I have now tried to fix this myself. See patch and updated test-case
below (or attached files).

One small issue with the patch is that it wouldn't compile on a
non-perlio perl, though I think we disabled that in Configure in 5.16
anyway.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From patrik.h.hagglund@ericsson.com

Me neither, it would require a deeper look at the code.

I took a quick look, and couldn't find any interdependencies between the three mutex locks in atfork_lock(). However, that may be due to my lack of understanding of the perl internals.

Here are updated patches. The first one now just adds #ifdef USE_PERLIO, and the second one alters the order of locks (potentially dangerous!), and updates the comments.

/Patrik Hägglund

-----Original Message-----
From​: Leon Timmermans [mailto​:fawaka@​gmail.com]
Sent​: den 12 februari 2013 08​:37
To​: perlbug-comment@​perl.org
Cc​: perl5-porters@​perl.org
Subject​: Re​: [perl #106212] close and dup hang in threads

On Mon, Feb 11, 2013 at 4​:56 PM, Patrik Hagglund via RT <perlbug-comment@​perl.org> wrote​:

I have now tried to fix this myself. See patch and updated test-case
below (or attached files).

The fix makes perfect sense to me, this class of issue is exactly why at_fork was invented.

However, I'm not sure if I fulfill the requriement in atfork_lock()​:

/* locks must be held in locking order (if any) */

Me neither, it would require a deeper look at the code. What is important is that *any* code that uses at least two of those three locks (memory, io, ops) does it in the same order (not just in core but also on cpan). As long as that invariant is held, the specific order doesn't matter.

(Also, I find the comment in atfork_unlock strange, saying that the
same order should be used, rather than the reverse.)

I would find reverse more intuitive too, but I don't think the unlock order semantically matters much since it will never block, it's the lock order that does.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From patrik.h.hagglund@ericsson.com

0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch
From 57ad65b03bdd5ab163534b39a6cbd22fc6e7d74c Mon Sep 17 00:00:00 2001
From: Patrik Hagglund <patrik.h.hagglund@ericsson.com>
Date: Sat, 2 Feb 2013 20:21:05 +0100
Subject: [PATCH 1/2] Add PL_perlio_mutex to atfork_lock/unlock, to fix bug
 #106212.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.8.1.3"

This is a multi-part message in MIME format.
--------------1.8.1.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Using threads + fork() on Linux, and IO operations in the threads, the
PL_perlio_mutex may be left in a locked state at the call of fork(),
potentially leading to deadlock in the child process at subsequent IO
operations. (Threads are pre-empted and not continued in the child
process after the fork.)

Therefore, ensure that the PL_perlio_mutex is unlocked in the child
process, right after fork(), by using atfork_lock/unlock.
---
 util.c | 6 ++++++
 1 file changed, 6 insertions(+)


--------------1.8.1.3
Content-Type: text/x-patch; name="0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch"

diff --git a/util.c b/util.c
index a3fbd3c..bb35d97 100644
--- a/util.c
+++ b/util.c
@@ -2798,6 +2798,9 @@ Perl_atfork_lock(void)
    dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be held in locking order (if any) */
+#  ifdef USE_PERLIO
+    MUTEX_LOCK(&PL_perlio_mutex);
+#  endif
 #  ifdef MYMALLOC
     MUTEX_LOCK(&PL_malloc_mutex);
 #  endif
@@ -2812,6 +2815,9 @@ Perl_atfork_unlock(void)
     dVAR;
 #if defined(USE_ITHREADS)
     /* locks must be released in same order as in atfork_lock() */
+#  ifdef USE_PERLIO
+    MUTEX_UNLOCK(&PL_perlio_mutex);
+#  endif
 #  ifdef MYMALLOC
     MUTEX_UNLOCK(&PL_malloc_mutex);
 #  endif

--------------1.8.1.3--


@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From patrik.h.hagglund@ericsson.com

0002-Update-ordering-of-locks-in-atfork_lock-unlock-and-c.patch
From fcac73e4e40e5c4770628dda6512155ba7e36716 Mon Sep 17 00:00:00 2001
From: Patrik Hagglund <patrik.h.hagglund@ericsson.com>
Date: Tue, 12 Feb 2013 11:18:53 +0100
Subject: [PATCH 2/2] Update ordering of locks in atfork_lock/unlock, and
 corresponding commments.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.8.1.3"

This is a multi-part message in MIME format.
--------------1.8.1.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit

---
 util.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)


--------------1.8.1.3
Content-Type: text/x-patch; name="0002-Update-ordering-of-locks-in-atfork_lock-unlock-and-c.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0002-Update-ordering-of-locks-in-atfork_lock-unlock-and-c.patch"

diff --git a/util.c b/util.c
index bb35d97..47c5e2d 100644
--- a/util.c
+++ b/util.c
@@ -2797,14 +2797,17 @@ Perl_atfork_lock(void)
 {
    dVAR;
 #if defined(USE_ITHREADS)
-    /* locks must be held in locking order (if any) */
+    /* Locks must be held in locking order (if any), to avoid
+       deadlock. (It is not clear to me if there are any lock
+       interdependencies. I can't find any. Anyway, use an approximate
+       high-level to low-level ordering.) */
+    OP_REFCNT_LOCK;
 #  ifdef USE_PERLIO
     MUTEX_LOCK(&PL_perlio_mutex);
 #  endif
 #  ifdef MYMALLOC
     MUTEX_LOCK(&PL_malloc_mutex);
 #  endif
-    OP_REFCNT_LOCK;
 #endif
 }
 
@@ -2814,13 +2817,15 @@ Perl_atfork_unlock(void)
 {
     dVAR;
 #if defined(USE_ITHREADS)
-    /* locks must be released in same order as in atfork_lock() */
-#  ifdef USE_PERLIO
-    MUTEX_UNLOCK(&PL_perlio_mutex);
-#  endif
+    /* As long as locking is absent, ordering is not needed to avoid
+       deadlock. However, for clarity, use the reverse order of
+       atfork_lock() */
 #  ifdef MYMALLOC
     MUTEX_UNLOCK(&PL_malloc_mutex);
 #  endif
+#  ifdef USE_PERLIO
+    MUTEX_UNLOCK(&PL_perlio_mutex);
+#  endif
     OP_REFCNT_UNLOCK;
 #endif
 }

--------------1.8.1.3--


@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From @Leont

On Tue, Feb 12, 2013 at 8​:36 AM, Leon Timmermans <fawaka@​gmail.com> wrote​:

The fix makes perfect sense to me, this class of issue is exactly why
at_fork was invented.

To explain what's going on​:

When a fork is performed in a multi-threaded application, only the
calling thread is copied into the new one. So when one thread holds a
mutex and another forks, the mutex is never unlocked in the new
thread.

One solution to this is to lock all mutexes right before the lock and
unlock them right after. This way you can be sure no other thread
holds those mutexes. Interestingly this means sacrificing fork's
signal safety for thread safety, as useful fork handlers are pretty
much never signal safe.

Do note in particular that any shared variable may still show this
issue, unless guarded by POSIX​::AtFork. Also note you can't blindly
lock all shared variables, as this may produce deadlocks. It's not
impossible to do this correctly, but you're likely to get it wrong.

The sane solution is to just never combine fork and threads, unless
maybe when followed immediately by an exec. Anything in between the
two should be handled mostly as if you're writing an (unsafe) signal
handler, unless guarded by an at_fork handler (which malloc
fortunately is on any sane platform). And even then there are some
signaling complications you should take into account (for example, the
SIGCHLD may end up being handled by another thread). There's a reason
POSIX considers system() not to be thread-safe.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2013

From patrik.h.hagglund@ericsson.com

FYI​: There is a discussion in POSIX, pthread_atfork, section RATIONALE​:

The problem arises especially in multi-threaded I/O libraries, which are almost sure to be invoked between the fork() and exec calls to effect I/O redirection.

I originally used the following code, copied from perlfaq8, to capture a program's STDOUT but discard its STDERR, in a thread​:

  use IPC​::Open3;
  use File​::Spec;
  use Symbol qw(gensym);
  open(NULL, ">", File​::Spec->devnull);
  my $pid = open3(gensym, \*PH, ">&NULL", "cmd");
  while( <PH> ) { }
  waitpid($pid, 0);

Interestingly this means sacrificing fork's signal safety for thread safety, as useful fork handlers are pretty much never signal safe

I'm not sure I follow you completely. However, this is what the RATIONALE says​:

This is why the POSIX.1 standard since the 1996 release requires that the child process after fork() in a multi-threaded process only calls async-signal-safe interfaces.

/Patrik Hägglund

-----Original Message-----
From​: Leon Timmermans [mailto​:fawaka@​gmail.com]
Sent​: den 12 februari 2013 12​:06
To​: perlbug-comment@​perl.org
Cc​: perl5-porters@​perl.org
Subject​: Re​: [perl #106212] close and dup hang in threads

On Tue, Feb 12, 2013 at 8​:36 AM, Leon Timmermans <fawaka@​gmail.com> wrote​:

The fix makes perfect sense to me, this class of issue is exactly why
at_fork was invented.

To explain what's going on​:

When a fork is performed in a multi-threaded application, only the calling thread is copied into the new one. So when one thread holds a mutex and another forks, the mutex is never unlocked in the new thread.

One solution to this is to lock all mutexes right before the lock and unlock them right after. This way you can be sure no other thread holds those mutexes. Interestingly this means sacrificing fork's signal safety for thread safety, as useful fork handlers are pretty much never signal safe.

Do note in particular that any shared variable may still show this issue, unless guarded by POSIX​::AtFork. Also note you can't blindly lock all shared variables, as this may produce deadlocks. It's not impossible to do this correctly, but you're likely to get it wrong.

The sane solution is to just never combine fork and threads, unless maybe when followed immediately by an exec. Anything in between the two should be handled mostly as if you're writing an (unsafe) signal handler, unless guarded by an at_fork handler (which malloc fortunately is on any sane platform). And even then there are some signaling complications you should take into account (for example, the SIGCHLD may end up being handled by another thread). There's a reason POSIX considers system() not to be thread-safe.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2013

From @Leont

On Tue, Feb 12, 2013 at 1​:38 PM, Patrik Hägglund H
<patrik.h.hagglund@​ericsson.com> wrote​:

FYI​: There is a discussion in POSIX, pthread_atfork, section RATIONALE​:

The problem arises especially in multi-threaded I/O libraries, which are almost sure to be invoked between the fork() and exec calls to effect I/O redirection.

Sounds like the kind of problem you just reported ;-)

Interestingly this means sacrificing fork's signal safety for thread safety, as useful fork handlers are pretty much never signal safe

I'm not sure I follow you completely. However, this is what the RATIONALE says​:

This is why the POSIX.1 standard since the 1996 release requires that the child process after fork() in a multi-threaded process only calls async-signal-safe interfaces.

That's two different things.

The signal safety issue is this​: fork used to be async-signal-safe (it
was a pure system call), but now now it also runs the at_fork
handlers. Those handlers will not be async-signal-safe, so the entire
fork will not be. This wasn't fully realized when pthread_at_fork was
added though. See also
http​://standards.ieee.org/findstds/interps/1003-1c-95_int/pasc-1003.1c-37.html
and http​://sourceware.org/bugzilla/show_bug.cgi?id=4737

The second thing boils down to this​:

1) Mutexes are not multi-threaded fork safe, unless properly secured
using at_fork handlers.
2) Any function may use unguarded mutexes internally, unless
guaranteed otherwise.
3) Mutexes are not async-signal safe, so any function that is
async-signal safe is guaranteed not use mutexes

1 + 2 + 3 → So async-signal safe is also multi-threaded-fork safe,
this provides a well-defined lower-bound of sane things to do between
fork and exec. The actual list is likely to be larger (in particular
it should include fork itself), but you can't rely on that.

Leon

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2013

From @druud62

On 2013-02-12 08​:36, Leon Timmermans wrote​:

What is
important is that *any* code that uses at least two of those three
locks (memory, io, ops) does it in the same order (not just in core
but also on cpan). As long as that invariant is held, the specific
order doesn't matter.

So we should all do them in (for example) alphabetical order,
or have an (overridable) multilock( LIST ) that does it for us?

--
Ruud

@p5pRT
Copy link
Author

p5pRT commented Feb 14, 2013

From @Leont

On Wed, Feb 13, 2013 at 8​:46 PM, Dr.Ruud <rvtol+usenet@​isolution.nl> wrote​:

So we should all do them in (for example) alphabetical order,
or have an (overridable) multilock( LIST ) that does it for us?

To make matters more complex, there is also​:

* PL_hints_mutex /* Mutex for refcounted he refcounting */
* PL_check_mutex /* Mutex for PL_check */
* PL_my_ctx_mutex
* PL_dollarzero_mutex /* Modifying $0 */

We don't have three but seven mutexes to take into account. Doing this
properly may be a bit of a puzzle. Not that that should refrain us
from fixing the PerlIO mutex.

Leon

@p5pRT
Copy link
Author

p5pRT commented Mar 21, 2013

From patrik.h.hagglund@ericsson.com

(I have been busy with other duties, and is now returning to this.)

Not that that should refrain us from fixing the PerlIO mutex.

Fine. So is my 0001 patch attached to the ticket [1] good enough? If yes, can you apply it? If no, what is the next step to get this fixed?

/Patrik Hägglund

1. https://rt-archive.perl.org/perl5/Ticket/Attachment/1190997/613241/0001-Add-PL_perlio_mutex-to-atfork_lock-unlock-to-fix-bug.patch

-----Original Message-----
From​: Leon Timmermans via RT [mailto​:perlbug-followup@​perl.org]
Sent​: den 14 februari 2013 13​:29
To​: Patrik Hägglund H
Subject​: Re​: [perl #106212] close and dup hang in threads

On Wed, Feb 13, 2013 at 8​:46 PM, Dr.Ruud <rvtol+usenet@​isolution.nl> wrote​:

So we should all do them in (for example) alphabetical order,
or have an (overridable) multilock( LIST ) that does it for us?

To make matters more complex, there is also​:

* PL_hints_mutex /* Mutex for refcounted he refcounting */
* PL_check_mutex /* Mutex for PL_check */
* PL_my_ctx_mutex
* PL_dollarzero_mutex /* Modifying $0 */

We don't have three but seven mutexes to take into account. Doing this
properly may be a bit of a puzzle. Not that that should refrain us
from fixing the PerlIO mutex.

Leon

@p5pRT
Copy link
Author

p5pRT commented Mar 31, 2013

From @khwilliamson

Thanks, applied as commit
4da8095

which made it into 5.17.10, and hence 5.18
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 31, 2013

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant