Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threads not joinable issue #12133

Open
p5pRT opened this issue May 24, 2012 · 3 comments
Open

threads not joinable issue #12133

p5pRT opened this issue May 24, 2012 · 3 comments

Comments

@p5pRT
Copy link

p5pRT commented May 24, 2012

Migrated from rt.perl.org#113070 (status was 'open')

Searchable as RT113070$

@p5pRT
Copy link
Author

p5pRT commented May 24, 2012

From alexs@ecoscentric.com

Created by alexs@ecoscentric.com

I have an old perl application (pre perl 5.005) which emulated
thread support using fork that I moved over to use threads but
have encountered what I believe to be a major bug. Basically
the issue can be summed up as​:

  foreach my $thr (threads->list(threads​::joinable)) {
  $thr->join(); # Linux strace shows hangs here in waitpid4
  }

The thread really believes it is joinable as well​:
  foreach my $child (threads->list(threads​::joinable)) {
  printf "reap_children​: Waiting for thread %d to join\n",$child->tid();
  if ($child->is_joinable()) {
  printf "reap_children​: I think %d is joinable\n",$child->tid();
  $child->join();
  printf "reap_children​: Thread %d joined\n",$child->tid();
  } else {
  printf "reap_children​: Thread %d was not joinable\n",$child->tid();
  }
  }

Gives​:
  ...
  reap_children​: Waiting for thread 3 to join
  reap_children​: I think 3 is joinable
  [hangs]

  log_debug(1,'reap_children​: Waiting for thread '.$child->tid().' to join');
  if ($child->is_joinable()) {
  $child->join();
  log_debug(1,'reap_children​: Thread '.$child->tid().' joined');
  } else {
  log_debug(0,'reap_children​: Child '.$child->tid().' was joinable but now is not');
  }
  }

The trigger to this appears to be a pipe process, a bash script, started by my
main application. I have verified the script has finished ("echo FINISH > /tmp/foo"
as the last line of the script) but my main application is locked in a select()
which includes the pipe. The main application select() does not return, and
neither does join(), yet both the thread and script have terminated.

The proof of my pudding is that if I kill the bash script, I get​:
  ...
  reap_children​: I think 3 is joinable
  reap_children​: Thread 3 joined
and the select() call also returns. However, when the pipe process is
closed, $? gives -1 and $! returns 'No child processes'.

It seems to me that perl's thread support is getting in a muddle handling
the SIGCHLD signals resulting from the termination of the pipe and the
termination of the thread (though I thought perl 5.10 did not use fork()).
A Linux strace clearly shows perl in waitpid4() waiting on the shell
process ID, which I would have thought it should receive as my script
clearly has terminated (/tmp/foo exists and contains "FINISH" - see above)
I am using POSIX as well, which may be contributing to this confision.

Unfortunately I have not been able to reproduce this issue with a simple
case. My application is a pretty complex automated build and test system
which runs test on remote hardware and logs results in a MySQL database,
and this problem occurs every 4-8 hours after succesfully running
several hundred thousand tests and many builds. This application is
also well tested and has been in place for almost 10 years.

I have unfortunately had to revert back to my own thread emulation
system where I have a "fork_and_call" function which forks and calls
a given function with the pointer to the function and its arguments
passed to fork_and_call, pushing PIDs on a stack, and a signal handler to
reap SIGCHLD signals and verify when certain "threads" have finished
(ignoring SIGCHLD from terminating pipe processes).

FAOD, I am no NOOB and have been writing lightweighted thread
applications for almost 20 years, and have written over a dozen
multi-threaded perl apps. By design they have all been detached
threads though using threads​::shared, Thread​::Queue and
Thread​::Semaphore to communicate and handle start/stop synchronisation.
This is however the first time I have attempted to use thread->join()
with disappointing results.

As I have a workaround, I am not in any rush for a fix, especially since
I am unable to provide a small test case. You may wish to revisit the
thread->join() support though and check for any possibility of what I
have described. There still could be an issue with my app, but things
to appear to point to SIGCHLD getting misappropriated somewhere.

Perl Info

Flags:
    category=library
    severity=high
    module=threads

This perlbug was built using Perl 5.10.1 in the Fedora build system.
It is being executed now by Perl 5.10.1 - Sun Nov  6 00:37:43 GMT 2011.

Site configuration information for perl 5.10.1:

Configured by Red Hat, Inc. at Sun Nov  6 00:37:43 GMT 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-44.2.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux c6b5.bsys.dev.centos.org 2.6.32-44.2.el6.x86_64 #1 smp wed jul 21 12:48:32 edt 2010 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DDEBUGGING=-g -Dversion=5.10.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Darchlib=/usr/lib64/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib64/perl5/vendor_perl -Dinc_version_list=5.10.0 -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -U
 d_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.5 20110214 (Red Hat 4.4.5-6)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.12'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'

Locally applied patches:
    


@INC for perl 5.10.1:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.10.1:
    HOME=/home/farm
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/farm/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2012

From @Leont

On Thu, May 24, 2012 at 2​:12 PM, alexs@​ecoscentric.com
<perlbug-followup@​perl.org> wrote​:

The trigger to this appears to be a pipe process, a bash script, started by my
main application. I have verified the script has finished ("echo FINISH > /tmp/foo"
as the last line of the script) but my main application is locked in a select()
which includes the pipe.  The main application select() does not return, and
neither does join(), yet both the thread and script have terminated.

In general, system(3) is not particularly thread safe (for both signal
handling as asynchronous safety related issues), though for most
purposes it's ok. I'm not sure that's the real issue here, but it's
worth pointing that out.

Unfortunately I have not been able to reproduce this issue with a simple
case. My application is a pretty complex automated build and test system
which runs test on remote hardware and logs results in a MySQL database,
and this problem occurs every 4-8 hours after succesfully running
several hundred thousand tests and many builds.  This application is
also well tested and has been in place for almost 10 years.

That sounds like some weird race condition. Judging by the code of
threads.pm, I can imagine how this is going to happen. A thread is
marked joinable right before it is actually destroyed, so if the
create/destroy mutex corrupted (someone locked it but didn't unlock
it), it will hang the thread's death. I'm not quite sure what causes
this though.

I have unfortunately had to revert back to my own thread emulation
system where I have a "fork_and_call" function which forks and calls
a given function with the pointer to the function and its arguments
passed to fork_and_call, pushing PIDs on a stack, and a signal handler to
reap SIGCHLD signals and verify when certain "threads" have finished
(ignoring SIGCHLD from terminating pipe processes).

Have your tried upgrading your version of threads.pm? It may or may
not fix this issue but it's worth a try.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2012

The RT System itself - Status changed from 'new' to 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants