Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many threads => various crashes #8203

Open
p5pRT opened this issue Nov 10, 2005 · 8 comments
Open

many threads => various crashes #8203

p5pRT opened this issue Nov 10, 2005 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 10, 2005

Migrated from rt.perl.org#37652 (status was 'open')

Searchable as RT37652$

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2005

From zefram@fysh.org

Created by zefram@fysh.org

I have a multithreaded server, using the "threads" interface, that fires
off a new thread to handle each client. It typically has between zero
and two clients connected at once, and gets a new client connection
every couple of seconds. I've found that after running for a few hours
(time is variable) it crashes, in a variety of ways. I have seen it
terminate, and I have also seen it lock up completely, apparently during
thread creation​: the parent thread uses 99% CPU and never gets past the
thread creation instruction, while the child thread is created at the
Unix level but I think never does its work.

I have a test program that simulates this thread usage pattern at a much
faster speed​:

### t0 start ###
#!/usr/bin/perl
use warnings;
use strict;
use threads;
use threads​::shared;
my $n : shared = 0;
while(1) {
  async {
  threads->self->detach;
  threads->yield;
  lock $n;
  threads->yield;
  print "thread $n\n";
  threads->yield;
  $n++;
  threads->yield;
  };
  threads->yield;
}
### t0 end ###

When running this program, I've seen it abort in various ways​:

$ ./t0
thread 0
[...]
thread 26117
*** glibc detected *** corrupted double-linked list​: 0x401cd858 ***
zsh​: abort ./t0
$ ./t0
thread 0
[...]
thread 1770
Attempt to free non-existent shared string '_<th', Perl interpreter​: 0x410090c0 during global destruction.
Unbalanced string table refcount​: (1) for "_<th" during global destruction.
*** glibc detected *** corrupted double-linked list​: 0x082a0960 ***
zsh​: abort ./t0
$

I've also seen it abort with these error messages from glibc​:

*** glibc detected *** free()​: invalid next size (fast)​: 0x0819cef8 ***
*** glibc detected *** double free or corruption (fasttop)​: 0x0819cef8 ***
*** glibc detected *** free()​: invalid pointer​: 0x40dbca64 ***

If the read and incrementation of $n are commented out, I see it crash in
the same kinds of way, and also sometimes crash with a segmentation fault.
I've never seen it lock up as my real program does.

I've run this test program with Perl 5.8.6 on a Gentoo Linux machine.
The original program was run with Perl 5.8.4 on a Debian (stable) Linux
machine, with an older version of the Linux kernel. The Debian machine
is a commercially sensitive server, so I can't run test code on it.

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl v5.8.6:

Configured by Gentoo at Wed Sep 28 16:05:01 BST 2005.

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
  Platform:
    osname=linux, osvers=2.6.12-gentoo-r4, archname=i686-linux-thread-multi
    uname='linux localhost 2.6.12-gentoo-r4 #3 smp thu jul 21 11:11:50 bst 2005 i686 intel(r) pentium(r) 4 cpu 3.20ghz genuineintel gnulinux '
    config_args='-des -Darchname=i686-linux-thread -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth=  -Doptimize=-O3 -march=pentium4 -mtune=pentium4 --force-addr -momit-leaf-frame-pointer -fomit-frame-pointer -ftracer -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/var/tmp/portage/perl-5.8.6-r6/image//usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux-thread-multi 5.8.2 5.8.2/i686-linux-thread-multi 5.8.4 5.8.4/i686-linux-thread-multi 5.8.5 5.8.5/i686-linux-thread-multi  -Dcf_by=Gentoo -Ud_csh -Dusethreads -Di_ndbm -Di_gdbm -Di_db'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='i686-pc-linux-gnu-gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3 -march=pentium4 -mtune=pentium4 --force-addr -momit-leaf-frame-pointer -fomit-frame-pointer -ftracer -pipe',
    cppflags='-DPERL5 -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -pipe'
    ccversion='', gccversion='3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.6:
    /etc/perl
    /usr/lib/perl5/site_perl/5.8.6/i686-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.6
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl/5.8.5/i686-linux-thread-multi
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.6/i686-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.6
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl/5.8.5/i686-linux-thread-multi
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/5.8.6/i686-linux-thread-multi
    /usr/lib/perl5/5.8.6
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.6:
    HOME=/home/zefram
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/zefram/pub/i686-pc-linux-gnu/bin:/home/zefram/pub/common/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local/bin:/usr/games:/opt/exim/bin:/opt/opera/bin:/opt/sun-jdk-1.4.2.09/bin:/opt/tomcat5/bin
    PERL_BADLANG (unset)
    SHELL=/home/zefram/pub/i686-pc-linux-gnu/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2005

From zefram@fysh.org

Additional information​: I've had a chance to analyse the program I started
with in its locked-up state (the crash type that the test program didn't
exhibit). The purpose of the program is to maintain high-value encryption
keys in RAM, so unfortunately taking a core dump was out of the question.

The program locked up with exactly one child thread in existence at
the Unix level. From log entries it now appears that this last thread
created *did* perform its intended work before locking up. Attaching to
the child with gdb, I got this stack trace​:

#0 0x40048604 in __pthread_sigsuspend () from /lib/libpthread.so.0
#1 0x400483c8 in __pthread_wait_for_restart_signal ()
  from /lib/libpthread.so.0
#2 0x40049d99 in __pthread_alt_lock () from /lib/libpthread.so.0
#3 0x40046ba5 in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x40215d90 in Perl_ithread_run ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#5 0x40045e51 in pthread_start_thread () from /lib/libpthread.so.0
#6 0x4016892a in clone () from /lib/libc.so.6

The child was not using any CPU time. The parent was using 99% CPU time,
and strace showed it making no system calls. I attached to it twice
with gdb, and got these stack traces​:

#0 0x40049cc7 in wait_node_free () from /lib/libpthread.so.0
#1 0x40046d7a in pthread_mutex_unlock () from /lib/libpthread.so.0
#2 0x4021555d in Perl_ithread_destruct ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#3 0x402159f9 in ithread_mg_free ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#4 0x080cb285 in Perl_sv_unmagic ()
#5 0x40216995 in Perl_ithread_DESTROY ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#6 0x40217483 in XS_threads_DESTROY ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#7 0x080c32d6 in Perl_pp_entersub ()
#8 0x080640ec in Perl_call_sv ()
#9 0x08063ec1 in Perl_call_sv ()
#10 0x080cbad5 in Perl_sv_clear ()
#11 0x080cc2a0 in Perl_sv_free ()
#12 0x080cbc99 in Perl_sv_clear ()
#13 0x080cc2a0 in Perl_sv_free ()
#14 0x080e63a1 in Perl_free_tmps ()
#15 0x080bc32a in Perl_pp_unstack ()
#16 0x080bbdc9 in Perl_runops_standard ()
#17 0x080635e8 in perl_run ()
#18 0x080633f5 in perl_run ()
#19 0x0805fb9f in main ()

#0 0x4004a155 in __pthread_acquire () from /lib/libpthread.so.0
#1 0x40049ca8 in wait_node_free () from /lib/libpthread.so.0
#2 0x4004a017 in __pthread_alt_unlock () from /lib/libpthread.so.0
#3 0x40046d7a in pthread_mutex_unlock () from /lib/libpthread.so.0
#4 0x4021555d in Perl_ithread_destruct ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#5 0x402159f9 in ithread_mg_free ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#6 0x080cb285 in Perl_sv_unmagic ()
#7 0x40216995 in Perl_ithread_DESTROY ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#8 0x40217483 in XS_threads_DESTROY ()
  from /usr/lib/perl/5.8/auto/threads/threads.so
#9 0x080c32d6 in Perl_pp_entersub ()
#10 0x080640ec in Perl_call_sv ()
#11 0x08063ec1 in Perl_call_sv ()
#12 0x080cbad5 in Perl_sv_clear ()
#13 0x080cc2a0 in Perl_sv_free ()
#14 0x080cbc99 in Perl_sv_clear ()
#15 0x080cc2a0 in Perl_sv_free ()
#16 0x080e63a1 in Perl_free_tmps ()
#17 0x080bc32a in Perl_pp_unstack ()
#18 0x080bbdc9 in Perl_runops_standard ()
#19 0x080635e8 in perl_run ()
#20 0x080633f5 in perl_run ()
#21 0x0805fb9f in main ()

This is on a Debian (stable) Linux machine. The output of `perl -V`​:

Summary of my perl5 (revision 5 version 8 subversion 4) configuration​:
  Platform​:
  osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
  uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18​:17​:45 est 2004 i686 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
  ccversion='', gccversion='3.3.5 (Debian 1​:3.3.5-9)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
  gnulibc_version='2.3.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
  Built under linux
  Compiled at Mar 8 2005 19​:51​:48
  @​INC​:
  /etc/perl
  /usr/local/lib/perl/5.8.4
  /usr/local/share/perl/5.8.4
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.8
  /usr/share/perl/5.8
  /usr/local/lib/site_perl
  .

-zefram

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

From @iabyn

On Thu, Nov 10, 2005 at 08​:49​:05AM -0800, Zefram wrote​:

I have a test program that simulates this thread usage pattern at a much
faster speed​:

### t0 start ###
#!/usr/bin/perl
use warnings;
use strict;
use threads;
use threads​::shared;
my $n : shared = 0;
while(1) {
async {
threads->self->detach;
threads->yield;
lock $n;
threads->yield;
print "thread $n\n";
threads->yield;
$n++;
threads->yield;
};
threads->yield;
}

The only way I can get that code to crash is when it runs out of memory on
32-bit systems (which on Linux seems to have a 128Mb limit). This
happened simply when the supply of new threads exceeded the destructions
of old threadsand so memory use grew. Thus, I'm not sure it's replicating
the problem you're seeing with your real code.

--
"I do not resent critisism, even when, for the sake of emphasis,
it parts for the time with reality".
  -- Winston Churchill, House of Commons, 22nd Jan 1941.

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

From @iabyn

On Fri, Nov 11, 2005 at 10​:48​:22AM +0000, Zefram wrote​:

The child was not using any CPU time. The parent was using 99% CPU time,
and strace showed it making no system calls. I attached to it twice
with gdb, and got these stack traces​:

#0 0x40049cc7 in wait_node_free () from /lib/libpthread.so.0
#1 0x40046d7a in pthread_mutex_unlock () from /lib/libpthread.so.0
#2 0x4021555d in Perl_ithread_destruct ()
from /usr/lib/perl/5.8/auto/threads/threads.so
#3 0x402159f9 in ithread_mg_free ()
from /usr/lib/perl/5.8/auto/threads/threads.so
#4 0x080cb285 in Perl_sv_unmagic ()
#5 0x40216995 in Perl_ithread_DESTROY ()
from /usr/lib/perl/5.8/auto/threads/threads.so
#6 0x40217483 in XS_threads_DESTROY ()
from /usr/lib/perl/5.8/auto/threads/threads.so
#7 0x080c32d6 in Perl_pp_entersub ()
#8 0x080640ec in Perl_call_sv ()
#9 0x08063ec1 in Perl_call_sv ()
#10 0x080cbad5 in Perl_sv_clear ()
#11 0x080cc2a0 in Perl_sv_free ()
#12 0x080cbc99 in Perl_sv_clear ()
#13 0x080cc2a0 in Perl_sv_free ()
#14 0x080e63a1 in Perl_free_tmps ()
#15 0x080bc32a in Perl_pp_unstack ()
#16 0x080bbdc9 in Perl_runops_standard ()
#17 0x080635e8 in perl_run ()
#18 0x080633f5 in perl_run ()
#19 0x0805fb9f in main ()

It would be interesting to to see the results of continuing to execute
using 'finish' a few times, to see at what level in the call stack it is
looping.

--
Nothing ventured, nothing lost.

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

From zefram@fysh.org

Dave Mitchell via RT wrote​:

The only way I can get that code to crash is when it runs out of memory on
32-bit systems (which on Linux seems to have a 128Mb limit). This
happened simply when the supply of new threads exceeded the destructions
of old threadsand so memory use grew. Thus, I'm not sure it's replicating
the problem you're seeing with your real code.

That's definitely not what I was seeing with any of my code. I never
saw the test program with more than four Linux-level subthreads, and its
memory usage stayed small enough to not be a problem. Obviously it was
keeping up with thread terminations.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2005

From zefram@fysh.org

Dave Mitchell via RT wrote​:

It would be interesting to to see the results of continuing to execute
using 'finish' a few times, to see at what level in the call stack it is
looping.

Unfortunately that's not an option. I don't have a proper test
environment for the code that does this.

-zefram

@p5pRT
Copy link
Author

p5pRT commented May 19, 2006

From guest@guest.guest.xxxxxxxx

http​://guest​:guest@​rt.perl.org/rt3/Ticket/Display.html?id=37652

Dave Mitchell wrote​:

The only way I can get that code to crash is when it runs
out of memory on 32-bit systems (which on Linux seems to
have a 128Mb limit). This happened simply when the supply
of new threads exceeded the destructions of old threads and
so memory use grew. Thus, I'm not sure it's replicating the
problem you're seeing with your real code.

zefram wrote​:

That's definitely not what I was seeing with any of my
code. I never saw the test program with more than four
Linux-level subthreads, and its memory usage stayed small
enough to not be a problem. Obviously it was keeping up
with thread terminations.

I would be inclined to dispute this. The test program, as
written, generates threads in a fast loop such that garbage
collection of terminated threads is, at best, hindered.
Trying it under Cygwin Perl 5.8.8 and threads 1.29, it
failed after 2400 threads. Watching it in the Task Manager,
memory steady grows until it fails after 350 MB. Similarly,
all the failures reported by the poster can be interpreted
as resulting from memory allocation errors (which usually
lead to various memory corruption failures).

I then modified the test program to add a sleep after each
1000 threads​:

#!/usr/bin/perl
use warnings;
use strict;
use threads;
use threads​::shared;
my $n : shared = 0;
while(1) {
  for (1..1000) {
  async {
  threads->self->detach;
  threads->yield;
  lock $n;
  threads->yield;
  print "thread $n\n";
  threads->yield;
  $n++;
  threads->yield;
  };
  threads->yield;
  }
  sleep(10);
}

(On my machine, it takes about 10 sec. to create and destroy
1000 threads. If I try 5 sec., for instance, Perl doesn't
have enough time to cleanup before the next loop, and
eventually the program fails, as before, on memory. Of
course, faster machines can get away with a shorter pause.)

This version runs virtually indefinitely (> 2,000,000
threads). (It dies on memory, too, at that point due to a
memory leak in Windows whereby each thread comsumes about
500 bytes that are never returned.)

Additionally, older versions of 'threads' did have issues
with memory leaks which would have compounded the poster's
problem. However, the lastest version does not suffer in
that regard. (Again, the leak I mentioned above is due to
Windows; not Perl/'threads'.)

Jerry D. Hedden <jdhedden AT cpan DOT org>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants