Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fork/system bug on FreeBSD #6115

Closed
p5pRT opened this issue Dec 3, 2002 · 18 comments
Closed

fork/system bug on FreeBSD #6115

p5pRT opened this issue Dec 3, 2002 · 18 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 3, 2002

Migrated from rt.perl.org#18849 (status was 'resolved')

Searchable as RT18849$

@p5pRT
Copy link
Author

p5pRT commented Dec 3, 2002

From alan@pair.com

Created by alan@pair.com

To​: perlbug@​perl.com
Subject​: fork/system bug on FreeBSD
Reply-To​: alan@​pair.com

This is a bug report for perl from alan@​pair.com,
generated with the help of perlbug 1.26 running under perl 5.00503.

-----------------------------------------------------------------

Organization

pair Networks, Inc.

Environment

System​: FreeBSD pair.com 4.5-STABLE FreeBSD 4.5-STABLE #8​: Mon Apr 15 10​:23​:48 EDT 2002 root@​pair.com​:/usr/src/sys/compile/PAIRk i386

This bug is known to happen consistently on the following FreeBSD / Perl
version combinations​:

FreeBSD 4.5-STABLE / Perl 5.8.0 (vfork enabled)
FreeBSD 4.5-STABLE / Perl 5.8.0 (vfork disabled)
FreeBSD 4.6-STABLE / Perl 5.6.0
FreeBSD 4.6-STABLE / Perl 5.6.1
FreeBSD 4.6-STABLE / Perl 5.005_03

The bug does not occur on the following FreeBSD / Perl combination​:

FreeBSD 2.2.7-STABLE / Perl 5.005_02
FreeBSD 2.2.7-STABLE / Perl 5.6.1

Description

  While using Perl in some newer versions of FreeBSD, if the SIGCHLD handler
is set to 'IGNORE' and a child process is forked, then all subsequent system()
calls made in the parent process will hang until the previously forked child
has exited.

How-To-Repeat

perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

Why it happens

I believe the problem lies in the use of the SA_NOCLDWAIT flag when calling
sigaction() to set the SIG_IGN handler for the SIGCHLD signal. In the 5.8.0
source, this seems to happen at least in util.c in Perl_rsignal_save​:

#ifdef SA_NOCLDWAIT
  if (signo == SIGCHLD && handler == (Sighandler_t)SIG_IGN)
  act.sa_flags |= SA_NOCLDWAIT;
#endif

The problem is, when you set SA_NOCLDWAIT, subsequent calls to wait() (or
wait4()) wait for All child processes to exit, not just the process ID
specified. Since Perl's system() calls wait4() on its recently forked child,
the system() call doesn't return until All of the perl process's children
exit. This is documented in FreeBSD's sigaction man page​:

  SA_NOCLDWAIT If this bit is set when calling sigaction() for the
  SIGCHLD signal, the system will not create zombie
  processes when children of the calling process
  exit. If the calling process subsequently issues a
  wait(2) (or equivalent), it blocks until all of the
  calling process's child processes terminate, and
  then returns a value of -1 with errno set to
  ECHILD.

The quick fix is to stop using SA_NOCLDWAIT when you ignore SIGCHLD. This may
create unwanted zombie processes, though. The better fix is probably not to
wait() at all if SIGCHLD is currently being ignored with SA_NOCLDWAIT.

Below is C code which compiles and runs on FreeBSD using gcc, and which
demonstrates the difference in behavior when SA_NOCLDWAIT is used and is
not used.

#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main (void) {
  int chld, stat;
  struct sigaction act;

  act.sa_handler = SIG_IGN;
  sigemptyset(&act.sa_mask);
  act.sa_flags = 0;

  sigaction(SIGCHLD, &act, 0);
  printf("fork 1​: no one will wait for me.\n");
  if (!fork()) {
  sleep(120);
  printf(" Fork 1 exiting\n");
  exit(0);
  }

  printf("fork 2\n");
  if (chld = fork()) {
  printf(" Parent​: waiting for fork 2 to exit\n");
  wait4(chld, &stat, 0, 0) == -1 && errno == EINTR;
  } else { /* child process */
  sleep(1);
  printf(" fork 2 exiting\n");
  exit(0);
  }
  printf(" Parent done with fork 2.\n");

  act.sa_handler = SIG_IGN;
  sigemptyset(&act.sa_mask);
  act.sa_flags = SA_NOCLDWAIT;
  sigaction(SIGCHLD, &act, 0);

  printf("fork 3\n");
  if (chld = fork()) {
  printf(" Parent​: waiting for fork 3 to exit\n");
  wait4(chld, &stat, 0, 0) == -1 && errno == EINTR;
  } else { /* child process */
  sleep(1);
  printf(" Fork 3 exiting\n");
  exit(0);
  }
  printf(" Parent done with fork 3.\n");

  printf("done\n");

  return 0;
}

Perl Info


Site configuration information for perl 5.00503:

Configured by markm at Sun Mar  5 13:39:27 SAST 2000.

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=freebsd, osvers=4.0-current, archname=i386-freebsd
    uname='FreeBSD freefall.FreeBSD.org 4.0-current FreeBSD 4.0-current #0: $Date$'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cc', optimize='undef', gccversion=2.95.2 19991024 (release)
    cppflags=''
    ccflags =''
    stdchar='char', d_stdstdio=undef, usevfork=true
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E -lperl -lm '
    libpth=/usr/lib
    libs=-lm -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.3
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-R/usr/lib'
    cccdlflags='-DPIC -fpic', lddlflags='-Wl,-E -shared -lperl -lm '

Locally applied patches:



@INC for perl 5.00503:
    /usr/libdata/perl/5.00503/mach
    /usr/libdata/perl/5.00503
    /usr/local/lib/perl5/site_perl/5.005/i386-freebsd
    /usr/local/lib/perl5/site_perl/5.005
    .


Environment for perl 5.00503:
    HOME=/usr/home/staff/alan
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/krb5/bin/:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/usr/home/staff/alan/bin
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/zsh


@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2003

From alan@pair.com

(Disclaimer​: This is my first patch submission. If there's a better
way for me to patch FreeBSD-dependant bugs in Perl than testing for
the __FreeBSD__ macro, please tell me what it is and I'll fix my
patches.)

As I described in bug [perl #18849] "fork/system bug on FreeBSD," code
like the following does the wrong thing on FreeBSD​:

perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

Instead of returning immediately, system()'s implicit wait4pid waits
for the forked child to exit before returning. I confirmed with the
FreeBSD folks that FreeBSD's behavior is correct (though possibly
unexpected), and that Perl is using SA_NOCLDWAIT incorrectly in
FreeBSD.

I have a few possible solutions for this problem.

The first, which I prefer if no one can find any problems with it, is
to stop using SA_NOCLDWAIT when perl sets SIGCHLD's handler to IGNORE.
This works because in FreeBSD, SIG_IGN is enough to prevent zombies
without the SA_NOCLDWAIT flag.

Here is the patch, against Perl 5.8.0's source​:

*** util.c.orig Wed Jun 11 12​:52​:49 2003
--- util.c Wed Jun 11 12​:59​:58 2003
***************
*** 2363,2369 ****
  act.sa_flags |= SA_RESTART; /* SVR4, 4.3+BSD */
  #endif
  #endif
! #ifdef SA_NOCLDWAIT
  if (signo == SIGCHLD && handler == (Sighandler_t)SIG_IGN)
  act.sa_flags |= SA_NOCLDWAIT;
  #endif
--- 2363,2369 ----
  act.sa_flags |= SA_RESTART; /* SVR4, 4.3+BSD */
  #endif
  #endif
! #if defined(SA_NOCLDWAIT) && !defined(__FreeBSD__)
  if (signo == SIGCHLD && handler == (Sighandler_t)SIG_IGN)
  act.sa_flags |= SA_NOCLDWAIT;
  #endif
***************
*** 2403,2409 ****
  act.sa_flags |= SA_RESTART; /* SVR4, 4.3+BSD */
  #endif
  #endif
! #ifdef SA_NOCLDWAIT
  if (signo == SIGCHLD && handler == (Sighandler_t)SIG_IGN)
  act.sa_flags |= SA_NOCLDWAIT;
  #endif
--- 2403,2409 ----
  act.sa_flags |= SA_RESTART; /* SVR4, 4.3+BSD */
  #endif
  #endif
! #if defined(SA_NOCLDWAIT) && !defined(__FreeBSD__)
  if (signo == SIGCHLD && handler == (Sighandler_t)SIG_IGN)
  act.sa_flags |= SA_NOCLDWAIT;
  #endif

The other option is to block SIGCHLD whenever Perl does a wait()
implicitly​: for example, in pp_sys.c's PP(pp_system), in Perl's
close() for pipes, and so on. This patch would need to be applied to
potentially widely disparate parts of Perl, wherever it implictly
calls wait().

Furthermore, it would not fix the case where a Perl script sets
$SIG{CHLD} to IGNORE, and then calls wait() explicitly. In that case,
the behavior would be as FreeBSD's man page describes​: wait() will
wait until All child processes exit, and then return an error.

An example of this patch, applied to 5.8.0's pp_sys.c in PP(pp_system)
only​:

*** pp_sys.c.orig Wed Jun 11 11​:54​:00 2003
--- pp_sys.c.new Wed Jun 11 14​:02​:32 2003
***************
*** 4089,4106 ****
  sleep(5);
  }
  if (childpid > 0) {
! Sigsave_t ihand,qhand; /* place to save signals during system() */
  int status;

  if (did_pipes)
  PerlLIO_close(pp[1]);
  #ifndef PERL_MICRO
  rsignal_save(SIGINT, SIG_IGN, &ihand);
  rsignal_save(SIGQUIT, SIG_IGN, &qhand);
  #endif
  do {
  result = wait4pid(childpid, &status, 0);
  } while (result == -1 && errno == EINTR);
  #ifndef PERL_MICRO
  (void)rsignal_restore(SIGINT, &ihand);
  (void)rsignal_restore(SIGQUIT, &qhand);
--- 4089,4123 ----
  sleep(5);
  }
  if (childpid > 0) {
! Sigsave_t ihand,qhand,chand; /* place to save signals during system\
() */
  int status;

+ #if defined(SA_NOCLDWAIT) && defined(__FreeBSD__)
+ sigset_t block, oblock;
+ #endif
+
  if (did_pipes)
  PerlLIO_close(pp[1]);
  #ifndef PERL_MICRO
  rsignal_save(SIGINT, SIG_IGN, &ihand);
  rsignal_save(SIGQUIT, SIG_IGN, &qhand);
  #endif
+ #if defined(SA_NOCLDWAIT) && defined(__FreeBSD__)
+ /* Block SIGCHLD so we won't wait for All childrenm */
+ sigemptyset(&block);
+ sigaddset(&block, SIGCHLD);
+ sigprocmask(SIG_BLOCK, &block, &oblock);
+ rsignal_save(SIGCHLD, SIG_DFL, &chand);
+ #endif
  do {
  result = wait4pid(childpid, &status, 0);
  } while (result == -1 && errno == EINTR);
+
+ #if defined(SA_NOCLDWAIT) && defined(__FreeBSD__)
+ (void)rsignal_restore(SIGCHLD, &chand);
+ sigprocmask(SIG_UNBLOCK, &oblock, 0);
+ #endif
+
  #ifndef PERL_MICRO
  (void)rsignal_restore(SIGINT, &ihand);
  (void)rsignal_restore(SIGQUIT, &qhand);

Both of these patches fix the minimal failing test case I described
above, but I prefer the first one because it fixes a larger set of
potential failing cases with one patch instead of many patches.

Thanks for listening. I appreciate constructive comments on these
patches.

Alan Ferrency
alan@​pair.com
pair Networks Inc.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2003

From @rgs

alan wrote​:

As I described in bug [perl #18849] "fork/system bug on FreeBSD," code
like the following does the wrong thing on FreeBSD​:

perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

Instead of returning immediately, system()'s implicit wait4pid waits
for the forked child to exit before returning. I confirmed with the
FreeBSD folks that FreeBSD's behavior is correct (though possibly
unexpected), and that Perl is using SA_NOCLDWAIT incorrectly in
FreeBSD.

I have a few possible solutions for this problem.

The first, which I prefer if no one can find any problems with it, is
to stop using SA_NOCLDWAIT when perl sets SIGCHLD's handler to IGNORE.
This works because in FreeBSD, SIG_IGN is enough to prevent zombies
without the SA_NOCLDWAIT flag.

I personally prefer this first patch.
What about the other BSDs ?

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From alan@pair.com

On Wed, 11 Jun 2003, Rafael Garcia-Suarez wrote​:

alan wrote​:

As I described in bug [perl #18849] "fork/system bug on FreeBSD," code
like the following does the wrong thing on FreeBSD​:

perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

Instead of returning immediately, system()'s implicit wait4pid waits
for the forked child to exit before returning. I confirmed with the
FreeBSD folks that FreeBSD's behavior is correct (though possibly
unexpected), and that Perl is using SA_NOCLDWAIT incorrectly in
FreeBSD.

I have a few possible solutions for this problem.

The first, which I prefer if no one can find any problems with it, is
to stop using SA_NOCLDWAIT when perl sets SIGCHLD's handler to IGNORE.
This works because in FreeBSD, SIG_IGN is enough to prevent zombies
without the SA_NOCLDWAIT flag.

I personally prefer this first patch.
What about the other BSDs ?

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,
then a better patch would obviously be to stop using SA_NOCLDWAIT
completely instead of conditionally compiling it out of only FreeBSD.

If you have another BSD, try this test​:
perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just
read the man page... :)

Alan Ferrency

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From @jhi

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,

Both Linux and Solaris seem to have SA_NOCLDWAIT but I get no long
sleep in there (100 repetitions of your test).

then a better patch would obviously be to stop using SA_NOCLDWAIT
completely instead of conditionally compiling it out of only FreeBSD.

If you have another BSD, try this test​:
perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just

In Mac OS X / Darwin I get a long sleep in about every 5th-10th attempt.
So maybe it's a BSD thing.

read the man page... :)

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From alan@pair.com

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,

Both Linux and Solaris seem to have SA_NOCLDWAIT but I get no long
sleep in there (100 repetitions of your test).

Thanks. I didn't know which systems used SA_NOCLDWAIT and which didn't.

then a better patch would obviously be to stop using SA_NOCLDWAIT
completely instead of conditionally compiling it out of only FreeBSD.

If you have another BSD, try this test​:
perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just

In Mac OS X / Darwin I get a long sleep in about every 5th-10th attempt.
So maybe it's a BSD thing.

I had more consistently failing results when I tried this on Mac OS X.
I had forgotten I tried that, I should go repeat that test.

Is there a handy macro to detect "BSD" without enumerating the
specific failing OS's?

Thanks for the test points.

Alan

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From alan@pair.com

Can you see if this is in your Solaris 'man sigaction' man page?

  SA_NOCLDWAIT
  If set and sig equals SIGCHLD, the system will not create zombie
  processes when children of the calling process exit. If the call-
  ing process subsequently issues a wait(2), it blocks until all of
  the calling process's child processes terminate, and then returns
  -1 with errno set to ECHILD.

I found it in a SunOS 5.9 man page online. This is in line with what
FreeBSD does, but doesn't match the behavior of the test script. I
wonder if I checked an outdated man page, or if the man page is
inaccurate, or if there has already been a patch in Perl for some OS's
with this behavior.

(the key here is "...until ALL the calling process's child processes
terminate ...")

Thanks,

Alan

On Thu, 12 Jun 2003, alan wrote​:

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,

Both Linux and Solaris seem to have SA_NOCLDWAIT but I get no long
sleep in there (100 repetitions of your test).

Thanks. I didn't know which systems used SA_NOCLDWAIT and which didn't.

then a better patch would obviously be to stop using SA_NOCLDWAIT
completely instead of conditionally compiling it out of only FreeBSD.

If you have another BSD, try this test​:
perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo chamber";'

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just

In Mac OS X / Darwin I get a long sleep in about every 5th-10th attempt.
So maybe it's a BSD thing.

I had more consistently failing results when I tried this on Mac OS X.
I had forgotten I tried that, I should go repeat that test.

Is there a handy macro to detect "BSD" without enumerating the
specific failing OS's?

Thanks for the test points.

Alan

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From enache@rdslink.ro

On Thu, Jun 12, 2003 at 06​:01​:02PM +0300, Jarkko Hietaniemi wrote​:

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,

Both Linux and Solaris seem to have SA_NOCLDWAIT but I get no long
sleep in there (100 repetitions of your test).

$ grep -ri SA_NOCLDWAIT /usr/include
/usr/include/asm/signal.h​: * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
/usr/include/asm/signal.h​:#define SA_NOCLDWAIT 0x00000002 /* not supported yet
*/
/usr/include/bits/sigaction.h​:#define SA_NOCLDWAIT 2 /* Don't create zombie on child death. */

notice the "not supported yet" -
That's on a RH8 box (linux 2.4.18)

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just

In Mac OS X / Darwin I get a long sleep in about every 5th-10th attempt.
So maybe it's a BSD thing.

From SusV3​:

  SA_NOCLDWAIT
  [XSI] [Option Start] If set, and sig equals SIGCHLD, child
  processes of the calling processes shall not be transformed
  into zombie processes when they terminate. If the calling
  process subsequently waits for its children, and the process
  has no unwaited-for children that were transformed into zombie
  processes, it shall block until all of its children terminate,
  and wait(), waitid(), and waitpid() shall fail and set errno to
  [ECHILD]. Otherwise, terminating child processes shall be
  transformed into zombie processes, unless SIGCHLD is set to
  SIG_IGN. [Option End]

[functions/sigaction.html]

Regards,
Adi

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From @jhi

On Thu, Jun 12, 2003 at 10​:37​:28PM +0300, Enache Adrian wrote​:

...
/usr/include/asm/signal.h​:#define SA_NOCLDWAIT 0x00000002 /* not supported yet
*/
/usr/include/bits/sigaction.h​:#define SA_NOCLDWAIT 2 /* Don't create zombie on child death. */

notice the "not supported yet" -
That's on a RH8 box (linux 2.4.18)

If the SA_NOCLDWAIT is not supported, setting it cannot screw things
up, right? :-)

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From @jhi

Solaris 8 sigaction(2) says​:

  SA_NOCLDWAIT
  If set and sig equals SIGCHLD, the system will not
  create zombie processes when children of the calling
  process exit. If the calling process subsequently
  issues a wait(2), it blocks until all of the calling
  process's child processes terminate, and then returns
  -1 with errno set to ECHILD.

Tru64 (5.1B) also has SA_NOCLDWAIT and​:

  SA_NOCLDWAIT
  [XSH4.2] If this bit is set and the signal parameter is equal
  to SIGCHLD, zombie processes are not created by the system when
  a child process of the calling process exits. If a wait(),
  waitid(), waitpid(), or wait3() call is subsequently issued by
  the calling process, it blocks until all of its child processes
  terminate. The call then returns a value of -1 and errno is
  set to [ECHILD] to indicate the error. Note​: when this flag is
  set, exiting child processes do not send SIGCHLD signals to the
  parent.

and no long sleeps with 100 repeats.

Open Group's Single UNIX Specification
(http​://www.opengroup.org/onlinepubs/007908799/xsh/sigaction.html)
says​:

SA_NOCLDWAIT
If set, and sig equals SIGCHLD, child processes of the calling
processes will not be transformed into zombie processes when they
terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD]. Otherwise, terminating child processes
will be transformed into zombie processes, unless SIGCHLD is set to
SIG_IGN.

P.S. It seems that now in all of Linux, Solaris, and Tru64 I do have
about one hundred sleeping Perls :-)

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From alan@pair.com

Thanks. Now I'm even more confused :)

All of these man pages say the same thing that FreeBSD's man page
says​: if SA_NOCLDWAIT is in force, and you wait(), it'll wait for All
children to exit. I wonder why FreeBSD seems to be the only OS which
exhibits this behavior in Perl?

In the original bug report, I sent a short C program which, in
FreeBSD, duplicates the behavior I'm seeing in Perl. Without
SA_NOCLDWAIT it exits quickly; with SA_NOCLDWAIT it exits in 120
seconds, instead.

If you have Even More Time To Kill, can you see if this compiles on
something other than FreeBSD, and if so, what it does for you?

Thanks again,

Alan Ferrency

On Thu, 12 Jun 2003, Jarkko Hietaniemi wrote​:

Solaris 8 sigaction(2) says​:

 SA\_NOCLDWAIT
       If set and sig equals  SIGCHLD\, the  system  will  not
       create  zombie  processes when children of the calling
       process exit\.  If  the  calling  process  subsequently
       issues  a  wait\(2\)\, it blocks until all of the calling
       process's child processes terminate\, and then  returns
       \-1 with errno set to ECHILD\.

Tru64 (5.1B) also has SA_NOCLDWAIT and​:

      SA\_NOCLDWAIT
          \[XSH4\.2\]  If this bit is set and the signal parameter is equal
          to SIGCHLD\, zombie processes are not created by the system when
          a child process of the calling process exits\.  If a wait\(\)\,
          waitid\(\)\, waitpid\(\)\, or wait3\(\) call is subsequently issued by
          the calling process\, it blocks until all of its child processes
          terminate\.  The call then returns a value of \-1 and errno is
          set to \[ECHILD\] to indicate the error\. Note&#8203;: when this flag is
          set\, exiting child processes do not send SIGCHLD signals to the
          parent\.

and no long sleeps with 100 repeats.

Open Group's Single UNIX Specification
(http​://www.opengroup.org/onlinepubs/007908799/xsh/sigaction.html)
says​:

SA_NOCLDWAIT
If set, and sig equals SIGCHLD, child processes of the calling
processes will not be transformed into zombie processes when they
terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD]. Otherwise, terminating child processes
will be transformed into zombie processes, unless SIGCHLD is set to
SIG_IGN.

P.S. It seems that now in all of Linux, Solaris, and Tru64 I do have
about one hundred sleeping Perls :-)

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2003

From @jhi

On Thu, Jun 12, 2003 at 12​:57​:42PM -0400, alan wrote​:

Thanks. Now I'm even more confused :)

That's what I'm here for :-)

All of these man pages say the same thing that FreeBSD's man page
says​: if SA_NOCLDWAIT is in force, and you wait(), it'll wait for All
children to exit. I wonder why FreeBSD seems to be the only OS which
exhibits this behavior in Perl?

In the original bug report, I sent a short C program which, in
FreeBSD, duplicates the behavior I'm seeing in Perl. Without
SA_NOCLDWAIT it exits quickly; with SA_NOCLDWAIT it exits in 120
seconds, instead.

If you have Even More Time To Kill, can you see if this compiles on
something other than FreeBSD, and if so, what it does for you?

Compiled and tested in​:

Tru64/alpha 5.1B​: runs quickly to completion
Solaris/sparc 8​: runs quickly to completion
Debian/x86 3.0​: runs quickly to completion
AIX/ppc 4.3.1.0​: runs quickly to completion (1)
IRIX/mips 5.1​: runs quickly to completion (2) (3)
MacOSX/10.2.6​: the 2-minute wait

(1) "If SA_NOCLDWAIT is set, and sig equals SIGCHLD, child processes of
the calling processes will not be transformed into zombie processes
when they terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait, wait3, waitid and waitpid will fail and
set errno to ECHILD. Otherwise, terminating child processes will be
transformed into zombie processes, unless SIGCHLD is set to SIG_IGN."

(2) "SA_NOCLDWAIT If set and sig equals SIGCHLD, the system will not
  create zombie processes when children of the calling
  process exit. If the calling process subsequently
  issues a wait(2), it blocks until all of the calling
  process's child processes terminate, and then returns a
  value of -1 with errno set to ECHILD."

(3) No wait4() in IRIX, waitpid() used instead.

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2003

From abe@ztreet.demon.nl

Op een zonnige lentedag (Thursday 12 June 2003 15​:45), schreef alan​:

On Wed, 11 Jun 2003, Rafael Garcia-Suarez wrote​:
[snip]

What about the other BSDs ?

I don't have easy access to machines running other OS's which have
SA_NOCLDWAIT. If SA_NOCLDWAIT works the same way on all other OS's,
then a better patch would obviously be to stop using SA_NOCLDWAIT
completely instead of conditionally compiling it out of only FreeBSD.

If you have another BSD, try this test​:
perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 0xbeef; system "echo
chamber";'

If it returns immediately, your system isn't vulnerable to the bug.
If it waits a Long Time before returning, most likely your
SA_NOCLDWAIT behaves like FreeBSD's. (I guess you could also just
read the man page... :)

It looks like it's a *BSD thing (both on i386).

This is OpenBSD 3.2​:
  abeltje@​ayla (Fri Jun 13 12​:15​:50)
  ~$ bleadperl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 120; system "echo chamber";'
  chamber
  chamber
  abeltje@​ayla (Fri Jun 13 12​:17​:57)

And this is NetBSD 1.5​:
abeltje@​W (Fri Jun 13 12​:29​:08)
/usr/bleadperl/perl-current$ ./perl -e '$SIG{CHLD} = "IGNORE"; fork or sleep 120; system "echo chamber";'
chamber
chamber
abeltje@​W (Fri Jun 13 12​:31​:19)

Good luck,

Abe
--
Schwern> Anything else you'd like? Side order of fries? Clean your stables?
Schwern> Get you an apple? Part the Red Sea?

I guess I otherwise would sense some sarcasm in your voice but
unfortunately my sarcasm-o-meter burned out years ago from prolonged
exposure to myself.
  -- Jarkko Hietaniemi on p5p @​ 2002-02-03

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2003

From alan@pair.com

On 12 Jun 2003, Jarkko Hietaniemi wrote​:

On Thu, Jun 12, 2003 at 12​:57​:42PM -0400, alan wrote​:

Thanks. Now I'm even more confused :)

That's what I'm here for :-)

As far as I can tell from reading all the man page snippets you've
sent, the expected behavior is the same everywhere (except possibly
for the specific varieties of wait() call which will wait for all
children). I don't have any good ideas on why versions other than BSD
don't fail (I can't believe every OS has left it unimplemented as RH
did).

I'm hesitatant to suggest removing SA_NOCLDWAIT completely, for fear
of creating zombies on random platforms. So, if there's a convenient
way to identify the affected BSD systems with macros, my patch
suggestion is still to tweak util.c to stop using SA_NOCLDWAIT on
platforms where it has been identified as a problem.

Thanks,

Alan Ferrency

All of these man pages say the same thing that FreeBSD's man page
says​: if SA_NOCLDWAIT is in force, and you wait(), it'll wait for All
children to exit. I wonder why FreeBSD seems to be the only OS which
exhibits this behavior in Perl?

In the original bug report, I sent a short C program which, in
FreeBSD, duplicates the behavior I'm seeing in Perl. Without
SA_NOCLDWAIT it exits quickly; with SA_NOCLDWAIT it exits in 120
seconds, instead.

If you have Even More Time To Kill, can you see if this compiles on
something other than FreeBSD, and if so, what it does for you?

Compiled and tested in​:

Tru64/alpha 5.1B​: runs quickly to completion
Solaris/sparc 8​: runs quickly to completion
Debian/x86 3.0​: runs quickly to completion
AIX/ppc 4.3.1.0​: runs quickly to completion (1)
IRIX/mips 5.1​: runs quickly to completion (2) (3)
MacOSX/10.2.6​: the 2-minute wait

(1) "If SA_NOCLDWAIT is set, and sig equals SIGCHLD, child processes of
the calling processes will not be transformed into zombie processes
when they terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait, wait3, waitid and waitpid will fail and
set errno to ECHILD. Otherwise, terminating child processes will be
transformed into zombie processes, unless SIGCHLD is set to SIG_IGN."

(2) "SA_NOCLDWAIT If set and sig equals SIGCHLD, the system will not
create zombie processes when children of the calling
process exit. If the calling process subsequently
issues a wait(2), it blocks until all of the calling
process's child processes terminate, and then returns a
value of -1 with errno set to ECHILD."

(3) No wait4() in IRIX, waitpid() used instead.

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Sep 28, 2012

From @jkeenan

On Fri Jun 13 07​:04​:22 2003, alan wrote​:

On 12 Jun 2003, Jarkko Hietaniemi wrote​:

On Thu, Jun 12, 2003 at 12​:57​:42PM -0400, alan wrote​:

Thanks. Now I'm even more confused :)

That's what I'm here for :-)

As far as I can tell from reading all the man page snippets you've
sent, the expected behavior is the same everywhere (except possibly
for the specific varieties of wait() call which will wait for all
children). I don't have any good ideas on why versions other than BSD
don't fail (I can't believe every OS has left it unimplemented as RH
did).

I'm hesitatant to suggest removing SA_NOCLDWAIT completely, for fear
of creating zombies on random platforms. So, if there's a convenient
way to identify the affected BSD systems with macros, my patch
suggestion is still to tweak util.c to stop using SA_NOCLDWAIT on
platforms where it has been identified as a problem.

Thanks,

Alan Ferrency

All of these man pages say the same thing that FreeBSD's man page
says​: if SA_NOCLDWAIT is in force, and you wait(), it'll wait for
All
children to exit. I wonder why FreeBSD seems to be the only OS
which
exhibits this behavior in Perl?

In the original bug report, I sent a short C program which, in
FreeBSD, duplicates the behavior I'm seeing in Perl. Without
SA_NOCLDWAIT it exits quickly; with SA_NOCLDWAIT it exits in 120
seconds, instead.

If you have Even More Time To Kill, can you see if this compiles
on
something other than FreeBSD, and if so, what it does for you?

Compiled and tested in​:

Tru64/alpha 5.1B​: runs quickly to completion
Solaris/sparc 8​: runs quickly to completion
Debian/x86 3.0​: runs quickly to completion
AIX/ppc 4.3.1.0​: runs quickly to completion (1)
IRIX/mips 5.1​: runs quickly to completion (2) (3)
MacOSX/10.2.6​: the 2-minute wait

(1) "If SA_NOCLDWAIT is set, and sig equals SIGCHLD, child processes
of
the calling processes will not be transformed into zombie processes
when they terminate. If the calling process subsequently waits for
its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait, wait3, waitid and waitpid will fail
and
set errno to ECHILD. Otherwise, terminating child processes will be
transformed into zombie processes, unless SIGCHLD is set to
SIG_IGN."

(2) "SA_NOCLDWAIT If set and sig equals SIGCHLD, the system
will not
create zombie processes when children of the
calling
process exit. If the calling process
subsequently
issues a wait(2), it blocks until all of the
calling
process's child processes terminate, and
then returns a
value of -1 with errno set to ECHILD."

(3) No wait4() in IRIX, waitpid() used instead.

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this
special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

This RT has been collecting dust for more than nine years. Are there
any BSD experts out there who could review the discussion and make a
recommendation?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

From @jkeenan

On Thu Sep 27 18​:41​:23 2012, jkeenan wrote​:

This RT has been collecting dust for more than nine years. Are there
any BSD experts out there who could review the discussion and make a
recommendation?

Another call for BSD-knowledgeable people to look at this ticket!

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

From @jkeenan

On Mon Jan 14 16​:56​:12 2013, jkeenan wrote​:

On Thu Sep 27 18​:41​:23 2012, jkeenan wrote​:

This RT has been collecting dust for more than nine years. Are there
any BSD experts out there who could review the discussion and make a
recommendation?

Another call for BSD-knowledgeable people to look at this ticket!

As reported in another ticket​:

#####
This version of FreeBSD was EOL'd at least five years ago​:
http​://www.freebsd.org/security/#unsup . One of the most major changes
between the 4.x branch and current versions of FreeBSD is that there's a
new scheduler, which would make reproducing this bug quite unlikely.
Indeed, I cannot reproduce it on my ports perl 5.14.2.

--
Chris Nehren
Shadowcat Alumnus
#####

On that basis, I am closing this ticket.

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

@jkeenan - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant