Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vfork should be used for spawning external processes #15355

Open
p5pRT opened this issue May 23, 2016 · 15 comments · May be fixed by #19339
Open

vfork should be used for spawning external processes #15355

p5pRT opened this issue May 23, 2016 · 15 comments · May be fixed by #19339

Comments

@p5pRT
Copy link

p5pRT commented May 23, 2016

Migrated from rt.perl.org#128227 (status was 'open')

Searchable as RT128227$

@p5pRT
Copy link
Author

p5pRT commented May 23, 2016

From e@80x24.org

This is a bug report for perl from e@​80x24.org,
generated with the help of perlbug 1.40 running under perl 5.24.0.


Perl currently uses fork + exec for spawning processes with
system(), pipe open(), and `backtick` operators.

Despite the implementation of copy-on-write (CoW) under Linux,
fork performance degrades as process size grows. vfork avoids
the issue by pausing the parent thread and sharing the heap
until execve() or _exit() is called.

In my attached example (vfork.perl) using Inline​::C, it only takes
around 3 seconds to vfork+execve+waitpid "/bin/true" 10000 times
on my system. With the Perl CORE​::system() function, it takes
11-12 seconds(!).

This is with a 10MB string in memory. Increasing the size of the
$mem string in the attached script degrades performance further.

Keep in mind vfork is tricky to use and some of the caveats
are documented at​: https://ewontfix.com/7/

The mainline Ruby implementation has been using vfork since
Ruby 2.2 (released December 2014) and AFAIK we have not had
major problems related to it; even in multi-threaded (pthreads)
environments.

Disclaimer​: I'm a member of the ruby-core team, but I probably
use Perl more than Ruby :)

Unfortunately my knowledge of Perl internals is weak at this
point, but I will try my best to help answer questions related
to implementing vfork.

Thank you for the many years of Perl 5!



Flags​:
  category=core
  severity=wishlist


Site configuration information for perl 5.24.0​:

Configured by ew at Mon May 23 22​:06​:43 UTC 2016.

Summary of my perl5 (revision 5 version 24 subversion 0) configuration​:
  Commit id​: be2c0c6
  Platform​:
  osname=linux, osvers=4.5.3-x86_64-linode67, archname=x86_64-linux-64int
  uname='linux dcvr 4.5.3-x86_64-linode67 #3 smp tue may 10 10​:22​:44 edt 2016 x86_64 gnulinux '
  config_args='-des -Dprefix=/home/eee/p -Duse64bitint'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
  optimize='-O2',
  cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
  ccversion='', gccversion='4.9.2', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678, doublekind=3
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12, longdblkind=3
  ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/i586-linux-gnu/4.9/include-fixed /usr/include/i386-linux-gnu /usr/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib
  libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
  perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
  libc=libc-2.19.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.19'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


@​INC for perl 5.24.0​:
  /home/eee/p/lib/perl5/site_perl/5.24.0/x86_64-linux-64int
  /home/eee/p/lib/perl5/site_perl/5.24.0
  /home/eee/p/lib/perl5/5.24.0/x86_64-linux-64int
  /home/eee/p/lib/perl5/5.24.0
  .


Environment for perl 5.24.0​:
  HOME=/home/eee
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/eee/p/bin​:/usr/bin​:/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 23, 2016

From e@80x24.org

vfork.perl
# Copyright 2016 Eric Wong <e@80x24.org>
# licensed under the same terms as Perl itself

# The following example shows the advantage of vfork over fork for
# spawning processes under Linux.  Keep in mind vfork is tricky
# and some of the caveats are documented at: https://ewontfix.com/7/

use strict;
use warnings;
use Time::HiRes qw(gettimeofday tv_interval);
use Inline C => <<'VFORK_SPAWN';
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <alloca.h>
#include <signal.h>
#include <assert.h>

#define AV_ALLOCA(av, max) alloca((max = (av_len((av)) + 1)) * sizeof(char *))

static void av2c_copy(char **dst, AV *src, I32 max)
{
	I32 i;

	for (i = 0; i < max; i++) {
		SV **sv = av_fetch(src, i, 0);
		dst[i] = sv ? SvPV_nolen(*sv) : 0;
	}
	dst[max] = 0;
}

static void *deconst(const char *s)
{
	union { const char *in; void *out; } u;
	u.in = s;
	return u.out;
}

/* needs to be safe inside a vfork'ed process */
static void xerr(const char *msg)
{
	struct iovec iov[3];
	const char *err = strerror(errno); /* should be safe in practice */

	iov[0].iov_base = deconst(msg);
	iov[0].iov_len = strlen(msg);
	iov[1].iov_base = deconst(err);
	iov[1].iov_len = strlen(err);
	iov[2].iov_base = deconst("\n");
	iov[2].iov_len = 1;
	writev(2, iov, 3);
	_exit(1);
}

#define REDIR(var,fd) do { \
	if (var != fd && dup2(var, fd) < 0) \
		xerr("error redirecting std"#var ": "); \
} while (0)

/* this does not support arbitrary redirects, yet, just std{in,out,err} */
int vfork_spawn(int in, int out, int err, SV *file, SV *cmdref, SV *envref)
{
	AV *cmd = (AV *)SvRV(cmdref);
	AV *env = (AV *)SvRV(envref);
	const char *filename = SvPV_nolen(file);
	pid_t pid;
	char **argv, **envp;
	I32 max;
	sigset_t set, old;
	int ret;

	argv = AV_ALLOCA(cmd, max);
	av2c_copy(argv, cmd, max);

	envp = AV_ALLOCA(env, max);
	av2c_copy(envp, env, max);

	ret = sigfillset(&set);
	assert(ret == 0 && "BUG calling sigfillset");

	/*
	 * XXX: not thread-safe, use pthread_sigmask instead of sigprocmask
	 * if using pthreads
	 */
	ret = sigprocmask(SIG_SETMASK, &set, &old);
	assert(ret == 0 && "BUG calling sigprocmask to block");
	pid = vfork();
	if (pid == 0) {
		int sig;

		REDIR(in, 0);
		REDIR(out, 1);
		REDIR(err, 2);
		for (sig = 1; sig < NSIG; sig++)
			signal(sig, SIG_DFL); /* ignore errors on signals */
		ret = sigprocmask(SIG_SETMASK, &old, NULL);
		if (ret != 0)
			 xerr("sigprocmask failed in vfork child");
		execve(filename, argv, envp);
		xerr("execve failed");
	}
	ret = sigprocmask(SIG_SETMASK, &old, NULL);
	assert(ret == 0 && "BUG calling sigprocmask to restore");

	return (int)pid;
}
VFORK_SPAWN

# The above C code expects env to be an array for execve(2)
my @env = map { "$_=$ENV{$_}" } keys %ENV;
my $nr = 10000; # iterations

# Under the Linux kernel, vfork performance remains stable
# as parent process size grows:
my $mem = 'x' x (1024 * 1024 * 10);
my $t0;

$t0 = [gettimeofday];
foreach (1..$nr) {
	my $pid = vfork_spawn(0, 1, 2, '/bin/true', [ 'true' ], \@env);
	waitpid($pid, 0);
}
printf "vfork: %0.6f\n", tv_interval($t0, [gettimeofday]);

$t0 = [gettimeofday];
foreach (1..$nr) {
	system('/bin/true');
}
printf "system: %0.6f\n", tv_interval($t0, [gettimeofday]);

@p5pRT
Copy link
Author

p5pRT commented May 24, 2016

From zefram@fysh.org

via RT wrote​:

In my attached example (vfork.perl) using Inline​::C, it only takes
around 3 seconds to vfork+execve+waitpid "/bin/true" 10000 times
on my system. With the Perl CORE​::system() function, it takes
11-12 seconds(!).

This is not a fair comparison. You should compare standard CORE​::system()
against an equivalent implementation of the system op that uses vfork
instead of fork, because that's the change that you're proposing we
should make.

-zefram

@p5pRT
Copy link
Author

p5pRT commented May 24, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 24, 2016

From e@80x24.org

Zefram via RT <perlbug-followup@​perl.org> wrote​:

via RT wrote​:

In my attached example (vfork.perl) using Inline​::C, it only takes
around 3 seconds to vfork+execve+waitpid "/bin/true" 10000 times
on my system. With the Perl CORE​::system() function, it takes
11-12 seconds(!).

This is not a fair comparison. You should compare standard CORE​::system()
against an equivalent implementation of the system op that uses vfork
instead of fork, because that's the change that you're proposing we
should make.

Oops, missed the pipe creation and probably a few other things
system() does.

I tried dumbly swapping out fork() for vfork() in the perl
sources but of course that doesn't work, yet​: the child process
is modifying its heap in several places and probably doing other
vfork-incompatible things.

I still haven't had time to digest and learn much about the
perl internals; but I figured I'd start with a wishlist
report to get things started.

Anyways, attached is a tiny standalone C program with most
error-checking omitted. I've only tested it on Debian GNU/Linux
but hopefully it runs on most POSIX-like systems with vfork.

$ gcc -o vfork-test -Wall -O2 vfork-test.c

$ time ./vfork-test : normal fork()

real 0m6.444s
user 0m0.023s
sys 0m1.713s

$ time ./vfork-test vfork

real 0m2.725s
user 0m0.013s
sys 0m0.243s

That's only with 10M malloc-ed, increasing the malloc-ed size
will show vfork performance remains stable as process size
increases.

Anyways, I hope the above numbers are convincing enough to have
somebody more familiar than I to change Perl to use vfork
instead of fork whenever possible.

Thank you.

@p5pRT
Copy link
Author

p5pRT commented May 24, 2016

From e@80x24.org

/* gcc -o vfork-test -Wall -O2 vfork-test.c */
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[], char *envp[])
{
	int i;
	int do_vfork = argc > 1 && !strcmp(argv[1], "vfork");
	char * const cmd[] = { "/bin/true", 0 };
	size_t n = 1024 * 1024 * 10;
	char *mem = malloc(n);

	memset(mem, 'a', n); /* make sure it's really allocated */

	for (i = 0; i < 10000; i++) {
		pid_t pid = do_vfork ? vfork() : fork();

		if (pid == 0) {
			execve(cmd[0], cmd, envp);
			write(2, "exec error\n", 11);
			_exit(1);
		}
		waitpid(pid, 0, 0);
	}
	return 0;
}

@p5pRT
Copy link
Author

p5pRT commented May 25, 2016

From @Leont

On Tue, May 24, 2016 at 1​:54 AM, via RT <perlbug-followup@​perl.org> wrote​:

Perl currently uses fork + exec for spawning processes with
system(), pipe open(), and `backtick` operators.

Despite the implementation of copy-on-write (CoW) under Linux,
fork performance degrades as process size grows. vfork avoids
the issue by pausing the parent thread and sharing the heap
until execve() or _exit() is called.

In my attached example (vfork.perl) using Inline​::C, it only takes
around 3 seconds to vfork+execve+waitpid "/bin/true" 10000 times
on my system. With the Perl CORE​::system() function, it takes
11-12 seconds(!).

This is with a 10MB string in memory. Increasing the size of the
$mem string in the attached script degrades performance further.

Keep in mind vfork is tricky to use and some of the caveats
are documented at​: https://ewontfix.com/7/

The mainline Ruby implementation has been using vfork since
Ruby 2.2 (released December 2014) and AFAIK we have not had
major problems related to it; even in multi-threaded (pthreads)
environments.

Disclaimer​: I'm a member of the ruby-core team, but I probably
use Perl more than Ruby :)

Unfortunately my knowledge of Perl internals is weak at this
point, but I will try my best to help answer questions related
to implementing vfork.

Thank you for the many years of Perl 5!

It appears we removed vfork support 15 years ago in 52e18b1. I think the
concept of it is out-of-date, and using it correctly is fickle. I don't
think it's a good idea.

It may actually be sane to implement system on top of posix_spawn though
(which can use vfork internally). That at least can be reasoned about, and
seems to generally do what we want (I've never actually used it, there may
be pitfalls that aren't obvious yet).

Not sure either option is really worth the development effort though, any
program where fork is a serious bottleneck is likely to be a mess anyway
unless you're writing a shell or make.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 25, 2016

From vano@mail.mipt.ru

Configure has a question about whether to use vfork(), warning that
"perl can only use vfork() that doesn't suffer from severe limitations".

If you're going down this road, you should start your research by
exploring this lead.

Besides, posix_spawn() is the current POSIX standard that is intended to
replace vfork() that apparently didn't do well enough in practice.

@p5pRT
Copy link
Author

p5pRT commented May 26, 2016

From e@80x24.org

Leon Timmermans <fawaka@​gmail.com> wrote​:

It appears we removed vfork support 15 years ago in 52e18b1. I think the
concept of it is out-of-date, and using it correctly is fickle. I don't
think it's a good idea.

Sadly, in the past 15 years process sizes have gotten bigger
and make it more expensive to fork.

It may actually be sane to implement system on top of posix_spawn though
(which can use vfork internally). That at least can be reasoned about, and
seems to generally do what we want (I've never actually used it, there may
be pitfalls that aren't obvious yet).

Good point, I'll give posix_spawn a shot if nobody beats me to
it. It could be a few months or even years for me, though.

Ruby doesn't use posix_spawn since it needs to support
chdir/rlimits/umask and a bunch of other things in its generic
Process.spawn API (which system() shares code with);
but Perl doesn't do those things, so I suppose it's fine for
Perl.

Not sure either option is really worth the development effort though, any
program where fork is a serious bottleneck is likely to be a mess anyway
unless you're writing a shell or make.

Who knows what I'll be working on :>

But yeah, I plan to look at getting the dash shell to support
posix_spawn or vfork, too; and maybe bash/GNU make, too
(but FSF copyright assignment is a pain).

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2019

From e@80x24.org

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=128227

Another note​:

I hit ENOMEM when attempting to spawn one small subprocess from
a giant Perl process. I had enough RAM for the small subprocess
(but not enough for a full fork of the giant Perl process).

The kernel could not know the fork() was for an execve() shortly
afterwards when preparing for CoW, so it bailed with ENOMEM on
fork().

Using posix_spawn (where glibc uses CLONE_VFORK behind-the-scenes)
or vfork for spawning subprocesses would've avoided this problem.

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2019

From Mark@Overmeer.net

* Eric Wong (e@​80x24.org) [190128 09​:23]​:

I hit ENOMEM when attempting to spawn one small subprocess from
a giant Perl process. I had enough RAM for the small subprocess
(but not enough for a full fork of the giant Perl process).

Well, reading fork(2) tells me

  ENOMEM fork() failed to allocate the necessary kernel structures
  because memory is tight.

It's not directly about the normal RAM used by the giant process, but
about the RAM used for kernel administration, which may have restrictions.

You may be able to find an error or warning in dmesg about which resource
is really the limitation the kernel bounces into. That might show that
your processor is too small for the task (number of page tables, DMA
range, ...)
--
greetz,
  MarkOv


  Mark Overmeer MSc MARKOV Solutions
  Mark@​Overmeer.net solutions@​overmeer.net
http​://Mark.Overmeer.net http​://solutions.overmeer.net

@p5pRT
Copy link
Author

p5pRT commented Jan 29, 2019

From @eserte

Dana Mon, 28 Jan 2019 02​:55​:53 -0800, Mark@​Overmeer.net reče​:

* Eric Wong (e@​80x24.org) [190128 09​:23]​:

I hit ENOMEM when attempting to spawn one small subprocess from
a giant Perl process. I had enough RAM for the small subprocess
(but not enough for a full fork of the giant Perl process).

Well, reading fork(2) tells me

ENOMEM fork() failed to allocate the necessary kernel structures
because memory is tight.

It's not directly about the normal RAM used by the giant process, but
about the RAM used for kernel administration, which may have restrictions.

You may be able to find an error or warning in dmesg about which resource
is really the limitation the kernel bounces into. That might show that
your processor is too small for the task (number of page tables, DMA
range, ...)

I doubt it's about kernel memory. Just tried a experiment on a recent Linux system (debian/stretch) with 32GB RAM + 1 GB Swap (at the moment used memory by other processes​: < 1GB). The following script just allocates memory and then tries a simple system() call​:

#!/usr/bin/perl

use strict;

my $alloc_gb = shift || 4;

my $mb = " " x 1024**2;
my $buf;
for (1..$alloc_gb*1024) {
  $buf .= $mb;
}

system 'echo', 'system() was successful';
print "Return values of system() call​: \$?=$?, \$!=$!\n";

__END__

Running script allocating 16GB of RAM​:

  $ perl /tmp/m.pl 16
  Return values of system() call​: $?=0, $!=Cannot allocate memory

No logging happens in any of the files in /var/log at this time.

Running the script using with 15GB is successful.

Now another script using spawn() provided by POSIX​::RT​::Spawn (which unfortunately does not build anymore for perl >= 5.28.0)​:

#!/usr/bin/perl

use strict;
use POSIX​::RT​::Spawn;

my $alloc_gb = shift || 4;

my $mb = " " x 1024**2;
my $buf;
for (1..$alloc_gb*1024) {
  $buf .= $mb;
}

my $pid = spawn 'echo', 'spawn() was successful'
  or die "failed to spawn​: $!";
waitpid $pid, 0;
die "command failed with status​: ", $?&gt;&gt;8 if $?;

__END__

No problem running this script with 28GB.

Regards,
  Slaven

@crrodriguez
Copy link

I'll update this request.. perl5 should use the posix_spawn interface instead of (vfork|fork)/exec) on

  • glibc >= 2.26 or later, it is not recommeded to use it in earlier versions due to bugs.
  • MacOS where fork/exec are extremely slow, eve more so in recent releases.
  • Any OS where posix_spawn is a syscall (freebsd, Solaris(?))
  • else { fork/exec }

@jkeenan
Copy link
Contributor

jkeenan commented Jan 9, 2022

I'll update this request.. perl5 should use the posix_spawn interface instead of (vfork|fork)/exec) on

* glibc >= 2.26 or later, it is not recommeded to use it in earlier versions due to bugs.

* MacOS where fork/exec are extremely slow, eve more so in recent releases.

* Any OS where posix_spawn is a syscall (freebsd, Solaris(?))

* else { fork/exec }

Should we assume an OR between your bullet points?

@crrodriguez
Copy link

Should we assume an OR between your bullet points?

yes, OR. thanks for poiting this out.

@Leont Leont linked a pull request Jan 10, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants