Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t/op/getppid.t fails under docker/travis #15727

Closed
p5pRT opened this issue Nov 21, 2016 · 11 comments
Closed

t/op/getppid.t fails under docker/travis #15727

p5pRT opened this issue Nov 21, 2016 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 21, 2016

Migrated from rt.perl.org#130143 (status was 'resolved')

Searchable as RT130143$

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 2016

From @kentfredric

Created by @kentfredric

Recently all my travis targets for Perl Tests started failing t/op/getppid.t

Even tests suites that had previously passed without fault started failing
when re-executed.

t/op/getppid ... 1..8
# Failed test 2 - New parent of orphaned first grandchild at op/getppid.t line 38
# got "0"
# expected >= "1"
# grandchild waited until 'kill'
ok 1 - Parent of first grandchild
not ok 2 - New parent of orphaned first grandchild
# Failed test 5 - New parent of orphaned second grandchild at op/getppid.t line 38
# got "0"
# expected >= "1"
# grandchild waited until 'kill'
FAILED at test 2
Failed 1 test out of 1, 0.00% okay.
  op/getppid.t

Initial diagnosis suggests docker is doing something where those processes
get reaped/leak outside the container or something.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.25.7:

Configured by root at Nov 21 2016 13:45:58

Summary of my perl5 (revision 5 version 25 subversion 7) configuration:
   
  Platform:
    osname=linux
    osvers=4.8.7-040807-generic
    archname=x86_64-linux
    uname='linux testing-docker-2bff6298-a515-4ec6-9489-113816dfd551 4.8.7-040807-generic #201611101131 smp thu nov 10 16:33:40 utc 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Duseshrplib -Dcc=gcc -Dd_semctl_semun -Dusedevel -Ud_csh -Dsh=/bin/sh -Dtargetsh=/bin/sh -Uusenm -Dnoextensions=ODBM_File -Dmyhostname=localhost -Dperladmin=root@localhost -Dinstallusrbinperl=n'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion=''
    gccversion='4.6.3'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.15.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.15'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/local/lib/perl5/5.25.7/x86_64-linux/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_DEVEL
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under linux
  Compiled at Nov 21 2016 13:45:58
  @INC:
    lib
    /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.25.7
    /usr/local/lib/perl5/5.25.7/x86_64-linux
    /usr/local/lib/perl5/5.25.7
    .


@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2016

From @tonycoz

On Mon, 21 Nov 2016 06​:21​:00 -0800, kentfredric@​gmail.com wrote​:

Recently all my travis targets for Perl Tests started failing
t/op/getppid.t

Even tests suites that had previously passed without fault started
failing
when re-executed.

t/op/getppid ... 1..8
# Failed test 2 - New parent of orphaned first grandchild at
op/getppid.t line 38
# got "0"
# expected >= "1"
# grandchild waited until 'kill'
ok 1 - Parent of first grandchild
not ok 2 - New parent of orphaned first grandchild
# Failed test 5 - New parent of orphaned second grandchild at
op/getppid.t line 38
# got "0"
# expected >= "1"
# grandchild waited until 'kill'
FAILED at test 2
Failed 1 test out of 1, 0.00% okay.
op/getppid.t

Initial diagnosis suggests docker is doing something where those
processes
get reaped/leak outside the container or something.

If you use something like​:

https://github.com/phusion/baseimage-docker/blob/rel-0.9.16/image/bin/my_init

as the init process for your container, do the tests pass?

Possibly related​:

https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

http​://blog.dscpl.com.au/2015/12/issues-with-running-as-pid-1-in-docker.html

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2016

From @kentfredric

On Mon, 21 Nov 2016 16​:24​:20 -0800, tonyc wrote​:

If you use something like​:

https://github.com/phusion/baseimage-docker/blob/rel-
0.9.16/image/bin/my_init

as the init process for your container, do the tests pass?

Not able to comment on that as I don't believe I can usurp
the init implementation under travis, and I don't presently have
a plain docker configuration outside those provided by travis.

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2017

From @eserte

Dana Mon, 21 Nov 2016 17​:10​:30 -0800, kentfredric reče​:

On Mon, 21 Nov 2016 16​:24​:20 -0800, tonyc wrote​:

If you use something like​:

https://github.com/phusion/baseimage-docker/blob/rel-
0.9.16/image/bin/my_init

as the init process for your container, do the tests pass?

Not able to comment on that as I don't believe I can usurp
the init implementation under travis, and I don't presently have
a plain docker configuration outside those provided by travis.

The problem is that the travis build/test script does not run at all below the init process. The process tree in a travis-ci container while running the failing getppid.t test looks roughly like this​:

  PID PGID SID TTY TIME CMD
  1560 24 24 ? 00​:00​:00 perl
  1563 24 24 ? 00​:00​:00 sh
  1564 24 24 ? 00​:00​:00 ps
  24 24 24 ? 00​:00​:00 bash
  1465 24 24 ? 00​:00​:00 make
  1545 24 24 ? 00​:00​:00 sh
  1546 24 24 ? 00​:00​:00 perl
  1555 24 24 ? 00​:00​:00 perl <defunct>
  1556 24 24 ? 00​:00​:00 perl
  1557 24 24 ? 00​:00​:00 perl
  1558 24 24 ? 00​:00​:00 perl
  1 1 1 ? 00​:00​:00 init
  48 48 48 ? 00​:00​:00 mysqld
  50 50 50 ? 00​:00​:00 cron
  62 62 62 ? 00​:00​:00 sshd
  162 162 162 ? 00​:00​:00 rsyslogd
  1045 1039 1039 ? 00​:00​:00 postgres
  1051 1051 1051 ? 00​:00​:00 postgres
  1052 1052 1052 ? 00​:00​:00 postgres
  1053 1053 1053 ? 00​:00​:00 postgres
  1054 1054 1054 ? 00​:00​:00 postgres
  1055 1055 1055 ? 00​:00​:00 postgres
  1101 1099 1099 ? 00​:00​:00 memcached

Unlike in a traditional Unix system, there's not only one process tree, but three​: a traditional init process with some standard daemons, the travis-ci build/test script (with bash as the root process), and the forked orphaned perl process, which is calling getppid() (recognizable by the additional "ps" process which I added to the getppid.t test).

The problem seems to be that orphaned processes only started in the init tree would be adopted by the init process. In the other tree this is not done by the container's init process, but by a process outside the linux container. Maybe we can call this an "incognito adoption".

Now what's getppid() returning in this case --- the parent process is outside the container? http​://man7.org/linux/man-pages/man7/pid_namespaces.7.html says​: "Calls to getppid(2) for such processes return 0."

Consequence is that in such an environment the test should even allow 0 as the new parent​:

  cmp_ok ($second, '>=', 0, "New parent of orphaned $which grandchild")

Maybe accepting 0 should be done only on linux systems, and only if a container was detected. A possible check (adapted from http​://stackoverflow.com/a/20012536/2332415) could look like this​:

  sub is_container {
  my $container = 0;
  if (open my $fh, '<', "/proc/1/cgroup") {
  while(<$fh>) {
  if (m{^\d+​:pids​:(.*)} && $1 ne '/init.scope') {
  $container = 1;
  last;
  }
  }
  }
  $container;
  }

For non-container systems the check could still be $second >= 1.

BTW, to reproduce the issue in a docker system just make getppid.t use Test​::More, copy it to the docker container and run

  docker exec -it name_of_container perl getppid.t

In contrast, running the docker container normally with "docker run -it ... bash" and "perl getppid.t" inside would give a successful test run.

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2017

From @eserte

Dana Sat, 25 Mar 2017 13​:25​:44 -0700, slaven@​rezic.de reče​:

Dana Mon, 21 Nov 2016 17​:10​:30 -0800, kentfredric reče​:

On Mon, 21 Nov 2016 16​:24​:20 -0800, tonyc wrote​:

If you use something like​:

https://github.com/phusion/baseimage-docker/blob/rel-
0.9.16/image/bin/my_init

as the init process for your container, do the tests pass?

Not able to comment on that as I don't believe I can usurp
the init implementation under travis, and I don't presently have
a plain docker configuration outside those provided by travis.

The problem is that the travis build/test script does not run at all
below the init process. The process tree in a travis-ci container
while running the failing getppid.t test looks roughly like this​:

PID PGID SID TTY TIME CMD
1560 24 24 ? 00​:00​:00 perl
1563 24 24 ? 00​:00​:00 sh
1564 24 24 ? 00​:00​:00 ps
24 24 24 ? 00​:00​:00 bash
1465 24 24 ? 00​:00​:00 make
1545 24 24 ? 00​:00​:00 sh
1546 24 24 ? 00​:00​:00 perl
1555 24 24 ? 00​:00​:00 perl <defunct>
1556 24 24 ? 00​:00​:00 perl
1557 24 24 ? 00​:00​:00 perl
1558 24 24 ? 00​:00​:00 perl
1 1 1 ? 00​:00​:00 init
48 48 48 ? 00​:00​:00 mysqld
50 50 50 ? 00​:00​:00 cron
62 62 62 ? 00​:00​:00 sshd
162 162 162 ? 00​:00​:00 rsyslogd
1045 1039 1039 ? 00​:00​:00 postgres
1051 1051 1051 ? 00​:00​:00 postgres
1052 1052 1052 ? 00​:00​:00 postgres
1053 1053 1053 ? 00​:00​:00 postgres
1054 1054 1054 ? 00​:00​:00 postgres
1055 1055 1055 ? 00​:00​:00 postgres
1101 1099 1099 ? 00​:00​:00 memcached

Unlike in a traditional Unix system, there's not only one process
tree, but three​: a traditional init process with some standard
daemons, the travis-ci build/test script (with bash as the root
process), and the forked orphaned perl process, which is calling
getppid() (recognizable by the additional "ps" process which I added
to the getppid.t test).

The problem seems to be that orphaned processes only started in the
init tree would be adopted by the init process. In the other tree this
is not done by the container's init process, but by a process outside
the linux container. Maybe we can call this an "incognito adoption".

Now what's getppid() returning in this case --- the parent process is
outside the container? http​://man7.org/linux/man-
pages/man7/pid_namespaces.7.html says​: "Calls to getppid(2) for such
processes return 0."

Consequence is that in such an environment the test should even allow
0 as the new parent​:

cmp_ok ($second, '>=', 0, "New parent of orphaned $which grandchild")

Maybe accepting 0 should be done only on linux systems, and only if a
container was detected. A possible check (adapted from
http​://stackoverflow.com/a/20012536/2332415) could look like this​:

sub is_container {
my $container = 0;
if (open my $fh, '<', "/proc/1/cgroup") {
while(<$fh>) {
if (m{^\d+​:pids​:(.*)} && $1 ne '/init.scope') {
$container = 1;
last;
}
}
}
$container;
}

For non-container systems the check could still be $second >= 1.

BTW, to reproduce the issue in a docker system just make getppid.t use
Test​::More, copy it to the docker container and run

docker exec -it name_of_container perl getppid.t

In contrast, running the docker container normally with "docker run
-it ... bash" and "perl getppid.t" inside would give a successful test
run.

Attached a possible patch for this issue.

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2017

From @eserte

0001-fix-t-op-getppid.t-in-linux-containers-RT-130143.patch
From 544cccf5d9ad9ad99fa8750f0d912740e61b944d Mon Sep 17 00:00:00 2001
From: Slaven Rezic <slaven@rezic.de>
Date: Sat, 25 Mar 2017 21:39:15 +0100
Subject: [PATCH] fix t/op/getppid.t in linux containers (RT #130143)

Allow getppid() to return 0 if the test is running within a linux
container. From "man 2 getppid":

    If the caller's parent is in a different PID namespace (see
    pid_namespaces(7)), getppid() returns 0.
---
 t/op/getppid.t | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/op/getppid.t b/t/op/getppid.t
index 11e0f64..3565525 100644
--- a/t/op/getppid.t
+++ b/t/op/getppid.t
@@ -35,7 +35,8 @@ sub fork_and_retrieve {
 	    unless my ($how, $first, $second) = /^([a-z]+),(\d+),(\d+)\z/;
 	cmp_ok ($first, '>=', 1, "Parent of $which grandchild");
 	my $message = "grandchild waited until '$how'";
-	cmp_ok ($second, '>=', 1, "New parent of orphaned $which grandchild")
+        my $min_getppid_result = is_linux_container() ? 0 : 1;
+	cmp_ok ($second, '>=', $min_getppid_result, "New parent of orphaned $which grandchild")
 	    ? note ($message) : diag ($message);
 
 	SKIP: {
@@ -104,6 +105,19 @@ sub fork_and_retrieve {
     }
 }
 
+sub is_linux_container {
+    my $is_linux_container = 0;
+    if ($^O eq 'linux' && open my $fh, '<', '/proc/1/cgroup') {
+	while(<$fh>) {
+	    if (m{^\d+:pids:(.*)} && $1 ne '/init.scope') {
+		$is_linux_container = 1;
+		last;
+	    }
+	}
+    }
+    $is_linux_container;
+}
+
 my $first = fork_and_retrieve("first");
 my $second = fork_and_retrieve("second");
 SKIP: {
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2017

From @toddr

Pushed to blead with commit 73cfe8d

I added some notes in the test so it would be clear in the future why we had to do this.

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2017

@atoomic - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release yesterday of Perl 5.28.0, this and 185 other issues have been
resolved.

Perl 5.28.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.28.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant