Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread race in dist/IO/IO.xs #14816

Closed
p5pRT opened this issue Jul 24, 2015 · 34 comments
Closed

Thread race in dist/IO/IO.xs #14816

p5pRT opened this issue Jul 24, 2015 · 34 comments
Labels
Closable? We might be able to close this ticket, but we need to check with the reporter dist-IO issues in the dual-life blead-first IO distribution distro-openbsd type-library

Comments

@p5pRT
Copy link

p5pRT commented Jul 24, 2015

Migrated from rt.perl.org#125685 (status was 'open')

Searchable as RT125685$

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2015

From william@25thandClement.com

Created by william@25thandClement.com

This is a bug report for perl from william@​25thandClement.com,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
The _create_getline_subs routine in dist/IO/IO.xs manipulates PL_check in a
thread-unsafe manner. It's called when loading IO​::Handle, at line 437 of
dist/IO/lib/IO/Handle.pm.

In our case this manifested as Perl interpreters in other threads randomly
throwing warnings about undefined constants. This appears to be a regression
starting with 5.16 and still exists in 5.22 and 5.23.1. Apache mod_perl
seems to be immune because they clone their interpreters, and presumably
IO​::Handle is always loaded into the original.

Our workaround was to preload IO​::Handle when initializing interpreters
while holding a global writer lock, and use a reader lock when executing
Perl code.

I'm not sure what a proper fix is. Simply using wrap_op_checker, or
otherwise taking a mutex from _create_getline_subs, is insufficient because
the interposition is intended to be temporary. I don't understand what the
function is trying to achieve, either.

Perl Info

Flags:
    category=library
    severity=high
    module=IO::Handle

Site configuration information for perl 5.20.1:

Configured by root at Thu Jan  1  0:00:00 UTC 1970.

Summary of my perl5 (revision 5 version 20 subversion 1) configuration:
   
  Platform:
    osname=openbsd, osvers=5.7, archname=amd64-openbsd
    uname='openbsd'
    config_args='-dsE -Dopenbsd_distribution=defined -Dccflags=-DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -Dmksymlinks'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -fno-strict-aliasing -fno-delete-null-pointer-checks -pipe -fstack-protector -I/usr/local/include',
    optimize='-O2',
    cppflags='-DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -fno-strict-aliasing -fno-delete-null-pointer-checks -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 20070719 ', gccosandvers='openbsd5.7'
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/lib
    libs=-lm -lutil -lc
    perllibs=-lm -lutil -lc
    libc=/usr/lib/libc.a, so=so, useshrplib=true, libperl=libperl.so.17.0
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-R/usr/libdata/perl5/amd64-openbsd/5.20.1/CORE'
    cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC  -fstack-protector -L/usr/local/lib'



@INC for perl 5.20.1:
    /usr/local/libdata/perl5/site_perl/amd64-openbsd
    /usr/libdata/perl5/site_perl/amd64-openbsd
    /usr/local/libdata/perl5/site_perl
    /usr/libdata/perl5/site_perl
    /usr/libdata/perl5/amd64-openbsd/5.20.1
    /usr/local/libdata/perl5/amd64-openbsd/5.20.1
    /usr/libdata/perl5
    /usr/local/libdata/perl5
    .


Environment for perl 5.20.1:
    HOME=/home/william
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/games:.
    PERL_BADLANG (unset)
    SHELL=/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2017

From @jkeenan

On Fri, 24 Jul 2015 23​:11​:32 GMT, william@​25thandClement.com wrote​:

This is a bug report for perl from william@​25thandClement.com,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
The _create_getline_subs routine in dist/IO/IO.xs manipulates PL_check
in a
thread-unsafe manner. It's called when loading IO​::Handle, at line 437
of
dist/IO/lib/IO/Handle.pm.

In our case this manifested as Perl interpreters in other threads
randomly
throwing warnings about undefined constants. This appears to be a
regression
starting with 5.16 and still exists in 5.22 and 5.23.1. Apache
mod_perl
seems to be immune because they clone their interpreters, and
presumably
IO​::Handle is always loaded into the original.

Our workaround was to preload IO​::Handle when initializing
interpreters
while holding a global writer lock, and use a reader lock when
executing
Perl code.

I'm not sure what a proper fix is. Simply using wrap_op_checker, or
otherwise taking a mutex from _create_getline_subs, is insufficient
because
the interposition is intended to be temporary. I don't understand what
the
function is trying to achieve, either.

1. Would it be possible to supply a small program demonstrating this problem?

2. Given the line you're pointing to in the source code and the point at which the problem first emerged, I suspect the problem began with​:

#####
commit 986a805
Author​: Father Chrysostomos <sprout@​cpan.org>
Date​: Sat Sep 17 12​:09​:22 2011 -0700

  Make IO​::Handle​::getline(s) respect the open pragma
...
#####

But we'd need a reproducible failure to investigate further.

3. Can you supply the output of 'perl -V' from the system where you observed this problem? (The one supplied with the original post came from a non-threaded perl, which I suspect was not where the problem occured.)

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2017

From @dur-randir

On Thu, 05 Jan 2017 12​:24​:26 -0800, jkeenan wrote​:

1. Would it be possible to supply a small program demonstrating this
problem?
use threads;

while (1) {
  push @​foo, threads->create(sub {
  require IO​::Handle;
  });

  $_->detach for(splice @​foo);
}

This should output nothing, but is really noisy instead.

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2017

From @jkeenan

On Thu, 05 Jan 2017 21​:23​:50 GMT, randir wrote​:

On Thu, 05 Jan 2017 12​:24​:26 -0800, jkeenan wrote​:

1. Would it be possible to supply a small program demonstrating this
problem?
use threads;

while (1) {
push @​foo, threads->create(sub {
require IO​::Handle;
});

$\_\->detach for\(splice @&#8203;foo\);

}

This should output nothing, but is really noisy instead.

Thanks. I can confirm that 986a805 is the problematic commit.

#####
$ ./perl -v

This is perl 5, version 15, subversion 2 (v5.15.2-464-g7d167fe) built for x86_64-linux-thread-multi

$ ./perl -Ilib -c ~/learn/perl/p5p/125685-threads.pl
/home/jkeenan/learn/perl/p5p/125685-threads.pl syntax OK

$ ./perl -Ilib ~/learn/perl/p5p/125685-threads.pl
[# no output]
^C

#####
$ ./perl -v

This is perl 5, version 15, subversion 2 (v5.15.2-465-g986a805) built for x86_64-linux-thread-multi

$ ./perl -Ilib -c ~/learn/perl/p5p/125685-threads.pl
/home/jkeenan/learn/perl/p5p/125685-threads.pl syntax OK

$ ./perl -Ilib ~/learn/perl/p5p/125685-threads.pl
[# very noisy output]
^C
#####

Father C​: any thoughts?

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jan 26, 2017

From @hvds

On Thu, 05 Jan 2017 14​:22​:30 -0800, jkeenan wrote​:

I can confirm that 986a805
is the problematic commit.

I took a quick look through, and the manipulation of PL_check[] looked thread-unsafe; checking a bit further I find in perlvars.h​:
  For thread safety, modules should not write directly to this array.
  Instead, use the function L</wrap_op_checker>.

I don't see mention of a function to remove the inserted checker again though, I guess that's rather harder to do safely.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2019

From @niner

Hi,
When embedding perl in a multi threaded program, loading IO​::Handle
concurrently in multiple threads (each with their own interpreter) will cause
compilation errors in other threads and in unrelated modules. The errors all
look like missing symbols that should be there, like undefined subroutines or
methods or imported subs not there causing errors like this​:
String found where operator expected at /usr/lib/perl5/5.28.1/SelectSaver.pm
line 42, near "croak 'usage​: SelectSaver->new( [FILEHANDLE] )'" (Do you need
to predeclare croak?) syntax error at /usr/lib/perl5/5.28.1/SelectSaver.pm
line 42, near "croak 'usage​: SelectSaver->new( [FILEHANDLE] )'"

The errors can be reproduced using the attached C program which I compiled
with​: gcc -Wall testthreads.c -Wl,-E -Wl,-rpath,/usr/lib/perl5/5.28.1/x86_64-
linux-thread-multi/CORE -L/usr/local/lib64 -fstack-protector-strong -L/
usr/lib/perl5/5.28.1/x86_64-linux-thread-multi/CORE -lperl -lm -ldl -lcrypt -
lpthread -D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fwrapv -fno-
strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -
D_FILE_OFFSET_BITS=64 -I/usr/lib/perl5/5.28.1/x86_64-linux-thread-multi/CORE
-o testthreads -fPIC -g testthreads.c
--
Stefan

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2019

From @niner

#include <EXTERN.h>
#include <perl.h>
#include <XSUB.h>
#include <pthread.h>
#include <unistd.h>

static void xs_init (pTHX);
EXTERN_C void boot_DynaLoader (pTHX_ CV* cv);
EXTERN_C void boot_Socket (pTHX_ CV* cv);

EXTERN_C void xs_init(pTHX) {
    char *file = __FILE__;
    newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);
}

PerlInterpreter *p5_init_perl(int argc, char *argv[]) {
    PerlInterpreter *my_perl = perl_alloc();
    PERL_SET_CONTEXT(my_perl);
    PL_perl_destruct_level = 1;
    perl_construct( my_perl );
    perl_parse(my_perl, xs_init, argc, argv, NULL);
    PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
    perl_run(my_perl);

    return my_perl;
}

void p5_destruct_perl(PerlInterpreter *my_perl) {
    PERL_SET_CONTEXT(my_perl);
    PL_perl_destruct_level = 1;

    POPSTACK_TO(PL_mainstack);
    dounwind(-1);
    LEAVE_SCOPE(0);

    perl_destruct(my_perl);
    perl_free(my_perl);
}

void p5_eval_pv(PerlInterpreter *my_perl, const char* p, I32 croak_on_error) {
    PERL_SET_CONTEXT(my_perl);
    {
        dSP;
        ENTER;
        SAVETMPS;
        PUSHMARK(SP);

        eval_pv(p, croak_on_error);

        SPAGAIN;
        PUTBACK;
        FREETMPS;
        LEAVE;
    }
}

typedef struct {
    int argc;
    char **argv;
} args;

void *thread_code(void *data) {
    PerlInterpreter *my_perl = p5_init_perl(((args*)data)->argc, ((args*)data)->argv);
    p5_eval_pv(my_perl, "use IO::Handle; 1", 1);
    p5_destruct_perl(my_perl);
    return NULL;
}

#define thread_count 60

int main(int argc, char *argv[]) {
    args a = {argc, argv};
    PERL_SYS_INIT(&argc, &argv);

    pthread_t thread_id[thread_count];

    for(int i = 0; i < thread_count; i++) {
        int err = pthread_create(&thread_id[i], NULL, thread_code, &a);
        if (err != 0) abort();
    }

    for(int i = 0; i < thread_count; i++) {
        void *result;
        int err = pthread_join(thread_id[i], &result);
        if (err != 0) abort();
    }

    PERL_SYS_TERM();

    return 0;
}

@p5pRT
Copy link
Author

p5pRT commented Sep 21, 2019

From @niner

Removing this code from IO/Handle.pm makes the issue go away​:

# Special XS wrapper to make them inherit lexical hints from the caller.
_create_getline_subs( <<'END' ) or die $@​;
sub getline {
  @​_ == 1 or croak 'usage​: $io->getline()';
  my $this = shift;
  return scalar <$this>;
}

sub getlines {
  @​_ == 1 or croak 'usage​: $io->getlines()';
  wantarray or
  croak 'Can\'t call $io->getlines in a scalar context, use $io->getline';
  my $this = shift;
  return <$this>;
}
1; # return true for error checking
END

*gets = \&getline; # deprecated

@p5pRT
Copy link
Author

p5pRT commented Sep 21, 2019

From [Unknown Contact. See original ticket]

Removing this code from IO/Handle.pm makes the issue go away​:

# Special XS wrapper to make them inherit lexical hints from the caller.
_create_getline_subs( <<'END' ) or die $@​;
sub getline {
  @​_ == 1 or croak 'usage​: $io->getline()';
  my $this = shift;
  return scalar <$this>;
}

sub getlines {
  @​_ == 1 or croak 'usage​: $io->getlines()';
  wantarray or
  croak 'Can\'t call $io->getlines in a scalar context, use $io->getline';
  my $this = shift;
  return <$this>;
}
1; # return true for error checking
END

*gets = \&getline; # deprecated

@p5pRT
Copy link
Author

p5pRT commented Sep 21, 2019

From @niner

The issue is actually a regression caused by https://perl5.git.perl.org/perl.git/commit/986a805c4b2

_create_getline_subs is not thread safe as it assigns to PL_check directly​:
PL_check[OP_LINESEQ] = io_ck_lineseq;
https://perl5.git.perl.org/perl.git/blob/HEAD:/dist/IO/IO.xs#l565

According to perlapi(1) "PL_check" is global to an entire process and must not be written to directly. Instead "wrap_op_checker" should be used.

But still, since PL_check is global that would affect concurrent compilation and I don't think io_ck_lineseq is prepared for that. It looks like it's really tailored for compilation of getline and getlines as it does not check in any way if it should apply its modifications to the OP tree.

@p5pRT
Copy link
Author

p5pRT commented Sep 21, 2019

From [Unknown Contact. See original ticket]

The issue is actually a regression caused by https://perl5.git.perl.org/perl.git/commit/986a805c4b2

_create_getline_subs is not thread safe as it assigns to PL_check directly​:
PL_check[OP_LINESEQ] = io_ck_lineseq;
https://perl5.git.perl.org/perl.git/blob/HEAD:/dist/IO/IO.xs#l565

According to perlapi(1) "PL_check" is global to an entire process and must not be written to directly. Instead "wrap_op_checker" should be used.

But still, since PL_check is global that would affect concurrent compilation and I don't think io_ck_lineseq is prepared for that. It looks like it's really tailored for compilation of getline and getlines as it does not check in any way if it should apply its modifications to the OP tree.

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2019

From @jkeenan

On Fri, 20 Sep 2019 16​:09​:16 GMT, ss@​atikon.com wrote​:

Hi,
When embedding perl in a multi threaded program, loading IO​::Handle
concurrently in multiple threads (each with their own interpreter)
will cause
compilation errors in other threads and in unrelated modules. The
errors all
look like missing symbols that should be there, like undefined
subroutines or
methods or imported subs not there causing errors like this​:
String found where operator expected at
/usr/lib/perl5/5.28.1/SelectSaver.pm
line 42, near "croak 'usage​: SelectSaver->new( [FILEHANDLE] )'" (Do
you need
to predeclare croak?) syntax error at
/usr/lib/perl5/5.28.1/SelectSaver.pm
line 42, near "croak 'usage​: SelectSaver->new( [FILEHANDLE] )'"

The errors can be reproduced using the attached C program which I
compiled
with​: gcc -Wall testthreads.c -Wl,-E -Wl,-
rpath,/usr/lib/perl5/5.28.1/x86_64-
linux-thread-multi/CORE -L/usr/local/lib64 -fstack-protector-strong
-L/
usr/lib/perl5/5.28.1/x86_64-linux-thread-multi/CORE -lperl -lm -ldl
-lcrypt -
lpthread -D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fwrapv
-fno-
strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -
D_FILE_OFFSET_BITS=64 -I/usr/lib/perl5/5.28.1/x86_64-linux-thread-
multi/CORE
-o testthreads -fPIC -g testthreads.c

I built a threaded perl at blead and then tried to compile your test program. The compilation failed.

#####
$ gcc -Wall 134444-testthreads.c -Wl,-E -Wl,-rpath,/home/jkeenan/testing/threaded_blead/lib/perl5/5.31.5/x86_64-linux-thread-multi/CORE \

-L/usr/local/lib64 -fstack-protector-strong \
-L/home/jkeenan/testing/threaded_blead/lib/perl5/5.31.5/x86_64-linux-thread-multi/CORE -lperl -lm -ldl \
-lcrypt -lpthread -D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fwrapv -fno-strict-aliasing \
-pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \
-I/home/jkeenan/testing/threaded_blead/lib/perl5/5.31.5/x86_64-linux-thread-multi/CORE \
-o testthreads -fPIC -g 134444-testthreads.c
/tmp/ccCfdXuT.o​: In function `p5_init_perl'​:
/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:16​: multiple definition of `p5_init_perl'
/tmp/ccQrNdU4.o​:/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:16​: first defined here
/tmp/ccCfdXuT.o​: In function `p5_destruct_perl'​:
/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:28​: multiple definition of `p5_destruct_perl'
/tmp/ccQrNdU4.o​:/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:28​: first defined here
/tmp/ccCfdXuT.o​: In function `p5_eval_pv'​:
/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:40​: multiple definition of `p5_eval_pv'
/tmp/ccQrNdU4.o​:/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:40​: first defined here
/tmp/ccCfdXuT.o​: In function `thread_code'​:
/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:62​: multiple definition of `thread_code'
/tmp/ccQrNdU4.o​:/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:62​: first defined here
/tmp/ccCfdXuT.o​: In function `main'​:
/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:71​: multiple definition of `main'
/tmp/ccQrNdU4.o​:/home/jkeenan/learn/perl/p5p/134444-testthreads.c​:71​: first defined here
collect2​: error​: ld returned 1 exit status
#####

Can you spot the problem?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2019

From @niner

On Sun, 22 Sep 2019 08​:42​:00 -0700, jkeenan wrote​:

$ gcc -Wall 134444-testthreads.c -Wl,-E -Wl,-
...
-o testthreads -fPIC -g 134444-testthreads.c

Can you spot the problem?

Indeed! You gave it the name of the source file twice.

I have since written a patch that seems to fix my problem. It moves PL_check to the interpreter struct which should avoid any threading related issues altogether. It's attached.

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2019

From @niner

PL_check-to-interp.diff
commit ab49dea882469093cd1bce8aa4ddadd76f99f15d
Author: Stefan Seifert <nine@detonation.org>
Date:   Sun Sep 22 11:09:29 2019 +0200

    Move PL_check to the interp vars to fix threading issues

diff --git a/embedvar.h b/embedvar.h
index 35cf8f2191..33a3fbd72c 100644
--- a/embedvar.h
+++ b/embedvar.h
@@ -71,6 +71,7 @@
 #define PL_body_roots		(vTHX->Ibody_roots)
 #define PL_bodytarget		(vTHX->Ibodytarget)
 #define PL_breakable_sub_gen	(vTHX->Ibreakable_sub_gen)
+#define PL_check		(vTHX->Icheck)
 #define PL_checkav		(vTHX->Icheckav)
 #define PL_checkav_save		(vTHX->Icheckav_save)
 #define PL_chopset		(vTHX->Ichopset)
@@ -385,8 +386,6 @@
 #define PL_GXPosix_ptrs		(my_vars->GXPosix_ptrs)
 #define PL_appctx		(my_vars->Gappctx)
 #define PL_Gappctx		(my_vars->Gappctx)
-#define PL_check		(my_vars->Gcheck)
-#define PL_Gcheck		(my_vars->Gcheck)
 #define PL_check_mutex		(my_vars->Gcheck_mutex)
 #define PL_Gcheck_mutex		(my_vars->Gcheck_mutex)
 #define PL_csighandlerp		(my_vars->Gcsighandlerp)
diff --git a/globvar.sym b/globvar.sym
index dcc65f2e29..1642c88060 100644
--- a/globvar.sym
+++ b/globvar.sym
@@ -10,7 +10,6 @@ PL_bitcount
 PL_block_type
 PL_c9_utf8_dfa_tab
 PL_charclass
-PL_check
 PL_core_reg_engine
 PL_extended_utf8_dfa_tab
 PL_fold
diff --git a/intrpvar.h b/intrpvar.h
index 47383ae818..11790b7a34 100644
--- a/intrpvar.h
+++ b/intrpvar.h
@@ -496,6 +496,7 @@ PERLVAR(I, endav,	AV *)		/* names of END subroutines */
 PERLVAR(I, unitcheckav,	AV *)		/* names of UNITCHECK subroutines */
 PERLVAR(I, checkav,	AV *)		/* names of CHECK subroutines */
 PERLVAR(I, initav,	AV *)		/* names of INIT subroutines */
+PERLVAR(I, check,	Perl_check_t *) /* functions to call during CHECK phase */
 
 /* subprocess state */
 PERLVAR(I, fdpid,	AV *)		/* keep fd-to-pid mappings for my_popen */
diff --git a/opcode.h b/opcode.h
index ba3bd9e668..23ac225f5f 100644
--- a/opcode.h
+++ b/opcode.h
@@ -1371,15 +1371,8 @@ EXT Perl_ppaddr_t PL_ppaddr[] /* or perlvars.h */
 ;
 #endif
 
-#ifdef PERL_GLOBAL_STRUCT_INIT
-#  define PERL_CHECK_INITED
+#ifdef PERL_IN_PERL_C
 static const Perl_check_t Gcheck[]
-#elif !defined(PERL_GLOBAL_STRUCT)
-#  define PERL_CHECK_INITED
-EXT Perl_check_t PL_check[] /* or perlvars.h */
-#endif
-#if (defined(DOINIT) && !defined(PERL_GLOBAL_STRUCT)) || defined(PERL_GLOBAL_STRUCT_INIT)
-#  define PERL_CHECK_INITED
 = {
 	Perl_ck_null,		/* null */
 	Perl_ck_null,		/* stub */
@@ -1778,11 +1771,8 @@ EXT Perl_check_t PL_check[] /* or perlvars.h */
 	Perl_ck_null,		/* lvrefslice */
 	Perl_ck_null,		/* lvavref */
 	Perl_ck_null,		/* anonconst */
-}
+};
 #endif
-#ifdef PERL_CHECK_INITED
-;
-#endif /* #ifdef PERL_CHECK_INITED */
 
 #ifndef PERL_GLOBAL_STRUCT_INIT
 
diff --git a/perl.c b/perl.c
index 5176da0db1..38f6cea24b 100644
--- a/perl.c
+++ b/perl.c
@@ -452,6 +452,15 @@ perl_construct(pTHXx)
 #ifdef USE_POSIX_2008_LOCALE
     PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", NULL);
 #endif
+    {
+        const IV ncheck  = C_ARRAY_LENGTH(Gcheck);
+        PL_check  =
+            (Perl_check_t*)
+            PerlMem_malloc(ncheck  * sizeof(Perl_check_t));
+        if (!PL_check)
+            exit(1);
+        Copy(Gcheck,  PL_check,  ncheck,  Perl_check_t);
+    }
 
     ENTER;
     init_i18nl10n(1);
diff --git a/perlapi.h b/perlapi.h
index 4cfbafecdf..aee6468896 100644
--- a/perlapi.h
+++ b/perlapi.h
@@ -137,8 +137,6 @@ END_EXTERN_C
 #define PL_XPosix_ptrs		(*Perl_GXPosix_ptrs_ptr(NULL))
 #undef  PL_appctx
 #define PL_appctx		(*Perl_Gappctx_ptr(NULL))
-#undef  PL_check
-#define PL_check		(*Perl_Gcheck_ptr(NULL))
 #undef  PL_check_mutex
 #define PL_check_mutex		(*Perl_Gcheck_mutex_ptr(NULL))
 #undef  PL_csighandlerp
diff --git a/perlvars.h b/perlvars.h
index df5b2f8c64..bc16f2967d 100644
--- a/perlvars.h
+++ b/perlvars.h
@@ -150,7 +150,6 @@ PERLVAR(G, check_mutex,	perl_mutex)	/* Mutex for PL_check */
 #endif
 #ifdef PERL_GLOBAL_STRUCT 
 PERLVAR(G, ppaddr,	Perl_ppaddr_t *) /* or opcode.h */
-PERLVAR(G, check,	Perl_check_t *) /* or opcode.h */
 PERLVARA(G, fold_locale, 256, unsigned char) /* or perl.h */
 #endif
 
diff --git a/regen/opcode.pl b/regen/opcode.pl
index 672f55c368..44541a742d 100755
--- a/regen/opcode.pl
+++ b/regen/opcode.pl
@@ -1061,15 +1061,8 @@ print $oc <<'END';
 ;
 #endif
 
-#ifdef PERL_GLOBAL_STRUCT_INIT
-#  define PERL_CHECK_INITED
+#ifdef PERL_IN_PERL_C
 static const Perl_check_t Gcheck[]
-#elif !defined(PERL_GLOBAL_STRUCT)
-#  define PERL_CHECK_INITED
-EXT Perl_check_t PL_check[] /* or perlvars.h */
-#endif
-#if (defined(DOINIT) && !defined(PERL_GLOBAL_STRUCT)) || defined(PERL_GLOBAL_STRUCT_INIT)
-#  define PERL_CHECK_INITED
 = {
 END
 
@@ -1078,11 +1071,8 @@ for (@ops) {
 }
 
 print $oc <<'END';
-}
+};
 #endif
-#ifdef PERL_CHECK_INITED
-;
-#endif /* #ifdef PERL_CHECK_INITED */
 
 #ifndef PERL_GLOBAL_STRUCT_INIT
 
diff --git a/sv.c b/sv.c
index e088e5c419..8fc24fd7d3 100644
--- a/sv.c
+++ b/sv.c
@@ -15580,6 +15580,14 @@ perl_clone_using(PerlInterpreter *proto_perl, UV flags,
     PL_globalstash	= hv_dup(proto_perl->Iglobalstash, param);
     PL_curstname	= sv_dup_inc(proto_perl->Icurstname, param);
 
+    /* Add PL_check here */
+    PL_check  =
+        (Perl_check_t*)
+        PerlMem_malloc(PL_maxo  * sizeof(Perl_check_t));
+    if (!PL_check)
+        exit(1);
+    Copy(proto_perl->Icheck,  PL_check,  PL_maxo,  Perl_check_t);
+
     PL_beginav		= av_dup_inc(proto_perl->Ibeginav, param);
     PL_beginav_save	= av_dup_inc(proto_perl->Ibeginav_save, param);
     PL_checkav_save	= av_dup_inc(proto_perl->Icheckav_save, param);
diff --git a/util.c b/util.c
index 376cc8ab0f..0f0edcee97 100644
--- a/util.c
+++ b/util.c
@@ -4630,7 +4630,6 @@ Perl_init_global_struct(pTHX)
     struct perl_vars *plvarsp = NULL;
 # ifdef PERL_GLOBAL_STRUCT
     const IV nppaddr = C_ARRAY_LENGTH(Gppaddr);
-    const IV ncheck  = C_ARRAY_LENGTH(Gcheck);
     PERL_UNUSED_CONTEXT;
 #  ifdef PERL_GLOBAL_STRUCT_PRIVATE
     /* PerlMem_malloc() because can't use even safesysmalloc() this early. */
@@ -4659,13 +4658,7 @@ Perl_init_global_struct(pTHX)
 	PerlMem_malloc(nppaddr * sizeof(Perl_ppaddr_t));
     if (!plvarsp->Gppaddr)
         exit(1);
-    plvarsp->Gcheck  =
-	(Perl_check_t*)
-	PerlMem_malloc(ncheck  * sizeof(Perl_check_t));
-    if (!plvarsp->Gcheck)
-        exit(1);
     Copy(Gppaddr, plvarsp->Gppaddr, nppaddr, Perl_ppaddr_t); 
-    Copy(Gcheck,  plvarsp->Gcheck,  ncheck,  Perl_check_t); 
 #  endif
 #  ifdef PERL_SET_VARS
     PERL_SET_VARS(plvarsp);

@p5pRT
Copy link
Author

p5pRT commented Sep 23, 2019

From @nwc10

On Sun, 22 Sep 2019 10​:24​:48 -0700, nine@​detonation.org wrote​:

I have since written a patch that seems to fix my problem. It moves
PL_check to the interpreter struct which should avoid any threading
related issues altogether. It's attached.

As best I can work out from staring at history (for a while), PL_check was created in 2000 around the time all the non-static variables were renamed to start PL_*, and it seems to have been accident/oversight that it ended up as "global", not "per-interpreter".

Logically, yes, I think that it does need to be per interpreter.

But, what's somewhat perturbing me about your patch are two implications of this​:

Inline Patch
diff --git a/perl.c b/perl.c
index 5176da0db1..38f6cea24b 100644
--- a/perl.c
+++ b/perl.c
@@ -452,6 +452,15 @@ perl_construct(pTHXx)
 #ifdef USE_POSIX_2008_LOCALE
     PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", NULL);
 #endif
+    {
+        const IV ncheck  = C_ARRAY_LENGTH(Gcheck);
+        PL_check  =
+            (Perl_check_t*)
+            PerlMem_malloc(ncheck  * sizeof(Perl_check_t));
+        if (!PL_check)
+            exit(1);
+        Copy( ,  PL_check,  ncheck,  Perl_check_t);
+    }
 
     ENTER;
     init_i18nl10n(1);


First - I think that this as implemented is effectively a leak - there needs to be a corresponding free in perl_destruct

Second - should we really need to unconditionally allocate a block of memory hanging off the interpreter struct which is always present? Surely it should simply *be* part of the interpreter struct

At which point it becomes obvious that we're about to add 397 pointers to the structure, which I think is 3K on a 64-bit system. I'm not *sure* if this is a problem. It only really hurts things *using* threads (er, MULTIPLICITY), on the assumption that the copy source (Gcheck, if I read it correctly) is part of the read-only constant section of the executable (on any sane OS), so it will cause 3K more read/write memory to be allocated, but once startup is done, the CPU cache will now simply be looking in a different place for the values, and the original Gcheck will go cold.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 24, 2019

From @dur-randir

Btw, this is a duplicate of https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125685. I'm glad that it's got picked up.

@p5pRT
Copy link
Author

p5pRT commented Sep 24, 2019

From @jkeenan

Merging into existing ticket per analysis by Sergey.
--
James E Keenan (jkeenan@​cpan.org)

@niner
Copy link
Contributor

niner commented Oct 30, 2019

Thanks for the review!

From @nwc10

Second - should we really need to unconditionally allocate a block of memory hanging off the interpreter struct which is always present? Surely it should simply be part of the interpreter struct

That's....so obvious now that you say it. This fixes the memory leak and also simplifies the code quite a bit.
I've created a PR with the updated patch:
#17240

From @nwc10

At which point it becomes obvious that we're about to add 397 pointers to the structure, which I think is 3K on a 64-bit system. I'm not sure if this is a problem. It only really hurts things using threads (er, MULTIPLICITY), on the assumption that the copy source (Gcheck, if I read it correctly) is part of the read-only constant section of the executable (on any sane OS), so it will cause 3K more read/write memory to be allocated, but once startup is done, the CPU cache will now simply be looking in a different place for the values, and the original Gcheck will go cold.

I dare say correctness trumps speed. The issue has been reported twice at least, so it's clear that it's hurting real users. 3K memory is not nothing but also not huge. If it turns out to be an actual performance issue, we can try to find a better solution.

@jkeenan
Copy link
Contributor

jkeenan commented Oct 30, 2019

Thanks for the review!

From @nwc10

Second - should we really need to unconditionally allocate a block of memory hanging off the interpreter struct which is always present? Surely it should simply be part of the interpreter struct

That's....so obvious now that you say it. This fixes the memory leak and also simplifies the code quite a bit.
I've created a PR with the updated patch:
#17240

Shouldn't the pull request contain some new tests to demonstrate that the problem raised by the original poster has been resolved?

From @nwc10

At which point it becomes obvious that we're about to add 397 pointers to the structure, which I think is 3K on a 64-bit system. I'm not sure if this is a problem. It only really hurts things using threads (er, MULTIPLICITY), on the assumption that the copy source (Gcheck, if I read it correctly) is part of the read-only constant section of the executable (on any sane OS), so it will cause 3K more read/write memory to be allocated, but once startup is done, the CPU cache will now simply be looking in a different place for the values, and the original Gcheck will go cold.

I dare say correctness trumps speed. The issue has been reported twice at least, so it's clear that it's hurting real users. 3K memory is not nothing but also not huge. If it turns out to be an actual performance issue, we can try to find a better solution.

Thank you very much.
Jim Keenan

niner added a commit to niner/perl5 that referenced this issue Oct 30, 2019
@niner
Copy link
Contributor

niner commented Oct 30, 2019 via email

@jkeenan
Copy link
Contributor

jkeenan commented Oct 30, 2019

On Mittwoch, 30. Oktober 2019 20:11:57 CET James E Keenan wrote: Shouldn't the pull request contain some new tests to demonstrate that the problem raised by the original poster has been resolved?
Good point! I updated the pull request to contain a test file based on the code example posted by the original reporter of this issue. Since the bug is a race condition, the test does not fail 100 % reliably on an unpatched perl, but it does fail often enough so a regression could not go unnoticed for long.

I created a smoke-me branch from your patch:
smoke-me/jkeenan/niner/gh-14816/niner-blead

However, I am getting test failures in io/handle.t.
See: https://travis-ci.org/Perl/perl5/jobs/605164560#L3370.

Also, the MANIFEST will have to be updated.

Thank you very much.
Jim Keenan

@jkeenan
Copy link
Contributor

jkeenan commented Oct 30, 2019

On Mittwoch, 30. Oktober 2019 20:11:57 CET James E Keenan wrote: Shouldn't the pull request contain some new tests to demonstrate that the problem raised by the original poster has been resolved?
Good point! I updated the pull request to contain a test file based on the code example posted by the original reporter of this issue. Since the bug is a race condition, the test does not fail 100 % reliably on an unpatched perl, but it does fail often enough so a regression could not go unnoticed for long.

I created a smoke-me branch from your patch:
smoke-me/jkeenan/niner/gh-14816/niner-blead

However, I am getting test failures in io/handle.t.
See: https://travis-ci.org/Perl/perl5/jobs/605164560#L3370.

Also, the MANIFEST will have to be updated.

Thank you very much.
Jim Keenan

The test failures in t/io/handle.t were due to the absence of a skip_all for unthreaded builds. In the smoke-me/jkeenan/niner/gh-14816/niner-blead branch I have corrected that and updated MANIFEST.

Thank you very much.
Jim Keenan

jkeenan pushed a commit that referenced this issue Nov 1, 2019
nwc10 added a commit that referenced this issue Nov 6, 2019
…4816.

Re-implement getline() and getlines() as XS code.

The underlying problem that we're trying to solve here is making
getline() and getlines() in IO::Handle respect the open pragma.

That bug was first addressed in Sept 2011 by commit 986a805:
    Make IO::Handle::getline(s) respect the open pragma

However, that fix introduced a more subtle bug, hence this reworking.
Including the entirety of the rest of that commit message because it
explains both the bug the previous approach:

    See <https://rt.cpan.org/Ticket/Display.html?id=66474>.  Also, this
    came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>.

    The <> operator, when reading from the magic ARGV handle, automatic-
    ally opens the next file.  Layers set by the lexical open pragma are
    applied, if they are in scope at the point where <> is used.

    This works almost all the time, because the common convention is:

        use open ":utf8";

        while(<>) {
            ...
        }

    IO::Handle’s getline and getlines methods are Perl subroutines
    that call <> themselves.  But that happens within the scope of
    IO/Handle.pm, so the caller’s I/O layer settings are ignored.  That
    means that these two expressions are not equivalent within in a
    ‘use open’ scope:

        <>
        *ARGV->getline

    The latter will open the next file with no layers applied.

    This commit solves that by putting PL_check hooks in place in
    IO::Handle before compiling the getline and getlines subroutines.
    Those hooks cause every state op (nextstate, or dbstate under the
    debugger) to have a custom pp function that saves the previous value
    of PL_curcop, calls the default pp function, and then restores
    PL_curcop.

    That means that getline and getlines run with the caller’s compile-
    time hints.  Another way to see it is that getline and getlines’s own
    lexical hints are never activated.

    (A state op carries all the lexical pragmata.  Every statement
    has one.  When any op executes, it’s ‘pp’ function is called.
    pp_nextstate and pp_dbstate both set PL_curcop to the op itself.  Any
    code that checks hints looks at PL_curcop, which contains the current
    run-time hints.)

The problem with this approach is that the (current) design and implementation
of PL_check hooks is actually not threadsafe. There's one array (as a global),
which is used by all interpreters in the process. But as the code added to
IO.xs demonstrates, realistically it needs to be possible to change the hook
just for this interpreter.

GH #14816 has a fix for that bug for blead. However, it will be tricky (to
impossible) to backport to earlier perl versions.

Hence it's also worthwhile to change IO.xs to use a different approach to
solve the original bug. As described above, the bug is fixed by having the
readline OP (that implements getline() and getlines()) see the caller's
lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't
have any lexical hints of their own. getline() and getlines() are very
simple, mostly parameter checking, ending with a one line that maps to
a single core OP, whose values are directly returned.

Hence "all" we need to do re-implement the Perl code as XS. This might look
easy, but turns out to be trickier than expected. There isn't any API to be
called for the OP in question, pp_readline(). The body of the OP inspects
interpreter state, it directly calls pp_rv2gv() which also inspects state,
and then it tail calls Perl_do_readline(), which inspects state.

The easiest approach seems to be to set up enough state, and then call
pp_readline() directly. This leaves us very tightly coupled to the
internals, but so do all other approaches to try to tackle this bug.

The current implementation of PL_check (and possibly other arrays) still
needs to be addressed.
tonycoz pushed a commit to tonycoz/perl5 that referenced this issue Dec 4, 2019
…rl#14816.

Re-implement getline() and getlines() as XS code.

The underlying problem that we're trying to solve here is making
getline() and getlines() in IO::Handle respect the open pragma.

That bug was first addressed in Sept 2011 by commit 986a805:
    Make IO::Handle::getline(s) respect the open pragma

However, that fix introduced a more subtle bug, hence this reworking.
Including the entirety of the rest of that commit message because it
explains both the bug the previous approach:

    See <https://rt.cpan.org/Ticket/Display.html?id=66474>.  Also, this
    came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>.

    The <> operator, when reading from the magic ARGV handle, automatic-
    ally opens the next file.  Layers set by the lexical open pragma are
    applied, if they are in scope at the point where <> is used.

    This works almost all the time, because the common convention is:

        use open ":utf8";

        while(<>) {
            ...
        }

    IO::Handle’s getline and getlines methods are Perl subroutines
    that call <> themselves.  But that happens within the scope of
    IO/Handle.pm, so the caller’s I/O layer settings are ignored.  That
    means that these two expressions are not equivalent within in a
    ‘use open’ scope:

        <>
        *ARGV->getline

    The latter will open the next file with no layers applied.

    This commit solves that by putting PL_check hooks in place in
    IO::Handle before compiling the getline and getlines subroutines.
    Those hooks cause every state op (nextstate, or dbstate under the
    debugger) to have a custom pp function that saves the previous value
    of PL_curcop, calls the default pp function, and then restores
    PL_curcop.

    That means that getline and getlines run with the caller’s compile-
    time hints.  Another way to see it is that getline and getlines’s own
    lexical hints are never activated.

    (A state op carries all the lexical pragmata.  Every statement
    has one.  When any op executes, it’s ‘pp’ function is called.
    pp_nextstate and pp_dbstate both set PL_curcop to the op itself.  Any
    code that checks hints looks at PL_curcop, which contains the current
    run-time hints.)

The problem with this approach is that the (current) design and implementation
of PL_check hooks is actually not threadsafe. There's one array (as a global),
which is used by all interpreters in the process. But as the code added to
IO.xs demonstrates, realistically it needs to be possible to change the hook
just for this interpreter.

GH Perl#14816 has a fix for that bug for blead. However, it will be tricky (to
impossible) to backport to earlier perl versions.

Hence it's also worthwhile to change IO.xs to use a different approach to
solve the original bug. As described above, the bug is fixed by having the
readline OP (that implements getline() and getlines()) see the caller's
lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't
have any lexical hints of their own. getline() and getlines() are very
simple, mostly parameter checking, ending with a one line that maps to
a single core OP, whose values are directly returned.

Hence "all" we need to do re-implement the Perl code as XS. This might look
easy, but turns out to be trickier than expected. There isn't any API to be
called for the OP in question, pp_readline(). The body of the OP inspects
interpreter state, it directly calls pp_rv2gv() which also inspects state,
and then it tail calls Perl_do_readline(), which inspects state.

The easiest approach seems to be to set up enough state, and then call
pp_readline() directly. This leaves us very tightly coupled to the
internals, but so do all other approaches to try to tackle this bug.

The current implementation of PL_check (and possibly other arrays) still
needs to be addressed.
tonycoz pushed a commit to tonycoz/perl5 that referenced this issue Dec 9, 2019
tonycoz pushed a commit to tonycoz/perl5 that referenced this issue Dec 11, 2019
@tonycoz
Copy link
Contributor

tonycoz commented Dec 12, 2019

should be fixed by #17351

@nwc10
Copy link
Contributor

nwc10 commented Dec 12, 2019 via email

@tonycoz
Copy link
Contributor

tonycoz commented Dec 12, 2019

On Wed, Dec 11, 2019 at 04:41:03PM -0800, Tony Cook wrote: should be fixed by #17351
What was have now is

  1. first (i)thread to call wrap_op_checker() changes that interpreter's PL_check and the static variable (via *old_checker_p)
  2. second (i)thread to call wrap_op_checker() does nothing, because *old_checker_p is not-NULL and hence the second ithread doesn't see PL_check changed. And I can't see how to "fix" this API in any way that actually works with the existing XS code.

I hadn't considered the API, I guess this issue is a consequence of having modules manage the hook chaining rather than having perl manage it itself.

If we implemented our own chaining maybe wrap_op_checker() could work like:

  1. wrap_op_checker() scans the current list of functions for this op, and if it isn't present adds it at the end.
  2. wrap_op_checker() always sets *old_checker_p to point to a stub that indicates the next item in the chain should be called

The chain handling code may need to distinguish between checkers added with wrap_op_checker() and otherwise, we might use an interpreter variable to distinguish between call-next and don't-call-next for wrap_op_checker() hooks, but a return value might be more natural and require less management of the interpreter variable for newer style hooks.

An advantage of maintaining the list of checkers in core is we could have an API to remove a checker too, so the code that produced this report could add the checker and remove it more safely.[1]

But that's probably all too complicated.

Given Christmas/NY and the contentious changes freeze in January it might be difficult to get this into core for 5.32.

So I can see us reverting the PL_check change and just including your IO changes (which aren't merged just yet.)

[1] it isn't an issue here, but what if that code compiled by _create_getline_subs() had installed another checker

rjbs pushed a commit that referenced this issue Dec 16, 2019
…4816.

Re-implement getline() and getlines() as XS code.

The underlying problem that we're trying to solve here is making
getline() and getlines() in IO::Handle respect the open pragma.

That bug was first addressed in Sept 2011 by commit 986a805:
    Make IO::Handle::getline(s) respect the open pragma

However, that fix introduced a more subtle bug, hence this reworking.
Including the entirety of the rest of that commit message because it
explains both the bug the previous approach:

    See <https://rt.cpan.org/Ticket/Display.html?id=66474>.  Also, this
    came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>.

    The <> operator, when reading from the magic ARGV handle, automatic-
    ally opens the next file.  Layers set by the lexical open pragma are
    applied, if they are in scope at the point where <> is used.

    This works almost all the time, because the common convention is:

        use open ":utf8";

        while(<>) {
            ...
        }

    IO::Handle’s getline and getlines methods are Perl subroutines
    that call <> themselves.  But that happens within the scope of
    IO/Handle.pm, so the caller’s I/O layer settings are ignored.  That
    means that these two expressions are not equivalent within in a
    ‘use open’ scope:

        <>
        *ARGV->getline

    The latter will open the next file with no layers applied.

    This commit solves that by putting PL_check hooks in place in
    IO::Handle before compiling the getline and getlines subroutines.
    Those hooks cause every state op (nextstate, or dbstate under the
    debugger) to have a custom pp function that saves the previous value
    of PL_curcop, calls the default pp function, and then restores
    PL_curcop.

    That means that getline and getlines run with the caller’s compile-
    time hints.  Another way to see it is that getline and getlines’s own
    lexical hints are never activated.

    (A state op carries all the lexical pragmata.  Every statement
    has one.  When any op executes, it’s ‘pp’ function is called.
    pp_nextstate and pp_dbstate both set PL_curcop to the op itself.  Any
    code that checks hints looks at PL_curcop, which contains the current
    run-time hints.)

The problem with this approach is that the (current) design and implementation
of PL_check hooks is actually not threadsafe. There's one array (as a global),
which is used by all interpreters in the process. But as the code added to
IO.xs demonstrates, realistically it needs to be possible to change the hook
just for this interpreter.

GH #14816 has a fix for that bug for blead. However, it will be tricky (to
impossible) to backport to earlier perl versions.

Hence it's also worthwhile to change IO.xs to use a different approach to
solve the original bug. As described above, the bug is fixed by having the
readline OP (that implements getline() and getlines()) see the caller's
lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't
have any lexical hints of their own. getline() and getlines() are very
simple, mostly parameter checking, ending with a one line that maps to
a single core OP, whose values are directly returned.

Hence "all" we need to do re-implement the Perl code as XS. This might look
easy, but turns out to be trickier than expected. There isn't any API to be
called for the OP in question, pp_readline(). The body of the OP inspects
interpreter state, it directly calls pp_rv2gv() which also inspects state,
and then it tail calls Perl_do_readline(), which inspects state.

The easiest approach seems to be to set up enough state, and then call
pp_readline() directly. This leaves us very tightly coupled to the
internals, but so do all other approaches to try to tackle this bug.

The current implementation of PL_check (and possibly other arrays) still
needs to be addressed.
nwc10 added a commit that referenced this issue Jan 19, 2020
…4816.

Re-implement getline() and getlines() as XS code.

The underlying problem that we're trying to solve here is making
getline() and getlines() in IO::Handle respect the open pragma.

That bug was first addressed in Sept 2011 by commit 986a805:
    Make IO::Handle::getline(s) respect the open pragma

However, that fix introduced a more subtle bug, hence this reworking.
Including the entirety of the rest of that commit message because it
explains both the bug the previous approach:

    See <https://rt.cpan.org/Ticket/Display.html?id=66474>.  Also, this
    came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>.

    The <> operator, when reading from the magic ARGV handle, automatic-
    ally opens the next file.  Layers set by the lexical open pragma are
    applied, if they are in scope at the point where <> is used.

    This works almost all the time, because the common convention is:

        use open ":utf8";

        while(<>) {
            ...
        }

    IO::Handle’s getline and getlines methods are Perl subroutines
    that call <> themselves.  But that happens within the scope of
    IO/Handle.pm, so the caller’s I/O layer settings are ignored.  That
    means that these two expressions are not equivalent within in a
    ‘use open’ scope:

        <>
        *ARGV->getline

    The latter will open the next file with no layers applied.

    This commit solves that by putting PL_check hooks in place in
    IO::Handle before compiling the getline and getlines subroutines.
    Those hooks cause every state op (nextstate, or dbstate under the
    debugger) to have a custom pp function that saves the previous value
    of PL_curcop, calls the default pp function, and then restores
    PL_curcop.

    That means that getline and getlines run with the caller’s compile-
    time hints.  Another way to see it is that getline and getlines’s own
    lexical hints are never activated.

    (A state op carries all the lexical pragmata.  Every statement
    has one.  When any op executes, it’s ‘pp’ function is called.
    pp_nextstate and pp_dbstate both set PL_curcop to the op itself.  Any
    code that checks hints looks at PL_curcop, which contains the current
    run-time hints.)

The problem with this approach is that the (current) design and implementation
of PL_check hooks is actually not threadsafe. There's one array (as a global),
which is used by all interpreters in the process. But as the code added to
IO.xs demonstrates, realistically it needs to be possible to change the hook
just for this interpreter.

GH #14816 has a fix for that bug for blead. However, it will be tricky (to
impossible) to backport to earlier perl versions.

Hence it's also worthwhile to change IO.xs to use a different approach to
solve the original bug. As described above, the bug is fixed by having the
readline OP (that implements getline() and getlines()) see the caller's
lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't
have any lexical hints of their own. getline() and getlines() are very
simple, mostly parameter checking, ending with a one line that maps to
a single core OP, whose values are directly returned.

Hence "all" we need to do re-implement the Perl code as XS. This might look
easy, but turns out to be trickier than expected. There isn't any API to be
called for the OP in question, pp_readline(). The body of the OP inspects
interpreter state, it directly calls pp_rv2gv() which also inspects state,
and then it tail calls Perl_do_readline(), which inspects state.

The easiest approach seems to be to set up enough state, and then call
pp_readline() directly. This leaves us very tightly coupled to the
internals, but so do all other approaches to try to tackle this bug.

The current implementation of PL_check (and possibly other arrays) still
needs to be addressed.
@jkeenan jkeenan added the dist-IO issues in the dual-life blead-first IO distribution label Feb 3, 2020
@tonycoz
Copy link
Contributor

tonycoz commented May 7, 2020

I believe this was fixed by f172023.

@tonycoz tonycoz added the Closable? We might be able to close this ticket, but we need to check with the reporter label May 7, 2020
@jkeenan
Copy link
Contributor

jkeenan commented May 7, 2020

I believe this was fixed by f172023.

So when I build a threaded perl at blead and re-run the program submitted back in 2017 by @dur-randir,

while (1) {
  push @​foo, threads->create(sub {
  require IO​::Handle;
  });

  $_->detach for(splice @​foo);
}

I now get a silent infinite loop. Is that what I should be expecting?

If so, then the original complaint in this ticket has been addressed and the ticket can be closed.

Thank you very much.
Jim Keenan

@dur-randir
Copy link
Member

The original issue here is module loading, and it looks good now. wrap_op_checker is a different concern that needs it's own ticket.

@jkeenan
Copy link
Contributor

jkeenan commented May 7, 2020 via email

@dur-randir
Copy link
Member

Looks like i've skipped some bits here, and the merged solution doesn't involve changing PL_check (and hence avoid wrap_op_check problems). But nevertheless that commit says "The current implementation of PL_check (and possibly other arrays) still needs to be addressed.".

But i'm not sure what was implied - reinstantiate the original change early enough to have time to change api + modules? Different approach? It'd be better to ask Nicholas.

@jkeenan
Copy link
Contributor

jkeenan commented May 10, 2020

Looks like i've skipped some bits here, and the merged solution doesn't involve changing PL_check (and hence avoid wrap_op_check problems). But nevertheless that commit says "The current implementation of PL_check (and possibly other arrays) still needs to be addressed.".

But i'm not sure what was implied - reinstantiate the original change early enough to have time to change api + modules? Different approach? It'd be better to ask Nicholas.

@nwc10, can you comment on the above? Thanks.

@tonycoz
Copy link
Contributor

tonycoz commented May 26, 2020

Open #17806 to cover the remaining issue here.

@tonycoz tonycoz closed this as completed May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closable? We might be able to close this ticket, but we need to check with the reporter dist-IO issues in the dual-life blead-first IO distribution distro-openbsd type-library
Projects
None yet
Development

No branches or pull requests

6 participants