New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread race in dist/IO/IO.xs #14816
Comments
From william@25thandClement.comCreated by william@25thandClement.comThis is a bug report for perl from william@25thandClement.com, ----------------------------------------------------------------- In our case this manifested as Perl interpreters in other threads randomly Our workaround was to preload IO::Handle when initializing interpreters I'm not sure what a proper fix is. Simply using wrap_op_checker, or Perl Info
|
From @jkeenanOn Fri, 24 Jul 2015 23:11:32 GMT, william@25thandClement.com wrote:
1. Would it be possible to supply a small program demonstrating this problem? 2. Given the line you're pointing to in the source code and the point at which the problem first emerged, I suspect the problem began with: ##### Make IO::Handle::getline(s) respect the open pragma But we'd need a reproducible failure to investigate further. 3. Can you supply the output of 'perl -V' from the system where you observed this problem? (The one supplied with the original post came from a non-threaded perl, which I suspect was not where the problem occured.) Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @dur-randirOn Thu, 05 Jan 2017 12:24:26 -0800, jkeenan wrote:
while (1) { $_->detach for(splice @foo); This should output nothing, but is really noisy instead. |
From @jkeenanOn Thu, 05 Jan 2017 21:23:50 GMT, randir wrote:
Thanks. I can confirm that 986a805 is the problematic commit. ##### This is perl 5, version 15, subversion 2 (v5.15.2-464-g7d167fe) built for x86_64-linux-thread-multi $ ./perl -Ilib -c ~/learn/perl/p5p/125685-threads.pl $ ./perl -Ilib ~/learn/perl/p5p/125685-threads.pl ##### This is perl 5, version 15, subversion 2 (v5.15.2-465-g986a805) built for x86_64-linux-thread-multi $ ./perl -Ilib -c ~/learn/perl/p5p/125685-threads.pl $ ./perl -Ilib ~/learn/perl/p5p/125685-threads.pl Father C: any thoughts? -- |
From @hvdsOn Thu, 05 Jan 2017 14:22:30 -0800, jkeenan wrote:
I took a quick look through, and the manipulation of PL_check[] looked thread-unsafe; checking a bit further I find in perlvars.h: I don't see mention of a function to remove the inserted checker again though, I guess that's rather harder to do safely. Hugo |
From @ninerHi, The errors can be reproduced using the attached C program which I compiled |
From @niner#include <EXTERN.h>
#include <perl.h>
#include <XSUB.h>
#include <pthread.h>
#include <unistd.h>
static void xs_init (pTHX);
EXTERN_C void boot_DynaLoader (pTHX_ CV* cv);
EXTERN_C void boot_Socket (pTHX_ CV* cv);
EXTERN_C void xs_init(pTHX) {
char *file = __FILE__;
newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);
}
PerlInterpreter *p5_init_perl(int argc, char *argv[]) {
PerlInterpreter *my_perl = perl_alloc();
PERL_SET_CONTEXT(my_perl);
PL_perl_destruct_level = 1;
perl_construct( my_perl );
perl_parse(my_perl, xs_init, argc, argv, NULL);
PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
perl_run(my_perl);
return my_perl;
}
void p5_destruct_perl(PerlInterpreter *my_perl) {
PERL_SET_CONTEXT(my_perl);
PL_perl_destruct_level = 1;
POPSTACK_TO(PL_mainstack);
dounwind(-1);
LEAVE_SCOPE(0);
perl_destruct(my_perl);
perl_free(my_perl);
}
void p5_eval_pv(PerlInterpreter *my_perl, const char* p, I32 croak_on_error) {
PERL_SET_CONTEXT(my_perl);
{
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
eval_pv(p, croak_on_error);
SPAGAIN;
PUTBACK;
FREETMPS;
LEAVE;
}
}
typedef struct {
int argc;
char **argv;
} args;
void *thread_code(void *data) {
PerlInterpreter *my_perl = p5_init_perl(((args*)data)->argc, ((args*)data)->argv);
p5_eval_pv(my_perl, "use IO::Handle; 1", 1);
p5_destruct_perl(my_perl);
return NULL;
}
#define thread_count 60
int main(int argc, char *argv[]) {
args a = {argc, argv};
PERL_SYS_INIT(&argc, &argv);
pthread_t thread_id[thread_count];
for(int i = 0; i < thread_count; i++) {
int err = pthread_create(&thread_id[i], NULL, thread_code, &a);
if (err != 0) abort();
}
for(int i = 0; i < thread_count; i++) {
void *result;
int err = pthread_join(thread_id[i], &result);
if (err != 0) abort();
}
PERL_SYS_TERM();
return 0;
} |
From @ninerRemoving this code from IO/Handle.pm makes the issue go away: # Special XS wrapper to make them inherit lexical hints from the caller. sub getlines { *gets = \&getline; # deprecated |
From [Unknown Contact. See original ticket]Removing this code from IO/Handle.pm makes the issue go away: # Special XS wrapper to make them inherit lexical hints from the caller. sub getlines { *gets = \&getline; # deprecated |
From @ninerThe issue is actually a regression caused by https://perl5.git.perl.org/perl.git/commit/986a805c4b2 _create_getline_subs is not thread safe as it assigns to PL_check directly: According to perlapi(1) "PL_check" is global to an entire process and must not be written to directly. Instead "wrap_op_checker" should be used. But still, since PL_check is global that would affect concurrent compilation and I don't think io_ck_lineseq is prepared for that. It looks like it's really tailored for compilation of getline and getlines as it does not check in any way if it should apply its modifications to the OP tree. |
From [Unknown Contact. See original ticket]The issue is actually a regression caused by https://perl5.git.perl.org/perl.git/commit/986a805c4b2 _create_getline_subs is not thread safe as it assigns to PL_check directly: According to perlapi(1) "PL_check" is global to an entire process and must not be written to directly. Instead "wrap_op_checker" should be used. But still, since PL_check is global that would affect concurrent compilation and I don't think io_ck_lineseq is prepared for that. It looks like it's really tailored for compilation of getline and getlines as it does not check in any way if it should apply its modifications to the OP tree. |
From @jkeenanOn Fri, 20 Sep 2019 16:09:16 GMT, ss@atikon.com wrote:
I built a threaded perl at blead and then tried to compile your test program. The compilation failed. #####
Can you spot the problem? Thank you very much. |
The RT System itself - Status changed from 'new' to 'open' |
From @ninerOn Sun, 22 Sep 2019 08:42:00 -0700, jkeenan wrote:
Indeed! You gave it the name of the source file twice. I have since written a patch that seems to fix my problem. It moves PL_check to the interpreter struct which should avoid any threading related issues altogether. It's attached. |
From @ninerPL_check-to-interp.diffcommit ab49dea882469093cd1bce8aa4ddadd76f99f15d
Author: Stefan Seifert <nine@detonation.org>
Date: Sun Sep 22 11:09:29 2019 +0200
Move PL_check to the interp vars to fix threading issues
diff --git a/embedvar.h b/embedvar.h
index 35cf8f2191..33a3fbd72c 100644
--- a/embedvar.h
+++ b/embedvar.h
@@ -71,6 +71,7 @@
#define PL_body_roots (vTHX->Ibody_roots)
#define PL_bodytarget (vTHX->Ibodytarget)
#define PL_breakable_sub_gen (vTHX->Ibreakable_sub_gen)
+#define PL_check (vTHX->Icheck)
#define PL_checkav (vTHX->Icheckav)
#define PL_checkav_save (vTHX->Icheckav_save)
#define PL_chopset (vTHX->Ichopset)
@@ -385,8 +386,6 @@
#define PL_GXPosix_ptrs (my_vars->GXPosix_ptrs)
#define PL_appctx (my_vars->Gappctx)
#define PL_Gappctx (my_vars->Gappctx)
-#define PL_check (my_vars->Gcheck)
-#define PL_Gcheck (my_vars->Gcheck)
#define PL_check_mutex (my_vars->Gcheck_mutex)
#define PL_Gcheck_mutex (my_vars->Gcheck_mutex)
#define PL_csighandlerp (my_vars->Gcsighandlerp)
diff --git a/globvar.sym b/globvar.sym
index dcc65f2e29..1642c88060 100644
--- a/globvar.sym
+++ b/globvar.sym
@@ -10,7 +10,6 @@ PL_bitcount
PL_block_type
PL_c9_utf8_dfa_tab
PL_charclass
-PL_check
PL_core_reg_engine
PL_extended_utf8_dfa_tab
PL_fold
diff --git a/intrpvar.h b/intrpvar.h
index 47383ae818..11790b7a34 100644
--- a/intrpvar.h
+++ b/intrpvar.h
@@ -496,6 +496,7 @@ PERLVAR(I, endav, AV *) /* names of END subroutines */
PERLVAR(I, unitcheckav, AV *) /* names of UNITCHECK subroutines */
PERLVAR(I, checkav, AV *) /* names of CHECK subroutines */
PERLVAR(I, initav, AV *) /* names of INIT subroutines */
+PERLVAR(I, check, Perl_check_t *) /* functions to call during CHECK phase */
/* subprocess state */
PERLVAR(I, fdpid, AV *) /* keep fd-to-pid mappings for my_popen */
diff --git a/opcode.h b/opcode.h
index ba3bd9e668..23ac225f5f 100644
--- a/opcode.h
+++ b/opcode.h
@@ -1371,15 +1371,8 @@ EXT Perl_ppaddr_t PL_ppaddr[] /* or perlvars.h */
;
#endif
-#ifdef PERL_GLOBAL_STRUCT_INIT
-# define PERL_CHECK_INITED
+#ifdef PERL_IN_PERL_C
static const Perl_check_t Gcheck[]
-#elif !defined(PERL_GLOBAL_STRUCT)
-# define PERL_CHECK_INITED
-EXT Perl_check_t PL_check[] /* or perlvars.h */
-#endif
-#if (defined(DOINIT) && !defined(PERL_GLOBAL_STRUCT)) || defined(PERL_GLOBAL_STRUCT_INIT)
-# define PERL_CHECK_INITED
= {
Perl_ck_null, /* null */
Perl_ck_null, /* stub */
@@ -1778,11 +1771,8 @@ EXT Perl_check_t PL_check[] /* or perlvars.h */
Perl_ck_null, /* lvrefslice */
Perl_ck_null, /* lvavref */
Perl_ck_null, /* anonconst */
-}
+};
#endif
-#ifdef PERL_CHECK_INITED
-;
-#endif /* #ifdef PERL_CHECK_INITED */
#ifndef PERL_GLOBAL_STRUCT_INIT
diff --git a/perl.c b/perl.c
index 5176da0db1..38f6cea24b 100644
--- a/perl.c
+++ b/perl.c
@@ -452,6 +452,15 @@ perl_construct(pTHXx)
#ifdef USE_POSIX_2008_LOCALE
PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", NULL);
#endif
+ {
+ const IV ncheck = C_ARRAY_LENGTH(Gcheck);
+ PL_check =
+ (Perl_check_t*)
+ PerlMem_malloc(ncheck * sizeof(Perl_check_t));
+ if (!PL_check)
+ exit(1);
+ Copy(Gcheck, PL_check, ncheck, Perl_check_t);
+ }
ENTER;
init_i18nl10n(1);
diff --git a/perlapi.h b/perlapi.h
index 4cfbafecdf..aee6468896 100644
--- a/perlapi.h
+++ b/perlapi.h
@@ -137,8 +137,6 @@ END_EXTERN_C
#define PL_XPosix_ptrs (*Perl_GXPosix_ptrs_ptr(NULL))
#undef PL_appctx
#define PL_appctx (*Perl_Gappctx_ptr(NULL))
-#undef PL_check
-#define PL_check (*Perl_Gcheck_ptr(NULL))
#undef PL_check_mutex
#define PL_check_mutex (*Perl_Gcheck_mutex_ptr(NULL))
#undef PL_csighandlerp
diff --git a/perlvars.h b/perlvars.h
index df5b2f8c64..bc16f2967d 100644
--- a/perlvars.h
+++ b/perlvars.h
@@ -150,7 +150,6 @@ PERLVAR(G, check_mutex, perl_mutex) /* Mutex for PL_check */
#endif
#ifdef PERL_GLOBAL_STRUCT
PERLVAR(G, ppaddr, Perl_ppaddr_t *) /* or opcode.h */
-PERLVAR(G, check, Perl_check_t *) /* or opcode.h */
PERLVARA(G, fold_locale, 256, unsigned char) /* or perl.h */
#endif
diff --git a/regen/opcode.pl b/regen/opcode.pl
index 672f55c368..44541a742d 100755
--- a/regen/opcode.pl
+++ b/regen/opcode.pl
@@ -1061,15 +1061,8 @@ print $oc <<'END';
;
#endif
-#ifdef PERL_GLOBAL_STRUCT_INIT
-# define PERL_CHECK_INITED
+#ifdef PERL_IN_PERL_C
static const Perl_check_t Gcheck[]
-#elif !defined(PERL_GLOBAL_STRUCT)
-# define PERL_CHECK_INITED
-EXT Perl_check_t PL_check[] /* or perlvars.h */
-#endif
-#if (defined(DOINIT) && !defined(PERL_GLOBAL_STRUCT)) || defined(PERL_GLOBAL_STRUCT_INIT)
-# define PERL_CHECK_INITED
= {
END
@@ -1078,11 +1071,8 @@ for (@ops) {
}
print $oc <<'END';
-}
+};
#endif
-#ifdef PERL_CHECK_INITED
-;
-#endif /* #ifdef PERL_CHECK_INITED */
#ifndef PERL_GLOBAL_STRUCT_INIT
diff --git a/sv.c b/sv.c
index e088e5c419..8fc24fd7d3 100644
--- a/sv.c
+++ b/sv.c
@@ -15580,6 +15580,14 @@ perl_clone_using(PerlInterpreter *proto_perl, UV flags,
PL_globalstash = hv_dup(proto_perl->Iglobalstash, param);
PL_curstname = sv_dup_inc(proto_perl->Icurstname, param);
+ /* Add PL_check here */
+ PL_check =
+ (Perl_check_t*)
+ PerlMem_malloc(PL_maxo * sizeof(Perl_check_t));
+ if (!PL_check)
+ exit(1);
+ Copy(proto_perl->Icheck, PL_check, PL_maxo, Perl_check_t);
+
PL_beginav = av_dup_inc(proto_perl->Ibeginav, param);
PL_beginav_save = av_dup_inc(proto_perl->Ibeginav_save, param);
PL_checkav_save = av_dup_inc(proto_perl->Icheckav_save, param);
diff --git a/util.c b/util.c
index 376cc8ab0f..0f0edcee97 100644
--- a/util.c
+++ b/util.c
@@ -4630,7 +4630,6 @@ Perl_init_global_struct(pTHX)
struct perl_vars *plvarsp = NULL;
# ifdef PERL_GLOBAL_STRUCT
const IV nppaddr = C_ARRAY_LENGTH(Gppaddr);
- const IV ncheck = C_ARRAY_LENGTH(Gcheck);
PERL_UNUSED_CONTEXT;
# ifdef PERL_GLOBAL_STRUCT_PRIVATE
/* PerlMem_malloc() because can't use even safesysmalloc() this early. */
@@ -4659,13 +4658,7 @@ Perl_init_global_struct(pTHX)
PerlMem_malloc(nppaddr * sizeof(Perl_ppaddr_t));
if (!plvarsp->Gppaddr)
exit(1);
- plvarsp->Gcheck =
- (Perl_check_t*)
- PerlMem_malloc(ncheck * sizeof(Perl_check_t));
- if (!plvarsp->Gcheck)
- exit(1);
Copy(Gppaddr, plvarsp->Gppaddr, nppaddr, Perl_ppaddr_t);
- Copy(Gcheck, plvarsp->Gcheck, ncheck, Perl_check_t);
# endif
# ifdef PERL_SET_VARS
PERL_SET_VARS(plvarsp);
|
From @nwc10On Sun, 22 Sep 2019 10:24:48 -0700, nine@detonation.org wrote:
As best I can work out from staring at history (for a while), PL_check was created in 2000 around the time all the non-static variables were renamed to start PL_*, and it seems to have been accident/oversight that it ended up as "global", not "per-interpreter". Logically, yes, I think that it does need to be per interpreter. But, what's somewhat perturbing me about your patch are two implications of this: Inline Patchdiff --git a/perl.c b/perl.c
index 5176da0db1..38f6cea24b 100644
--- a/perl.c
+++ b/perl.c
@@ -452,6 +452,15 @@ perl_construct(pTHXx)
#ifdef USE_POSIX_2008_LOCALE
PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", NULL);
#endif
+ {
+ const IV ncheck = C_ARRAY_LENGTH(Gcheck);
+ PL_check =
+ (Perl_check_t*)
+ PerlMem_malloc(ncheck * sizeof(Perl_check_t));
+ if (!PL_check)
+ exit(1);
+ Copy( , PL_check, ncheck, Perl_check_t);
+ }
ENTER;
init_i18nl10n(1);
Second - should we really need to unconditionally allocate a block of memory hanging off the interpreter struct which is always present? Surely it should simply *be* part of the interpreter struct At which point it becomes obvious that we're about to add 397 pointers to the structure, which I think is 3K on a 64-bit system. I'm not *sure* if this is a problem. It only really hurts things *using* threads (er, MULTIPLICITY), on the assumption that the copy source (Gcheck, if I read it correctly) is part of the read-only constant section of the executable (on any sane OS), so it will cause 3K more read/write memory to be allocated, but once startup is done, the CPU cache will now simply be looking in a different place for the values, and the original Gcheck will go cold. Nicholas Clark |
From @dur-randirBtw, this is a duplicate of https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125685. I'm glad that it's got picked up. |
From @jkeenanMerging into existing ticket per analysis by Sergey. |
Thanks for the review!
That's....so obvious now that you say it. This fixes the memory leak and also simplifies the code quite a bit.
I dare say correctness trumps speed. The issue has been reported twice at least, so it's clear that it's hurting real users. 3K memory is not nothing but also not huge. If it turns out to be an actual performance issue, we can try to find a better solution. |
Shouldn't the pull request contain some new tests to demonstrate that the problem raised by the original poster has been resolved?
Thank you very much. |
On Mittwoch, 30. Oktober 2019 20:11:57 CET James E Keenan wrote:
Shouldn't the pull request contain some new tests to demonstrate that the
problem raised by the original poster has been resolved?
Good point! I updated the pull request to contain a test file based on the
code example posted by the original reporter of this issue. Since the bug is a
race condition, the test does not fail 100 % reliably on an unpatched perl,
but it does fail often enough so a regression could not go unnoticed for long.
|
I created a smoke-me branch from your patch: However, I am getting test failures in Also, the MANIFEST will have to be updated. Thank you very much. |
The test failures in t/io/handle.t were due to the absence of a Thank you very much. |
…4816. Re-implement getline() and getlines() as XS code. The underlying problem that we're trying to solve here is making getline() and getlines() in IO::Handle respect the open pragma. That bug was first addressed in Sept 2011 by commit 986a805: Make IO::Handle::getline(s) respect the open pragma However, that fix introduced a more subtle bug, hence this reworking. Including the entirety of the rest of that commit message because it explains both the bug the previous approach: See <https://rt.cpan.org/Ticket/Display.html?id=66474>. Also, this came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>. The <> operator, when reading from the magic ARGV handle, automatic- ally opens the next file. Layers set by the lexical open pragma are applied, if they are in scope at the point where <> is used. This works almost all the time, because the common convention is: use open ":utf8"; while(<>) { ... } IO::Handle’s getline and getlines methods are Perl subroutines that call <> themselves. But that happens within the scope of IO/Handle.pm, so the caller’s I/O layer settings are ignored. That means that these two expressions are not equivalent within in a ‘use open’ scope: <> *ARGV->getline The latter will open the next file with no layers applied. This commit solves that by putting PL_check hooks in place in IO::Handle before compiling the getline and getlines subroutines. Those hooks cause every state op (nextstate, or dbstate under the debugger) to have a custom pp function that saves the previous value of PL_curcop, calls the default pp function, and then restores PL_curcop. That means that getline and getlines run with the caller’s compile- time hints. Another way to see it is that getline and getlines’s own lexical hints are never activated. (A state op carries all the lexical pragmata. Every statement has one. When any op executes, it’s ‘pp’ function is called. pp_nextstate and pp_dbstate both set PL_curcop to the op itself. Any code that checks hints looks at PL_curcop, which contains the current run-time hints.) The problem with this approach is that the (current) design and implementation of PL_check hooks is actually not threadsafe. There's one array (as a global), which is used by all interpreters in the process. But as the code added to IO.xs demonstrates, realistically it needs to be possible to change the hook just for this interpreter. GH #14816 has a fix for that bug for blead. However, it will be tricky (to impossible) to backport to earlier perl versions. Hence it's also worthwhile to change IO.xs to use a different approach to solve the original bug. As described above, the bug is fixed by having the readline OP (that implements getline() and getlines()) see the caller's lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't have any lexical hints of their own. getline() and getlines() are very simple, mostly parameter checking, ending with a one line that maps to a single core OP, whose values are directly returned. Hence "all" we need to do re-implement the Perl code as XS. This might look easy, but turns out to be trickier than expected. There isn't any API to be called for the OP in question, pp_readline(). The body of the OP inspects interpreter state, it directly calls pp_rv2gv() which also inspects state, and then it tail calls Perl_do_readline(), which inspects state. The easiest approach seems to be to set up enough state, and then call pp_readline() directly. This leaves us very tightly coupled to the internals, but so do all other approaches to try to tackle this bug. The current implementation of PL_check (and possibly other arrays) still needs to be addressed.
…rl#14816. Re-implement getline() and getlines() as XS code. The underlying problem that we're trying to solve here is making getline() and getlines() in IO::Handle respect the open pragma. That bug was first addressed in Sept 2011 by commit 986a805: Make IO::Handle::getline(s) respect the open pragma However, that fix introduced a more subtle bug, hence this reworking. Including the entirety of the rest of that commit message because it explains both the bug the previous approach: See <https://rt.cpan.org/Ticket/Display.html?id=66474>. Also, this came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>. The <> operator, when reading from the magic ARGV handle, automatic- ally opens the next file. Layers set by the lexical open pragma are applied, if they are in scope at the point where <> is used. This works almost all the time, because the common convention is: use open ":utf8"; while(<>) { ... } IO::Handle’s getline and getlines methods are Perl subroutines that call <> themselves. But that happens within the scope of IO/Handle.pm, so the caller’s I/O layer settings are ignored. That means that these two expressions are not equivalent within in a ‘use open’ scope: <> *ARGV->getline The latter will open the next file with no layers applied. This commit solves that by putting PL_check hooks in place in IO::Handle before compiling the getline and getlines subroutines. Those hooks cause every state op (nextstate, or dbstate under the debugger) to have a custom pp function that saves the previous value of PL_curcop, calls the default pp function, and then restores PL_curcop. That means that getline and getlines run with the caller’s compile- time hints. Another way to see it is that getline and getlines’s own lexical hints are never activated. (A state op carries all the lexical pragmata. Every statement has one. When any op executes, it’s ‘pp’ function is called. pp_nextstate and pp_dbstate both set PL_curcop to the op itself. Any code that checks hints looks at PL_curcop, which contains the current run-time hints.) The problem with this approach is that the (current) design and implementation of PL_check hooks is actually not threadsafe. There's one array (as a global), which is used by all interpreters in the process. But as the code added to IO.xs demonstrates, realistically it needs to be possible to change the hook just for this interpreter. GH Perl#14816 has a fix for that bug for blead. However, it will be tricky (to impossible) to backport to earlier perl versions. Hence it's also worthwhile to change IO.xs to use a different approach to solve the original bug. As described above, the bug is fixed by having the readline OP (that implements getline() and getlines()) see the caller's lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't have any lexical hints of their own. getline() and getlines() are very simple, mostly parameter checking, ending with a one line that maps to a single core OP, whose values are directly returned. Hence "all" we need to do re-implement the Perl code as XS. This might look easy, but turns out to be trickier than expected. There isn't any API to be called for the OP in question, pp_readline(). The body of the OP inspects interpreter state, it directly calls pp_rv2gv() which also inspects state, and then it tail calls Perl_do_readline(), which inspects state. The easiest approach seems to be to set up enough state, and then call pp_readline() directly. This leaves us very tightly coupled to the internals, but so do all other approaches to try to tackle this bug. The current implementation of PL_check (and possibly other arrays) still needs to be addressed.
should be fixed by #17351 |
On Wed, Dec 11, 2019 at 04:41:03PM -0800, Tony Cook wrote:
should be fixed by #17351
(OK let's see how well the "reply to this e-mail" thing works...
Not sure *where* the discussion on "this" really should be. I'd not tried to
write "this" message that follows, as I didn't yet have a clear view on what
the right solution is...
"So this patch is still needed to fix the thread-safety issue."
agree.
But what I'd also realised was that, in turn, fixing the whole design
mistake of PL_check being "global" *then* breaks everything that uses
wrap_op_checker().
To quote the full wrap_op_checker docs:
=for apidoc wrap_op_checker
Puts a C function into the chain of check functions for a specified op
type. This is the preferred way to manipulate the L</PL_check> array.
C<opcode> specifies which type of op is to be affected. C<new_checker>
is a pointer to the C function that is to be added to that opcode's
check chain, and C<old_checker_p> points to the storage location where a
pointer to the next function in the chain will be stored. The value of
C<new_checker> is written into the L</PL_check> array, while the value
previously stored there is written to C<*old_checker_p>.
L</PL_check> is global to an entire process, and a module wishing to
hook op checking may find itself invoked more than once per process,
typically in different threads. To handle that situation, this function
is idempotent. The location C<*old_checker_p> must initially (once
per process) contain a null pointer. A C variable of static duration
(declared at file scope, typically also marked C<static> to give
it internal linkage) will be implicitly initialised appropriately,
if it does not have an explicit initialiser. This function will only
actually modify the check chain if it finds C<*old_checker_p> to be null.
This function is also thread safe on the small scale. It uses appropriate
locking to avoid race conditions in accessing L</PL_check>.
When this function is called, the function referenced by C<new_checker>
must be ready to be called, except for C<*old_checker_p> being unfilled.
In a threading situation, C<new_checker> may be called immediately,
even before this function has returned. C<*old_checker_p> will always
be appropriately set before C<new_checker> is called. If C<new_checker>
decides not to do anything special with an op that it is given (which
is the usual case for most uses of op check hooking), it must chain the
check function referenced by C<*old_checker_p>.
Taken all together, XS code to hook an op checker should typically look
something like this:
static Perl_check_t nxck_frob;
static OP *myck_frob(pTHX_ OP *op) {
...
op = nxck_frob(aTHX_ op);
...
return op;
}
BOOT:
wrap_op_checker(OP_FROB, myck_frob, &nxck_frob);
If you want to influence compilation of calls to a specific subroutine,
then use L</cv_set_call_checker_flags> rather than hooking checking of
all C<entersub> ops.
=cut
So, as-was (for an actually process global PL_check) that design works,
even if the XS module is loaded independently in two child threads
(the shared object is loaded once, and the static storage exists once)
1) first (i)thread to call wrap_op_checker() changes PL_check and the static
variable (via *old_checker_p)
2) second (i)thread to call wrap_op_checker() does nothing, because
*old_checker_p is not-NULL
and both ithreads see PL_check changed to the value that they "want".
What was have now is
1) first (i)thread to call wrap_op_checker() changes *that* interpreter's
PL_check and the static variable (via *old_checker_p)
2) second (i)thread to call wrap_op_checker() does nothing, because
*old_checker_p is not-NULL
and hence the second ithread *doesn't* see PL_check changed.
And I can't see how to "fix" this API in any way that actually works with
the existing XS code.
1) I *think* that it needs to die.
2) I'm not sure what should replace it
(maybe no API is needed. There's no *thread* race here now...)
So sadly it looks like we need to fix all of CPAN...
http://grep.cpan.me/?q=wrap_op_checker+-file%3Apport.h
14 references, 2 which are false positives, leaving code from
ZEFRAM, ETHER, VPIT, RANDIR, ATOOMIC.
Hence a flag-day change is possible here. I'm just not sure what design
we want to put on our flag.
Nicholas Clark
|
I hadn't considered the API, I guess this issue is a consequence of having modules manage the hook chaining rather than having perl manage it itself. If we implemented our own chaining maybe wrap_op_checker() could work like:
The chain handling code may need to distinguish between checkers added with wrap_op_checker() and otherwise, we might use an interpreter variable to distinguish between call-next and don't-call-next for wrap_op_checker() hooks, but a return value might be more natural and require less management of the interpreter variable for newer style hooks. An advantage of maintaining the list of checkers in core is we could have an API to remove a checker too, so the code that produced this report could add the checker and remove it more safely.[1] But that's probably all too complicated. Given Christmas/NY and the contentious changes freeze in January it might be difficult to get this into core for 5.32. So I can see us reverting the PL_check change and just including your IO changes (which aren't merged just yet.) [1] it isn't an issue here, but what if that code compiled by _create_getline_subs() had installed another checker |
…4816. Re-implement getline() and getlines() as XS code. The underlying problem that we're trying to solve here is making getline() and getlines() in IO::Handle respect the open pragma. That bug was first addressed in Sept 2011 by commit 986a805: Make IO::Handle::getline(s) respect the open pragma However, that fix introduced a more subtle bug, hence this reworking. Including the entirety of the rest of that commit message because it explains both the bug the previous approach: See <https://rt.cpan.org/Ticket/Display.html?id=66474>. Also, this came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>. The <> operator, when reading from the magic ARGV handle, automatic- ally opens the next file. Layers set by the lexical open pragma are applied, if they are in scope at the point where <> is used. This works almost all the time, because the common convention is: use open ":utf8"; while(<>) { ... } IO::Handle’s getline and getlines methods are Perl subroutines that call <> themselves. But that happens within the scope of IO/Handle.pm, so the caller’s I/O layer settings are ignored. That means that these two expressions are not equivalent within in a ‘use open’ scope: <> *ARGV->getline The latter will open the next file with no layers applied. This commit solves that by putting PL_check hooks in place in IO::Handle before compiling the getline and getlines subroutines. Those hooks cause every state op (nextstate, or dbstate under the debugger) to have a custom pp function that saves the previous value of PL_curcop, calls the default pp function, and then restores PL_curcop. That means that getline and getlines run with the caller’s compile- time hints. Another way to see it is that getline and getlines’s own lexical hints are never activated. (A state op carries all the lexical pragmata. Every statement has one. When any op executes, it’s ‘pp’ function is called. pp_nextstate and pp_dbstate both set PL_curcop to the op itself. Any code that checks hints looks at PL_curcop, which contains the current run-time hints.) The problem with this approach is that the (current) design and implementation of PL_check hooks is actually not threadsafe. There's one array (as a global), which is used by all interpreters in the process. But as the code added to IO.xs demonstrates, realistically it needs to be possible to change the hook just for this interpreter. GH #14816 has a fix for that bug for blead. However, it will be tricky (to impossible) to backport to earlier perl versions. Hence it's also worthwhile to change IO.xs to use a different approach to solve the original bug. As described above, the bug is fixed by having the readline OP (that implements getline() and getlines()) see the caller's lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't have any lexical hints of their own. getline() and getlines() are very simple, mostly parameter checking, ending with a one line that maps to a single core OP, whose values are directly returned. Hence "all" we need to do re-implement the Perl code as XS. This might look easy, but turns out to be trickier than expected. There isn't any API to be called for the OP in question, pp_readline(). The body of the OP inspects interpreter state, it directly calls pp_rv2gv() which also inspects state, and then it tail calls Perl_do_readline(), which inspects state. The easiest approach seems to be to set up enough state, and then call pp_readline() directly. This leaves us very tightly coupled to the internals, but so do all other approaches to try to tackle this bug. The current implementation of PL_check (and possibly other arrays) still needs to be addressed.
…4816. Re-implement getline() and getlines() as XS code. The underlying problem that we're trying to solve here is making getline() and getlines() in IO::Handle respect the open pragma. That bug was first addressed in Sept 2011 by commit 986a805: Make IO::Handle::getline(s) respect the open pragma However, that fix introduced a more subtle bug, hence this reworking. Including the entirety of the rest of that commit message because it explains both the bug the previous approach: See <https://rt.cpan.org/Ticket/Display.html?id=66474>. Also, this came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>. The <> operator, when reading from the magic ARGV handle, automatic- ally opens the next file. Layers set by the lexical open pragma are applied, if they are in scope at the point where <> is used. This works almost all the time, because the common convention is: use open ":utf8"; while(<>) { ... } IO::Handle’s getline and getlines methods are Perl subroutines that call <> themselves. But that happens within the scope of IO/Handle.pm, so the caller’s I/O layer settings are ignored. That means that these two expressions are not equivalent within in a ‘use open’ scope: <> *ARGV->getline The latter will open the next file with no layers applied. This commit solves that by putting PL_check hooks in place in IO::Handle before compiling the getline and getlines subroutines. Those hooks cause every state op (nextstate, or dbstate under the debugger) to have a custom pp function that saves the previous value of PL_curcop, calls the default pp function, and then restores PL_curcop. That means that getline and getlines run with the caller’s compile- time hints. Another way to see it is that getline and getlines’s own lexical hints are never activated. (A state op carries all the lexical pragmata. Every statement has one. When any op executes, it’s ‘pp’ function is called. pp_nextstate and pp_dbstate both set PL_curcop to the op itself. Any code that checks hints looks at PL_curcop, which contains the current run-time hints.) The problem with this approach is that the (current) design and implementation of PL_check hooks is actually not threadsafe. There's one array (as a global), which is used by all interpreters in the process. But as the code added to IO.xs demonstrates, realistically it needs to be possible to change the hook just for this interpreter. GH #14816 has a fix for that bug for blead. However, it will be tricky (to impossible) to backport to earlier perl versions. Hence it's also worthwhile to change IO.xs to use a different approach to solve the original bug. As described above, the bug is fixed by having the readline OP (that implements getline() and getlines()) see the caller's lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't have any lexical hints of their own. getline() and getlines() are very simple, mostly parameter checking, ending with a one line that maps to a single core OP, whose values are directly returned. Hence "all" we need to do re-implement the Perl code as XS. This might look easy, but turns out to be trickier than expected. There isn't any API to be called for the OP in question, pp_readline(). The body of the OP inspects interpreter state, it directly calls pp_rv2gv() which also inspects state, and then it tail calls Perl_do_readline(), which inspects state. The easiest approach seems to be to set up enough state, and then call pp_readline() directly. This leaves us very tightly coupled to the internals, but so do all other approaches to try to tackle this bug. The current implementation of PL_check (and possibly other arrays) still needs to be addressed.
I believe this was fixed by f172023. |
So when I build a threaded perl at blead and re-run the program submitted back in 2017 by @dur-randir,
I now get a silent infinite loop. Is that what I should be expecting? If so, then the original complaint in this ticket has been addressed and the ticket can be closed. Thank you very much. |
The original issue here is module loading, and it looks good now. wrap_op_checker is a different concern that needs it's own ticket. |
On 5/7/20 6:47 PM, Sergey Aleynikov wrote:
The original issue here is module loading, and it looks good now.
wrap_op_checker is a different concern that needs it's own ticket.
Would you be able to open such a new ticket?
Thanks.
jimk
|
Looks like i've skipped some bits here, and the merged solution doesn't involve changing PL_check (and hence avoid wrap_op_check problems). But nevertheless that commit says "The current implementation of PL_check (and possibly other arrays) still needs to be addressed.". But i'm not sure what was implied - reinstantiate the original change early enough to have time to change api + modules? Different approach? It'd be better to ask Nicholas. |
@nwc10, can you comment on the above? Thanks. |
Open #17806 to cover the remaining issue here. |
Migrated from rt.perl.org#125685 (status was 'open')
Searchable as RT125685$
The text was updated successfully, but these errors were encountered: