New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in pp_* #12612
Comments
From @bulk88See attached patch. I did *NOT* test this on a unix OS. My use of |
From @bulk880001-create-dPERL_ASYNC_CHECK-move-PERL_ASYNC_CHECKs-arou.patchFrom 7112b4d56ad596bf753ec037e227603c762572ac Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Fri, 23 Nov 2012 12:16:43 -0500
Subject: [PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in
pp_*
The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK. This should reduce use of non-vol regs, or decrease
the stack frame of the opcode. On Win32 PERL_ASYNC_CHECK reads 2 interp
vars, Isys_intern.poll_count at 0x680/1664, Isig_pending at 0x27C/636
(these are bad positions for caching), but these 2 are never read again or
needed for any reason in the rest of the opcode. In the old code, dSP was
calculated and had to be saved across the potential win32_async_check
call. This now is not done.
Rather than open new scopes and whitespace changes, or separate out the
initializations from the declarations, a dPERL_ASYNC_CHECK is introduced.
On Visual C as an inline static, half the static dPERL_ASYNC_CHECKs
refused to inline so a different implementation was tried in win32.h.
Visual C has a __forceinline and a __inline. Win32 Perl currently uses
__inline. I choose to work around that and not touch or investigate
that issue.
The return 0 of S_async_check_get is meaningless. Some non x86 platforms
make it easier/faster to return a 0 than any other constant. U8 is
picked in case an actual 0 const has to be specified on a particular
CPU. U8 may or may not be zero extended to a machine word depending on ABI.
In any case, in theory all this should optimize away with the inlining
on a decent compiler. xPeRlAc was chosen from other examples of macros
that create private autos using almost random mixed case identifiers.
Ac=async check.
A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32 version.
In pp_nextstate a line was moved to merge 2 reads of PL_op into 1 read
since FREETMPS and PERL_ASYNC_CHECK make calls.
PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed. This patch was not tested on a Unix Perl, so bugs or less than
ideal behaviour or optimizations might occur.
Some size changes I observed on VC 2003 x86 32 bit. Some funcs had no
change in size since a nonvol reg is still saved and used later on
somewhere in the funcs body. Even with no size change, it should still be
a microscopic bit faster.
Perl_pp_dbstate no change
Perl_pp_leaveeval no change
Perl_pp_leavetry no change
Perl_pp_substcont 0x5BF to 0x5B2
Perl_pp_nextstate 0x6E to 0x6B
Perl_pp_or no change
Perl_pp_subst 0xB89 to 0xB88
---
inline.h | 11 +++++++++++
perl.h | 10 ++++++++++
pp_ctl.c | 16 ++++++++--------
pp_hot.c | 13 +++++++------
win32/win32.h | 4 ++++
5 files changed, 40 insertions(+), 14 deletions(-)
diff --git a/inline.h b/inline.h
index 0d53860..1a8858a 100644
--- a/inline.h
+++ b/inline.h
@@ -122,3 +122,14 @@ S_croak_memory_wrap(void)
{
Perl_croak_nocontext("%s",PL_memory_wrap);
}
+
+/* ------------------------------- perl.h ------------------------------- */
+
+/* part of dPERL_ASYNC_CHECK, return value is meaningless and should optimize away */
+PERL_STATIC_INLINE U8
+S_async_check_get(pTHX)
+{
+ PERL_ASYNC_CHECK();
+ return 0;
+}
+
diff --git a/perl.h b/perl.h
index cab4eba..2fb53a7 100644
--- a/perl.h
+++ b/perl.h
@@ -5471,6 +5471,16 @@ typedef struct am_table_short AMTS;
# define PERL_ASYNC_CHECK() NOOP
#endif
+#ifndef PERL_MICRO
+# ifndef dPERL_ASYNC_CHECK
+# define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = S_async_check_get(aTHX)
+# endif
+#else
+# ifndef dPERL_ASYNC_CHECK
+# define dPERL_ASYNC_CHECK dNOOP
+# endif
+#endif
+
/*
* On some operating systems, a memory allocation may succeed,
* but put the process too close to the system's comfort limit.
diff --git a/pp_ctl.c b/pp_ctl.c
index c9e4ac4..06c4fe1 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -176,6 +176,7 @@ PP(pp_regcomp)
PP(pp_substcont)
{
dVAR;
+ dPERL_ASYNC_CHECK;
dSP;
PERL_CONTEXT *cx = &cxstack[cxstack_ix];
PMOP * const pm = (PMOP*) cLOGOP->op_other;
@@ -187,8 +188,6 @@ PP(pp_substcont)
SV *nsv = NULL;
REGEXP *old = PM_GETRE(pm);
- PERL_ASYNC_CHECK();
-
if(old != rx) {
if(old)
ReREFCNT_dec(old);
@@ -1952,13 +1951,12 @@ PP(pp_reset)
PP(pp_dbstate)
{
dVAR;
+ PERL_ASYNC_CHECK();
PL_curcop = (COP*)PL_op;
TAINT_NOT; /* Each statement is presumed innocent */
PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
FREETMPS;
- PERL_ASYNC_CHECK();
-
if (PL_op->op_flags & OPf_SPECIAL /* breakpoint */
|| SvIV(PL_DBsingle) || SvIV(PL_DBsignal) || SvIV(PL_DBtrace))
{
@@ -4170,7 +4168,9 @@ PP(pp_entereval)
PP(pp_leaveeval)
{
- dVAR; dSP;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
SV **newsp;
PMOP *newpm;
I32 gimme;
@@ -4181,7 +4181,6 @@ PP(pp_leaveeval)
SV *namesv;
CV *evalcv;
- PERL_ASYNC_CHECK();
POPBLOCK(cx,newpm);
POPEVAL(cx);
namesv = cx->blk_eval.old_namesv;
@@ -4275,14 +4274,15 @@ PP(pp_entertry)
PP(pp_leavetry)
{
- dVAR; dSP;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
SV **newsp;
PMOP *newpm;
I32 gimme;
PERL_CONTEXT *cx;
I32 optype;
- PERL_ASYNC_CHECK();
POPBLOCK(cx,newpm);
POPEVAL(cx);
PERL_UNUSED_VAR(optype);
diff --git a/pp_hot.c b/pp_hot.c
index 977e22f..07db5d1 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -48,11 +48,11 @@ PP(pp_const)
PP(pp_nextstate)
{
dVAR;
- PL_curcop = (COP*)PL_op;
TAINT_NOT; /* Each statement is presumed innocent */
PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
FREETMPS;
PERL_ASYNC_CHECK();
+ PL_curcop = (COP*)PL_op;
return NORMAL;
}
@@ -478,8 +478,9 @@ PP(pp_preinc)
PP(pp_or)
{
- dVAR; dSP;
- PERL_ASYNC_CHECK();
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
if (SvTRUE(TOPs))
RETURN;
else {
@@ -2128,7 +2129,9 @@ pp_match is just a simpler version of the above.
PP(pp_subst)
{
- dVAR; dSP; dTARG;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP; dTARG;
PMOP *pm = cPMOP;
PMOP *rpm = pm;
char *s;
@@ -2158,8 +2161,6 @@ PP(pp_subst)
/* known replacement string? */
SV *dstr = (pm->op_pmflags & PMf_CONST) ? POPs : NULL;
- PERL_ASYNC_CHECK();
-
if (PL_op->op_flags & OPf_STACKED)
TARG = POPs;
else if (PL_op->op_private & OPpTARGET_MY)
diff --git a/win32/win32.h b/win32/win32.h
index 3065867..16423bd 100644
--- a/win32/win32.h
+++ b/win32/win32.h
@@ -465,6 +465,10 @@ DllExport int win32_async_check(pTHX);
#define WIN32_POLL_INTERVAL 32768
#define PERL_ASYNC_CHECK() if (w32_do_async || PL_sig_pending) win32_async_check(aTHX)
+/* PERL_STATIC_INLINE (__inline) on Visual C does not actually inline reliably
+ enough, so use an expression instead */
+#define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = \
+ (((w32_do_async || PL_sig_pending) ? win32_async_check(aTHX): 0), 0)
#define w32_perlshell_tokens (PL_sys_intern.perlshell_tokens)
#define w32_perlshell_vec (PL_sys_intern.perlshell_vec)
--
1.7.9.msysgit.0
|
From @LeontOn Fri, Nov 23, 2012 at 6:26 PM, bulk88 <perlbug-followup@perl.org> wrote:
This sounds like a premature optimization to me. I'm not so fond of
But what are the other effects of doing this? I'm not entirely sure Leon |
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88On Fri Nov 23 10:43:21 2012, LeonT wrote:
I am researching the effects of the curcop move in nextstate on caller() -- |
From @bulk88On Fri Nov 23 14:41:50 2012, bulk88 wrote:
Bump. -- |
From @iabynOn Thu, Nov 29, 2012 at 12:37:16AM -0800, bulk88 via RT wrote:
Your question is fairly incomprehensible. What 'release'? What do you mean Anyway, taking a guess at what you're asking: Perl line-number reporting is currently very poor and misleading, since it PL_curcop isn't only there to get the current line number; it also does eval '$x; 1'; # succeeds -- |
From @bulk88On Thu Nov 29 08:37:46 2012, davem wrote:
major (5.x.0) or maintenance (5.0.x) release.
The line numbers that the Perl caller function produces for a given
What is the ideal line number system then? What is "getting worse"? -- |
From @demerphqOn 29 November 2012 18:21, bulk88 via RT <perlbug-followup@perl.org> wrote:
Currently when an error or warning occurs in code we occasionally Worse would be getting it incorrect more often. Ideal would be always getting it right - It is very frustrating, cheers, -- |
From @iabynOn Thu, Nov 29, 2012 at 07:24:38PM +0100, demerphq wrote:
For example: #!/usr/bin/perl gives Use of uninitialized value $a in numeric gt (>) at /tmp/p line 7. because when the conditional is executed for the second time, the last cop -- |
From @bulk88On Fri Nov 23 10:43:21 2012, LeonT wrote:
Created a script (Win32 specific) that shows a difference on the -- |
From @bulk88Press any key to continue . . . |
From @bulk88Press any key to continue . . . |
From @bulk88On Wed Dec 05 12:51:05 2012, bulk88 wrote: Bumping. Will this be committed for 5.18 or not? The 2013-01-20 code freeze is -- |
From @LeontOn Wed, Jan 16, 2013 at 3:10 AM, bulk88 via RT
I'd say no. This really doesn't feel solid to me. Leon |
From @bulk88Patch remade. Removed the controversial PL_curcop change in nextstate. |
From @bulk880001-create-dPERL_ASYNC_CHECK-move-PERL_ASYNC_CHECKs-arou.patchFrom 82e5000bbdd709ef9615d77845ce12fd3a77d2eb Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Wed, 23 Jan 2013 13:17:14 -0500
Subject: [PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in
pp_*
The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK. This should reduce use of non-vol regs, or decrease
the stack frame of the opcode. On Win32 PERL_ASYNC_CHECK reads 2 interp
vars, Isys_intern.poll_count at 0x680/1664, Isig_pending at 0x27C/636
(these are bad positions for caching), but these 2 are never read again or
needed for any reason in the rest of the opcode. In the old code, dSP was
calculated and had to be saved across the potential win32_async_check
call. This now is not done.
Rather than open new scopes and whitespace changes, or separate out the
initializations from the declarations, a dPERL_ASYNC_CHECK is introduced.
On Visual C as an inline static, half the static dPERL_ASYNC_CHECKs
refused to inline so a different implementation was tried in win32.h.
Visual C has a __forceinline and a __inline. Win32 Perl currently uses
__inline. I choose to work around that and not touch or investigate
that issue.
The return 0 of S_async_check_get is meaningless. Some non x86 platforms
make it easier/faster to return a 0 than any other constant. U8 is
picked in case an actual 0 const has to be specified on a particular
CPU. U8 may or may not be zero extended to a machine word depending on ABI.
In any case, in theory all this should optimize away with the inlining
on a decent compiler. xPeRlAc was chosen from other examples of macros
that create private autos using almost random mixed case identifiers.
Ac=async check.
A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32 version.
PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed. This patch was not tested on a Unix Perl, so bugs or less than
ideal behaviour or optimizations might occur.
Some size changes I observed on VC 2003 x86 32 bit. Some funcs had no
change in size since a nonvol reg is still saved and used later on
somewhere in the funcs body. Even with no size change, it should still be
a microscopic bit faster.
Perl_pp_dbstate no change
Perl_pp_leaveeval no change
Perl_pp_leavetry no change
Perl_pp_substcont 0x5BF to 0x5B2
Perl_pp_or no change
Perl_pp_subst 0xB89 to 0xB88
---
inline.h | 11 +++++++++++
perl.h | 10 ++++++++++
pp_ctl.c | 16 ++++++++--------
pp_hot.c | 11 ++++++-----
win32/win32.h | 4 ++++
5 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/inline.h b/inline.h
index 85bdc74..f5bf557 100644
--- a/inline.h
+++ b/inline.h
@@ -139,6 +139,17 @@ S_croak_memory_wrap(void)
#pragma clang diagnostic pop
#endif
+/* ------------------------------- perl.h ------------------------------- */
+
+/* part of dPERL_ASYNC_CHECK, return value is meaningless and should
+ * optimize away */
+PERL_STATIC_INLINE U8
+S_async_check_get(pTHX)
+{
+ PERL_ASYNC_CHECK();
+ return 0;
+}
+
/* ------------------------------- utf8.h ------------------------------- */
/* These exist only to replace the macros they formerly were so that their use
diff --git a/perl.h b/perl.h
index 29c4425..c5f919c 100644
--- a/perl.h
+++ b/perl.h
@@ -5497,6 +5497,16 @@ typedef struct am_table_short AMTS;
# define PERL_ASYNC_CHECK() NOOP
#endif
+#ifndef PERL_MICRO
+# ifndef dPERL_ASYNC_CHECK
+# define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = S_async_check_get(aTHX)
+# endif
+#else
+# ifndef dPERL_ASYNC_CHECK
+# define dPERL_ASYNC_CHECK dNOOP
+# endif
+#endif
+
/*
* On some operating systems, a memory allocation may succeed,
* but put the process too close to the system's comfort limit.
diff --git a/pp_ctl.c b/pp_ctl.c
index d5028c8..1501614 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -176,6 +176,7 @@ PP(pp_regcomp)
PP(pp_substcont)
{
dVAR;
+ dPERL_ASYNC_CHECK;
dSP;
PERL_CONTEXT *cx = &cxstack[cxstack_ix];
PMOP * const pm = (PMOP*) cLOGOP->op_other;
@@ -187,8 +188,6 @@ PP(pp_substcont)
SV *nsv = NULL;
REGEXP *old = PM_GETRE(pm);
- PERL_ASYNC_CHECK();
-
if(old != rx) {
if(old)
ReREFCNT_dec(old);
@@ -1952,13 +1951,12 @@ PP(pp_reset)
PP(pp_dbstate)
{
dVAR;
+ PERL_ASYNC_CHECK();
PL_curcop = (COP*)PL_op;
TAINT_NOT; /* Each statement is presumed innocent */
PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
FREETMPS;
- PERL_ASYNC_CHECK();
-
if (PL_op->op_flags & OPf_SPECIAL /* breakpoint */
|| SvIV(PL_DBsingle) || SvIV(PL_DBsignal) || SvIV(PL_DBtrace))
{
@@ -4174,7 +4172,9 @@ PP(pp_entereval)
PP(pp_leaveeval)
{
- dVAR; dSP;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
SV **newsp;
PMOP *newpm;
I32 gimme;
@@ -4185,7 +4185,6 @@ PP(pp_leaveeval)
SV *namesv;
CV *evalcv;
- PERL_ASYNC_CHECK();
POPBLOCK(cx,newpm);
POPEVAL(cx);
namesv = cx->blk_eval.old_namesv;
@@ -4279,14 +4278,15 @@ PP(pp_entertry)
PP(pp_leavetry)
{
- dVAR; dSP;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
SV **newsp;
PMOP *newpm;
I32 gimme;
PERL_CONTEXT *cx;
I32 optype;
- PERL_ASYNC_CHECK();
POPBLOCK(cx,newpm);
POPEVAL(cx);
PERL_UNUSED_VAR(optype);
diff --git a/pp_hot.c b/pp_hot.c
index ad99c42..198168d 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -481,8 +481,9 @@ PP(pp_preinc)
PP(pp_or)
{
- dVAR; dSP;
- PERL_ASYNC_CHECK();
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP;
if (SvTRUE(TOPs))
RETURN;
else {
@@ -2140,7 +2141,9 @@ pp_match is just a simpler version of the above.
PP(pp_subst)
{
- dVAR; dSP; dTARG;
+ dVAR;
+ dPERL_ASYNC_CHECK;
+ dSP; dTARG;
PMOP *pm = cPMOP;
PMOP *rpm = pm;
char *s;
@@ -2170,8 +2173,6 @@ PP(pp_subst)
/* known replacement string? */
SV *dstr = (pm->op_pmflags & PMf_CONST) ? POPs : NULL;
- PERL_ASYNC_CHECK();
-
if (PL_op->op_flags & OPf_STACKED)
TARG = POPs;
else if (PL_op->op_private & OPpTARGET_MY)
diff --git a/win32/win32.h b/win32/win32.h
index 3065867..16423bd 100644
--- a/win32/win32.h
+++ b/win32/win32.h
@@ -465,6 +465,10 @@ DllExport int win32_async_check(pTHX);
#define WIN32_POLL_INTERVAL 32768
#define PERL_ASYNC_CHECK() if (w32_do_async || PL_sig_pending) win32_async_check(aTHX)
+/* PERL_STATIC_INLINE (__inline) on Visual C does not actually inline reliably
+ enough, so use an expression instead */
+#define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = \
+ (((w32_do_async || PL_sig_pending) ? win32_async_check(aTHX): 0), 0)
#define w32_perlshell_tokens (PL_sys_intern.perlshell_tokens)
#define w32_perlshell_vec (PL_sys_intern.perlshell_vec)
--
1.7.9.msysgit.0
|
From @iabynOn Wed, Jan 23, 2013 at 03:36:00PM -0800, bulk88 via RT wrote:
I'm sorry to say this again, but I'm really not keen on this patch. It's another nano-optimisation, which doesn't seem to offer much in the Perl_pp_substcont 0x5BF to 0x5B2 Both these are heavyweight functions that call out to the regex engine, Set against this are the rather hacky techniques needed to achieve this There's also the issue that the patch incorporates functional changes, So, my overall feelings are: 1) this patch is not a good idea, because the additional code complexity 2) If the patch (or something similar) were to be applied, then it should 3) Ditto if it were applied, I would want the patch to have better code /* dPERL_ASYNC_CHECK allows you to perform a PERL_ASYNC_CHECK within Lastly, note that the patch as-is doesn't compile on linux: cc -fstack-protector -L/usr/local/lib -o miniperl \ At this point I should note that I really, *really* don't want to discourage -- |
From @bulk88On Mon Jan 28 08:12:45 2013, davem wrote:
On other non-x86 CPUs or LLVM/clang (next gen compilers) or clang to JVM
Comments were added to explain the hacks. I think its better than
Safe signals are randomly fired. Previously async_check was called much ______________________________________________________
______________________________________________________________ I don't think this is user visible.
Good point. I agree. I forgot. Someone could argue to use git blame, but
Hmmm, I'll have to get a POSIX box running to reproduce/fix that. Will
They aren't random. My formula is to look at random machine code, find a
I also try to invest time into things which I think nobody else will I've tried C profilers in the past but they were too noisy (the overhead I see Perl as more of a DB engine than a multimedia/math engine. I dont Alot of my tweaks aren't specific to Perl (RE, optree), but any C code. -- |
From @demerphqOn 28 January 2013 19:42, bulk88 via RT <perlbug-followup@perl.org> wrote:
FWIW, I think the regex-engine-pig could be taught to fly. If you Yves -- |
From @bulk88On Mon Jan 28 10:42:43 2013, bulk88 wrote:
TLDR: In some cases, code size in this patch, will only decrease, if Long version: Another possibility there was no change in instruction |
From @tonycozOn Mon Jan 28 10:42:43 2013, bulk88 wrote:
This works on Win32 but fails elsewhere because win32/win32.h defines One thing I noticed, you've performed this optimization for pp_or, did Since the patch, as it stands, breaks the build and provides little Tony |
From @bulk88On Wed Jul 31 23:40:26 2013, tonyc wrote:
If I had time to do the pp_or thing to pp_and, it would be in a -- |
1 similar comment
From @bulk88On Wed Jul 31 23:40:26 2013, tonyc wrote:
If I had time to do the pp_or thing to pp_and, it would be in a -- |
From @tonycozOn Wed Jul 31 23:40:26 2013, tonyc wrote:
And rejecting the ticket too. Tony |
@tonycoz - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#115894 (status was 'rejected')
Searchable as RT115894$
The text was updated successfully, but these errors were encountered: