Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in pp_* #12612

Closed
p5pRT opened this issue Nov 23, 2012 · 27 comments
Closed

[PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in pp_* #12612

p5pRT opened this issue Nov 23, 2012 · 27 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Nov 23, 2012

Migrated from rt.perl.org#115894 (status was 'rejected')

Searchable as RT115894$

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 2012

From @bulk88

See attached patch. I did *NOT* test this on a unix OS. My use of
PERL_UNUSED_DECL may also be wrong since I did not compile this with
GCC. Creating a dPERL_ASYNC_CHECK was the lesser of the other evils as
mentioned in the commit message. There are some suggestions and ideas in
the commit message which I cant implement but someone else might be
interested in implementing.

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 2012

From @bulk88

0001-create-dPERL_ASYNC_CHECK-move-PERL_ASYNC_CHECKs-arou.patch
From 7112b4d56ad596bf753ec037e227603c762572ac Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Fri, 23 Nov 2012 12:16:43 -0500
Subject: [PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in
 pp_*

The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK. This should reduce use of non-vol regs, or decrease
the stack frame of the opcode. On Win32 PERL_ASYNC_CHECK reads 2 interp
vars, Isys_intern.poll_count at 0x680/1664, Isig_pending at 0x27C/636
(these are bad positions for caching), but these 2 are never read again or
needed for any reason in the rest of the opcode. In the old code, dSP was
calculated and had to be saved across the potential win32_async_check
call. This now is not done.

Rather than open new scopes and whitespace changes, or separate out the
initializations from the declarations, a dPERL_ASYNC_CHECK is introduced.

On Visual C as an inline static, half the static dPERL_ASYNC_CHECKs
refused to inline so a different implementation was tried in win32.h.
Visual C has a __forceinline and a __inline. Win32 Perl currently uses
__inline. I choose to work around that and not touch or investigate
that issue.

The return 0 of S_async_check_get is meaningless. Some non x86 platforms
make it easier/faster to return a 0 than any other constant. U8 is
picked in case an actual 0 const has to be specified on a particular
CPU. U8 may or may not be zero extended to a machine word depending on ABI.
In any case, in theory all this should optimize away with the inlining
on a decent compiler. xPeRlAc was chosen from other examples of macros
that create private autos using almost random mixed case identifiers.
Ac=async check.

A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32 version.

In pp_nextstate a line was moved to merge 2 reads of PL_op into 1 read
since FREETMPS and PERL_ASYNC_CHECK make calls.

PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed. This patch was not tested on a Unix Perl, so bugs or less than
ideal behaviour or optimizations might occur.

Some size changes I observed on VC 2003 x86 32 bit. Some funcs had no
change in size since a nonvol reg is still saved and used later on
somewhere in the funcs body. Even with no size change, it should still be
a microscopic bit faster.

Perl_pp_dbstate no change
Perl_pp_leaveeval no change
Perl_pp_leavetry no change
Perl_pp_substcont 0x5BF to 0x5B2
Perl_pp_nextstate 0x6E to 0x6B
Perl_pp_or no change
Perl_pp_subst 0xB89 to 0xB88
---
 inline.h      |   11 +++++++++++
 perl.h        |   10 ++++++++++
 pp_ctl.c      |   16 ++++++++--------
 pp_hot.c      |   13 +++++++------
 win32/win32.h |    4 ++++
 5 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/inline.h b/inline.h
index 0d53860..1a8858a 100644
--- a/inline.h
+++ b/inline.h
@@ -122,3 +122,14 @@ S_croak_memory_wrap(void)
 {
     Perl_croak_nocontext("%s",PL_memory_wrap);
 }
+
+/* ------------------------------- perl.h ------------------------------- */
+
+/* part of dPERL_ASYNC_CHECK, return value is meaningless and should optimize away */
+PERL_STATIC_INLINE U8
+S_async_check_get(pTHX)
+{
+    PERL_ASYNC_CHECK();
+    return 0;
+}
+
diff --git a/perl.h b/perl.h
index cab4eba..2fb53a7 100644
--- a/perl.h
+++ b/perl.h
@@ -5471,6 +5471,16 @@ typedef struct am_table_short AMTS;
 #   define PERL_ASYNC_CHECK()  NOOP
 #endif
 
+#ifndef PERL_MICRO
+#  ifndef dPERL_ASYNC_CHECK
+#    define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = S_async_check_get(aTHX)
+#  endif
+#else
+#  ifndef dPERL_ASYNC_CHECK
+#    define dPERL_ASYNC_CHECK dNOOP
+#  endif
+#endif
+
 /*
  * On some operating systems, a memory allocation may succeed,
  * but put the process too close to the system's comfort limit.
diff --git a/pp_ctl.c b/pp_ctl.c
index c9e4ac4..06c4fe1 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -176,6 +176,7 @@ PP(pp_regcomp)
 PP(pp_substcont)
 {
     dVAR;
+    dPERL_ASYNC_CHECK;
     dSP;
     PERL_CONTEXT *cx = &cxstack[cxstack_ix];
     PMOP * const pm = (PMOP*) cLOGOP->op_other;
@@ -187,8 +188,6 @@ PP(pp_substcont)
     SV *nsv = NULL;
     REGEXP *old = PM_GETRE(pm);
 
-    PERL_ASYNC_CHECK();
-
     if(old != rx) {
 	if(old)
 	    ReREFCNT_dec(old);
@@ -1952,13 +1951,12 @@ PP(pp_reset)
 PP(pp_dbstate)
 {
     dVAR;
+    PERL_ASYNC_CHECK();
     PL_curcop = (COP*)PL_op;
     TAINT_NOT;		/* Each statement is presumed innocent */
     PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
     FREETMPS;
 
-    PERL_ASYNC_CHECK();
-
     if (PL_op->op_flags & OPf_SPECIAL /* breakpoint */
 	    || SvIV(PL_DBsingle) || SvIV(PL_DBsignal) || SvIV(PL_DBtrace))
     {
@@ -4170,7 +4168,9 @@ PP(pp_entereval)
 
 PP(pp_leaveeval)
 {
-    dVAR; dSP;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     SV **newsp;
     PMOP *newpm;
     I32 gimme;
@@ -4181,7 +4181,6 @@ PP(pp_leaveeval)
     SV *namesv;
     CV *evalcv;
 
-    PERL_ASYNC_CHECK();
     POPBLOCK(cx,newpm);
     POPEVAL(cx);
     namesv = cx->blk_eval.old_namesv;
@@ -4275,14 +4274,15 @@ PP(pp_entertry)
 
 PP(pp_leavetry)
 {
-    dVAR; dSP;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     SV **newsp;
     PMOP *newpm;
     I32 gimme;
     PERL_CONTEXT *cx;
     I32 optype;
 
-    PERL_ASYNC_CHECK();
     POPBLOCK(cx,newpm);
     POPEVAL(cx);
     PERL_UNUSED_VAR(optype);
diff --git a/pp_hot.c b/pp_hot.c
index 977e22f..07db5d1 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -48,11 +48,11 @@ PP(pp_const)
 PP(pp_nextstate)
 {
     dVAR;
-    PL_curcop = (COP*)PL_op;
     TAINT_NOT;		/* Each statement is presumed innocent */
     PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
     FREETMPS;
     PERL_ASYNC_CHECK();
+    PL_curcop = (COP*)PL_op;
     return NORMAL;
 }
 
@@ -478,8 +478,9 @@ PP(pp_preinc)
 
 PP(pp_or)
 {
-    dVAR; dSP;
-    PERL_ASYNC_CHECK();
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     if (SvTRUE(TOPs))
 	RETURN;
     else {
@@ -2128,7 +2129,9 @@ pp_match is just a simpler version of the above.
 
 PP(pp_subst)
 {
-    dVAR; dSP; dTARG;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP; dTARG;
     PMOP *pm = cPMOP;
     PMOP *rpm = pm;
     char *s;
@@ -2158,8 +2161,6 @@ PP(pp_subst)
     /* known replacement string? */
     SV *dstr = (pm->op_pmflags & PMf_CONST) ? POPs : NULL;
 
-    PERL_ASYNC_CHECK();
-
     if (PL_op->op_flags & OPf_STACKED)
 	TARG = POPs;
     else if (PL_op->op_private & OPpTARGET_MY)
diff --git a/win32/win32.h b/win32/win32.h
index 3065867..16423bd 100644
--- a/win32/win32.h
+++ b/win32/win32.h
@@ -465,6 +465,10 @@ DllExport int win32_async_check(pTHX);
 
 #define WIN32_POLL_INTERVAL 32768
 #define PERL_ASYNC_CHECK() if (w32_do_async || PL_sig_pending) win32_async_check(aTHX)
+/* PERL_STATIC_INLINE (__inline) on Visual C does not actually inline reliably
+   enough, so use an expression instead */
+#define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL =  \
+  (((w32_do_async || PL_sig_pending) ? win32_async_check(aTHX): 0), 0)
 
 #define w32_perlshell_tokens	(PL_sys_intern.perlshell_tokens)
 #define w32_perlshell_vec	(PL_sys_intern.perlshell_vec)
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 2012

From @Leont

On Fri, Nov 23, 2012 at 6​:26 PM, bulk88 <perlbug-followup@​perl.org> wrote​:

The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK. This should reduce use of non-vol regs, or decrease
the stack frame of the opcode. On Win32 PERL_ASYNC_CHECK reads 2 interp
vars, Isys_intern.poll_count at 0x680/1664, Isig_pending at 0x27C/636
(these are bad positions for caching), but these 2 are never read again or
needed for any reason in the rest of the opcode. In the old code, dSP was
calculated and had to be saved across the potential win32_async_check
call. This now is not done.

This sounds like a premature optimization to me. I'm not so fond of
playing a human peephole optimizer, specially when it makes the
understanding of the code harder.

A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32 version.

In pp_nextstate a line was moved to merge 2 reads of PL_op into 1 read
since FREETMPS and PERL_ASYNC_CHECK make calls.

PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed. This patch was not tested on a Unix Perl, so bugs or less than
ideal behaviour or optimizations might occur.

But what are the other effects of doing this? I'm not entirely sure
your moving around PL_curcop assignments in particular are necessarily
invisible, I'm worried about the effects at a distance.

Leon

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 2012

From @bulk88

On Fri Nov 23 10​:43​:21 2012, LeonT wrote​:

On Fri, Nov 23, 2012 at 6​:26 PM, bulk88 <perlbug-followup@​perl.org>
This sounds like a premature optimization to me. I'm not so fond of
playing a human peephole optimizer, specially when it makes the
understanding of the code harder.

A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32
version.

In pp_nextstate a line was moved to merge 2 reads of PL_op into 1
read
since FREETMPS and PERL_ASYNC_CHECK make calls.

PERL_ASYNC_CHECKs that were already at the end of a non opcode func,
or
seemed to be called around things like IO that could raise signals,
were
not changed. This patch was not tested on a Unix Perl, so bugs or
less than
ideal behaviour or optimizations might occur.

But what are the other effects of doing this? I'm not entirely sure
your moving around PL_curcop assignments in particular are necessarily
invisible, I'm worried about the effects at a distance.

Leon

I am researching the effects of the curcop move in nextstate on caller()
from tied and blessed scalars. Is there a policy on the release to
release caller line numbers, or accuracy of caller() line numbers in Perl?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2012

From @bulk88

On Fri Nov 23 14​:41​:50 2012, bulk88 wrote​:

I am researching the effects of the curcop move in nextstate on caller()
from tied and blessed scalars. Is there a policy on the release to
release caller line numbers, or accuracy of caller() line numbers in Perl?

Bump.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2012

From @iabyn

On Thu, Nov 29, 2012 at 12​:37​:16AM -0800, bulk88 via RT wrote​:

On Fri Nov 23 14​:41​:50 2012, bulk88 wrote​:

I am researching the effects of the curcop move in nextstate on caller()
from tied and blessed scalars. Is there a policy on the release to
release caller line numbers, or accuracy of caller() line numbers in Perl?

Bump.

Your question is fairly incomprehensible. What 'release'? What do you mean
by 'release caller line numbers'?

Anyway, taking a guess at what you're asking​:

Perl line-number reporting is currently very poor and misleading, since it
only records line numbers for COPs. Ideally this should be fixed at some
point, but certainly we we wouldn't encourage it getting any worse.

PL_curcop isn't only there to get the current line number; it also does
things like hold the current hints state so that things like this work as
expected​:

  eval '$x; 1'; # succeeds
  use strict;
  eval '$y; 1'; # fails

--
"Emacs isn't a bad OS once you get used to it.
It just lacks a decent editor."

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2012

From @bulk88

On Thu Nov 29 08​:37​:46 2012, davem wrote​:

On Thu, Nov 29, 2012 at 12​:37​:16AM -0800, bulk88 via RT wrote​:

On Fri Nov 23 14​:41​:50 2012, bulk88 wrote​:

I am researching the effects of the curcop move in nextstate on
caller()
from tied and blessed scalars. Is there a policy on the release to
release caller line numbers, or accuracy of caller() line numbers
in Perl?

Bump.

Your question is fairly incomprehensible. What 'release'?

major (5.x.0) or maintenance (5.0.x) release.

What do you
mean
by 'release caller line numbers'?

The line numbers that the Perl caller function produces for a given
script at various points throughout the script for a particular release
of Perl.

Anyway, taking a guess at what you're asking​:

Perl line-number reporting is currently very poor and misleading,
since it
only records line numbers for COPs. Ideally this should be fixed at
some
point, but certainly we we wouldn't encourage it getting any worse.

What is the ideal line number system then? What is "getting worse"?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Nov 29, 2012

From @demerphq

On 29 November 2012 18​:21, bulk88 via RT <perlbug-followup@​perl.org> wrote​:

On Thu Nov 29 08​:37​:46 2012, davem wrote​:

On Thu, Nov 29, 2012 at 12​:37​:16AM -0800, bulk88 via RT wrote​:

On Fri Nov 23 14​:41​:50 2012, bulk88 wrote​:

I am researching the effects of the curcop move in nextstate on
caller()
from tied and blessed scalars. Is there a policy on the release to
release caller line numbers, or accuracy of caller() line numbers
in Perl?

Bump.

Your question is fairly incomprehensible. What 'release'?

major (5.x.0) or maintenance (5.0.x) release.

What do you
mean
by 'release caller line numbers'?

The line numbers that the Perl caller function produces for a given
script at various points throughout the script for a particular release
of Perl.

Anyway, taking a guess at what you're asking​:

Perl line-number reporting is currently very poor and misleading,
since it
only records line numbers for COPs. Ideally this should be fixed at
some
point, but certainly we we wouldn't encourage it getting any worse.

What is the ideal line number system then? What is "getting worse"?

Currently when an error or warning occurs in code we occasionally
incorrectly report the line the error is from. I don't remember the
exact cases but I recall it has to do with compound statements or code
that has been optimized away.

Worse would be getting it incorrect more often.

Ideal would be always getting it right - It is very frustrating,
especially for beginners, to see an error message that refers to a
variable on a line that does not mention the variable.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 2012

From @iabyn

On Thu, Nov 29, 2012 at 07​:24​:38PM +0100, demerphq wrote​:

Currently when an error or warning occurs in code we occasionally
incorrectly report the line the error is from. I don't remember the
exact cases but I recall it has to do with compound statements or code
that has been optimized away.

For example​:

  #!/usr/bin/perl
  use warnings;
  $a = 1;
  while ($a > 0) { # line 4
  my $foo;
  my $bar;
  undef $a; # line 7
  }

gives

  Use of uninitialized value $a in numeric gt (>) at /tmp/p line 7.

because when the conditional is executed for the second time, the last cop
that was executed was the nextstate prefixing the 'undef'.

--
The Enterprise's efficient long-range scanners detect a temporal vortex
distortion in good time, allowing it to be safely avoided via a minor
course correction.
  -- Things That Never Happen in "Star Trek" #21

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2012

From @bulk88

On Fri Nov 23 10​:43​:21 2012, LeonT wrote​:

But what are the other effects of doing this? I'm not entirely sure
your moving around PL_curcop assignments in particular are necessarily
invisible, I'm worried about the effects at a distance.

Leon

Created a script (Win32 specific) that shows a difference on the
PL_curcop assignment move in pp_nextstate at script level. After the
PL_curcop move, during the freetmps between statements, the line number
is at the line where the mortal was created/mortalized, not at the line
after the line where the mortal was created/mortalized. So what is the
consensus on the line number change shown in the before/after files?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2012

From @bulk88

mg.pl

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2012

From @bulk88

Press any key to continue . . .
in destroy genid = 2
$VAR1 = [
  [
  "main",
  "mg.pl",
  20,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  20,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
in destroy genid = 1
$VAR1 = [
  [
  "main",
  "mg.pl",
  20,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  20,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
Press any key to continue . . .
in destroy genid = 3
$VAR1 = [
  [
  "main",
  "mg.pl",
  22,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  22,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
Press any key to continue . . .
in destroy genid = 4
$VAR1 = [
  [
  "main",
  "mg.pl",
  26,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
5.017007

@p5pRT
Copy link
Author

p5pRT commented Dec 5, 2012

From @bulk88

Press any key to continue . . .
in destroy genid = 2
$VAR1 = [
  [
  "main",
  "mg.pl",
  21,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  21,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
in destroy genid = 1
$VAR1 = [
  [
  "main",
  "mg.pl",
  21,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  21,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
Press any key to continue . . .
in destroy genid = 3
$VAR1 = [
  [
  "main",
  "mg.pl",
  23,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  23,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
Press any key to continue . . .
in destroy genid = 4
$VAR1 = [
  [
  "main",
  "mg.pl",
  26,
  "tiedclass​::DESTROY",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "(eval)",
  0,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ],
  [
  "main",
  "mg.pl",
  26,
  "main​::run",
  1,
  undef,
  undef,
  undef,
  2018,
  "UUUUUUUUUUUUUU",
  undef
  ]
  ];
5.017007

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2013

From @bulk88

On Wed Dec 05 12​:51​:05 2012, bulk88 wrote​:

Bumping.

Will this be committed for 5.18 or not? The 2013-01-20 code freeze is
coming soon.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2013

From @Leont

On Wed, Jan 16, 2013 at 3​:10 AM, bulk88 via RT
<perlbug-followup@​perl.org> wrote​:

Bumping.

Will this be committed for 5.18 or not? The 2013-01-20 code freeze is
coming soon.

I'd say no. This really doesn't feel solid to me.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jan 23, 2013

From @bulk88

Patch remade. Removed the controversial PL_curcop change in nextstate.
Please review and apply.
--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jan 23, 2013

From @bulk88

0001-create-dPERL_ASYNC_CHECK-move-PERL_ASYNC_CHECKs-arou.patch
From 82e5000bbdd709ef9615d77845ce12fd3a77d2eb Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Wed, 23 Jan 2013 13:17:14 -0500
Subject: [PATCH] create dPERL_ASYNC_CHECK, move PERL_ASYNC_CHECKs around in
 pp_*

The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK. This should reduce use of non-vol regs, or decrease
the stack frame of the opcode. On Win32 PERL_ASYNC_CHECK reads 2 interp
vars, Isys_intern.poll_count at 0x680/1664, Isig_pending at 0x27C/636
(these are bad positions for caching), but these 2 are never read again or
needed for any reason in the rest of the opcode. In the old code, dSP was
calculated and had to be saved across the potential win32_async_check
call. This now is not done.

Rather than open new scopes and whitespace changes, or separate out the
initializations from the declarations, a dPERL_ASYNC_CHECK is introduced.

On Visual C as an inline static, half the static dPERL_ASYNC_CHECKs
refused to inline so a different implementation was tried in win32.h.
Visual C has a __forceinline and a __inline. Win32 Perl currently uses
__inline. I choose to work around that and not touch or investigate
that issue.

The return 0 of S_async_check_get is meaningless. Some non x86 platforms
make it easier/faster to return a 0 than any other constant. U8 is
picked in case an actual 0 const has to be specified on a particular
CPU. U8 may or may not be zero extended to a machine word depending on ABI.
In any case, in theory all this should optimize away with the inlining
on a decent compiler. xPeRlAc was chosen from other examples of macros
that create private autos using almost random mixed case identifiers.
Ac=async check.

A potential idea is to convert the non Win32 static func
S_async_check_get version to a pure expression like the Win32 version.

PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed. This patch was not tested on a Unix Perl, so bugs or less than
ideal behaviour or optimizations might occur.

Some size changes I observed on VC 2003 x86 32 bit. Some funcs had no
change in size since a nonvol reg is still saved and used later on
somewhere in the funcs body. Even with no size change, it should still be
a microscopic bit faster.

Perl_pp_dbstate no change
Perl_pp_leaveeval no change
Perl_pp_leavetry no change
Perl_pp_substcont 0x5BF to 0x5B2
Perl_pp_or no change
Perl_pp_subst 0xB89 to 0xB88
---
 inline.h      |   11 +++++++++++
 perl.h        |   10 ++++++++++
 pp_ctl.c      |   16 ++++++++--------
 pp_hot.c      |   11 ++++++-----
 win32/win32.h |    4 ++++
 5 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/inline.h b/inline.h
index 85bdc74..f5bf557 100644
--- a/inline.h
+++ b/inline.h
@@ -139,6 +139,17 @@ S_croak_memory_wrap(void)
 #pragma clang diagnostic pop
 #endif
 
+/* ------------------------------- perl.h ------------------------------- */
+
+/* part of dPERL_ASYNC_CHECK, return value is meaningless and should
+ * optimize away */
+PERL_STATIC_INLINE U8
+S_async_check_get(pTHX)
+{
+    PERL_ASYNC_CHECK();
+    return 0;
+}
+
 /* ------------------------------- utf8.h ------------------------------- */
 
 /* These exist only to replace the macros they formerly were so that their use
diff --git a/perl.h b/perl.h
index 29c4425..c5f919c 100644
--- a/perl.h
+++ b/perl.h
@@ -5497,6 +5497,16 @@ typedef struct am_table_short AMTS;
 #   define PERL_ASYNC_CHECK()  NOOP
 #endif
 
+#ifndef PERL_MICRO
+#  ifndef dPERL_ASYNC_CHECK
+#    define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL = S_async_check_get(aTHX)
+#  endif
+#else
+#  ifndef dPERL_ASYNC_CHECK
+#    define dPERL_ASYNC_CHECK dNOOP
+#  endif
+#endif
+
 /*
  * On some operating systems, a memory allocation may succeed,
  * but put the process too close to the system's comfort limit.
diff --git a/pp_ctl.c b/pp_ctl.c
index d5028c8..1501614 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -176,6 +176,7 @@ PP(pp_regcomp)
 PP(pp_substcont)
 {
     dVAR;
+    dPERL_ASYNC_CHECK;
     dSP;
     PERL_CONTEXT *cx = &cxstack[cxstack_ix];
     PMOP * const pm = (PMOP*) cLOGOP->op_other;
@@ -187,8 +188,6 @@ PP(pp_substcont)
     SV *nsv = NULL;
     REGEXP *old = PM_GETRE(pm);
 
-    PERL_ASYNC_CHECK();
-
     if(old != rx) {
 	if(old)
 	    ReREFCNT_dec(old);
@@ -1952,13 +1951,12 @@ PP(pp_reset)
 PP(pp_dbstate)
 {
     dVAR;
+    PERL_ASYNC_CHECK();
     PL_curcop = (COP*)PL_op;
     TAINT_NOT;		/* Each statement is presumed innocent */
     PL_stack_sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
     FREETMPS;
 
-    PERL_ASYNC_CHECK();
-
     if (PL_op->op_flags & OPf_SPECIAL /* breakpoint */
 	    || SvIV(PL_DBsingle) || SvIV(PL_DBsignal) || SvIV(PL_DBtrace))
     {
@@ -4174,7 +4172,9 @@ PP(pp_entereval)
 
 PP(pp_leaveeval)
 {
-    dVAR; dSP;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     SV **newsp;
     PMOP *newpm;
     I32 gimme;
@@ -4185,7 +4185,6 @@ PP(pp_leaveeval)
     SV *namesv;
     CV *evalcv;
 
-    PERL_ASYNC_CHECK();
     POPBLOCK(cx,newpm);
     POPEVAL(cx);
     namesv = cx->blk_eval.old_namesv;
@@ -4279,14 +4278,15 @@ PP(pp_entertry)
 
 PP(pp_leavetry)
 {
-    dVAR; dSP;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     SV **newsp;
     PMOP *newpm;
     I32 gimme;
     PERL_CONTEXT *cx;
     I32 optype;
 
-    PERL_ASYNC_CHECK();
     POPBLOCK(cx,newpm);
     POPEVAL(cx);
     PERL_UNUSED_VAR(optype);
diff --git a/pp_hot.c b/pp_hot.c
index ad99c42..198168d 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -481,8 +481,9 @@ PP(pp_preinc)
 
 PP(pp_or)
 {
-    dVAR; dSP;
-    PERL_ASYNC_CHECK();
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP;
     if (SvTRUE(TOPs))
 	RETURN;
     else {
@@ -2140,7 +2141,9 @@ pp_match is just a simpler version of the above.
 
 PP(pp_subst)
 {
-    dVAR; dSP; dTARG;
+    dVAR;
+    dPERL_ASYNC_CHECK;
+    dSP; dTARG;
     PMOP *pm = cPMOP;
     PMOP *rpm = pm;
     char *s;
@@ -2170,8 +2173,6 @@ PP(pp_subst)
     /* known replacement string? */
     SV *dstr = (pm->op_pmflags & PMf_CONST) ? POPs : NULL;
 
-    PERL_ASYNC_CHECK();
-
     if (PL_op->op_flags & OPf_STACKED)
 	TARG = POPs;
     else if (PL_op->op_private & OPpTARGET_MY)
diff --git a/win32/win32.h b/win32/win32.h
index 3065867..16423bd 100644
--- a/win32/win32.h
+++ b/win32/win32.h
@@ -465,6 +465,10 @@ DllExport int win32_async_check(pTHX);
 
 #define WIN32_POLL_INTERVAL 32768
 #define PERL_ASYNC_CHECK() if (w32_do_async || PL_sig_pending) win32_async_check(aTHX)
+/* PERL_STATIC_INLINE (__inline) on Visual C does not actually inline reliably
+   enough, so use an expression instead */
+#define dPERL_ASYNC_CHECK U8 xPeRlAc PERL_UNUSED_DECL =  \
+  (((w32_do_async || PL_sig_pending) ? win32_async_check(aTHX): 0), 0)
 
 #define w32_perlshell_tokens	(PL_sys_intern.perlshell_tokens)
 #define w32_perlshell_vec	(PL_sys_intern.perlshell_vec)
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2013

From @iabyn

On Wed, Jan 23, 2013 at 03​:36​:00PM -0800, bulk88 via RT wrote​:

Patch remade. Removed the controversial PL_curcop change in nextstate.
Please review and apply.
[snip]
The purpose is to not read any interp members and set any autos before
the PERL_ASYNC_CHECK.

I'm sorry to say this again, but I'm really not keen on this patch.

It's another nano-optimisation, which doesn't seem to offer much in the
way of genuine performance improvement. You list the only two functions
that actually change in code size​:

  Perl_pp_substcont 0x5BF to 0x5B2
  Perl_pp_subst 0xB89 to 0xB88

Both these are heavyweight functions that call out to the regex engine,
and in the case of substcont, call out to perl code. Any performance gain
by reducing the code footprint of these functions (in terms of
not-normally-called bits of code) and thus the instruction cache misses
that might be avoided, is a drop in the ocean against the sheer amount of
instructions and data that these functions would consume.

Set against this are the rather hacky techniques needed to achieve this
modest goal​: a dFOO that doesn't actually declare any variables, and an
inline sub that returns a value that isn't used. Things that make the core
harder to understand are in general not a good idea, unless there's a
strong payback.

There's also the issue that the patch incorporates functional changes,
by changing the place where the async check is performed. The change in
position is only slight, and the chances are that it hasn't affected
anything; but anything that affects the behaviour of signal handlers tends
to be subtle and hard to debug.

So, my overall feelings are​:

1) this patch is not a good idea, because the additional code complexity
it introduces far outweighs any theoretical performance gains.

2) If the patch (or something similar) were to be applied, then it should
be applied after 5.18 has been released, to give it maximum time for
bedding in in development releases.

3) Ditto if it were applied, I would want the patch to have better code
comments​: for example a comment in perl.h to say *what* dPERL_ASYNC_CHECK
is for, e.g.

  /* dPERL_ASYNC_CHECK allows you to perform a PERL_ASYNC_CHECK within
  * the declaration part of function; under some circumstances this can
  * result in smaller and faster code */

Lastly, note that the patch as-is doesn't compile on linux​:

  cc -fstack-protector -L/usr/local/lib -o miniperl \
  perlmini.o opmini.o miniperlmain.o gv.o toke.o perly.o pad.o regcomp.o dump.o util.o mg.o reentr.o mro.o keywords.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o regexec.o utf8.o taint.o deb.o universal.o globals.o perlio.o perlapi.o numeric.o mathoms.o locale.o pp_pack.o pp_sort.o -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  pp_hot.o​: In function `S_async_check_get'​:
  /home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to `PERL_ASYNC_CHECK'
  pp_ctl.o​: In function `S_async_check_get'​:
  /home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to `PERL_ASYNC_CHECK'

At this point I should note that I really, *really* don't want to discourage
you from doing further such work​: you clearly have a good knowledge of the
low-level details of machine code and compiler output, and how to optimise
things; I just worry that too often these talents are mis-aimed at random
parts of the perl core that don't really warrant the attention.

--
Thank God I'm an atheist.....

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2013

From @bulk88

On Mon Jan 28 08​:12​:45 2013, davem wrote​:

On Wed, Jan 23, 2013 at 03​:36​:00PM -0800, bulk88 via RT wrote​:

Patch remade. Removed the controversial PL_curcop change in
nextstate.
Please review and apply.
[snip]
The purpose is to not read any interp members and set any autos
before
the PERL_ASYNC_CHECK.

I'm sorry to say this again, but I'm really not keen on this patch.

It's another nano-optimisation, which doesn't seem to offer much in
the
way of genuine performance improvement. You list the only two
functions
that actually change in code size​:

Perl\_pp\_substcont 0x5BF to 0x5B2
Perl\_pp\_subst     0xB89 to 0xB88

Both these are heavyweight functions that call out to the regex
engine,
and in the case of substcont, call out to perl code. Any performance
gain
by reducing the code footprint of these functions (in terms of
not-normally-called bits of code) and thus the instruction cache
misses
that might be avoided, is a drop in the ocean against the sheer amount
of
instructions and data that these functions would consume.

On other non-x86 CPUs or LLVM/clang (next gen compilers) or clang to JVM
or LLVM to JS to node.js there might be improvements for some reason or
another.

Set against this are the rather hacky techniques needed to achieve
this
modest goal​: a dFOO that doesn't actually declare any variables, and
an
inline sub that returns a value that isn't used. Things that make the
core
harder to understand are in general not a good idea, unless there's a
strong payback.

Comments were added to explain the hacks. I think its better than
whitespace changes. C89, etc.

There's also the issue that the patch incorporates functional changes,
by changing the place where the async check is performed. The change
in
position is only slight, and the chances are that it hasn't affected
anything; but anything that affects the behaviour of signal handlers
tends
to be subtle and hard to debug.

Safe signals are randomly fired. Previously async_check was called much
more often until
http​://perl5.git.perl.org/perl.git/commit/f410a2119920dd04690025a349e79575cfb9c972
. There were a couple places where it looks like async_check was being
called before/after an IO or IPC call on purpose. I didn't move those
for that reason.

______________________________________________________
PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed.
______________________________________________________

So, my overall feelings are​:

1) this patch is not a good idea, because the additional code
complexity
it introduces far outweighs any theoretical performance gains.

2) If the patch (or something similar) were to be applied, then it
should
be applied after 5.18 has been released, to give it maximum time for
bedding in in development releases.

______________________________________________________________
  2013-01-20 5.17.8 Contentious changes freeze
  2013-02-20 5.17.9 User-visible changes freeze
  2013-03-20 5.17.10 Full code freeze
______________________________________________________________

I don't think this is user visible.

3) Ditto if it were applied, I would want the patch to have better
code
comments​: for example a comment in perl.h to say *what*
dPERL_ASYNC_CHECK
is for, e.g.

/\* dPERL\_ASYNC\_CHECK allows you to perform a PERL\_ASYNC\_CHECK

within
* the declaration part of function; under some circumstances this
can
* result in smaller and faster code */

Good point. I agree. I forgot. Someone could argue to use git blame, but
your right. That will have to be fixed by me.

Lastly, note that the patch as-is doesn't compile on linux​:

cc \-fstack\-protector \-L/usr/local/lib \-o miniperl \\
perlmini\.o opmini\.o miniperlmain\.o   gv\.o toke\.o perly\.o pad\.o

regcomp.o dump.o util.o mg.o reentr.o mro.o keywords.o hv.o av.o
run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o
regexec.o utf8.o taint.o deb.o universal.o globals.o perlio.o
perlapi.o numeric.o mathoms.o locale.o pp_pack.o pp_sort.o -lnsl
-ldl -lm -lcrypt -lutil -lpthread -lc
pp_hot.o​: In function `S_async_check_get'​:
/home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to
`PERL_ASYNC_CHECK'
pp_ctl.o​: In function `S_async_check_get'​:
/home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to
`PERL_ASYNC_CHECK'

Hmmm, I'll have to get a POSIX box running to reproduce/fix that. Will
take some time.

At this point I should note that I really, *really* don't want to
discourage
you from doing further such work​: you clearly have a good knowledge of
the
low-level details of machine code and compiler output, and how to
optimise
things;
I just worry that too often these talents are mis-aimed at
random
parts of the perl core that don't really warrant the attention.

They aren't random. My formula is to look at random machine code, find a
problem, then find all other instances of the "problem". The problem is
usually on a C or asm level. I don't venture into parts of the interp I
don't understand, so that is one reason my tweaks are small and not
reengineering the VM as a whole, or changing Perl 5 the lang (something
about swine, and aerodynamic lift).

I just worry that too often these talents are mis-aimed at
random
parts of the perl core that don't really warrant the attention.

I also try to invest time into things which I think nobody else will
ever do. I guess I am sucessful in doing things nobody else will ever do
(not strictly correct, Ive seen FC and Nicholas C do alot of small
random tweaks in the past (last 10 years)).

I've tried C profilers in the past but they were too noisy (the overhead
of 1 func call to the profiler per line of C makes timing for the whole
func useless)/I didn't configure them correctly. On my todo list is to
make a CPAN runloop replacer which times and categorizes each pp_* call.
Min, max, call count, and avg. Custom ops all grouped together. Timing
is probably rdtsc or OS equivalent.

I see Perl as more of a DB engine than a multimedia/math engine. I dont
think you can put SSE vector ops anywhere in the interp. If you can't
remove branches/logic (because you would loose the feature/api as a
whole), you might as well remove instructions (me) without removing API.

Alot of my tweaks aren't specific to Perl (RE, optree), but any C code.
I don't think I've ever added or removed a struct member to the interp
(tweak vs reengineering concept) for that reason (tweak vs reengineering
concept).

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2013

From @demerphq

On 28 January 2013 19​:42, bulk88 via RT <perlbug-followup@​perl.org> wrote​:

On Mon Jan 28 08​:12​:45 2013, davem wrote​:

On Wed, Jan 23, 2013 at 03​:36​:00PM -0800, bulk88 via RT wrote​:

Patch remade. Removed the controversial PL_curcop change in
nextstate.
Please review and apply.
[snip]
The purpose is to not read any interp members and set any autos
before
the PERL_ASYNC_CHECK.

I'm sorry to say this again, but I'm really not keen on this patch.

It's another nano-optimisation, which doesn't seem to offer much in
the
way of genuine performance improvement. You list the only two
functions
that actually change in code size​:

Perl\_pp\_substcont 0x5BF to 0x5B2
Perl\_pp\_subst     0xB89 to 0xB88

Both these are heavyweight functions that call out to the regex
engine,
and in the case of substcont, call out to perl code. Any performance
gain
by reducing the code footprint of these functions (in terms of
not-normally-called bits of code) and thus the instruction cache
misses
that might be avoided, is a drop in the ocean against the sheer amount
of
instructions and data that these functions would consume.

On other non-x86 CPUs or LLVM/clang (next gen compilers) or clang to JVM
or LLVM to JS to node.js there might be improvements for some reason or
another.

Set against this are the rather hacky techniques needed to achieve
this
modest goal​: a dFOO that doesn't actually declare any variables, and
an
inline sub that returns a value that isn't used. Things that make the
core
harder to understand are in general not a good idea, unless there's a
strong payback.

Comments were added to explain the hacks. I think its better than
whitespace changes. C89, etc.

There's also the issue that the patch incorporates functional changes,
by changing the place where the async check is performed. The change
in
position is only slight, and the chances are that it hasn't affected
anything; but anything that affects the behaviour of signal handlers
tends
to be subtle and hard to debug.

Safe signals are randomly fired. Previously async_check was called much
more often until
http​://perl5.git.perl.org/perl.git/commit/f410a2119920dd04690025a349e79575cfb9c972
. There were a couple places where it looks like async_check was being
called before/after an IO or IPC call on purpose. I didn't move those
for that reason.

______________________________________________________
PERL_ASYNC_CHECKs that were already at the end of a non opcode func, or
seemed to be called around things like IO that could raise signals, were
not changed.
______________________________________________________

So, my overall feelings are​:

1) this patch is not a good idea, because the additional code
complexity
it introduces far outweighs any theoretical performance gains.

2) If the patch (or something similar) were to be applied, then it
should
be applied after 5.18 has been released, to give it maximum time for
bedding in in development releases.

______________________________________________________________
2013-01-20 5.17.8 Contentious changes freeze
2013-02-20 5.17.9 User-visible changes freeze
2013-03-20 5.17.10 Full code freeze
______________________________________________________________

I don't think this is user visible.

3) Ditto if it were applied, I would want the patch to have better
code
comments​: for example a comment in perl.h to say *what*
dPERL_ASYNC_CHECK
is for, e.g.

/\* dPERL\_ASYNC\_CHECK allows you to perform a PERL\_ASYNC\_CHECK

within
* the declaration part of function; under some circumstances this
can
* result in smaller and faster code */

Good point. I agree. I forgot. Someone could argue to use git blame, but
your right. That will have to be fixed by me.

Lastly, note that the patch as-is doesn't compile on linux​:

cc \-fstack\-protector \-L/usr/local/lib \-o miniperl \\
  perlmini\.o opmini\.o miniperlmain\.o   gv\.o toke\.o perly\.o pad\.o

regcomp.o dump.o util.o mg.o reentr.o mro.o keywords.o hv.o av.o
run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o
regexec.o utf8.o taint.o deb.o universal.o globals.o perlio.o
perlapi.o numeric.o mathoms.o locale.o pp_pack.o pp_sort.o -lnsl
-ldl -lm -lcrypt -lutil -lpthread -lc
pp_hot.o​: In function `S_async_check_get'​:
/home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to
`PERL_ASYNC_CHECK'
pp_ctl.o​: In function `S_async_check_get'​:
/home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to
`PERL_ASYNC_CHECK'

Hmmm, I'll have to get a POSIX box running to reproduce/fix that. Will
take some time.

At this point I should note that I really, *really* don't want to
discourage
you from doing further such work​: you clearly have a good knowledge of
the
low-level details of machine code and compiler output, and how to
optimise
things;
I just worry that too often these talents are mis-aimed at
random
parts of the perl core that don't really warrant the attention.

They aren't random. My formula is to look at random machine code, find a
problem, then find all other instances of the "problem". The problem is
usually on a C or asm level. I don't venture into parts of the interp I
don't understand, so that is one reason my tweaks are small and not
reengineering the VM as a whole, or changing Perl 5 the lang (something
about swine, and aerodynamic lift).

FWIW, I think the regex-engine-pig could be taught to fly. If you
might be interested in doing so I would be happy to explain what I
think is needed.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2013

From @bulk88

On Mon Jan 28 10​:42​:43 2013, bulk88 wrote​:

On Mon Jan 28 08​:12​:45 2013, davem wrote​:

I'm sorry to say this again, but I'm really not keen on this patch.

It's another nano-optimisation, which doesn't seem to offer much in
the
way of genuine performance improvement. You list the only two
functions
that actually change in code size​:

Perl\_pp\_substcont 0x5BF to 0x5B2
Perl\_pp\_subst     0xB89 to 0xB88

Both these are heavyweight functions that call out to the regex
engine,
and in the case of substcont, call out to perl code. Any performance
gain
by reducing the code footprint of these functions (in terms of
not-normally-called bits of code) and thus the instruction cache
misses
that might be avoided, is a drop in the ocean against the sheer
amount
of
instructions and data that these functions would consume.

On other non-x86 CPUs or LLVM/clang (next gen compilers) or clang to
JVM
or LLVM to JS to node.js there might be improvements for some reason
or
another.

TLDR​: In some cases, code size in this patch, will only decrease, if
other, currently unknown/planned parts of the function where
PERL_ASYNC_CHECK was moved, get slimmed down, at some unknown point in
the future.

Long version​: Another possibility there was no change in instruction
size from the async check move was, there was a peak in count of live
vars in the middle or towards the end of the function. Or moving the dSP
didn't prevent local SP from being written to C stack since there was
another function after the potential async check call that local SP had
to be saved across. Non-vol regs, atleast in VC, I think are saved once
at the beginning of the function, as many as are needed throughout the
entire function, (no alloca style stuff behind the scenes when you
declare 256 byte char array inside an if block, that char array was
allocated at function entry, not block entry). The compiler won't put
extra save x86 non-vols instructions every time you declare a new var in
a block or the simultaneous liveness increases. Also, once you overflow
your vol and non-vol regs, then C stack starts being used. There is no
instruction length difference between a function that uses 4 bytes of c
stack for autos and 128 bytes of autos. Its still a signed char in the
machine instruction. Past 128 bytes gets complicated. Some compilers
will use signed ebp/esp offsets, and position ebp/esp at +128 from
return address (actually -128 since C stack grows down), then use for
example +16 or -20 offsets (GCC/Intel C-ish compiler) to access autos.
Other compilers after 128 bytes of C autos switch to 32 bit offsets (old
fashioned MS VC) to keep ebp at the start of the stack frame infront of
the return address (stack grow up words) at all cost (usually to enable
symbol-less callstacks to look somewhat normal, also see frame pointer
omission optimization's pros and cons). Note, a compiler has a choice
whether to switch to 32 bit offsets when C autos > 128, C autos+ret
addr+saved non vol regs+caller's args > 128, C autos > 256, C autos+ret
addr+saved non vol regs+caller's args > 256. Since GCC saves non-vols
with pop/push, yet uses mov to create child call args instead of
pop/push, it doesn't necessarily need the non-vols to be addressable
with a char offset and mov, since they are must preserve garbage, in
practice.
--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @tonycoz

On Mon Jan 28 10​:42​:43 2013, bulk88 wrote​:

pp\_hot\.o&#8203;: In function \`S\_async\_check\_get'&#8203;:
/home/davem/perl5/git/bleed/inline\.h&#8203;:149&#8203;: undefined reference to

`PERL_ASYNC_CHECK'
pp_ctl.o​: In function `S_async_check_get'​:
/home/davem/perl5/git/bleed/inline.h​:149​: undefined reference to
`PERL_ASYNC_CHECK'

Hmmm, I'll have to get a POSIX box running to reproduce/fix that. Will
take some time.

This works on Win32 but fails elsewhere because win32/win32.h defines
its own PERL_ASYNC_CHECK(), *and* the perl.h PERL_ASYNC_CHECK definition
is after the #include for inline.h, so it's not defined while the inline
functions are being compiled.

One thing I noticed, you've performed this optimization for pp_or, did
you consider the same optimization you used in
4cc783e instead?

Since the patch, as it stands, breaks the build and provides little
benefit, I'm rejecting it.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @bulk88

On Wed Jul 31 23​:40​:26 2013, tonyc wrote​:

One thing I noticed, you've performed this optimization for pp_or, did
you consider the same optimization you used in
4cc783e instead?

If I had time to do the pp_or thing to pp_and, it would be in a
different patch. This one was just about moving ASYNC_CHECKs to minimize
variable liveness.

--
bulk88 ~ bulk88 at hotmail.com

1 similar comment
@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @bulk88

On Wed Jul 31 23​:40​:26 2013, tonyc wrote​:

One thing I noticed, you've performed this optimization for pp_or, did
you consider the same optimization you used in
4cc783e instead?

If I had time to do the pp_or thing to pp_and, it would be in a
different patch. This one was just about moving ASYNC_CHECKs to minimize
variable liveness.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Aug 22, 2013

From @tonycoz

On Wed Jul 31 23​:40​:26 2013, tonyc wrote​:

Since the patch, as it stands, breaks the build and provides little
benefit, I'm rejecting it.

And rejecting the ticket too.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 22, 2013

@tonycoz - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant