Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] rearrange initializations in S_intuit_more for better code gen #12710

Closed
p5pRT opened this issue Jan 17, 2013 · 9 comments
Closed

[PATCH] rearrange initializations in S_intuit_more for better code gen #12710

p5pRT opened this issue Jan 17, 2013 · 9 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Jan 17, 2013

Migrated from rt.perl.org#116443 (status was 'resolved')

Searchable as RT116443$

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2013

From @bulk88

See attached patch.

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2013

From @bulk88

0001-rearrange-initializations-in-S_intuit_more-for-bette.patch
From 7eff62ccd04b51539468d56f9d76f9753c42470c Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 17 Jan 2013 18:38:07 -0500
Subject: [PATCH] rearrange initializations in S_intuit_more for better code
 gen

MS VC 2003 in my experiance does not reorder var initializations with
constants to reduce their liveness. This commit attempts to defer
initialization until right before the var is first used. I can't explain
exactly why less instructions or shorter addressing happened since I didnt
record what the asm looked like before. On VC 2003 -O1 -GL, S_intuit_more
was previously 0x4B5 bytes of 32 bit machine code long, after it this
change it is 0x4A3 bytes long. These changes should have no user visible
effect.

The scope of the vars was not reduced to avoid large indentation changes
which would be required under C89 and Perl code formatting policy.
---
 toke.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/toke.c b/toke.c
index 987a68d..411fb42 100644
--- a/toke.c
+++ b/toke.c
@@ -3831,16 +3831,16 @@ S_intuit_more(pTHX_ char *s)
 	return FALSE;
     else {
         /* this is terrifying, and it works */
-	int weight = 2;		/* let's weigh the evidence */
+	int weight;
 	char seen[256];
-	unsigned char un_char = 255, last_un_char;
 	const char * const send = strchr(s,']');
+	unsigned char un_char, last_un_char;
 	char tmpbuf[sizeof PL_tokenbuf * 4];
 
 	if (!send)		/* has to be an expression */
 	    return TRUE;
+	weight = 2;		/* let's weigh the evidence */
 
-	Zero(seen,256,char);
 	if (*s == '$')
 	    weight -= 3;
 	else if (isDIGIT(*s)) {
@@ -3851,6 +3851,8 @@ S_intuit_more(pTHX_ char *s)
 	    else
 		weight -= 100;
 	}
+	Zero(seen,256,char);
+	un_char = 255;
 	for (; s < send; s++) {
 	    last_un_char = un_char;
 	    un_char = (unsigned char)*s;
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

From @iabyn

On Thu, Jan 17, 2013 at 03​:40​:55PM -0800, bulk88 wrote​:

Subject​: [PATCH] rearrange initializations in S_intuit_more for better code

I really don't understand the point of nano-optimisations such as this.

You've removed (for a particular compiler), 19 bytes of code from a
function within an object file that is 340kb. This function is only used
when parsing double-quotish strings in perl, and indeed the part of the
function you've modified is only used in those relatively rare cases where
perl has to disambiguate within a regex, whether in /$a[.../ the open
bracket is an array subscript or the start of a character class. So it's
about as un-hot a function as you could imagine.

I would estimate that any performance improvements are indistinguishable
from zero.

The change itself is so subtle that the next change to this function
is likely to undo that byte reduction, without anyone being aware that
they have done so.

--
"You may not work around any technical limitations in the software"
  -- Windows Vista license

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

From @doughera88

On Thu, 17 Jan 2013, bulk88 wrote​:

# New Ticket Created by bulk88
# Please include the string​: [perl #116443]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=116443 >

Thanks, applied as commit 99f2bdb.

With both gcc and Sun's C, this patch makes no difference for optimized
builds (i.e. the object file is exactly the same length). I did debate
whether the mild code churn was worth the 18 bytes it saved on VC 2003,
but did decide to apply it in the end.

Thanks,

--
  Andy Dougherty doughera@​lafayette.edu

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

From @doughera88

On Fri, 18 Jan 2013, Dave Mitchell wrote​:

On Thu, Jan 17, 2013 at 03​:40​:55PM -0800, bulk88 wrote​:

Subject​: [PATCH] rearrange initializations in S_intuit_more for better code

I really don't understand the point of nano-optimisations such as this.

You've removed (for a particular compiler), 19 bytes of code from a
function within an object file that is 340kb. This function is only used
when parsing double-quotish strings in perl, and indeed the part of the
function you've modified is only used in those relatively rare cases where
perl has to disambiguate within a regex, whether in /$a[.../ the open
bracket is an array subscript or the start of a character class. So it's
about as un-hot a function as you could imagine.

I would estimate that any performance improvements are indistinguishable
from zero.

[Sorry I didn't see this message before I committed the patch.]

I agree. I'd also point out that there are hidden costs to such
nano-optimisations. There is the developer time spent inspecting,
applying, and testing the patch. The code churn also may cost future
developers time as they try to navigate back through git blame output to
figure out why things are the way they are.

In this case, the function in question has barely changed since 1993, so I
didn't think the code churn was a big issue. The patch also did move
initializations closer to where the variables are actually used, which may
help clarity a little as well, so I went ahead and applied it.

In the future, though, I think I will probably look for more evidence of a
significant benefit before investing my time.

--
  Andy Dougherty doughera@​lafayette.edu

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

@doughera88 - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jan 18, 2013
@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

From @bulk88

On Fri Jan 18 07​:02​:01 2013, davem wrote​:

I really don't understand the point of nano-optimisations such as
this.

You've removed (for a particular compiler), 19 bytes of code from a
function within an object file that is 340kb. This function is only
used
when parsing double-quotish strings in perl, and indeed the part of
the
function you've modified is only used in those relatively rare cases
where
perl has to disambiguate within a regex, whether in /$a[.../ the open
bracket is an array subscript or the start of a character class. So
it's
about as un-hot a function as you could imagine.

It was picked because it called memset, I was following an idea in Perl
RT #36078 . Hotness was not a consideration.

I would estimate that any performance improvements are
indistinguishable
from zero.

This change is unmeasurable small, but every little bit counts.
Progressively combing through all C code that makes the interp, will
produce a better engine in end.

The change itself is so subtle that the next change to this function
is likely to undo that byte reduction, without anyone being aware that
they have done so.

I agree the chance of the optimization undoing itself is high. But the
likelihood of someone touching that code is rare, so I think that it
will be around for a while (months/years). The optimization took a
couple of seconds for me to recognize and a couple of minutes for the
rest. Running a limited harness run took 75% of the time. Writing this
response has taken more time than the patch.

On Fri Jan 18 07​:34​:41 2013, doughera wrote​:

Thanks, applied as commit 99f2bdb.

With both gcc and Sun's C, this patch makes no difference for optimized
builds (i.e. the object file is exactly the same length). I did debate
whether the mild code churn was worth the 18 bytes it saved on VC 2003,
but did decide to apply it in the end.

Thanks,

The object or executable image file might be the same exactly the same
size if the compiler will align functions and instructions. My VC 2003
in -O1 almost never places nop instructions (I've only seen nop infront
of jump tables with my VC). My strawberry perl binaries (strawberryperl
5.12.3.0), all beginning of functions are aligned to 16 bytes (its
default is -O2), and occasionally inside functions. If the compiler's
peephole is better than VC, or the compiler heavily throws around
alignment instructions. There will be no change in object file or exec
image size. If you removed the instructions they probably would be
filled in with nops. On disk Win32 DLLs (PE file) are aligned to 4096
bytes, so unless a 4096 boundary is crossed I don't see difference in
the disk size of the DLL (later than 2003 Visual Cs that dropped support
for Win 98 align to 512 on disk
http​://msdn.microsoft.com/en-us/library/bxwfs976%28v=vs.80%29.aspx ). I
do see a difference in "Virtual Size" down to a couple bytes with each
tweak. Late in 2012 Perl blead did fall from 956KB to 952KB on disk on
my machine. So very slowly the interp is getting smaller. Things like
http​://perl5.git.perl.org/perl.git/commit/4a9a56a75c57646b348598f90875415e994d2f65
also help.

If the function is smaller than a page/screen I would have reindented it
but in this case too much history (2-3 pages of code changed) would be
lost if I did that so I didn't. It is mentioned in the commit message.
Does anyone think my balance between wiping history vs uninit var risk
in the future should change towards one side or the other or is it fine?

On Fri Jan 18 08​:03​:30 2013, doughera wrote​:

In the future, though, I think I will probably look for more evidence
of a
significant benefit before investing my time.

Some porters have an opinion my commits are too long, and too general
and should be much smaller for bisecting. Should I group more changes
into larger commits in the future?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jan 18, 2013

From @doughera88

On Fri, 18 Jan 2013, bulk88 via RT wrote​:

On Fri Jan 18 07​:02​:01 2013, davem wrote​:

I would estimate that any performance improvements are
indistinguishable
from zero.

This change is unmeasurable small, but every little bit counts.
Progressively combing through all C code that makes the interp, will
produce a better engine in end.

But every little bit also has a cost, and those add up too.

The change itself is so subtle that the next change to this function
is likely to undo that byte reduction, without anyone being aware that
they have done so.

I agree the chance of the optimization undoing itself is high. But the
likelihood of someone touching that code is rare, so I think that it
will be around for a while (months/years). The optimization took a
couple of seconds for me to recognize and a couple of minutes for the
rest. Running a limited harness run took 75% of the time. Writing this
response has taken more time than the patch.

You also have to include all the time for someone else to review the
patch, apply it, run all the tests, push it to the repository, and then
monitor the smoke tests to make sure nothing broke.

On Fri Jan 18 07​:34​:41 2013, doughera wrote​:

Thanks, applied as commit 99f2bdb.

With both gcc and Sun's C, this patch makes no difference for optimized
builds (i.e. the object file is exactly the same length). I did debate
whether the mild code churn was worth the 18 bytes it saved on VC 2003,
but did decide to apply it in the end.

Thanks,

The object or executable image file might be the same exactly the same
size if the compiler will align functions and instructions.
My VC 2003 in -O1 almost never places nop instructions (I've only seen
nop infront of jump tables with my VC).

It also might be exactly the same size if the compiler optimizer
rearranges things to ultimately generate the same set of underlying
instructions. In addition to measuring that it "makes no difference"
in the size of the object files, I actually also looked at the generated
assembly files. The instructions were moved around, and the labels
changed, but there seemed to be no net reduction. On SPARC, which does
like to align all sorts of things, there were exactly as many 'nop'
instructions before the patch as after the patch. VC 2003 may well
act differently, of course.

On Fri Jan 18 08​:03​:30 2013, doughera wrote​:

In the future, though, I think I will probably look for more evidence
of a
significant benefit before investing my time.

Some porters have an opinion my commits are too long, and too general
and should be much smaller for bisecting. Should I group more changes
into larger commits in the future?

No. Grouping multiple independent nano-optimizations together wouldn't
help. They would still need to be individually reviewed, and they should
still be applied as independent logical commits. I suppose if they are
all queued up and applied sequentially, the final test/push/smoke cycle
could be run just once, which would save some time if all goes well.

My bigger point was that I don't think it's worth investing the time in
these nano-optimizations in parts of the code that are rarely called.
If you want someone to look at a proposed optimization, you'll have to do
a better job explaining why it's worthwhile to do so.

--
  Andy Dougherty doughera@​lafayette.edu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant