Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow map - patch + discussion #1890

Closed
p5pRT opened this issue Apr 25, 2000 · 15 comments
Closed

Slow map - patch + discussion #1890

p5pRT opened this issue Apr 25, 2000 · 15 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 25, 2000

Migrated from rt.perl.org#3158 (status was 'resolved')

Searchable as RT3158$

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2000

From @btilly

This is an update on my earlier bug report. After reading the "This Week on
P5P" output by Mark-Jason Dominus I felt sheepish for offering complaints
and no patches. So here is a full response to everything that appears in​:

http​://www.perl.com/pub/2000/04/p5pdigest/THISWEEK-20000423.html#Pseudohash_Field_Names_Hash_Performance_and_map_Performance

So an update on my complaints from last week and a patch.

First of all I was exactly right about what was going wrong with map() and I
am attaching a patch at the end of this emaill. My patch is not ideal since
how much you resize the stack should depend upon how many times you have had
to resize it already. But it is a significant improvement and perfectly
solves the common array2hash 2-for-1 map.

Secondly there is a patch in the Changes for 5.6 that solves the missing
field-names in pseudo-hash warnings. If there is ever a 5.005_04 I really
want that change in there. (As well as the map() change that I am
including.)

Thirdly I had an off-hand hash comment about making the hashing algorithm
fail gracefully if the hashing function was not working well. Well I did a
lot of looking at Perl's hashing algorithm in hv.c and hv.h, and I am not
sure it is feasible. That is, enough of the API is exposed in the HE and
HEK structs that there would be a huge overhead in having a bucket really
populated by a linked list. Oh, you could do it. For instance if
(he)->hent_hek is NULL then follow (he)->hent_next to a node in a balanced
binary tree. But it would be a lot of overhead, and I for one don't feel
comfortable suggesting a change like that. (Does everyone consistently uses
the convenience macros?)

BTW something random I noticed. Wouldn't the hash value be better in HE
than HEK? Too late to change it now, but since compilers like to allocate
things in powers of 2 that should save a couple of pointers per hash
entry..?

Cheers,
Ben

PS The patch​:

Inline Patch
--- pp_ctl.c.bak        Tue Mar 21 00:42:26 2000
+++ pp_ctl.c    Tue Apr 25 04:03:36 2000
@@ -736,6 +736,8 @@
        if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
            shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
            count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+           if (shift < count)
+               shift = count; /* Avoid shifting too often */

            EXTEND(SP,shift);
            src = SP;

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2000

From [Unknown Contact. See original ticket]

Someone please give this man a medal​: for persistence, well-spokenness,
and now results! This has got to be one of the most useful (and short!)
patches I've seen here in a while. Would someone with more wits and
tuits about them like to comment on a perhaps better solution, if there
is one, to the slow map problem?

Thanks,

John.

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From @gsar

On Tue, 25 Apr 2000 15​:49​:05 EDT, "Ben Tilly" wrote​:

First of all I was exactly right about what was going wrong with map() and I
am attaching a patch at the end of this emaill. My patch is not ideal since
how much you resize the stack should depend upon how many times you have had
to resize it already. But it is a significant improvement and perfectly
solves the common array2hash 2-for-1 map.
[...]
PS The patch​:

--- pp_ctl.c.bak Tue Mar 21 00​:42​:26 2000
+++ pp_ctl.c Tue Apr 25 04​:03​:36 2000
@​@​ -736,6 +736,8 @​@​
if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+ if (shift < count)
+ shift = count; /* Avoid shifting too often */

       EXTEND\(SP\,shift\);
       src = SP;

I took a quick look at this issue earlier, and it seems that better
approaches are possible. But I haven't had time to do anything about
it. I'm going to put down my thoughts in the hope that someone will
be able to work on a better patch for this.

On entry into pp_mapwhile(), the stack looks like this​:

  x|y|z|-|p|q|r|1|2|3

"x".."z" are the accumulated result of the map() so far. "-" is the
source item during the previous iteration (i.e. the erstwhile $_).
"p".."r" are what's left of the argument list. 1..3 is the list that
the map block returned in the previous iteration (which needs to be
added to the end of "x".."z" before the pp_mapwhile() returns).

So on exit from pp_mapwhile(), the stack needs to look like this​:

  x|y|z|1|2|3|p|q|r

Now, the current implementation gets there by doing this (assume there are
3 result items)​:

  start​: x|y|z|-|p|q|r|1|2|3
  extend stack by 2​: x|y|z|-|p|q|r|1|2|3|#|#
  move src+result down by 2​: x|y|z|-|#|#|p|q|r|1|2|3
  copy results up, popping stack​: x|y|z|-|#|3|p|q|r|1|2
  x|y|z|-|2|3|p|q|r|1
  end​: x|y|z|1|2|3|p|q|r

Your modification does an (almost arbitrary) overextension of the stack
based on the number of items that are being moved down in step 3 above,
so that steps 2 and 3 don't happen very often during subsequent
iterations​:

  start​: x|y|z|-|p|q|r|1|2|3
  extend stack by 6​: x|y|z|-|p|q|r|1|2|3|#|#|#|#|#|#
  move src+result down by 6​: x|y|z|-|#|#|#|#|#|#|p|q|r|1|2|3
  copy results up, popping stack​: x|y|z|-|#|3|#|#|#|#|p|q|r|1|2
  x|y|z|-|2|3|#|#|#|#|p|q|r|1
  end​: x|y|z|1|2|3|#|#|#|#|p|q|r

That makes the next pp_mapwhile() look like this​:

  start​: x|y|z|1|2|3|#|#|#|#|-|q|r|4|5|6
  copy results up, popping stack​: x|y|z|1|2|3|#|#|6|#|-|q|r|4|5
  x|y|z|1|2|3|#|5|6|#|-|q|r|4
  end​: x|y|z|1|2|3|4|5|6|#|-|q|r

So in the ideal case where the stack extension is only done once, and
is sufficient for the entire result list, performance is
O(cumulative size of result list), which is pretty good. The problem
of course is in predicting the ideal stack extension.

Assuming that it is difficult to predict the ideal amount of stack
overextension, an acceptable approach may be to not extend the stack
at all. When there are more elements in the argument list than
in the result list, we can do something like this​:

  start​: x|y|z|-|p|q|r|1|2|3
  special swap for $_​: x|y|z|1|p|q|r|-|2|3
  (now we move -,r,q,p to the back (in that order) with swaps)
  swap 1​: x|y|z|1|p|q|r|3|2|-
  swap 2​: x|y|z|1|p|q|2|3|r|-
  swap 3​: x|y|z|1|p|3|2|q|r|-
  swap 4​: x|y|z|1|2|3|p|q|r|-
  pop and end​: x|y|z|1|2|3|p|q|r

If there are more elements in the result list than in the argument list,
the swaps will need to go the other way​:

  start​: x|y|z|-|p|1|2|3|4|5
  special swap for $_​: x|y|z|1|p|-|2|3|4|5
  (now we move 2,3,4,5 to the front (in that order) with swaps)
  swap 1​: x|y|z|1|2|-|p|3|4|5
  swap 2​: x|y|z|1|2|3|p|-|4|5
  swap 3​: x|y|z|1|2|3|4|-|p|5
  swap 4​: x|y|z|1|2|3|4|5|p|-
  pop and end​: x|y|z|1|2|3|4|5|p

The number of swaps needed is obviously linear per iteration--it is
either the number of elements in the argument list, or the number of
elements in the result list, whichever is greater. This will in theory
have worse performance than the predict-the-ideal-stack-extension
approach when the number of arguments ("p".."r") far exceeds the size
of the result list (1..3), but it could potentially win in practice
because no stack extension is ever needed.

A different approach would be to have better prediction for the amount
of stack overextension needed. The more or less obvious​:

  number_of_args_remaining * size_of_results_list_so_far
  overextension = ------------------------------------------------------
  number_of_args_processed

may be the best we can do here.

A third approach is to make map() use a distinct array/stack for the
argument list ("p".."r") rather than putting it all on the stack where
it keeps getting in the way. I suspect this may turn out to be the
best solution in the end.

Hope all this helps someone explore the best solution for this.

Sarathy
gsar@​ActiveState.com

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From [Unknown Contact. See original ticket]

Gurusamy Sarathy writes​:

On entry into pp_mapwhile(), the stack looks like this​:

x|y|z|\-|p|q|r|1|2|3

"x".."z" are the accumulated result of the map() so far. "-" is the
source item during the previous iteration (i.e. the erstwhile $_).
"p".."r" are what's left of the argument list. 1..3 is the list that
the map block returned in the previous iteration (which needs to be
added to the end of "x".."z" before the pp_mapwhile() returns).

So on exit from pp_mapwhile(), the stack needs to look like this​:

x|y|z|1|2|3|p|q|r

Why all this? Why not ride on the standart stack-extension logic?

On entry have

  o|_|p|q|x|y|z|

on exit have

  o|_|p|q|x|y|z|1|2|3|

Then move all these guys down at the end of map. Pre-extending the
stack could make it almost as efficient as the current method. This
may require twice as much stack as the current case, but instead one
can make an optimization and do not move @​arr to stack for

  map CODE, @​arr;

Ilya

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From @gsar

On Wed, 31 May 2000 17​:31​:27 EDT, Ilya Zakharevich wrote​:

Why all this? Why not ride on the standart stack-extension logic?

On entry have

o|_|p|q|x|y|z|

on exit have

o|_|p|q|x|y|z|1|2|3|

Then move all these guys down at the end of map. Pre-extending the
stack could make it almost as efficient as the current method. This
may require twice as much stack as the current case, but instead one
can make an optimization and do not move @​arr to stack for

map CODE, @​arr;

That was more or less my third suggestion. Even when passed a list
rather than an array, we should be able to SWITCHSTACK() on a temporary
array to receive the incoming arguments so that the results for every
iteration simply accumulate at the end of the overall result list.

Sarathy
gsar@​ActiveState.com

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From @vanstyn

In <200005312103.OAA11023@​molotok.activestate.com>, Gurusamy Sarathy writes​:
:A different approach would be to have better prediction for the amount
:of stack overextension needed. The more or less obvious​:
:
: number_of_args_remaining * size_of_results_list_so_far
: overextension = ------------------------------------------------------
: number_of_args_processed
:
:may be the best we can do here.

I think this is too aggressive. It works well in many common cases, but
when the map results are heavily skewed toward the front of the list
it gets very unreasonable​: if the first element of n maps to (n+1)
results, we immediately allow for (n+1)^2 total.

Pulling some numbers out of the air, I would suggest​:

if number_of_args_processed < 10
  overextension = number_of_args_remaining + size_of_results_list_so_far
else
  number_of_args_remaining * size_of_results_list_so_far
  overextension = ------------------------------------------------------
  number_of_args_processed

.. ie double the total size allocated whenever extension is needed, until
we've processed enough arguments to make the aggressive option safer.

Hugo

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From [Unknown Contact. See original ticket]

On Wed, May 31, 2000 at 03​:15​:58PM -0700, Gurusamy Sarathy wrote​:

On entry have

o|_|p|q|x|y|z|

on exit have

o|_|p|q|x|y|z|1|2|3|

Then move all these guys down at the end of map.

That was more or less my third suggestion. Even when passed a list
rather than an array, we should be able to SWITCHSTACK() on a temporary
array to receive the incoming arguments so that the results for every
iteration simply accumulate at the end of the overall result list.

This is an extra malloc/free which may be significant for short lists.

Another idea​: do as now (overwrite "used" input arguments) while the
output is not longer than input. The moment the output becomes
longer, switch to the above strategy, when results accumulate at the
end.

Looks optimal in all the cases...

Ilya

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:
|| Another idea​: do as now (overwrite "used" input arguments) while the
|| output is not longer than input. The moment the output becomes
|| longer, switch to the above strategy, when results accumulate at the
|| end.
||
|| Looks optimal in all the cases...

After such a switch (to storing results elsewhere, whether it is at
the end of the stack or a separate area) there is the possibility of
switching back to filling the "used" slots. Whenever there is a need
to grow the alternate destination area, check to see whether the
current chunk of "used" slots is bigger than the current accumulation
of results. If so, copy them over (that's work that was going to
have to be done eventually anyhow, and since the entire collection
can be dealt with now there won't be any wasted copies going
somewhere other than their final destination). Don't free the
alternate destination at this time, in case the "used" slots run out
again it will still be around. It can be released at the end of the
entire map operation.

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From @btilly

Gurusamy Sarathy wrote​:

On Tue, 25 Apr 2000 15​:49​:05 EDT, "Ben Tilly" wrote​:

First of all I was exactly right about what was going wrong with map()
and I
am attaching a patch at the end of this emaill. My patch is not ideal
since
how much you resize the stack should depend upon how many times you have
had
to resize it already. But it is a significant improvement and perfectly
solves the common array2hash 2-for-1 map.
[...]

After posting that I realized that it actually *is* ideal. :-)

PS The patch​:

--- pp_ctl.c.bak Tue Mar 21 00​:42​:26 2000
+++ pp_ctl.c Tue Apr 25 04​:03​:36 2000
@​@​ -736,6 +736,8 @​@​
if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+ if (shift < count)
+ shift = count; /* Avoid shifting too often */

       EXTEND\(SP\,shift\);
       src = SP;

I took a quick look at this issue earlier, and it seems that better
approaches are possible. But I haven't had time to do anything about
it. I'm going to put down my thoughts in the hope that someone will
be able to work on a better patch for this.

That is what I thought, but AFAICS there is no solution that is so
much better that it is worth looking at. I thought about this for
a while and it looked like there should be one, but I really think
that there isn't.

First of all the most common case in my code where the size
of the map increases is 2 for 1, constructing a hash with map.
Certainly there is no faster way for this case. And assuming that
this allocation (which is smaller that the current stack) is
acceptable.

Leaving that aside, attempting to be smarter means needing to
keep track of at least some extra information. This involves
passing (or processing) additional arguments. Passing arguments
adds a cost that is O(number of starting elements) and adds it to
all cases, including ones that are fine under the current version.
And what kind of win do you get? In terms of recopying inside the
mapwhile loop, it is O(number of elements returned). (Something
may be copied into and out of the location once before the final
element gets copied there.)

So if you generate large numbers of extra elements, the cost of
moving stuff is lost in the cost of generating those elements so
this case is not worth considering. :-)

This is, of course, assuming that extending the stack many times
is itself handled efficiently. (eg By working with powers of 2
which again results in a bounded fixed and fairly small cost per
element finally allocated in the list.)

If so then even in the worst case the overhead of copying is lost
in noise.

In short the currently effient case is untouched.

My common complaint case is optimized.

In the drastically larger list case, the method has overhead below
the cost of creating that list.

The only potential significant loss is if the stack extend caused
the stack to be copied and it was extended too much. Well you
cannot be perfect. :-)

And trying to be smart means tracking more data which itself is an
additional cost on par with the cost of the algorithm!

[...]

Assuming that it is difficult to predict the ideal amount of stack
overextension, an acceptable approach may be to not extend the stack
at all. When there are more elements in the argument list than
in the result list, we can do something like this​:

start&#8203;:                          x|y|z|\-|p|q|r|1|2|3
special swap for $\_&#8203;:            x|y|z|1|p|q|r|\-|2|3
\(now we move \-\,r\,q\,p to the back \(in that order\) with swaps\)
swap 1&#8203;:                         x|y|z|1|p|q|r|3|2|\-
swap 2&#8203;:                         x|y|z|1|p|q|2|3|r|\-
swap 3&#8203;:                         x|y|z|1|p|3|2|q|r|\-
swap 4&#8203;:                         x|y|z|1|2|3|p|q|r|\-
pop and end&#8203;:                    x|y|z|1|2|3|p|q|r

If there are more elements in the result list than in the argument list,
the swaps will need to go the other way​:

start&#8203;:                          x|y|z|\-|p|1|2|3|4|5
special swap for $\_&#8203;:            x|y|z|1|p|\-|2|3|4|5
\(now we move 2\,3\,4\,5 to the front \(in that order\) with swaps\)
swap 1&#8203;:                         x|y|z|1|2|\-|p|3|4|5
swap 2&#8203;:                         x|y|z|1|2|3|p|\-|4|5
swap 3&#8203;:                         x|y|z|1|2|3|4|\-|p|5
swap 4&#8203;:                         x|y|z|1|2|3|4|5|p|\-
pop and end&#8203;:                    x|y|z|1|2|3|4|5|p

The number of swaps needed is obviously linear per iteration--it is
either the number of elements in the argument list, or the number of
elements in the result list, whichever is greater. This will in theory
have worse performance than the predict-the-ideal-stack-extension
approach when the number of arguments ("p".."r") far exceeds the size
of the result list (1..3), but it could potentially win in practice
because no stack extension is ever needed.

In the cases that I see, if you ever need more space you probably
need at least double. YMMV, but that is what I experience.

I also like answers that have good practical performance and no
theoretical gotchas. :-)

A different approach would be to have better prediction for the amount
of stack overextension needed. The more or less obvious​:

                number\_of\_args\_remaining \* size\_of\_results\_list\_so\_far
overextension = \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
                             number\_of\_args\_processed

may be the best we can do here.

I thought about this one as well.

What happens if your stack has 1000 elements and your first element
returns 1000 elements but the rest did not? Potential disaster!

A third approach is to make map() use a distinct array/stack for the
argument list ("p".."r") rather than putting it all on the stack where
it keeps getting in the way. I suspect this may turn out to be the
best solution in the end.

I thought about this as well. No matter what you do here the amount
of copying in putting it into that stack and then copying that stack
to Perl's stack is greater than the amount of copying in my patch.

The only case where this is a win is if my patch grew Perl's stack
enough to force it to relocate. I consider that an acceptable
risk - particularly since when I trigger this in my code I almost
certainly will actually need that much space.

Hope all this helps someone explore the best solution for this.

I think that my patch really is optimal. I did not think so at
first but every way I analayzed it I could not see a better
solution. Even if it can be improved on, the worst-case cost of
my approach is lost in the noise of actually creating entries to
put on the stack, so the improvement will not be noticable.

Cheers,
Ben
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http​://www.hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From [Unknown Contact. See original ticket]

On Wed, May 31, 2000 at 07​:59​:50PM -0400, John Macdonald wrote​:

After such a switch (to storing results elsewhere, whether it is at
the end of the stack or a separate area) there is the possibility of
switching back to filling the "used" slots. Whenever there is a need
to grow the alternate destination area, check to see whether the
current chunk of "used" slots is bigger than the current accumulation
of results. If so, copy them over (that's work that was going to
have to be done eventually anyhow, and since the entire collection
can be dealt with now there won't be any wasted copies going
somewhere other than their final destination).

Yes, I thought about this too. Two objections​:

  a) stack growth logic is outsize of our reach, so we cannot
  syncronize checks for "enough free space" with rare events;

  b) Checking that there is enough space on each iterations of
  mapwhile (sp?) may be a pessimization.

Don't free the
alternate destination at this time, in case the "used" slots run out
again it will still be around. It can be released at the end of the
entire map operation.

Do not follow this. AFAIK, stacks never shrink.

Ilya

@p5pRT
Copy link
Author

p5pRT commented May 31, 2000

From @gsar

On Wed, 31 May 2000 22​:08​:07 EDT, "Ben Tilly" wrote​:

After posting that I realized that it actually *is* ideal. :-)

That's not enough to convince me. :-)

A third approach is to make map() use a distinct array/stack for the
argument list ("p".."r") rather than putting it all on the stack where
it keeps getting in the way. I suspect this may turn out to be the
best solution in the end.

I thought about this as well. No matter what you do here the amount
of copying in putting it into that stack and then copying that stack
to Perl's stack is greater than the amount of copying in my patch.

There is no copying of the argument list involved here. (Think
of it as map() using a separate array to iterate over, like
foreach (LIST) {...} does.)

The only case where this is a win is if my patch grew Perl's stack
enough to force it to relocate. I consider that an acceptable
risk - particularly since when I trigger this in my code I almost
certainly will actually need that much space.

When using a separate array​:

  * there is no per-iteration checking/copying of result elements on
  the stack (because they already accumulate in the spot they need
  to be in at the end)
  * there is no per-iteration stack extension

I would imagine each of those to be a substantial win. The only
potential hit is in allocating the array to hold the argument list,
but I think that will be lost in the noise because it is only done
once per map at run time, or perhaps even once at compile time (as
a scratchpad entry).

In contrast, the current approach with your modification *always*
involves per-iteration checking+copying of items on the stack,
even assuming that the stack is resized ideally.

Anyway, I think the best way to settle the question is by implementing
it. Anyone up for it?

Sarathy
gsar@​ActiveState.com

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2000

From [Unknown Contact. See original ticket]

In article <200006010526.WAA30143@​molotok.activestate.com>,
  Gurusamy Sarathy <gsar@​ActiveState.com> writes​:

On Wed, 31 May 2000 22​:08​:07 EDT, "Ben Tilly" wrote​:

After posting that I realized that it actually *is* ideal. :-)

That's not enough to convince me. :-)

A third approach is to make map() use a distinct array/stack for the
argument list ("p".."r") rather than putting it all on the stack where
it keeps getting in the way. I suspect this may turn out to be the
best solution in the end.

I thought about this as well. No matter what you do here the amount
of copying in putting it into that stack and then copying that stack
to Perl's stack is greater than the amount of copying in my patch.

The only way you can avoid having the copying work not go O(n*n) in
pathological cases is in having the work space grow at least exponential
(which is not so bad as it may sound. In the end you allocate at most
base times too much stack, so just don't choose the base too high)
In this particular case you can choose e.g​:

new_length = max(old_length * 3/2, input_length*current_expansion_ratio)

which will guess correctly the common case and have moderate copying cost
even in the worst case.

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2000

From @btilly

Ton Hospel wrote​:

I thought about this as well. No matter what you do here the amount
of copying in putting it into that stack and then copying that stack
to Perl's stack is greater than the amount of copying in my patch.

The only way you can avoid having the copying work not go O(n*n) in
pathological cases is in having the work space grow at least exponential
(which is not so bad as it may sound. In the end you allocate at most
base times too much stack, so just don't choose the base too high)
In this particular case you can choose e.g​:

new_length = max(old_length * 3/2, input_length*current_expansion_ratio)

which will guess correctly the common case and have moderate copying cost
even in the worst case.

The logic for that should not be here.

My theoretical analysis, and the observed behaviour with my patch,
does not have the calls to EXTEND grow exponentially and yet the
recopying work is O(n).

The actual Perl *stack* has to test on an EXTEND call whether this
will force it to relocate, that is another issue. But mapwhile is
not the right place for that logic.

Cheers,
Ben

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http​://www.hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2000

From [Unknown Contact. See original ticket]

In article <20000601182925.12509.qmail@​hotmail.com>,
  "Ben Tilly" <ben_tilly@​hotmail.com> writes​:

Ton Hospel wrote​:

I thought about this as well. No matter what you do here the amount
of copying in putting it into that stack and then copying that stack
to Perl's stack is greater than the amount of copying in my patch.

The only way you can avoid having the copying work not go O(n*n) in
pathological cases is in having the work space grow at least exponential
(which is not so bad as it may sound. In the end you allocate at most
base times too much stack, so just don't choose the base too high)
In this particular case you can choose e.g​:

new_length = max(old_length * 3/2, input_length*current_expansion_ratio)

which will guess correctly the common case and have moderate copying cost
even in the worst case.

The logic for that should not be here.

My theoretical analysis, and the observed behaviour with my patch,
does not have the calls to EXTEND grow exponentially and yet the
recopying work is O(n).

The whole point of growing the *size* exponentially is that the number
of EXTEND calls then grows only logarithmic instead of linear. Because
a linear number of (constant extra size) EXTENDs can mean quadratic
copying work.

The actual Perl *stack* has to test on an EXTEND call whether this
will force it to relocate, that is another issue. But mapwhile is
not the right place for that logic.

In a way that would be cleaner, yes. But I don't see how to do this since
EXTEND does not have a memory and does not understand when it's called in
a logically related set of extensions. So I in fact DO think mapwhile is
the place for that logic.

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2000

From @btilly

In article <20000601182925.12509.qmail@​hotmail.com>,
"Ben Tilly" <ben_tilly@​hotmail.com> writes​:

Ton Hospel wrote​:
[...]
The logic for that should not be here.

My theoretical analysis, and the observed behaviour with my patch,
does not have the calls to EXTEND grow exponentially and yet the
recopying work is O(n).

The whole point of growing the *size* exponentially is that the number
of EXTEND calls then grows only logarithmic instead of linear. Because
a linear number of (constant extra size) EXTENDs can mean quadratic
copying work.

I have a patch out there to fix the current quadratic behaviour that
map shows. I claim that both in theory and in observation, this
results in linear copying work.

Here is the patch​:


Inline Patch
--- pp_ctl.c.bak        Tue Mar 21 00:42:26 2000
+++ pp_ctl.c    Tue Apr 25 04:03:36 2000
@@ -736,6 +736,8 @@
        if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
            shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
            count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+           if (shift < count)
+               shift = count; /* Avoid shifting too often */

            EXTEND(SP,shift);
            src = SP;
-----------------------------------------------------------------------

Try it before you say it doesn't work. Theory and practice...

The actual Perl *stack* has to test on an EXTEND call whether this
will force it to relocate, that is another issue. But mapwhile is
not the right place for that logic.

In a way that would be cleaner, yes. But I don't see how to do this since
EXTEND does not have a memory and does not understand when it's called in
a logically related set of extensions. So I in fact DO think mapwhile is
the place for that logic.

It looks like it already does this.

As for how? Ask Sarathy. How I would do it would be the following​:

Keep 3 pointers. One to the beginning of the stack. One to the end
of the stack, and one to the end of the allocated space. When EXTEND
is called check whether that will move the end of the stack past the
end of the allocated space. If it will then allocate double the
space somewhere, recopy the stack, and move all 3 pointers.
Otherwise just change the position of the end of the stack.

You see the exponential growth pattern makes so much sense that it is
simply smart to do this universally. You do not care whether EXTEND
is being called by related routines or the same one. You only care
that over the lifetime of Perl you do not waste too much time on
copying the stack around.

Based on observed performance it looks like this classic (and
intelligent) behaviour is widely used in Perl. If you want a painful
comparison, attempt to build a long string in JavaScript by appending
incrementally to it. You will hit serious performance problems in
both major browsers (as well as tons of stack overflow problems).
Then you can go to building an array and joining it once. This works
somewhat better - until you cause the garbage collection algorithm to
go crazy.

Then try it in Perl and curse and swear because you see that there is
no good reason for the problem.

JavaScript​:
for (var i = 0; i < 100000; ++i) {
  str = str + "Hello again\n"; // WHY don't they have +=?
}

Perl​:
foreach (1..100000) {
  $str .= "Hello again\n";
}

Cheers,
Ben

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http​://www.hotmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant