New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow map - patch + discussion #1890
Comments
From @btillyThis is an update on my earlier bug report. After reading the "This Week on http://www.perl.com/pub/2000/04/p5pdigest/THISWEEK-20000423.html#Pseudohash_Field_Names_Hash_Performance_and_map_Performance So an update on my complaints from last week and a patch. First of all I was exactly right about what was going wrong with map() and I Secondly there is a patch in the Changes for 5.6 that solves the missing Thirdly I had an off-hand hash comment about making the hashing algorithm BTW something random I noticed. Wouldn't the hash value be better in HE Cheers, PS The patch: Inline Patch--- pp_ctl.c.bak Tue Mar 21 00:42:26 2000
+++ pp_ctl.c Tue Apr 25 04:03:36 2000
@@ -736,6 +736,8 @@
if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+ if (shift < count)
+ shift = count; /* Avoid shifting too often */
EXTEND(SP,shift);
src = SP;
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com |
From [Unknown Contact. See original ticket]Someone please give this man a medal: for persistence, well-spokenness, Thanks, John. |
From @gsarOn Tue, 25 Apr 2000 15:49:05 EDT, "Ben Tilly" wrote:
I took a quick look at this issue earlier, and it seems that better On entry into pp_mapwhile(), the stack looks like this: x|y|z|-|p|q|r|1|2|3 "x".."z" are the accumulated result of the map() so far. "-" is the So on exit from pp_mapwhile(), the stack needs to look like this: x|y|z|1|2|3|p|q|r Now, the current implementation gets there by doing this (assume there are start: x|y|z|-|p|q|r|1|2|3 Your modification does an (almost arbitrary) overextension of the stack start: x|y|z|-|p|q|r|1|2|3 That makes the next pp_mapwhile() look like this: start: x|y|z|1|2|3|#|#|#|#|-|q|r|4|5|6 So in the ideal case where the stack extension is only done once, and Assuming that it is difficult to predict the ideal amount of stack start: x|y|z|-|p|q|r|1|2|3 If there are more elements in the result list than in the argument list, start: x|y|z|-|p|1|2|3|4|5 The number of swaps needed is obviously linear per iteration--it is A different approach would be to have better prediction for the amount number_of_args_remaining * size_of_results_list_so_far may be the best we can do here. A third approach is to make map() use a distinct array/stack for the Hope all this helps someone explore the best solution for this. Sarathy |
From [Unknown Contact. See original ticket]Gurusamy Sarathy writes:
Why all this? Why not ride on the standart stack-extension logic? On entry have o|_|p|q|x|y|z| on exit have o|_|p|q|x|y|z|1|2|3| Then move all these guys down at the end of map. Pre-extending the map CODE, @arr; Ilya |
From @gsarOn Wed, 31 May 2000 17:31:27 EDT, Ilya Zakharevich wrote:
That was more or less my third suggestion. Even when passed a list Sarathy |
From @vanstynIn <200005312103.OAA11023@molotok.activestate.com>, Gurusamy Sarathy writes: I think this is too aggressive. It works well in many common cases, but Pulling some numbers out of the air, I would suggest: if number_of_args_processed < 10 .. ie double the total size allocated whenever extension is needed, until Hugo |
From [Unknown Contact. See original ticket]On Wed, May 31, 2000 at 03:15:58PM -0700, Gurusamy Sarathy wrote:
This is an extra malloc/free which may be significant for short lists. Another idea: do as now (overwrite "used" input arguments) while the Looks optimal in all the cases... Ilya |
From [Unknown Contact. See original ticket]Ilya Zakharevich <ilya@math.ohio-state.edu> writes: After such a switch (to storing results elsewhere, whether it is at |
From @btillyGurusamy Sarathy wrote:
After posting that I realized that it actually *is* ideal. :-)
That is what I thought, but AFAICS there is no solution that is so First of all the most common case in my code where the size Leaving that aside, attempting to be smarter means needing to So if you generate large numbers of extra elements, the cost of This is, of course, assuming that extending the stack many times If so then even in the worst case the overhead of copying is lost In short the currently effient case is untouched. My common complaint case is optimized. In the drastically larger list case, the method has overhead below The only potential significant loss is if the stack extend caused And trying to be smart means tracking more data which itself is an [...]
In the cases that I see, if you ever need more space you probably I also like answers that have good practical performance and no
I thought about this one as well. What happens if your stack has 1000 elements and your first element
I thought about this as well. No matter what you do here the amount The only case where this is a win is if my patch grew Perl's stack
I think that my patch really is optimal. I did not think so at Cheers, |
From [Unknown Contact. See original ticket]On Wed, May 31, 2000 at 07:59:50PM -0400, John Macdonald wrote:
Yes, I thought about this too. Two objections: a) stack growth logic is outsize of our reach, so we cannot b) Checking that there is enough space on each iterations of
Do not follow this. AFAIK, stacks never shrink. Ilya |
From @gsarOn Wed, 31 May 2000 22:08:07 EDT, "Ben Tilly" wrote:
That's not enough to convince me. :-)
There is no copying of the argument list involved here. (Think
When using a separate array: * there is no per-iteration checking/copying of result elements on I would imagine each of those to be a substantial win. The only In contrast, the current approach with your modification *always* Anyway, I think the best way to settle the question is by implementing Sarathy |
From [Unknown Contact. See original ticket]In article <200006010526.WAA30143@molotok.activestate.com>,
The only way you can avoid having the copying work not go O(n*n) in new_length = max(old_length * 3/2, input_length*current_expansion_ratio) which will guess correctly the common case and have moderate copying cost |
From @btillyTon Hospel wrote:
The logic for that should not be here. My theoretical analysis, and the observed behaviour with my patch, The actual Perl *stack* has to test on an EXTEND call whether this Cheers, ________________________________________________________________________ |
From [Unknown Contact. See original ticket]In article <20000601182925.12509.qmail@hotmail.com>,
The whole point of growing the *size* exponentially is that the number
In a way that would be cleaner, yes. But I don't see how to do this since |
From @btilly
I have a patch out there to fix the current quadratic behaviour that Here is the patch: Inline Patch--- pp_ctl.c.bak Tue Mar 21 00:42:26 2000
+++ pp_ctl.c Tue Apr 25 04:03:36 2000
@@ -736,6 +736,8 @@
if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2]);
count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+ if (shift < count)
+ shift = count; /* Avoid shifting too often */
EXTEND(SP,shift);
src = SP;
-----------------------------------------------------------------------
It looks like it already does this. As for how? Ask Sarathy. How I would do it would be the following: Keep 3 pointers. One to the beginning of the stack. One to the end You see the exponential growth pattern makes so much sense that it is Based on observed performance it looks like this classic (and Then try it in Perl and curse and swear because you see that there is JavaScript: Perl: Cheers, ________________________________________________________________________ |
Migrated from rt.perl.org#3158 (status was 'resolved')
Searchable as RT3158$
The text was updated successfully, but these errors were encountered: