Skip Menu |
Report information
Id: 119635
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: nicholas <nick [at] ccl4.org>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: 5.27.6



Subject: deprecate and remove qsort?
Date: Fri, 6 Sep 2013 13:41:40 +0100
To: perlbug [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 2.5k
tl;dr: Nothing current uses C<use sort 'quicksort';> - should we remove it? The sort pragma was added by this commit in 5.7.x times: commit 84d4ea48280f6b54fdc70fe4c8b9494e3331071e Author: Jarkko Hietaniemi <jhi@iki.fi> Date: Wed Nov 21 22:25:14 2001 +0000 Implement the sort pragma. Split sort code from pp_ctl.c to pp_sort.c. Includes the quicksort stabilizing layer from John P. Linderman. -Msort=qsort or -Msort=fast is faster than without (or with -Msort=mergesort or -Msort=safe) for short random inputs, but for some reason not quite as fast as 5.6.1 qsort. More benchmarking, profiling, tuning, and optimizing definitely needed. p4raw-id: //depot/perl@13179 Part of the reason, I think, was that v5.6 and earlier's sort was a quicksort, and hence not a stable sort. By the time v5.8.0 shipped, the only options offered were 'stable', '_mergesort', '_quicksort' and the alias '_qsort', with the default as mergesort (which is stable). qw(_qsort) gets you a quicksort, qw(_qsort stable) gets you a quicksort with a fallback comparison to guarantee stability. (Slower, but stable) Of the 28 distributions that mention C<use sort>, most are in documentation: http://grep.cpan.me/?q=%5Cbuse%5Cs%2Bsort.*%28stable%7Cqsort%7Cquicksort%7Cmergesort%7Csafe%7Cfast%29 All the code is C<use sort 'stable'> with 3 exceptions. 1) This commented-out code in a regression test: # uncomment to enable compatibility with perl versions prior to 5.8 #use sort '_quicksort'; 2) 3 examples in Module-Pragma-0.02/example/sort.t (which is a copy of the core sort pragma from v5.10.0) 3) 2 uses in Class::Std::Containers, which fails tests with v5.10.0 and later (and hasn't been updated to fix this) So, 1) nothing will break if we removed _qsort. 2) sort.pm's documentation, from the start, has stated this: You can force the choice of algorithm with this pragma, but this feels heavy-handed, so the subpragmas beginning with a C<_> may not persist beyond Perl 5.8. 3) We shave off some source code: embed.fnc | 3 +- embed.h | 3 +- pp_sort.c | 866 ------------------------------------------------------------- proto.h | 7 +- 4 files changed, 6 insertions(+), 873 deletions(-) [plus a bit in lib/sort.pm, which I didn't touch for this prototype] 4) We reduce the size of pp_sort.o by about 3K (or about 0.25% of perl) but it's not exactly high maintenence code. So, should we put _qsort through a deprecation cycle, and drop it in v5.23? I'm not actually sure if it's worth it. Do the small gains outweigh the small churn? Nicholas Clark
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Fri, 6 Sep 2013 16:42:15 +0300
To: perl5-porters [...] perl.org
From: Shlomi Fish <shlomif [...] shlomifish.org>
Download (untitled) / with headers
text/plain 716b
Hi Nicholas and all, On Fri, 06 Sep 2013 05:42:18 -0700 Nicholas Clark (via RT) <perlbug-followup@perl.org> wrote: Show quoted text
> > but it's not exactly high maintenence code. > > So, should we put _qsort through a deprecation cycle, and drop it in v5.23? >
I'm OK with either way. I never used the "use sort" pragma. Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Freecell Solver - http://fc-solve.shlomifish.org/ <petn-randall> Writing your own nirvana may be easier than writing a good blog engine ;) — http://xrl.us/botw86 Please reply to list if it's a mailing list post - http://shlom.in/reply .
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Fri, 06 Sep 2013 18:54:12 +0200
To: pp <perl5-porters [...] perl.org>
From: Steffen Mueller <smueller [...] cpan.org>
Download (untitled) / with headers
text/plain 648b
On 09/06/2013 02:42 PM, Nicholas Clark (via RT) wrote: Show quoted text
> 1) nothing will break if we removed _qsort.
... on CPAN. Show quoted text
> but it's not exactly high maintenence code.
It's certainly not, nut pp_sort.c as a whole is moderately complex. Reducing the amount of code there can only be good. Show quoted text
> So, should we put _qsort through a deprecation cycle, and drop it in v5.23? > > > I'm not actually sure if it's worth it. Do the small gains outweigh the small > churn?
+1 to see it go. I'm trying not to grandly suggest that somebody (not me, likely not Nicholas) should try implementing timsort for us now instead of the current merge-alike. :) --Steffen
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Fri, 6 Sep 2013 14:59:44 -0400
To: perl5-porters [...] perl.org
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
> Part of the reason, I think, was that v5.6 and earlier's sort was a
> quicksort, and hence not a stable sort.

Stability was definitely a feature, but my best argument to Jarkko was that far too many common sequences (organ pipe, sawtooth, etc.) went quadratic, and Peter Mcilroy's merge sort did a particularly nice job on sequences with pre-existing order (linear time on sorted input).

The only sequences (that I could think of, anyway) where it was clear that qsort would outperform merge sort were those containing many repetitions of a small number (k) of keys.  Qsort's "fat pivot" guaranteed that only k passes would be required, where merge sort might need log N.

My email address in pp_sort.c is shown as jpl@research.att.com.  That will stop working in a year or so.  Better would be jpl.jpl@gmail.com.  Peter is no longer at Lucent, so his email address, pmcilroy@lucent.com, probably stopped working about a decade ago.  I don't have a current email address for him.

I have no strong opinion about removing qsort.  Seems like a good idea to me.


CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Fri, 6 Sep 2013 23:34:47 +0400
To: "John P. Linderman" <jpl.jpl [...] gmail.com>
From: Victor Efimov <victor [...] vsespb.ru>
Download (untitled) / with headers
text/plain 1.3k

2013/9/6 John P. Linderman <jpl.jpl@gmail.com>
Show quoted text
The only sequences (that I could think of, anyway) where it was clear that qsort would outperform merge sort were those containing many repetitions of a small number (k) of keys.  Qsort's "fat pivot" guaranteed that only k passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure about perl implementation).

http://en.wikipedia.org/wiki/Merge_sort
===
Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces).
===

http://en.wikipedia.org/wiki/Quicksort
===
The disadvantage of the simple version above is that it requires O(n) extra storage space, which is as bad as merge sort. The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complex version which uses an in-place partition algorithm and can achieve the complete sort using O(log n) space (not counting the input) on average (for the call stack). We start with a partition function:
===

CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Fri, 6 Sep 2013 18:30:29 -0400
To: Victor Efimov <victor [...] vsespb.ru>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download (untitled) / with headers
text/plain 3.1k
All true, all valid observations.  The merge sort implementation uses N additional *pointers* (not records), the qsort implementation sorts in place (unless you want stability, in which case it, too, uses another set of pointers, same as the merge sort implementation).  I started programming back in the 60's, when a few kilobytes of memory was all that there was, and you'd do all sorts of things that would now be considered inexcusable to save a few bytes (like using only the last two digits of the year, leading to the Y2K "crisis"). I often have to remind myself that memory is now darn near free, but my time is more expensive every year (at least until AT&T "surplussed" me).  If you need to worry about double the number of pointers, perhaps perl is not the implementation language you should be using.  Qsort, with its divide and conquer approach, automatically ends up doing a lot of processing in the fastest (i.e. smallest) cache, which is quite ironic, since quicksort was invented when memory was monolithic, and caches didn't exist.  But I attempted to fix that flaw in merge sort (at the expense of some additional stack space... it took some discipline to ignore that in the environment I now program in rather than one where I learned to program :-).  My attitude now (since you didn't ask :-) is that space matters a whole lot less than time, and cache hits matter, because they have a major impact on time.  And maintainability matters a lot too, because it relates to correctness, which really matters more than just about everything, which is why I wouldn't much mind qsort disappearing.  I hope to live long enough to experience another major paradigm shift (like exploiting multiple processors in parallel).


On Fri, Sep 6, 2013 at 3:34 PM, Victor Efimov <victor@vsespb.ru> wrote:
Show quoted text

2013/9/6 John P. Linderman <jpl.jpl@gmail.com>

The only sequences (that I could think of, anyway) where it was clear that qsort would outperform merge sort were those containing many repetitions of a small number (k) of keys.  Qsort's "fat pivot" guaranteed that only k passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure about perl implementation).

http://en.wikipedia.org/wiki/Merge_sort
===
Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces).
===

http://en.wikipedia.org/wiki/Quicksort
===
The disadvantage of the simple version above is that it requires O(n) extra storage space, which is as bad as merge sort. The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complex version which uses an in-place partition algorithm and can achieve the complete sort using O(log n) space (not counting the input) on average (for the call stack). We start with a partition function:
===


Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Sat, 7 Sep 2013 01:26:02 +0200
To: perl5-porters [...] perl.org
From: Aristotle Pagaltzis <pagaltzis [...] gmx.de>
Download (untitled) / with headers
text/plain 399b
* John P. Linderman <jpl.jpl@gmail.com> [2013-09-07 00:35]: Show quoted text
> I often have to remind myself that memory is now darn near free, > but my time is more expensive every year
Computer’s Perspective on Moore’s Law: Humans are getting more expensive at an exponential rate. —Mark S. Miller -- Aristotle Pagaltzis // <http://plasmasturm.org/>
CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Sat, 7 Sep 2013 12:17:59 +0400
To: "John P. Linderman" <jpl.jpl [...] gmail.com>
From: Victor Efimov <victor [...] vsespb.ru>
I was able to reproduce case when qsort do 3 times less comparison than mergesort.

However I was unable to find case when qsort faster when comparsion function is simple, obvious this can be the case for complex comparison functions. One also can/should use http://en.wikipedia.org/wiki/Schwartzian_transform anyway if comparison function is always slow, an this limits benefits of qsort.

==
use strict;
use warnings;

my @a = (1..3) x 100_000;
our $i;

sub mysub
{
++$i;
$a <=> $b;
}


my ($q, $m);
{
 use sort '_quicksort';
 local $i=0;
 my @b = sort mysub @a;
 $q = $i;
 print "quicksort: $i\n";
};
{
 local $i=0;
 my @b = sort mysub @a;
 $m = $i;
 print "mergesort: $i\n";
}

print $q/$m, "\n";
__END_
quicksort: 599997
mergesort: 1746356
0.343570841225958
==



2013/9/7 John P. Linderman <jpl.jpl@gmail.com>
Show quoted text
All true, all valid observations.  The merge sort implementation uses N additional *pointers* (not records), the qsort implementation sorts in place (unless you want stability, in which case it, too, uses another set of pointers, same as the merge sort implementation).  I started programming back in the 60's, when a few kilobytes of memory was all that there was, and you'd do all sorts of things that would now be considered inexcusable to save a few bytes (like using only the last two digits of the year, leading to the Y2K "crisis"). I often have to remind myself that memory is now darn near free, but my time is more expensive every year (at least until AT&T "surplussed" me).  If you need to worry about double the number of pointers, perhaps perl is not the implementation language you should be using.  Qsort, with its divide and conquer approach, automatically ends up doing a lot of processing in the fastest (i.e. smallest) cache, which is quite ironic, since quicksort was invented when memory was monolithic, and caches didn't exist.  But I attempted to fix that flaw in merge sort (at the expense of some additional stack space... it took some discipline to ignore that in the environment I now program in rather than one where I learned to program :-).  My attitude now (since you didn't ask :-) is that space matters a whole lot less than time, and cache hits matter, because they have a major impact on time.  And maintainability matters a lot too, because it relates to correctness, which really matters more than just about everything, which is why I wouldn't much mind qsort disappearing.  I hope to live long enough to experience another major paradigm shift (like exploiting multiple processors in parallel).


On Fri, Sep 6, 2013 at 3:34 PM, Victor Efimov <victor@vsespb.ru> wrote:

2013/9/6 John P. Linderman <jpl.jpl@gmail.com>

The only sequences (that I could think of, anyway) where it was clear that qsort would outperform merge sort were those containing many repetitions of a small number (k) of keys.  Qsort's "fat pivot" guaranteed that only k passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure about perl implementation).

http://en.wikipedia.org/wiki/Merge_sort
===
Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces).
===

http://en.wikipedia.org/wiki/Quicksort
===
The disadvantage of the simple version above is that it requires O(n) extra storage space, which is as bad as merge sort. The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complex version which uses an in-place partition algorithm and can achieve the complete sort using O(log n) space (not counting the input) on average (for the call stack). We start with a partition function:
===



CC: smueller [...] cpan.org
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Mon, 9 Sep 2013 10:55:33 -0400
To: Perl5 Porters <perl5-porters [...] perl.org>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>

Message body is not shown because it is too large.

Download (untitled) / with headers
text/html 19.6k

Message body is not shown because it is too large.

CC: smueller [...] cpan.org
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Mon, 9 Sep 2013 11:31:14 -0400
To: Perl5 Porters <perl5-porters [...] perl.org>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download timsort
application/octet-stream 17k

Message body not shown because it is not plain text.

Download (untitled) / with headers
text/plain 816b
On Fri, 06 Sep 2013 18:54:12, Steffen Mueller <smueller@cpan.org> wrote:

>> +1 to see it go. I'm trying not to grandly suggest
>> that somebody (not me, likely not Nicholas) should try
>> implementing timsort for us now instead of the current
>> merge-alike. :)

My recent attempt to copy and paste a commentary on timsort appears to have ended up where the sun don't shine.  I'll attach it, knowing that attachments don't always show up well, either.

Executive summary: I see no reason to believe that our sort, based on code written (and analyzed) by Peter Mcilroy, is not better than timsort.  The (long) attachment explains how I came to that conclusion.  If someone thinks otherwise, the world at large would profit from a careful comparison.  It won't be me, either, doing it (beyond the attached analysis). 
CC: perl5-porters [...] perl.org, smueller [...] cpan.org
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 13:41:17 +0100
To: "John P. Linderman" <jpl.jpl [...] gmail.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.5k
On Fri, Sep 06, 2013 at 02:59:44PM -0400, John P. Linderman wrote: Show quoted text
> My email address in pp_sort.c is shown as jpl@research.att.com. That will > stop working in a year or so. Better would be jpl.jpl@gmail.com. Peter is
On Mon, Sep 09, 2013 at 10:55:33AM -0400, John P. Linderman wrote: Show quoted text
> Here's a more accurate citation for Peter's sort, prior to our working > together. Thanks, Tim! Much more useful than mine. > > "Optimistic Sorting and Information Theoretic Complexity" > Peter McIlroy > SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp > 467-474, Austin, Texas, 25-27 January 1993.
So this is an appropriate change to make to pp_sort.c? diff --git a/pp_sort.c b/pp_sort.c index b65e9eb..ab4a6fd 100644 --- a/pp_sort.c +++ b/pp_sort.c @@ -53,9 +53,15 @@ * The original code was written in conjunction with BSD Computer Software * Research Group at University of California, Berkeley. * - * See also: "Optimistic Merge Sort" (SODA '92) + * See also: "Optimistic Sorting and Information Theoretic Complexity" + * Peter McIlroy + * SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), + * pp 467-474, Austin, Texas, 25-27 January 1993. * - * The integration to Perl is by John P. Linderman <jpl@research.att.com>. + * The integration to Perl is by John P. Linderman <jpl.jpl@gmail.com>. + * + * John described some more details of the implementation here: + * http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-09/msg00345.html * * The code can be distributed under the same terms as Perl itself. * Nicholas Clark
CC: pp <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 13:50:38 +0100
To: Steffen Mueller <smueller [...] cpan.org>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 615b
On Fri, Sep 06, 2013 at 06:54:12PM +0200, Steffen Mueller wrote: Show quoted text
> On 09/06/2013 02:42 PM, Nicholas Clark (via RT) wrote:
> > 1) nothing will break if we removed _qsort.
> > ... on CPAN.
Good point. Nothing on CPAN will break, and probably nothing else. Show quoted text
> > but it's not exactly high maintenence code.
> > It's certainly not, nut pp_sort.c as a whole is moderately complex. > Reducing the amount of code there can only be good.
That seems to be the principal benefit. And the cost is either zero, or pretty close. (This is one of those points where Google Code Search would have been useful) Nicholas Clark
CC: perl5-porters [...] perl.org
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 13:53:46 +0100
To: "John P. Linderman" <jpl.jpl [...] gmail.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.1k
On Fri, Sep 06, 2013 at 02:59:44PM -0400, John P. Linderman wrote: Show quoted text
> > Part of the reason, I think, was that v5.6 and earlier's sort was a > > quicksort, and hence not a stable sort.
> > Stability was definitely a feature, but my best argument to Jarkko was that > far too many common sequences (organ pipe, sawtooth, etc.) went quadratic, > and Peter Mcilroy's merge sort did a particularly nice job on sequences > with pre-existing order (linear time on sorted input). > > The only sequences (that I could think of, anyway) where it was clear that > qsort would outperform merge sort were those containing many repetitions of > a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k > passes would be required, where merge sort might need log N.
The bit that keeps bugging me is "how many people know that their data is usually going to be in a particular form that qsort favours"? In that, the amount of use cases where it's correct to specify qsort instead of mergesort are small, the amount of effort needed to determine this is high, and the gain itself is small? In which case, seems that choosing a specific sort algorithm is not the first place to look at optimising. Nicholas Clark
CC: Perl5 Porters <perl5-porters [...] perl.org>, Steffen Mueller <smueller [...] cpan.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 09:58:28 -0400
To: Nicholas Clark <nick [...] ccl4.org>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download (untitled) / with headers
text/plain 2.2k
If you don't mind, I'd prefer to leave off that final reference to my timsort analysis.  You can't possibly get "more details of the implementation" than pp_sort.c itself, and, particularly if I do something about making instability an option, you might get directed to something out of date.  Which might also be true if Tim changes timsort.  demerphq is nudging me to create a wikipedia page about perl's sort.  *If* I do that, it would make a better reference, since it could be kept up to date with current implementations of both.


On Thu, Sep 12, 2013 at 8:41 AM, Nicholas Clark <nick@ccl4.org> wrote:
Show quoted text
On Fri, Sep 06, 2013 at 02:59:44PM -0400, John P. Linderman wrote:

> My email address in pp_sort.c is shown as jpl@research.att.com.  That will
> stop working in a year or so.  Better would be jpl.jpl@gmail.com.  Peter is

On Mon, Sep 09, 2013 at 10:55:33AM -0400, John P. Linderman wrote:

> Here's a more accurate citation for Peter's sort, prior to our working
> together.  Thanks, Tim!  Much more useful than mine.
>
>    "Optimistic Sorting and Information Theoretic Complexity"
>    Peter McIlroy
>    SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp
>    467-474, Austin, Texas, 25-27 January 1993.

So this is an appropriate change to make to pp_sort.c?

diff --git a/pp_sort.c b/pp_sort.c
index b65e9eb..ab4a6fd 100644
--- a/pp_sort.c
+++ b/pp_sort.c
@@ -53,9 +53,15 @@
  * The original code was written in conjunction with BSD Computer Software
  * Research Group at University of California, Berkeley.
  *
- * See also: "Optimistic Merge Sort" (SODA '92)
+ * See also: "Optimistic Sorting and Information Theoretic Complexity"
+ *           Peter McIlroy
+ *           SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms),
+ *           pp 467-474, Austin, Texas, 25-27 January 1993.
  *
- * The integration to Perl is by John P. Linderman <jpl@research.att.com>.
+ * The integration to Perl is by John P. Linderman <jpl.jpl@gmail.com>.
+ *
+ * John described some more details of the implementation here:
+ * http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-09/msg00345.html
  *
  * The code can be distributed under the same terms as Perl itself.
  *


Nicholas Clark

CC: Perl5 Porters <perl5-porters [...] perl.org>, Steffen Mueller <smueller [...] cpan.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 15:31:30 +0100
To: "John P. Linderman" <jpl.jpl [...] gmail.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 899b
On Thu, Sep 12, 2013 at 09:58:28AM -0400, John P. Linderman wrote: Show quoted text
> If you don't mind, I'd prefer to leave off that final reference to my > timsort analysis. You can't possibly get "more details of the > implementation" than pp_sort.c itself, and, particularly if I do something > about making instability an option, you might get directed to something out > of date. Which might also be true if Tim changes timsort. demerphq is > nudging me to create a wikipedia page about perl's sort. *If* I do that, > it would make a better reference, since it could be kept up to date with > current implementations of both.
I just culled both lines. If a wikipedia page appears, we can update pp_sort.c to point to it. [And then the wikipedia talk folks can debate whether it's notable, and if they so choose create a 404 error. Until we update the link to point to their history :-)] Nicholas Clark
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Thu Sep 12 05:54:17 2013, nicholas wrote: Show quoted text
> On Fri, Sep 06, 2013 at 02:59:44PM -0400, John P. Linderman wrote:
> > > Part of the reason, I think, was that v5.6 and earlier's sort was
> a
> > > quicksort, and hence not a stable sort.
> > > > Stability was definitely a feature, but my best argument to Jarkko
> was that
> > far too many common sequences (organ pipe, sawtooth, etc.) went
> quadratic,
> > and Peter Mcilroy's merge sort did a particularly nice job on
> sequences
> > with pre-existing order (linear time on sorted input). > > > > The only sequences (that I could think of, anyway) where it was
> clear that
> > qsort would outperform merge sort were those containing many
> repetitions of
> > a small number (k) of keys. Qsort's "fat pivot" guaranteed that
> only k
> > passes would be required, where merge sort might need log N.
> > The bit that keeps bugging me is "how many people know that their data > is > usually going to be in a particular form that qsort favours"? > > In that, the amount of use cases where it's correct to specify qsort > instead > of mergesort are small, the amount of effort needed to determine this > is > high, and the gain itself is small? > > In which case, seems that choosing a specific sort algorithm is not > the > first place to look at optimising.
But would it be possible for perl to detect datasets that work better with qsort? Or (most likely) would the time the detection takes outweigh the benefits? -- Father Chrysostomos
CC: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Thu, 12 Sep 2013 11:46:33 -0400
To: Nicholas Clark <nick [...] ccl4.org>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download (untitled) / with headers
text/plain 4.6k
I'm actually in complete agreement.  I just wanted to acknowledge that perl's mergesort was not always better than perl's quicksort on a fairly extensive set of inputs.  But if you know you are dealing with an input of the class I mentioned, you'd be better off using a hash than with either sort.  In fact, I am so much in agreement about choosing algorithms being less meaningful than choosing that I gave the algorithm-choosing subpragmas funny names, starting with an underscore, and warned that they might not persist beyond 5.8.  They had better longevity than I would have predicted.  Obviously, if qsort is deprecated, so, too, must be the funny subpragmas.

A line in Peter's SODA paper has always intrigued me

Versus the fastest readily available qsort, [a citation here to Jon Bentley's and Doug Mcilroy's "Engineering a Sort Function" paper, which had been submitted, but not yet accepted, for publication] the hybrid [Peter's mergesort] is 25% slower on random integers; for random strings, time differences are negligible.

Qsort is constrained to elements of fixed size, and the elements themselves are moved about.  As partitions get smaller and smaller, the elements are brought closer and closer together, "finding" faster and faster cache memory, if it exists.  Mergesort sorts pointers to elements; the elements themselves are never moved.  Although the pointers will be packed together, enjoying some of the benefits of whatever cache memory may exist, the elements remain scattered, probably leading to a cache miss when referenced.  And the qsort comparison routine for comparing integers is very short and efficient, so saving compares may not be so important.  I figured maybe what Peter reported was a consequence of cache memory and simple comparisons.  But if that were true for integers, why wasn't it also true for strings?  String comparison is more complicated than integer comparison.  Saving comparisons becomes more important.  Or maybe because integers are short, and easy to move about, and strings are (probably, it's not stated) longer, the extra time spent moving data counterbalances the advantages of cache memory.  In any event, the observed behavior had plausible explanations.

But now I wonder if there may not have been another contributing cause.  Qsort, unconstrained by stability, is free to ignore the position of equal integers.  Stable mergesort is not.  As I mentioned elsewhere, this can handicap mergesort when the input has many repeated items.  If the "random integers" were selected from a pool significantly smaller than the total number of elements (we don't know), qsort would have another advantage.  But if (lots of "ifs" piling up here) that factors into the performance difference Peter mentioned, and the performance difference alluded to in pp_sort.c

The original merge sort, in use since 5.7, was as fast as, or faster than, qsort on many platforms, but slower than qsort, conspicuously so, on others.

then the ability to ignore stability will take away that advantage of quicksort, another reason why quicksort might be deprecated without much pain.  Since I got lots of +1s for taking a look at ignoring stability (even though most of them came from one person :-), I'll do that.  It looks like the changes to the merge phase will be quite simple, but changes to the setup of initial runs will be anything but.


On Thu, Sep 12, 2013 at 8:53 AM, Nicholas Clark <nick@ccl4.org> wrote:
Show quoted text
On Fri, Sep 06, 2013 at 02:59:44PM -0400, John P. Linderman wrote:
> > Part of the reason, I think, was that v5.6 and earlier's sort was a
> > quicksort, and hence not a stable sort.
>
> Stability was definitely a feature, but my best argument to Jarkko was that
> far too many common sequences (organ pipe, sawtooth, etc.) went quadratic,
> and Peter Mcilroy's merge sort did a particularly nice job on sequences
> with pre-existing order (linear time on sorted input).
>
> The only sequences (that I could think of, anyway) where it was clear that
> qsort would outperform merge sort were those containing many repetitions of
> a small number (k) of keys.  Qsort's "fat pivot" guaranteed that only k
> passes would be required, where merge sort might need log N.

The bit that keeps bugging me is "how many people know that their data is
usually going to be in a particular form that qsort favours"?

In that, the amount of use cases where it's correct to specify qsort instead
of mergesort are small, the amount of effort needed to determine this is
high, and the gain itself is small?

In which case, seems that choosing a specific sort algorithm is not the
first place to look at optimising.

Nicholas Clark

From: Zefram <zefram [...] fysh.org>
To: perl5-porters [...] perl.org
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Mon, 13 Nov 2017 02:44:28 +0000
Download (untitled) / with headers
text/plain 564b
Discussion on this petered out, having mostly got sidetracked into timsort. A couple of people were in favour of removing qsort and the algorithm-specific pragmata, and no one opposed. Adding my opinion, I too think we should remove these things. Pragmata to specify whether stability is required make sense; choosing specific algorithms does not. I also think, given the underscore spelling and the documented status of thesse subpragmata, that we do not need a deprecation cycle to remove them. If no one objects, I intend to go ahead with removal. -zefram
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
To: Perl5 Porters <perl5-porters [...] perl.org>
Subject: Re: [perl #119635] deprecate and remove qsort?
Date: Mon, 13 Nov 2017 08:32:21 -0500
Download (untitled) / with headers
text/plain 1.4k
Zefram wrote:
Show quoted text
Date: Mon, 13 Nov 2017 02:44:28 +0000
Subject: Re: [perl #119635] deprecate and remove qsort?
Discussion on this petered out, having mostly got sidetracked into
timsort.  A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed.  Adding my opinion,
I too think we should remove these things.  Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

I've been among those proposing that qsort may be taking more code-space than it's worth (and it was I who put the underscores in the subpragmata). I'll just point out that there are classes of inputs for which qsort may well outperform mergesort. If there are a small number of different sort keys, like two-letter state abbreviations in the US, in a large collection of records, qsort will need something like log(#-of-different-keys) rather than log(#-of-records) passes. And it sorts in-place, reducing storage requirements. Both of these advantages disappear if you demand both qsort and stability. So that combination almost surely should die. I was planning to replace the existing qsort code with the Bentley/McIlroy algorithm. But I'm equally happy to scrap qsort altogether.
Date: Wed, 15 Nov 2017 09:16:34 -0500
Subject: Re: [perl #119635] deprecate and remove qsort?
To: Perl5 Porters <perl5-porters [...] perl.org>
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download (untitled) / with headers
text/plain 696b
A couple days ago, I sent a reply to Zefram's proposal to remove qsort. I haven't seen my reply in the porter's digest. Maybe I just missed it, so I won't repeat the entire post. The important part was that there are classes of inputs for which qsort will outperform mergesort. If the number of distinct keys, k, is much smaller than the number of records, r, qsort will only need log(k) passes to mergesort's log(r). For example, there are only 50-some states and territories in the US, so sorting by state/territory will benefit from qsort, 6 or 7 passes regardless of the number of records. Perhaps what is needed is a better subpragmata, maybe something like "few_keys" rather than "_qsort"?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 642b
On Sun, 12 Nov 2017 18:44:44 -0800, zefram@fysh.org wrote: Show quoted text
> Discussion on this petered out, having mostly got sidetracked into > timsort. A couple of people were in favour of removing qsort and the > algorithm-specific pragmata, and no one opposed. Adding my opinion, > I too think we should remove these things. Pragmata to specify whether > stability is required make sense; choosing specific algorithms does not. > I also think, given the underscore spelling and the documented status > of thesse subpragmata, that we do not need a deprecation cycle to > remove them. > > If no one objects, I intend to go ahead with removal.
Agreed.
To: perl5-porters [...] perl.org
From: Zefram <zefram [...] fysh.org>
Date: Fri, 17 Nov 2017 05:37:00 +0000
Subject: Re: [perl #119635] deprecate and remove qsort?
Done in commit e2091bb6ea87111c32936c9170405a44995be338. -zefram
Subject: Re: [perl #119635] deprecate and remove qsort?
CC: perl5-porters [...] perl.org, John P. Linderman <jpl.jpl [...] gmail.com>
To: perlbug-followup [...] perl.org
Date: Wed, 06 Dec 2017 15:23:22 +0100
From: Ævar Arnfjörð Bjarmason <avarab [...] gmail.com>
Download (untitled) / with headers
text/plain 5.2k
On Thu, Nov 16 2017, Sawyer X. via jotted: Show quoted text
> On Sun, 12 Nov 2017 18:44:44 -0800, zefram@fysh.org wrote:
>> Discussion on this petered out, having mostly got sidetracked into >> timsort. A couple of people were in favour of removing qsort and the >> algorithm-specific pragmata, and no one opposed. Adding my opinion, >> I too think we should remove these things. Pragmata to specify whether >> stability is required make sense; choosing specific algorithms does not. >> I also think, given the underscore spelling and the documented status >> of thesse subpragmata, that we do not need a deprecation cycle to >> remove them. >> >> If no one objects, I intend to go ahead with removal.
> > Agreed. >
All in all I don't mind that this got removed like this, but I thought I'd point out that in the general case I think this sort of a removal is a bit cavalier given our current deprecation policy. I.e.: 1. Obscure feature is discovered 2. There's grep of the CPAN discovering no uses for it 3. It's removed in one cycle, and existing uses for it turn into an error if people upgrade Perl. There's a big DarkPAN out there. E.g. Sawyer and I both work for a well-known perl company with a big codebase whose main money-making app won't compile with 5.28 because of this, and that's some new code (C<use sort '_quicksort'>) added in April of this year. (Now, in that particular case I have no idea why _quicksort was used, and it can probably just be removed, I've contacted the author). I wonder if we should at least do this for now: diff --git a/lib/sort.pm b/lib/sort.pm index 659f3e4f4d..c48e41cf90 100644 --- a/lib/sort.pm +++ b/lib/sort.pm @@ -24,6 +24,8 @@ sub import { $^H{sort} &= ~$sort::unstable_bit; } elsif ($_ eq 'defaults') { $^H{sort} = 0; + } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') { + warn "sort: the $_ subpragma is a no-op since 5.28!"; } else { require Carp; Carp::croak("sort: unknown subpragma '$_'"); @@ -43,6 +45,8 @@ sub unimport { if ($_ eq 'stable') { $^H{sort} &= ~$sort::stable_bit; $^H{sort} |= $sort::unstable_bit; + } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') { + warn "sort: the $_ subpragma is a no-op since 5.28!"; } else { require Carp; Carp::croak("sort: unknown subpragma '$_'"); Or maybe it really is better if this dies at compile-time. FWIW I initially ran into this because I was hacking some pod docs that had to do with references to very old version (5.[68]), hadn't rebased in a while, and had this commit ready to push before I ran into a conflict with Zefram: commit 1bb8507c8f Author: Ævar Arnfjörð Bjarmason <avar@cpan.org> Date: Wed Dec 6 13:26:32 2017 +0000 perlfunc: shorten overly verbose & out of date mention of sort.pm This paragraph saying a 5.8 feature "may persist" in future versions is obviously wildly out of date as we have it now almost 16 years later. All of this is covered in the sort.pm documentation, it's enough to just briefly mention that perl uses a stable mergesort, and point to the other docs in case someone wants to use quicksort instead. --- pod/perlfunc.pod | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index b7e001fcc8..1d117994c1 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -7307,19 +7307,11 @@ L<C<grep>|/grep BLOCK LIST>) actually modifies the element in the original list. This is usually something to be avoided when writing clear code. -Perl 5.6 and earlier used a quicksort algorithm to implement sort. -That algorithm was not stable and I<could> go quadratic. (A I<stable> sort -preserves the input order of elements that compare equal. Although -quicksort's run time is O(NlogN) when averaged over all arrays of -length N, the time can be O(N**2), I<quadratic> behavior, for some -inputs.) In 5.7, the quicksort implementation was replaced with -a stable mergesort algorithm whose worst-case behavior is O(NlogN). -But benchmarks indicated that for some inputs, on some platforms, -the original quicksort was faster. 5.8 has a L<sort> pragma for -limited control of the sort. Its rather blunt control of the -underlying algorithm may not persist into future Perls, but the -ability to characterize the input or output in implementation -independent ways quite probably will. +Perl uses a I<stable> mergesort algorithm whose worst-case behavior is +O(NlogN). Benchmarks indicate that for some inputs, on some platforms, +quicksort can be faster, although it can have much worse performance +for pathological inputs. See the L<sort> module for how the sort +algorithm can be tweaked with L<a pragma|perlpragma>. I.e. my reading of these docs was that the 5.8 reference was simply some now-obsolete way of saying "we have this new feature now", not that the docs had serious about deprecating this, after RT #119635 to remove it didn't even get filed until 5.18 had been released!
Subject: Re: [perl #119635] deprecate and remove qsort?
From: Zefram <zefram [...] fysh.org>
Date: Wed, 6 Dec 2017 14:51:04 +0000
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 888b
Avar Arnfjorth Bjarmason wrote: Show quoted text
>I wonder if we should at least do this for now:
... Show quoted text
> + } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') { > + warn "sort: the $_ subpragma is a no-op since 5.28!";
There's no real need for that: the error that it'll produce leads one straight to the problem, and the change required to get it running again is very straightforward. But if we were to make these subpragmata legal again, it shouldn't just be in the open-ended way that you suggest. We don't want to have that baggage around forever. We should deprecate them, with a defined deletion date, and on the deletion date they should become illegal (as they are now in blead). The immediate implication is that, rather than an unconditional warning of the type you show, they should elicit a categorised deprecation warning that advertises the deletion date. -zefram
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Subject: Re: [perl #119635] deprecate and remove qsort?
To: Ævar Arnfjörð Bjarmason <avarab [...] gmail.com>
CC: perlbug-followup [...] perl.org, Perl5 Porters <perl5-porters [...] perl.org>, Zefram <zefram [...] fysh.org>
Date: Wed, 6 Dec 2017 10:37:09 -0500
Download (untitled) / with headers
text/plain 7.3k
I'll admit that I was a bit disappointed that my response to Zefram's proposal to pull the plug on qsort somehow never appeared in the digest, and that subsequent comments concerning qsort appeared in the digest, but the full text of those comments didn't appear in my version of the digest, so I couldn't reply to them either. It doesn't matter much now, but I wonder if other porters see similar problems.

I spoke with Jon Bentley last week. He confirmed my observation that if N records have only K distinct keys, qsort can be made to behave like NlogK rather than NlogN. If you are sorting on something like month (K==12) or states in the US (K==50), that can be a significant improvement when N is large. After a bit of thought, I realized that qsort can be stabilized by allocating another copy of the array pointers without treating all keys as being distinct (as the current stabilized implementation, of which I am guilty, does, losing the logK vs logN advantage).

It would be a shame to deny users the performance gains that qsort makes possible in certain circumstances. But the subpragmata for achieving those gains (guilty again) were butt-ugly. Users shouldn't be mentioning algorithms, the details of which are subject to change anyway. On the other hand, it is entirely reasonable that users be able to express behavior, like stability, which may or may not be important to them. We already have subpragmata for stability. Users should be able to describe properties of the data of which they may be aware (and which the sort code cannot determine), like estimates of the number of distinct keys. Something logarithmic like 10KEYS, 100KEYS, 1000KEYS etc could express estimates. If we want to do the best we can for the users, the discussion should be about good, subpragmatic, ways to express qualities of the data, and let the sort code figure out the best available algorithms.

On Wed, Dec 6, 2017 at 9:23 AM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
Show quoted text

On Thu, Nov 16 2017, Sawyer X. via jotted:

> On Sun, 12 Nov 2017 18:44:44 -0800, zefram@fysh.org wrote:
>> Discussion on this petered out, having mostly got sidetracked into
>> timsort.  A couple of people were in favour of removing qsort and the
>> algorithm-specific pragmata, and no one opposed.  Adding my opinion,
>> I too think we should remove these things.  Pragmata to specify whether
>> stability is required make sense; choosing specific algorithms does not.
>> I also think, given the underscore spelling and the documented status
>> of thesse subpragmata, that we do not need a deprecation cycle to
>> remove them.
>>
>> If no one objects, I intend to go ahead with removal.
>
> Agreed.
>

All in all I don't mind that this got removed like this, but I thought
I'd point out that in the general case I think this sort of a removal is
a bit cavalier given our current deprecation policy.

I.e.:

 1. Obscure feature is discovered
 2. There's grep of the CPAN discovering no uses for it
 3. It's removed in one cycle, and existing uses for it turn into an
    error if people upgrade Perl.

There's a big DarkPAN out there. E.g. Sawyer and I both work for a
well-known perl company with a big codebase whose main money-making app
won't compile with 5.28 because of this, and that's some new code (C<use
sort '_quicksort'>) added in April of this year.

(Now, in that particular case I have no idea why _quicksort was used,
and it can probably just be removed, I've contacted the author).

I wonder if we should at least do this for now:

    diff --git a/lib/sort.pm b/lib/sort.pm
    index 659f3e4f4d..c48e41cf90 100644
    --- a/lib/sort.pm
    +++ b/lib/sort.pm
    @@ -24,6 +24,8 @@ sub import {
                $^H{sort} &= ~$sort::unstable_bit;
            } elsif ($_ eq 'defaults') {
                $^H{sort} =   0;
    +       } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') {
    +           warn "sort: the $_ subpragma is a no-op since 5.28!";
            } else {
                require Carp;
                Carp::croak("sort: unknown subpragma '$_'");
    @@ -43,6 +45,8 @@ sub unimport {
            if ($_ eq 'stable') {
                $^H{sort} &= ~$sort::stable_bit;
                $^H{sort} |=  $sort::unstable_bit;
    +       } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') {
    +           warn "sort: the $_ subpragma is a no-op since 5.28!";
            } else {
                require Carp;
                Carp::croak("sort: unknown subpragma '$_'");

Or maybe it really is better if this dies at compile-time.

FWIW I initially ran into this because I was hacking some pod docs that
had to do with references to very old version (5.[68]), hadn't rebased
in a while, and had this commit ready to push before I ran into a
conflict with Zefram:

    commit 1bb8507c8f
    Author: Ævar Arnfjörð Bjarmason <avar@cpan.org>
    Date:   Wed Dec 6 13:26:32 2017 +0000

        perlfunc: shorten overly verbose & out of date mention of sort.pm

        This paragraph saying a 5.8 feature "may persist" in future versions
        is obviously wildly out of date as we have it now almost 16 years
        later. All of this is covered in the sort.pm documentation, it's
        enough to just briefly mention that perl uses a stable mergesort, and
        point to the other docs in case someone wants to use quicksort
        instead.
    ---
     pod/perlfunc.pod | 18 +++++-------------
     1 file changed, 5 insertions(+), 13 deletions(-)

    diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
    index b7e001fcc8..1d117994c1 100644
    --- a/pod/perlfunc.pod
    +++ b/pod/perlfunc.pod
    @@ -7307,19 +7307,11 @@ L<C<grep>|/grep BLOCK LIST>)
     actually modifies the element in the original list.  This is usually
     something to be avoided when writing clear code.

    -Perl 5.6 and earlier used a quicksort algorithm to implement sort.
    -That algorithm was not stable and I<could> go quadratic.  (A I<stable> sort
    -preserves the input order of elements that compare equal.  Although
    -quicksort's run time is O(NlogN) when averaged over all arrays of
    -length N, the time can be O(N**2), I<quadratic> behavior, for some
    -inputs.)  In 5.7, the quicksort implementation was replaced with
    -a stable mergesort algorithm whose worst-case behavior is O(NlogN).
    -But benchmarks indicated that for some inputs, on some platforms,
    -the original quicksort was faster.  5.8 has a L<sort> pragma for
    -limited control of the sort.  Its rather blunt control of the
    -underlying algorithm may not persist into future Perls, but the
    -ability to characterize the input or output in implementation
    -independent ways quite probably will.
    +Perl uses a I<stable> mergesort algorithm whose worst-case behavior is
    +O(NlogN). Benchmarks indicate that for some inputs, on some platforms,
    +quicksort can be faster, although it can have much worse performance
    +for pathological inputs. See the L<sort> module for how the sort
    +algorithm can be tweaked with L<a pragma|perlpragma>.

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

CC: perlbug-followup [...] perl.org, Perl5 Porters <perl5-porters [...] perl.org>, Zefram <zefram [...] fysh.org>
To: Ævar Arnfjörð Bjarmason <avarab [...] gmail.com>
Date: Wed, 6 Dec 2017 10:44:18 -0500
Subject: Re: [perl #119635] deprecate and remove qsort?
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
Download (untitled) / with headers
text/plain 8.2k
ARRGH! My response again disappeared from the digest I just received. Here's how my copy ends, omitting my reply, whether of not I click on the View entire message link. Massively annoying!

Show quoted text
I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

...

[Message clipped]  View entire message

On Wed, Dec 6, 2017 at 10:37 AM, John P. Linderman <jpl.jpl@gmail.com> wrote:
Show quoted text
I'll admit that I was a bit disappointed that my response to Zefram's proposal to pull the plug on qsort somehow never appeared in the digest, and that subsequent comments concerning qsort appeared in the digest, but the full text of those comments didn't appear in my version of the digest, so I couldn't reply to them either. It doesn't matter much now, but I wonder if other porters see similar problems.

I spoke with Jon Bentley last week. He confirmed my observation that if N records have only K distinct keys, qsort can be made to behave like NlogK rather than NlogN. If you are sorting on something like month (K==12) or states in the US (K==50), that can be a significant improvement when N is large. After a bit of thought, I realized that qsort can be stabilized by allocating another copy of the array pointers without treating all keys as being distinct (as the current stabilized implementation, of which I am guilty, does, losing the logK vs logN advantage).

It would be a shame to deny users the performance gains that qsort makes possible in certain circumstances. But the subpragmata for achieving those gains (guilty again) were butt-ugly. Users shouldn't be mentioning algorithms, the details of which are subject to change anyway. On the other hand, it is entirely reasonable that users be able to express behavior, like stability, which may or may not be important to them. We already have subpragmata for stability. Users should be able to describe properties of the data of which they may be aware (and which the sort code cannot determine), like estimates of the number of distinct keys. Something logarithmic like 10KEYS, 100KEYS, 1000KEYS etc could express estimates. If we want to do the best we can for the users, the discussion should be about good, subpragmatic, ways to express qualities of the data, and let the sort code figure out the best available algorithms.

On Wed, Dec 6, 2017 at 9:23 AM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

On Thu, Nov 16 2017, Sawyer X. via jotted:

> On Sun, 12 Nov 2017 18:44:44 -0800, zefram@fysh.org wrote:
>> Discussion on this petered out, having mostly got sidetracked into
>> timsort.  A couple of people were in favour of removing qsort and the
>> algorithm-specific pragmata, and no one opposed.  Adding my opinion,
>> I too think we should remove these things.  Pragmata to specify whether
>> stability is required make sense; choosing specific algorithms does not.
>> I also think, given the underscore spelling and the documented status
>> of thesse subpragmata, that we do not need a deprecation cycle to
>> remove them.
>>
>> If no one objects, I intend to go ahead with removal.
>
> Agreed.
>

All in all I don't mind that this got removed like this, but I thought
I'd point out that in the general case I think this sort of a removal is
a bit cavalier given our current deprecation policy.

I.e.:

 1. Obscure feature is discovered
 2. There's grep of the CPAN discovering no uses for it
 3. It's removed in one cycle, and existing uses for it turn into an
    error if people upgrade Perl.

There's a big DarkPAN out there. E.g. Sawyer and I both work for a
well-known perl company with a big codebase whose main money-making app
won't compile with 5.28 because of this, and that's some new code (C<use
sort '_quicksort'>) added in April of this year.

(Now, in that particular case I have no idea why _quicksort was used,
and it can probably just be removed, I've contacted the author).

I wonder if we should at least do this for now:

    diff --git a/lib/sort.pm b/lib/sort.pm
    index 659f3e4f4d..c48e41cf90 100644
    --- a/lib/sort.pm
    +++ b/lib/sort.pm
    @@ -24,6 +24,8 @@ sub import {
                $^H{sort} &= ~$sort::unstable_bit;
            } elsif ($_ eq 'defaults') {
                $^H{sort} =   0;
    +       } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') {
    +           warn "sort: the $_ subpragma is a no-op since 5.28!";
            } else {
                require Carp;
                Carp::croak("sort: unknown subpragma '$_'");
    @@ -43,6 +45,8 @@ sub unimport {
            if ($_ eq 'stable') {
                $^H{sort} &= ~$sort::stable_bit;
                $^H{sort} |=  $sort::unstable_bit;
    +       } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') {
    +           warn "sort: the $_ subpragma is a no-op since 5.28!";
            } else {
                require Carp;
                Carp::croak("sort: unknown subpragma '$_'");

Or maybe it really is better if this dies at compile-time.

FWIW I initially ran into this because I was hacking some pod docs that
had to do with references to very old version (5.[68]), hadn't rebased
in a while, and had this commit ready to push before I ran into a
conflict with Zefram:

    commit 1bb8507c8f
    Author: Ævar Arnfjörð Bjarmason <avar@cpan.org>
    Date:   Wed Dec 6 13:26:32 2017 +0000

        perlfunc: shorten overly verbose & out of date mention of sort.pm

        This paragraph saying a 5.8 feature "may persist" in future versions
        is obviously wildly out of date as we have it now almost 16 years
        later. All of this is covered in the sort.pm documentation, it's
        enough to just briefly mention that perl uses a stable mergesort, and
        point to the other docs in case someone wants to use quicksort
        instead.
    ---
     pod/perlfunc.pod | 18 +++++-------------
     1 file changed, 5 insertions(+), 13 deletions(-)

    diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
    index b7e001fcc8..1d117994c1 100644
    --- a/pod/perlfunc.pod
    +++ b/pod/perlfunc.pod
    @@ -7307,19 +7307,11 @@ L<C<grep>|/grep BLOCK LIST>)
     actually modifies the element in the original list.  This is usually
     something to be avoided when writing clear code.

    -Perl 5.6 and earlier used a quicksort algorithm to implement sort.
    -That algorithm was not stable and I<could> go quadratic.  (A I<stable> sort
    -preserves the input order of elements that compare equal.  Although
    -quicksort's run time is O(NlogN) when averaged over all arrays of
    -length N, the time can be O(N**2), I<quadratic> behavior, for some
    -inputs.)  In 5.7, the quicksort implementation was replaced with
    -a stable mergesort algorithm whose worst-case behavior is O(NlogN).
    -But benchmarks indicated that for some inputs, on some platforms,
    -the original quicksort was faster.  5.8 has a L<sort> pragma for
    -limited control of the sort.  Its rather blunt control of the
    -underlying algorithm may not persist into future Perls, but the
    -ability to characterize the input or output in implementation
    -independent ways quite probably will.
    +Perl uses a I<stable> mergesort algorithm whose worst-case behavior is
    +O(NlogN). Benchmarks indicate that for some inputs, on some platforms,
    +quicksort can be faster, although it can have much worse performance
    +for pathological inputs. See the L<sort> module for how the sort
    +algorithm can be tweaked with L<a pragma|perlpragma>.

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!


Subject: Re: [perl #119635] deprecate and remove qsort?
To: perl5-porters [...] perl.org
Date: Mon, 18 Dec 2017 12:44:36 +0200
From: Sawyer X <xsawyerx [...] gmail.com>
On 12/06/2017 04:51 PM, Zefram wrote: Show quoted text
> Avar Arnfjorth Bjarmason wrote:
>> I wonder if we should at least do this for now:
> ...
>> + } elsif (/^_q(?:uick)?sort$/ or $_ eq '_mergesort') { >> + warn "sort: the $_ subpragma is a no-op since 5.28!";
> There's no real need for that: the error that it'll produce leads one > straight to the problem, and the change required to get it running again > is very straightforward. But if we were to make these subpragmata legal > again, it shouldn't just be in the open-ended way that you suggest. > We don't want to have that baggage around forever. We should deprecate > them, with a defined deletion date, and on the deletion date they should > become illegal (as they are now in blead). The immediate implication > is that, rather than an unconditional warning of the type you show, > they should elicit a categorised deprecation warning that advertises > the deletion date.
Is this a price too high to pay for deprecating this over time? In other words, why avoid deprecating this cleanly as we usually do?
Subject: Re: [perl #119635] deprecate and remove qsort?
From: Zefram <zefram [...] fysh.org>
Date: Mon, 18 Dec 2017 11:18:01 +0000
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 326b
Sawyer X wrote: Show quoted text
>In other words, why avoid deprecating this cleanly as we usually do?
Because it's never been a fully supported feature. It has never incurred the compatibility obligations that stable features do. With that constraint not applying, we get a maintainability win from being rid of the code quickly. -zefram
Date: Mon, 18 Dec 2017 14:54:56 +0200
To: perl5-porters [...] perl.org
Subject: Re: [perl #119635] deprecate and remove qsort?
From: Sawyer X <xsawyerx [...] gmail.com>
Download (untitled) / with headers
text/plain 277b
On 12/18/2017 01:18 PM, Zefram wrote: Show quoted text
> Sawyer X wrote:
>> In other words, why avoid deprecating this cleanly as we usually do?
> Because it's never been a fully supported feature. It has never incurred > the compatibility obligations that stable features do.
Good reason.
To: Perl5 Porters <perl5-porters [...] perl.org>
Date: Tue, 19 Dec 2017 09:31:28 -0500
Subject: Subject: Re: [perl #119635] deprecate and remove qsort?
From: "John P. Linderman" <jpl.jpl [...] gmail.com>
​>​
On 12/18/2017 01:18 PM, Zefram wrote:

​>​
> Sawyer X wrote:

​>​
>> In other words, why avoid deprecating this cleanly as we usually do?

​>​
> Because it's never been a fully supported feature.  It has never incurred

>> the compatibility obligations that stable features do.

Long term, the _qsort subpragma surely deserves to die. It asks the users to understand the innards of pp_sort.c, which they almost certainly won't, and which are, in any event, subject to change. But I have long been puzzled by the observation in perldoc -f sort that

Show quoted text
... benchmarks indicated that for some inputs, on some platforms, the original quicksort was faster.

​I'm now convinced that this had much less to do with platform than with the inputs, and that

Show quoted text
...the ability to characterize the input or output in implementation independent ways
​is a worthy goal for the sort pragma. Many of us would consider a 10% speedup in something as common as sorting ​as worth some maintainability loss, and I believe there will be inputs for which quicksort will be multiple times faster than mergesort. I'm trying to instrument a version of pp_sort.c where quicksort is still present to replace "I believe" with "I can show". In a similar vein, "I believe" that the Bentley/McIlroy quicksort


with its sophisticated selection of pivot elements, will do an even better job on such inputs. My current goal is to instrument a version of pp_sort.c with both quicksort implementations, and with mergesort implementations with and without stability enforced, and then beat the daylights out all options to prove correctness and get an objective measure of performance. I'll then strip out the losing implementations, and try to integrate it with the evolving pp_sort.c. If it's decided to discard whichever quicksort algorithm wins out, that should be easy. If it looks like it's worth keeping, we can discuss how "characterize the input" in ways that allow pp_sort.c to select the appropriate algorithm.

Download (untitled) / with headers
text/plain 317b
Thank you for filing this report. You have helped make Perl better. With the release yesterday of Perl 5.28.0, this and 185 other issues have been resolved. Perl 5.28.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.28.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org