Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when testing array returned by sort operator #5557

Open
p5pRT opened this issue Jun 11, 2002 · 28 comments
Open

Bug when testing array returned by sort operator #5557

p5pRT opened this issue Jun 11, 2002 · 28 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 11, 2002

Migrated from rt.perl.org#9528 (status was 'stalled')

Searchable as RT9528$

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From david@whistlingcat.com

Created by dmellor@whistlingcat.com

I believe I have discovered a bug in Perl 5.6.1, which I first noticed when
upgrading Net​::DNS from version 0.14 to version 0.22. The problem does not
appear to be in the Net​::DNS module itself, but rather in the Perl core,
as I have been able to develop a standalone test case. The bug seems to be
associated with the sort operator when sorting an array which has been
returned from another operator and which has not been assigned to another
variable. The test case follows​:

#!/usr/bin/perl

use strict;

{
  my @​foo = (
  { type => 'MX', preference => 100 },
  { type => 'MX', preference => 5 },
  { type => 'MX', preference => 10 }
  );

  if (Sub1(\@​foo)) {
  print "Test 1 OK\n";
  }
  else {
  print "Test 1 FAILED\n";
  }

  my @​tmp = Sub1(\@​foo);
  if (@​tmp) {
  print "Test 2 OK\n";
  }

  if (Sub2(\@​foo)) {
  print "Test 3 OK\n";
  }

  @​tmp = Sub2(\@​foo);
  if (@​tmp) {
  print "Test 4 OK\n";
  }
}

sub Sub1
{
  my $array = shift;

  return sort { $a->{preference} <=> $b->{preference} }
  grep { $_->{type} eq 'MX' } @​$array;
}

sub Sub2
{
  my $array = shift;

  my @​items = grep { $_->{type} eq 'MX' } @​$array;
  my @​sortedItems = sort { $a->{preference} <=> $b->{preference} } @​items;
  return @​sortedItems;
}

Both of the subroutines in this script take an arrayref and extract some items
from the underlying array into another array, and then return a sorted version
of the intermediate array. The four tests should print OK, as each invocation
of either Sub1 or Sub2 should return a three-element array. However, on running
this script, the following output is obtained​:

Test 1 FAILED
Test 2 OK
Test 3 OK
Test 4 OK

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl v5.6.1:

Configured by bhcompile at Wed Feb 20 15:00:12 EST 2002.

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
  Platform:
    osname=linux, osvers=2.4.17-0.13smp, archname=i386-linux
    uname='linux daffy.perf.redhat.com 2.4.17-0.13smp #1 smp fri feb 1 10:30:48 est 2002 i686 unknown '
    config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Uusethreads -Uuseithreads -Uuselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Dlocincpth='
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include',
    optimize='-O2 -march=i386 -mcpu=i686',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-98)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.6.1:
    /usr/lib/perl5/5.6.1/i386-linux
    /usr/lib/perl5/5.6.1
    /usr/lib/perl5/site_perl/5.6.1/i386-linux
    /usr/lib/perl5/site_perl/5.6.1
    /usr/lib/perl5/site_perl/5.6.0/i386-linux
    /usr/lib/perl5/site_perl/5.6.0
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.6.1:
    HOME=/home/david
    LANG=en_US
    LANGUAGE (unset)
    LC_COLLATE=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/usr/bin:/usr/bin/X11:/usr/local/bin:/usr/games:/opt/gnome/bin:/usr/X11R6/bin:/var/qmail/bin:/usr/sbin:/home/david/bin:/home/david/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @Abigail

Both of the subroutines in this script take an arrayref and extract some items
from the underlying array into another array, and then return a sorted version
of the intermediate array. The four tests should print OK, as each invocation
of either Sub1 or Sub2 should return a three-element array.

No, it should not. Expressions in scalar context don't return lists,
they return scalars. 'sort' in scalar context doesn't actually do
anything, and returns undefined.

Your first invokation of Sub1 is in scalar context, hence the sort is
in scalar context.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @schwern

On Tue, Jun 11, 2002 at 02​:03​:46AM -0700, Abigail wrote​:

Both of the subroutines in this script take an arrayref and extract some items
from the underlying array into another array, and then return a sorted version
of the intermediate array. The four tests should print OK, as each invocation
of either Sub1 or Sub2 should return a three-element array.

No, it should not. Expressions in scalar context don't return lists,
they return scalars. 'sort' in scalar context doesn't actually do
anything, and returns undefined.

If this is good behavior, which I don't think it is, it isn't documented.

The more logical thing for sort in scalar context to do is skip the sort and
just return the number of elements it got passed in, just like you'd have
gotten if the sort wasn't there at all. This is much less surprising.

So if this returns 3​:

$ perl -wle '@​a = qw(c b a); print scalar @​a'
3

why not this​:

$ perl -wle '@​a = qw(c b a); print scalar sort @​a'
Use of uninitialized value in print at -e line 1.

More importantly, the user can't tell the difference if the sort is returned
from a subroutine, which is what bit David. You have to be careful about
how you use sort() in a subroutine

  # dangerous
  sub foo { my @​a = qw(c b a); return sort @​a; }

  # safe
  sub foo { my @​a = qw(c b a); @​a = sort @​a; return @​a; }

and there's really no reason it should be so.

The fix should be simple. pp_sort.c does this​:

  if (gimme != G_ARRAY) {
  SP = MARK;
  RETPUSHUNDEF;
  }

and it's a small matter of changing that RETURNPUSHUNDEF with something that
returns the number of elements in the given list. I'll poke at it, but it
would make me happy if someone could beat me to it. :)

This is very old functionality, going back to at least 5.004 and probably
back further. However, since it is undocumented, not useful and actively
confusing and counter-intuitive, it would be best to change it to something
which at least DWIM.

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From [Unknown Contact. See original ticket]

On tisdag, juni 11, 2002, at 09​:20 , Michael G Schwern wrote​:

From​: Michael G Schwern <schwern@​pobox.com>
Date​: tis jun 11, 2002 09​:20​:30 Europe/Stockholm
To​: Abigail <abigail@​foad.org>
Cc​: david@​whistlingcat.com, perl5-porters@​perl.org
Subject​: Change sort in scalar context behavior. (was Re​: [ID
20020611.002] Bug when testing array returned by sort operator)

On Tue, Jun 11, 2002 at 02​:03​:46AM -0700, Abigail wrote​:

Both of the subroutines in this script take an arrayref and extract
some items
from the underlying array into another array, and then return a
sorted version
of the intermediate array. The four tests should print OK, as each
invocation
of either Sub1 or Sub2 should return a three-element array.

No, it should not. Expressions in scalar context don't return lists,
they return scalars. 'sort' in scalar context doesn't actually do
anything, and returns undefined.

If this is good behavior, which I don't think it is, it isn't
documented.

The more logical thing for sort in scalar context to do is skip the
sort and
just return the number of elements it got passed in, just like you'd
have
gotten if the sort wasn't there at all. This is much less surprising.

So if this returns 3​:

$ perl -wle '@​a = qw(c b a); print scalar @​a'
3

why not this​:

$ perl -wle '@​a = qw(c b a); print scalar sort @​a'
Use of uninitialized value in print at -e line 1.

More importantly, the user can't tell the difference if the sort is
returned
from a subroutine, which is what bit David. You have to be careful
about
how you use sort() in a subroutine

\# dangerous
sub foo \{ my @&#8203;a = qw\(c b a\);  return sort @&#8203;a; \}

\# safe
sub foo \{ my @&#8203;a = qw\(c b a\);  @&#8203;a = sort @&#8203;a;  return @&#8203;a; \}

and there's really no reason it should be so.

The fix should be simple. pp_sort.c does this​:

if \(gimme \!= G\_ARRAY\) \{
SP = MARK;
RETPUSHUNDEF;
\}

and it's a small matter of changing that RETURNPUSHUNDEF with something
that
returns the number of elements in the given list. I'll poke at it, but
it
would make me happy if someone could beat me to it. :)

This is very old functionality, going back to at least 5.004 and
probably
back further. However, since it is undocumented, not useful and
actively
confusing and counter-intuitive, it would be best to change it to
something
which at least DWIM.

--
This sig file temporarily out of order.

If so, one could make the case that it should sort aswell.

my @​foo = qw(1 2 3);
sub foo { sort @​foo}
scalar foo();

?

Arthur

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @tamias

On Tue, Jun 11, 2002 at 03​:20​:30PM -0400, Michael G Schwern wrote​:

On Tue, Jun 11, 2002 at 02​:03​:46AM -0700, Abigail wrote​:

Both of the subroutines in this script take an arrayref and extract some items
from the underlying array into another array, and then return a sorted version
of the intermediate array. The four tests should print OK, as each invocation
of either Sub1 or Sub2 should return a three-element array.

No, it should not. Expressions in scalar context don't return lists,
they return scalars. 'sort' in scalar context doesn't actually do
anything, and returns undefined.

If this is good behavior, which I don't think it is, it isn't documented.

The more logical thing for sort in scalar context to do is skip the sort and
just return the number of elements it got passed in, just like you'd have
gotten if the sort wasn't there at all. This is much less surprising.

I disagree with this suggestion. First, returning the number of elements
is not necessarily what would happen if the sort weren't there at all​:

  @​x = sort('dog', 'cat', 'fish');
  $x = ('dog', 'cat', 'fish');

Second, if one wants the behavior that occurs without the sort, then I
think one should omit the sort explicitly.

It would be just as logical, I think, for sort in scalar context to sort
and return the first element, or sort and return the last element, or sort
in place, or sort and return the number of comparisons made...

In this case, given all these possibilities, it's difficult to choose which
is actually the best option. Therefore, I think the appropriate solution
is to do nothing (except document this behavior and maybe add a warning
when sort is called in scalar context).

So if this returns 3​:

$ perl -wle '@​a = qw(c b a); print scalar @​a'
3

why not this​:

$ perl -wle '@​a = qw(c b a); print scalar sort @​a'
Use of uninitialized value in print at -e line 1.

More importantly, the user can't tell the difference if the sort is returned
from a subroutine, which is what bit David. You have to be careful about
how you use sort() in a subroutine

\# dangerous
sub foo \{ my @&#8203;a = qw\(c b a\);  return sort @&#8203;a; \}

\# safe
sub foo \{ my @&#8203;a = qw\(c b a\);  @&#8203;a = sort @&#8203;a;  return @&#8203;a; \}

and there's really no reason it should be so.

I don't think there's any reason it shouldn't be so. You have to be
careful about how you use any context-sensitive expression in a subroutine,
such as split(), map(), grep(), and subroutine calls.

We do have wantarray(), after all.

The fix should be simple. pp_sort.c does this​:

if \(gimme \!= G\_ARRAY\) \{
SP = MARK;
RETPUSHUNDEF;
\}

and it's a small matter of changing that RETURNPUSHUNDEF with something that
returns the number of elements in the given list. I'll poke at it, but it
would make me happy if someone could beat me to it. :)

This is very old functionality, going back to at least 5.004 and probably
back further. However, since it is undocumented, not useful and actively
confusing and counter-intuitive, it would be best to change it to something
which at least DWIM.

I'm just not sure returning the number of elements is useful or what most
people mean.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @rgarcia

On 2002.06.11 22​:31 Ronald J Kimball wrote​:

I disagree with this suggestion. First, returning the number of elements
is not necessarily what would happen if the sort weren't there at all​:

@​x = sort('dog', 'cat', 'fish');
$x = ('dog', 'cat', 'fish');

Note also that sort in scalar context warns with 5.8.0.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @schwern

On Tue, Jun 11, 2002 at 04​:31​:32PM -0400, Ronald J Kimball wrote​:

I disagree with this suggestion. First, returning the number of elements
is not necessarily what would happen if the sort weren't there at all​:

@​x = sort('dog', 'cat', 'fish');
$x = ('dog', 'cat', 'fish');

Why would you sort a constant list? Is there any reasonable way to get this
effect?

Second, if one wants the behavior that occurs without the sort, then I
think one should omit the sort explicitly.

You have a subroutine which returns a sorted list. A user wants to know how
many elements are in that list. The obvious thing to do is​:

  sub foo { ... return sort @​list }
 
  # and elsewhere
  print scalar foo();

and that doesn't DWIM. This is a problem. Removing the sort just causes
another problem​: sure you can get the number of elements just fine, but now
foo() doesn't sort the list!

It would be just as logical, I think, for sort in scalar context to sort
and return the first element, or sort and return the last element, or sort
in place, or sort and return the number of comparisons made...

I find these options to be dubious. Having the ability to do any of the
above somehow is nice, but making the leap that sort in scalar context
should do any of them... I don't see it.

sort() returning undef solves no problems, it's just there because we can't
agree. It in fact creates a surprise. Changing sort to return the number of
elements removes that surprise. Changing it to any of the above just adds
another neat, unexpected feature.

More importantly, the user can't tell the difference if the sort is returned
from a subroutine, which is what bit David. You have to be careful about
how you use sort() in a subroutine

\# dangerous
sub foo \{ my @&#8203;a = qw\(c b a\);  return sort @&#8203;a; \}

\# safe
sub foo \{ my @&#8203;a = qw\(c b a\);  @&#8203;a = sort @&#8203;a;  return @&#8203;a; \}

and there's really no reason it should be so.

I don't think there's any reason it shouldn't be so. You have to be
careful about how you use any context-sensitive expression in a subroutine,
such as split(), map(), grep(), and subroutine calls.

What suprise am I missing here?

$ perl -wle 'sub foo { return map { $_ } ("dog", "cat", "fish") } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return grep { $_ } ("dog", "cat", "fish") } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return split / /, "dog cat fish" } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo(); print $x'
Use of uninitialized value in print at -e line 1.

o/` One of these things is not like the other o/` :)

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @schwern

On Tue, Jun 11, 2002 at 10​:56​:53PM +0200, Rafael Garcia-Suarez wrote​:

On 2002.06.11 22​:31 Ronald J Kimball wrote​:

I disagree with this suggestion. First, returning the number of elements
is not necessarily what would happen if the sort weren't there at all​:

@​x = sort('dog', 'cat', 'fish');
$x = ('dog', 'cat', 'fish');

Note also that sort in scalar context warns with 5.8.0.

It does for these simple cases​:

$ perl5.8.0 -wle 'my $x = sort("dog", "cat", "fish")'
Useless use of sort in scalar context at -e line 1.

$ perl5.8.0 -wle 'my $x = 42 if sort("dog", "cat", "fish")'
Useless use of sort in scalar context at -e line 1.

but not for the problem case​:

$ perl5.8.0 -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo(); print $x'
Use of uninitialized value in print at -e line 1.

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @tamias

On Tue, Jun 11, 2002 at 06​:46​:38PM -0400, Michael G Schwern wrote​:

On Tue, Jun 11, 2002 at 04​:31​:32PM -0400, Ronald J Kimball wrote​:

I disagree with this suggestion. First, returning the number of elements
is not necessarily what would happen if the sort weren't there at all​:

@​x = sort('dog', 'cat', 'fish');
$x = ('dog', 'cat', 'fish');

Why would you sort a constant list? Is there any reasonable way to get this
effect?

Why would you call sort in a scalar context?

How about this, then​:

@​x = sort($m, $n, $o);
$x = ($m, $n, $o);

Second, if one wants the behavior that occurs without the sort, then I
think one should omit the sort explicitly.

You have a subroutine which returns a sorted list. A user wants to know how
many elements are in that list. The obvious thing to do is​:

  sub foo \{  \.\.\.  return sort @&#8203;list \}
  
  \# and elsewhere
  print scalar foo\(\);

and that doesn't DWIM. This is a problem. Removing the sort just causes
another problem​: sure you can get the number of elements just fine, but now
foo() doesn't sort the list!

This problem is properly solved by rewriting the subroutine to use
wantarray(), not by changing the behavior of sort. e.g.

sub foo { return wantarray ? sort @​list : scalar(@​list) }
# the use scalar() is redundant but useful for clarity

It would be just as logical, I think, for sort in scalar context to sort
and return the first element, or sort and return the last element, or sort
in place, or sort and return the number of comparisons made...

I find these options to be dubious. Having the ability to do any of the
above somehow is nice, but making the leap that sort in scalar context
should do any of them... I don't see it.

Returning the first element seems like an obvious behavior, and other
people have, in fact, suggested it before. It would certainly be much more
useful than returning the number of elements in the list.

sort() returning undef solves no problems, it's just there because we can't
agree. It in fact creates a surprise. Changing sort to return the number of
elements removes that surprise. Changing it to any of the above just adds
another neat, unexpected feature.

We can't agree, and therefore changing the sort to return the number of
elements only removes the surprise for people who agree with your position.
:)

I don't think that having sort() return the number of elements solves a
problem. The problem is that calling sort in scalar context is a mistake.
The solution is to notify the user that a mistake has been made so that
they can correct it.

More importantly, the user can't tell the difference if the sort is returned
from a subroutine, which is what bit David. You have to be careful about
how you use sort() in a subroutine

\# dangerous
sub foo \{ my @&#8203;a = qw\(c b a\);  return sort @&#8203;a; \}

\# safe
sub foo \{ my @&#8203;a = qw\(c b a\);  @&#8203;a = sort @&#8203;a;  return @&#8203;a; \}

and there's really no reason it should be so.

I don't think there's any reason it shouldn't be so. You have to be
careful about how you use any context-sensitive expression in a subroutine,
such as split(), map(), grep(), and subroutine calls.

What suprise am I missing here?

$ perl -wle 'sub foo { return map { $_ } ("dog", "cat", "fish") } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return grep { $_ } ("dog", "cat", "fish") } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return split / /, "dog cat fish" } $x = foo(); print $x'
3

$ perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo(); print $x'
Use of uninitialized value in print at -e line 1.

o/` One of these things is not like the other o/` :)

The behavior changes based on the context of the expression. If you don't
pay attention to the context, you will not get the behavior you expect.

Also note that the number returned grep, map, and split in scalar context
is often not the same as the number of elements in the argument list, as
opposed to your suggestion for sort.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2002

From @schwern

I was going to do another point-by-point reply, but I think I've got the
crux of the problem here.

There's three distict issues, and I'm only worried about #3.

Issue #1​: sort in boolean context.

  if( sort @​array ) { ... }

this is pretty straight-forward. It's useless, and 5.8.0 warns you as such.
Good.

Issue #2​: sort in simple scalar context.

  my $x = sort @​array;

this could potentially do many things. Return undef, return the number of
elements, return the first element, the last element, etc... Currently, it
returns undef and warns you. That's an area ripe for discussion, but not
now.

Issue #3​: sort in "variable" context.

  sub foo { ... return sort @​array }

The current behavior of this subroutine is surprising. Unlike similar
operators (map, grep, split, etc...), this will not work when foo() is used
in scalar context.

  sub foo { ... return @​array }
  sub foo { ... return @​array[1..4] }
  sub foo { ... return map { ... } @​array }
  sub foo { ... return grep { ... } @​array }
  sub foo { ... return split / /, $var }

Those all will. And that's the surprise for the author and caller of foo().
No similar Perl function requires that you manually do the wantarray()
logic, why would you expect to have to do it with sort?

Now, is this a common problem? Here's a little straw poll of all the
modules I have installed which contain /return sort/.

CPAN
Apache​::MP3​::Sorted
Apache​::MP3
ExtUtils​::Installed
Debian​::Defoma​::Id
Net​::LDAP​::ASN
DBI
Xmms
Bio​::DB​::GFF​::Feature
Bio​::SeqFeature​::Gene​::Transcript
Bio​::Tree​::Node

$ perl -MDBI -le 'printf "You have %d DBI drivers\n", scalar DBI->available_drivers'
You have 0 DBI drivers

That's funny. I could have sworn I'd installed some DBD modules.

$ perl -MExtUtils​::Installed -le '$i = ExtUtils​::Installed->new; printf "I have %d modules installed\n", scalar $i->modules'
I have 0 modules installed

AHHH! Where'd all my modules go!!!! ;)

If Alan Burlison, Lincoln Stein, Andreas Koenig, Graham Barr and Tim Bunce
all get tripped by this, I think it's a problem.

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @gbarr

On Wed, Jun 12, 2002 at 12​:13​:03AM -0400, Michael G Schwern wrote​:

I was going to do another point-by-point reply, but I think I've got the
crux of the problem here.

If you ask a 3 year old "whats 1 + 1" and they reply 3, you say no its "2"
they are surprised. Does that mean that 2 is the wrong answer ? No, it
just means they need to learn.

There's three distict issues, and I'm only worried about #3.

Issue #1​: sort in boolean context.

  if\( sort @&#8203;array \) \{ \.\.\. \}

Why would you do that ?

this is pretty straight-forward. It's useless, and 5.8.0 warns you as such.
Good.

Issue #2​: sort in simple scalar context.

  my $x = sort @&#8203;array;

this could potentially do many things. Return undef, return the number of
elements, return the first element, the last element, etc... Currently, it
returns undef and warns you. That's an area ripe for discussion, but not
now.

Issue #3​: sort in "variable" context.

  sub foo \{ \.\.\.  return sort @&#8203;array \}

The current behavior of this subroutine is surprising.

I fail to see why this should be surprising if the return value of foo()
is documented properly.

Unlike similar
operators (map, grep, split, etc...), this will not work when foo() is used
in scalar context.

Define "work", what you really mean is that it will not do what YOU want
it to do.

   sub foo \{ \.\.\.  return @&#8203;array \}
   sub foo \{ \.\.\.  return @&#8203;array\[1\.\.4\] \}
   sub foo \{ \.\.\.  return map   \{ \.\.\. \} @&#8203;array \}
   sub foo \{ \.\.\.  return grep  \{ \.\.\. \} @&#8203;array \}
   sub foo \{ \.\.\.  return split / /\, $var \}

Those all will. And that's the surprise for the author and caller of foo().

No, thats YOUR surprise. If you read the docs properly you will know what it would
return.

No similar Perl function requires that you manually do the wantarray()
logic, why would you expect to have to do it with sort?

Now, is this a common problem? Here's a little straw poll of all the
modules I have installed which contain /return sort/.

CPAN
Apache​::MP3​::Sorted
Apache​::MP3
ExtUtils​::Installed
Debian​::Defoma​::Id
Net​::LDAP​::ASN

Your pattern needs fixing as Net​::LDAP​::ASN does not call sort at all.

DBI
Xmms
Bio​::DB​::GFF​::Feature
Bio​::SeqFeature​::Gene​::Transcript
Bio​::Tree​::Node

$ perl -MDBI -le 'printf "You have %d DBI drivers\n", scalar DBI->available_drivers'
You have 0 DBI drivers

One could just as easily say that that was a bug in available_drivers, but then if it is
not documented to work in a scalar context, why should it.

That's funny. I could have sworn I'd installed some DBD modules.

$ perl -MExtUtils​::Installed -le '$i = ExtUtils​::Installed->new; printf "I have %d modules installed\n", scalar $i->modules'
I have 0 modules installed

AHHH! Where'd all my modules go!!!! ;)

If Alan Burlison, Lincoln Stein, Andreas Koenig, Graham Barr and Tim Bunce
all get tripped by this, I think it's a problem.

Not me ? So please don't use my name as justifaction for a bad premise

Graham.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @Tux

On Wed 12 Jun 2002 11​:09, Graham Barr <gbarr@​pobox.com> wrote​:

On Wed, Jun 12, 2002 at 12​:13​:03AM -0400, Michael G Schwern wrote​:

I was going to do another point-by-point reply, but I think I've got the
crux of the problem here.

If you ask a 3 year old "whats 1 + 1" and they reply 3, you say no its "2"
they are surprised. Does that mean that 2 is the wrong answer ? No, it
just means they need to learn.

There's three distict issues, and I'm only worried about #3.

Issue #1​: sort in boolean context.

  if\( sort @&#8203;array \) \{ \.\.\. \}

Why would you do that ?

Undeliberately, as shown in Schwerns previous example

unless (DBI->available_drivers) die "No DBI drivers available";

this is pretty
straight-forward. It's useless, and 5.8.0 warns you as such.
Good.

--
H.Merijn Brand Amsterdam Perl Mongers (http​://amsterdam.pm.org/)
using perl-5.6.1, 5.8.0 & 632 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
  WinNT 4, Win2K pro & WinCE 2.11. Smoking perl CORE​: smokers@​perl.org
http​://archives.develooper.com/daily-build@​perl.org/ perl-qa@​perl.org
send smoke reports to​: smokers-reports@​perl.org, QA​: http​://qa.perl.org

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From [Unknown Contact. See original ticket]

Michael G Schwern <schwern@​pobox.com> writes​:

On Tue, Jun 11, 2002 at 02​:03​:46AM -0700, Abigail wrote​:

Both of the subroutines in this script take an arrayref and extract some items
from the underlying array into another array, and then return a sorted version
of the intermediate array. The four tests should print OK, as each invocation
of either Sub1 or Sub2 should return a three-element array.

No, it should not. Expressions in scalar context don't return lists,
they return scalars. 'sort' in scalar context doesn't actually do
anything, and returns undefined.

If this is good behavior, which I don't think it is, it isn't documented.

The more logical thing for sort in scalar context to do is skip the sort and
just return the number of elements it got passed in, just like you'd have
gotten if the sort wasn't there at all. This is much less surprising.

It is supposed to be surprising. Most people expect

$foo = sort ...;

to return the largest/smallest entry so returning undef surprises
everyone ... a warning "sort in scalar context" would be fine too.

This is very old functionality, going back to at least 5.004 and probably
back further. However, since it is undocumented, not useful and actively
confusing and counter-intuitive, it would be best to change it to something
which at least DWIM.

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

--
Nick Ing-Simmons
http​://www.ni-s.u-net.com/

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @schwern

On Wed, Jun 12, 2002 at 10​:09​:41AM +0100, Graham Barr wrote​:

Issue #1​: sort in boolean context.

  if\( sort @&#8203;array \) \{ \.\.\. \}

Why would you do that ?

You wouldn't, that's why we warn about it. Like I said, not what I'm
worried about, just trying to clarify the issue by eliminating this case
from discussion. Perl's current behavior is ok.

this is pretty straight-forward. It's useless, and 5.8.0 warns you as such.
Good.

<snip>

Issue #3​: sort in "variable" context.

  sub foo \{ \.\.\.  return sort @&#8203;array \}

The current behavior of this subroutine is surprising.

I fail to see why this should be surprising if the return value of foo()
is documented properly.

Because the author of foo() likely doesn't even know this behavior exists.
There's no warning, it's not documented and there's nothing else in Perl
that behaves like sort().

I'm quite certain both DBI->available_drivers and
ExtUtils​::Installed->modules weren't intended to return undef in scalar
context. I've already hacked around it inside ExtUtils​::Installed->modules.

Net​::LDAP​::ASN

Your pattern needs fixing as Net​::LDAP​::ASN does not call sort at all.

You're right, I should have looked deeper into it.

$ perl -MDBI -le 'printf "You have %d DBI drivers\n", scalar DBI->available_drivers'
You have 0 DBI drivers

One could just as easily say that that was a bug in available_drivers, but
then if it is not documented to work in a scalar context, why should it.

Explicitly documenting in every module that each function which returns an
array also returns the number of elements in that array in scalar context is
redundant information, like documenting "This function returns a number
which can also be used as a string". That's expected, it's how Perl works
and how we expect well-behaved functions to work. You document the
exceptions.

Tim didn't sit down one day and think, "Hmm, I really should prevent anyone
from using DBI->available_drivers in scalar context."

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @schwern

On Wed, Jun 12, 2002 at 05​:03​:11PM +0100, Nick Ing-Simmons wrote​:

The more logical thing for sort in scalar context to do is skip the sort and
just return the number of elements it got passed in, just like you'd have
gotten if the sort wasn't there at all. This is much less surprising.

It is supposed to be surprising. Most people expect

$foo = sort ...;

to return the largest/smallest entry so returning undef surprises
everyone ... a warning "sort in scalar context" would be fine too.

We already have such a warning, but it only works for simple cases.

$ perl5.8.0 -wle 'my $foo = sort qw(1 2 3); print $foo'
Useless use of sort in scalar context at -e line 1.
Use of uninitialized value in print at -e line 1.

$ perl5.8.0 -wle 'sub foo { return sort @​_ } my $foo = foo(qw(1 2 3)); print $foo'
Use of uninitialized value in print at -e line 1.

It's the second case which is the surprise. The problem being that when you
write "$foo = sort @​array" it's obviously suspicious. But when you write
"return sort @​array" it's so natural most people wouldn't think twice.

You certainly wouldn't expect to have to write it the "right" way

  sub foo { return wantarray ? sort @​_ : @​_ }

would you?

So that's why I think having sort return the number of elements in scalar
context, even if it is the least interesting option. It makes returning a
sorted list more natural.

You can even leave the compile-time warning on for the simple case.

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @TimToady

On Wed, 12 Jun 2002, Nick Ing-Simmons wrote​:

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

I think returning an extreme ought to be explicit. Perl 6 breaks
the C comma operator, and never returns the last element of a list
implicitly. You have to use [-1] for that.

Larry

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From [Unknown Contact. See original ticket]

Larry said​:

On Wed, 12 Jun 2002, Nick Ing-Simmons wrote​:

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

I think returning an extreme ought to be explicit. Perl 6 breaks
the C comma operator, and never returns the last element of a list
implicitly. You have to use [-1] for that.

Two votes (at least) for explicitness. Quicksort and merge sort
(and Abigail's favorite, heapsort) can save a lot of work if they
know that only the N largest or smallest elements are of interest.
So, tell them. -- jpl

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From [Unknown Contact. See original ticket]

Michael G Schwern <schwern@​pobox.com> writes​:

Because the author of foo() likely doesn't even know this behavior exists.
There's no warning, it's not documented and there's nothing else in Perl
that behaves like sort().

Presumably we could warn if sort was child of a 'return' or last statement
of a sub ?

--
Nick Ing-Simmons
http​://www.ni-s.u-net.com/

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From [Unknown Contact. See original ticket]

John P. Linderman <jpl@​research.att.com> writes​:

Larry said​:

On Wed, 12 Jun 2002, Nick Ing-Simmons wrote​:

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

I think returning an extreme ought to be explicit. Perl 6 breaks
the C comma operator, and never returns the last element of a list
implicitly. You have to use [-1] for that.

Two votes (at least) for explicitness.

But Larry's vote carries near-infinite weight...

Quicksort and merge sort
(and Abigail's favorite, heapsort) can save a lot of work if they
know that only the N largest or smallest elements are of interest.
So, tell them. -- jpl

But spotting

  my ($n) = sort ...
or
  my ($p,q) = (sort ..)[-2,-1];

Is much harder to do than the scalar/list context.

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

--
Nick Ing-Simmons
http​://www.ni-s.u-net.com/

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From @schwern

On Wed, Jun 12, 2002 at 07​:57​:24PM +0100, Nick Ing-Simmons wrote​:

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

use List​::Util qw(max min maxstr minstr);

Though it is unfortunate that max/min aren't generic. ie.

  my $max = max { your sort criteria } @​array;

but I think my nit-picking account is overdrawn. :)

--
This sig file temporarily out of order.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2002

From [Unknown Contact. See original ticket]

Nick Ing-Simmons wrote​:

John P. Linderman <jpl@​research.att.com> writes​:

Larry said​:

On Wed, 12 Jun 2002, Nick Ing-Simmons wrote​:

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

I think returning an extreme ought to be explicit. Perl 6 breaks
the C comma operator, and never returns the last element of a list
implicitly. You have to use [-1] for that.

Two votes (at least) for explicitness.

But Larry's vote carries near-infinite weight...

Quicksort and merge sort
(and Abigail's favorite, heapsort) can save a lot of work if they
know that only the N largest or smallest elements are of interest.
So, tell them. -- jpl

But spotting

my ($n) = sort ...
or
my ($p,q) = (sort ..)[-2,-1];

Is much harder to do than the scalar/list context.

Isn't Want​::want() able to do this? Or at least some of it?

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

Surely you mean, invent max_n and min_n, which would return the n
largest or n smallest elements.

my ($p, $q) = max_n { $a <=> $b } 2, @​list;

--
tr/`4/ /d, print "@​{[map --$| ? ucfirst lc : lc, split]},\n" for
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2002

From [Unknown Contact. See original ticket]

Benjamin Goldberg chimed in​:

Nick Ing-Simmons wrote​:

John P. Linderman <jpl@​research.att.com> writes​:

Larry said​:

On Wed, 12 Jun 2002, Nick Ing-Simmons wrote​:

There was considerable heat on "What It Meant" last time around.
Just returning the length seems like least interesting returning
one of the extreams would give it value but needs real code behind
it...

I think returning an extreme ought to be explicit. Perl 6 breaks
the C comma operator, and never returns the last element of a list
implicitly. You have to use [-1] for that.

Two votes (at least) for explicitness.

But Larry's vote carries near-infinite weight...

Quicksort and merge sort
(and Abigail's favorite, heapsort) can save a lot of work if they
know that only the N largest or smallest elements are of interest.
So, tell them. -- jpl

But spotting

my ($n) = sort ...
or
my ($p,q) = (sort ..)[-2,-1];

Is much harder to do than the scalar/list context.

Isn't Want​::want() able to do this? Or at least some of it?

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

Surely you mean, invent max_n and min_n, which would return the n
largest or n smallest elements.

my ($p, $q) = max_n { $a <=> $b } 2, @​list;

That's what I had in mind. But we want to be careful about the interface.
One the one hand, there are worthwhile savings to be had if we don't
guarantee the order in which the elements are returned. For example,
in a qsort-like implementation, having completed the initial partitioning
with l low elements, e equal elements, and g greater elements, if we are
after the m minimal elements, if l <= m, there's no need to continue
sorting the low elements. One the other hand, if the user *does*
want the elements sorted, it would be wasteful not to continue
with the sorting on the spot, and a Very Bad Thing to force the user
to restate the comparison algorithm to impose the order.

(A qsort-like implementation runs the risk of the same quadratic
behavior seen in sorting, and two minutes' thought hasn't turned
up a way to get any advantage from *not* ordering the elements
using mergesort, so maybe the order ought to be guaranteed.
But, like sort(), stability may or may not matter to the end user,
so I favor letting the users explicitly state what matters,
rather than ham-stringing implementations. Pragma?) -- jpl

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2002

From [Unknown Contact. See original ticket]

Benjamin Goldberg <goldbb2@​earthlink.net> writes​:

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

Surely you mean, invent max_n and min_n, which would return the n
largest or n smallest elements.

my ($p, $q) = max_n { $a <=> $b } 2, @​list;

I don't want that - it is too messy.

Let perl6 solve the general problem.

--
Nick Ing-Simmons
http​://www.ni-s.u-net.com/

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2002

From @gbarr

On Wed, Jun 12, 2002 at 03​:25​:01PM -0400, Michael G Schwern wrote​:

On Wed, Jun 12, 2002 at 07​:57​:24PM +0100, Nick Ing-Simmons wrote​:

But consider me convinced by the explicit argument - we
presumably should invent max() and min() if that is what we mean.

use List​::Util qw(max min maxstr minstr);

Though it is unfortunate that max/min aren't generic. ie.

my $max = max \{ your sort criteria \} @&#8203;array;

but I think my nit-picking account is overdrawn. :)

I agree its unfortunate, but the reason is because prototypes
don't allow me to make the code block optional and I did not
want the code block for the case of min/max numeric

Graham.

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2004

From @davidnicol

here's a
patch that adds a runtime sort-in-scalar-context warning, which, had it
existed in June of 2002, would have reported the situation.

Before​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
Use of uninitialized value $x in print at -e line 1.

After​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
sort called in dynamic non-array context, try ()=sort ... at -e line 1.
Use of uninitialized value $x in print at -e line 1.

Inline Patch
--- perl-current/pp_sort.c      Sat Jul 31 11:44:58 2004
+++ dynamic-sort-in-scalar-warning/pp_sort.c    Mon Aug  2 23:01:55 2004
@@ -1508,8 +1508,13 @@
     void (*sortsvp)(pTHX_ SV **array, size_t nmemb, SVCOMPARE_t cmp)
       = Perl_sortsv;
 
     if (gimme != G_ARRAY) {
+        Perl_warner(
+          aTHX_ packWARN3(WARN_VOID,WARN_AMBIGUOUS,WARN_SYNTAX),
+           "sort called in dynamic non-array context,"
+          " try ()=sort ..."
+       );
        SP = MARK;
        RETPUSHUNDEF;
     }
 

@p5pRT
Copy link
Author

p5pRT commented May 26, 2012

From @Hugmeir

On Wed Aug 04 21​:03​:53 2004, davidnicol wrote​:

here's a
patch that adds a runtime sort-in-scalar-context warning, which, had it
existed in June of 2002, would have reported the situation.

Before​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
Use of uninitialized value $x in print at -e line 1.

After​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
sort called in dynamic non-array context, try ()=sort ... at -e line 1.
Use of uninitialized value $x in print at -e line 1.

--- perl-current/pp_sort.c Sat Jul 31 11​:44​:58 2004
+++ dynamic-sort-in-scalar-warning/pp_sort.c Mon Aug 2 23​:01​:55 2004
@​@​ -1508,8 +1508,13 @​@​
void (*sortsvp)(pTHX_ SV **array, size_t nmemb, SVCOMPARE_t cmp)
= Perl_sortsv;

 if \(gimme \!= G\_ARRAY\) \{

+ Perl_warner(
+ aTHX_ packWARN3(WARN_VOID,WARN_AMBIGUOUS,WARN_SYNTAX),
+ "sort called in dynamic non-array context,"
+ " try ()=sort ..."
+ );
SP = MARK;
RETPUSHUNDEF;
}

Some bikeshedding here​: If the warning the applied, I think that it
should be consistent and merely say "Useless use of sort in scalar
context". "dynamic non-array" context is a mouthful, and nothing in the
docs ever refers to something like that; It would only cause more confusion.

--hugmeir

@p5pRT
Copy link
Author

p5pRT commented May 26, 2012

From @cpansprout

On Fri May 25 17​:37​:27 2012, Hugmeir wrote​:

On Wed Aug 04 21​:03​:53 2004, davidnicol wrote​:

here's a
patch that adds a runtime sort-in-scalar-context warning, which, had it
existed in June of 2002, would have reported the situation.

Before​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
Use of uninitialized value $x in print at -e line 1.

After​:
./perl -wle 'sub foo { return sort("dog", "cat", "fish") } $x = foo();\
print $x'
sort called in dynamic non-array context, try ()=sort ... at -e line 1.
Use of uninitialized value $x in print at -e line 1.

--- perl-current/pp_sort.c Sat Jul 31 11​:44​:58 2004
+++ dynamic-sort-in-scalar-warning/pp_sort.c Mon Aug 2 23​:01​:55 2004
@​@​ -1508,8 +1508,13 @​@​
void (*sortsvp)(pTHX_ SV **array, size_t nmemb, SVCOMPARE_t cmp)
= Perl_sortsv;

 if \(gimme \!= G\_ARRAY\) \{

+ Perl_warner(
+ aTHX_ packWARN3(WARN_VOID,WARN_AMBIGUOUS,WARN_SYNTAX),
+ "sort called in dynamic non-array context,"
+ " try ()=sort ..."
+ );
SP = MARK;
RETPUSHUNDEF;
}

Some bikeshedding here​: If the warning the applied, I think that it
should be consistent and merely say "Useless use of sort in scalar
context". "dynamic non-array" context is a mouthful, and nothing in the
docs ever refers to something like that; It would only cause more
confusion.

I’m not sure this warning is such a good idea.

If a module provides a sub to be called only in list context, that ends
with sort(), then the warning will point the finger at the module, not
at the user thereof. If the warning occurs from the point of view of
the caller, that might be too confusing.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2016

@dcollinsn - Status changed from 'open' to 'stalled'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants