Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort in scalar context is undefined #12803

Open
p5pRT opened this issue Feb 20, 2013 · 31 comments
Open

sort in scalar context is undefined #12803

p5pRT opened this issue Feb 20, 2013 · 31 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 20, 2013

Migrated from rt.perl.org#116877 (status was 'open')

Searchable as RT116877$

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

sort is documented as having undefined behavior in scalar context.

This is really annoying as code like​:

  return keys %hash;

cannot be safely changed to

  return sort keys %hash;

as if it is used in scalar context it wont return the number of items
in the list. One must instead do

  my @​temporary= sort keys %hash;
  return @​temporary;

IMO this is silly. Why shouldnt sort() become a no-op in this case and
return the number of items in the list? Undefined behavior in this
case seems like the worst of all the possible choices. Is there a
reason for this?

This bug affects all modern perls as far as I know, including perl
5.17.9. I am including a -V to appease the perlbug gods, but it is not
version specific.

Cheers,
Yves

Summary of my perl5 (revision 5 version 17 subversion 9) configuration​:
  Derived from​: e2183dd1583911a709201025e7d0020fde8b1862
  Platform​:
  osname=linux, osvers=3.0.0-12-generic, archname=x86_64-linux
  uname='linux yam 3.0.0-12-generic #20-ubuntu smp fri oct 7
14​:56​:25 utc 2011 x86_64 x86_64 x86_64 gnulinux '
  config_args='-Doptimize=-g -d -Dusedevel -Dcc=ccache gcc -Dld=gcc
-DDEBUGGING'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='ccache gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
  optimize='-g',
  cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
  ccversion='', gccversion='4.6.1', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.13'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -g -L/usr/local/lib
-fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING HAS_TIMES PERLIO_LAYERS
  PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
  PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV
  PERL_USE_DEVEL USE_64_BIT_ALL USE_64_BIT_INT
  USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE
  USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO
  USE_PERL_ATOF
  Locally applied patches​:
  uncommitted-changes
  Built under linux
  Compiled at Feb 20 2013 06​:10​:44
  @​INC​:
  lib
  /usr/local/lib/perl5/site_perl/5.17.9/x86_64-linux
  /usr/local/lib/perl5/site_perl/5.17.9
  /usr/local/lib/perl5/5.17.9/x86_64-linux
  /usr/local/lib/perl5/5.17.9
  .

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @tsee

On 02/20/2013 07​:06 AM, yves orton (via RT) wrote​:

# Please include the string​: [perl #116877]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=116877 >

sort is documented as having undefined behavior in scalar context.

This is really annoying as code like​:

return keys %hash;

cannot be safely changed to

return sort keys %hash;

as if it is used in scalar context it wont return the number of items
in the list. One must instead do

my @&#8203;temporary= sort keys %hash;
return @&#8203;temporary;

IMO this is silly. Why shouldnt sort() become a no-op in this case and
return the number of items in the list? Undefined behavior in this
case seems like the worst of all the possible choices. Is there a
reason for this?

This bug affects all modern perls as far as I know, including perl
5.17.9. I am including a -V to appease the perlbug gods, but it is not
version specific.

But this is just a documentation bug, right?

Using some random blead from after 5.17.7 and before 5.17.8​:

perl -e 'use warnings; use strict; sub foo {my %h = 1..10; return sort
keys %h} foo()'

This does not warn. But this does​:

perl -e 'use warnings; use strict; sort qw(1..10); my $x = sort qw(1..10)'
Useless use of sort in scalar context at -e line 1.
Useless use of sort in void context at -e line 1.

Is the bug report here specifically about documenting (and implementing)
that sort becomes a noop in void context? Since the example you gave
doesn't warn, can you come up with any other valid cases where that
behaviour makes sense?

If so, I guess you could modify the docs and then do an op_null() in the
respective OP_SORT case in Perl_scalarvoid in op.c? Dito for
Perl_scalar, same file.

--Steffen

PS​: Yes, sort{} blocks could have side effects. That's already undefined
behaviour AFAIK.

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

On 20 February 2013 07​:49, Steffen Mueller <smueller@​cpan.org> wrote​:

On 02/20/2013 07​:06 AM, yves orton (via RT) wrote​:

# Please include the string​: [perl #116877]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=116877 >

sort is documented as having undefined behavior in scalar context.

This is really annoying as code like​:

return keys %hash;

cannot be safely changed to

return sort keys %hash;

as if it is used in scalar context it wont return the number of items
in the list. One must instead do

my @&#8203;temporary= sort keys %hash;
return @&#8203;temporary;

IMO this is silly. Why shouldnt sort() become a no-op in this case and
return the number of items in the list? Undefined behavior in this
case seems like the worst of all the possible choices. Is there a
reason for this?

This bug affects all modern perls as far as I know, including perl
5.17.9. I am including a -V to appease the perlbug gods, but it is not
version specific.

But this is just a documentation bug, right?

No. I think that sort having undefined behavior in *scalar* context is a bug.

It should return the number of items in the list just like keys, or
values, or map, or grep, or well, pretty much every list operator does
in scalar context.

Why should sort be different?

perl -e 'use warnings; use strict; sub foo {my %h = 1..10; return sort keys
%h} foo()'

My point is the following is stupid​:

$ ./perl -Ilib -le'use warnings; use strict; sub foo {my %h = 1..10;
return sort keys %h} print foo(); print scalar foo();'
13579
Use of uninitialized value in print at -e line 1.

Why should it return the most useless possible result of undef? Why
cant it return 5 like any other list op?

This does not warn. But this does​:

perl -e 'use warnings; use strict; sort qw(1..10); my $x = sort qw(1..10)'
Useless use of sort in scalar context at -e line 1.
Useless use of sort in void context at -e line 1.

Is the bug report here specifically about documenting (and implementing)
that sort becomes a noop in void context? Since the example you gave doesn't
warn, can you come up with any other valid cases where that behaviour makes
sense?

Please drink some coffee and reread my OP :-).

I said *scalar* context. The behavior of sort in void context is an
entirely separate discussion, which I have minimal opinion on.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @Smylers

yves orton writes​:

This is really annoying as code like​:

return keys %hash;

cannot be safely changed to

return sort keys %hash;

as if it is used in scalar context it wont return the number of items
in the list.

Why shouldnt sort() become a no-op in this case and return the number
of items in the list?

That's a reasonable argument for sort in scalar context being a no-op.

However, returning the number of items in the list only serves as a
no-op in this case because that's what keys does in scalar context; it
wouldn't be a no-op for arguments that themselves do something else in
scalar context. As a silly example, it wouldn't be a no-op in scalar
context turning​:

  return localtime;

into​:

  return sort localtime;

So maybe a better no-op behaviour would be for sort in scalar context to
propagate that scalar context to its input?

In your example that would put keys in scalar context, which would do
the counting, and sort would simply pass through that count.

Cheers

Smylers
--
http​://twitter.com/Smylers2

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

On 20 February 2013 12​:16, Smylers <Smylers@​stripey.com> wrote​:

yves orton writes​:

This is really annoying as code like​:

return keys %hash;

cannot be safely changed to

return sort keys %hash;

as if it is used in scalar context it wont return the number of items
in the list.

Why shouldnt sort() become a no-op in this case and return the number
of items in the list?

That's a reasonable argument for sort in scalar context being a no-op.

However, returning the number of items in the list only serves as a
no-op in this case because that's what keys does in scalar context; it
wouldn't be a no-op for arguments that themselves do something else in
scalar context. As a silly example, it wouldn't be a no-op in scalar
context turning​:

return localtime;

into​:

return sort localtime;

So maybe a better no-op behaviour would be for sort in scalar context to
propagate that scalar context to its input?

In your example that would put keys in scalar context, which would do
the counting, and sort would simply pass through that count.

I like that idea. If its doable then I think its worth doing.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @tamias

On Wed, Feb 20, 2013 at 11​:16​:19AM +0000, Smylers wrote​:

So maybe a better no-op behaviour would be for sort in scalar context to
propagate that scalar context to its input?

In your example that would put keys in scalar context, which would do
the counting, and sort would simply pass through that count.

Note that under that implementation, C<< scalar sort(4, 3, 2, 1) >> would
return 1.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From Eirik-Berg.Hanssen@allverden.no

On Wed, Feb 20, 2013 at 10​:17 PM, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Wed, Feb 20, 2013 at 11​:16​:19AM +0000, Smylers wrote​:

So maybe a better no-op behaviour would be for sort in scalar context to
propagate that scalar context to its input?

In your example that would put keys in scalar context, which would do
the counting, and sort would simply pass through that count.

Note that under that implementation, C<< scalar sort(4, 3, 2, 1) >> would
return 1.

  ... which is exactly what it would return without the sort, and no more
useless than the undef it is currently returning, so I'm not too worried
about that.

  More generally​:

  For the case when a C<< return EXPR >> is amended to a C<< return sort
EXPR >>, this proposed change would prevent unintended change of the return
value in scalar context.

  That sounds like a good thing.

  The current undef seems remarkably useless – even as a usage error
indicator, it seems flimsy and half-hearted ("you're not supposed to call
me in scalar context, but instead of complaining, I'll just return an
undef") – so I don't see that we lose much either.

  But I guess it's better to keep the current warning (if not necessarily
semantics) when the sort can be determined to be in scalar context at
compile time. SWYM, right? :)

Eirik

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @xdg

I would prefer it to act exactly like map would, since it's a list
transformation. Possibly, in scalar context, it should probably just
get optimized to an empty map op.

Having it force scalar context on the input is a bad idea​:

  return sort $obj->method();

What if method uses wantarray to determine if a list or single value
is wanted? You still want it to provide the list, sort should pass it
through like an empty map, and the return value will be the count of
items returned from the method.

While I'm fine with it not actually bothering to sort in scalar
context, that should be clearly documented in case someone crazily
wants to put side effects into their sort block and then is surprised
if they don't run.

David

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @demerphq

On 21 February 2013 00​:00, David Golden <xdg@​xdg.me> wrote​:

While I'm fine with it not actually bothering to sort in scalar
context, that should be clearly documented in case someone crazily
wants to put side effects into their sort block and then is surprised
if they don't run.

I think we could warn if sort is called in scalar context in block form.

Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @xdg

On Thu, Feb 21, 2013 at 2​:47 AM, demerphq <demerphq@​gmail.com> wrote​:

On 21 February 2013 00​:00, David Golden <xdg@​xdg.me> wrote​:

While I'm fine with it not actually bothering to sort in scalar
context, that should be clearly documented in case someone crazily
wants to put side effects into their sort block and then is surprised
if they don't run.

I think we could warn if sort is called in scalar context in block form.

Why? If I'm offering a list of values in response to a function call,
why do I care what my caller uses it for? And why should their change
in context throw warnings in my code?

E.g. I have a module that does "return sort { ... } stuff()" -- having
that line warn at runtime when someone calls my sub in scalar context
seems perverse, since 99% of the time, the block will have no side
effects and the warning would be a false positive.

If we're really concerned about it, we shouldn't optimize away the
sort and let people do that themselves with wantarray.

But I don't think we should be that concerned about it. A sort block
with side effects seems bizarre, even if theoretically possible.

David

--
David Golden <xdg@​xdg.me>
Take back your inbox! → http​://www.bunchmail.com/
Twitter/IRC​: @​xdg

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @b2gills

On Thu, Feb 21, 2013 at 10​:47 AM, David Golden <xdg@​xdg.me> wrote​:

On Thu, Feb 21, 2013 at 2​:47 AM, demerphq <demerphq@​gmail.com> wrote​:

On 21 February 2013 00​:00, David Golden <xdg@​xdg.me> wrote​:

While I'm fine with it not actually bothering to sort in scalar
context, that should be clearly documented in case someone crazily
wants to put side effects into their sort block and then is surprised
if they don't run.

I think we could warn if sort is called in scalar context in block form.

Why? If I'm offering a list of values in response to a function call,
why do I care what my caller uses it for? And why should their change
in context throw warnings in my code?

E.g. I have a module that does "return sort { ... } stuff()" -- having
that line warn at runtime when someone calls my sub in scalar context
seems perverse, since 99% of the time, the block will have no side
effects and the warning would be a false positive.

If we're really concerned about it, we shouldn't optimize away the
sort and let people do that themselves with wantarray.

But I don't think we should be that concerned about it. A sort block
with side effects seems bizarre, even if theoretically possible.

David

I totally agree with David.

I don't see why these aren't (largely) equivalent

  $x = sort @​y;
  $x =()= sort @​y;

I think the first one should be optimized to just​:

  $x = @​y;

Then by extension these should work the same​:

  sub x{
  return sort @​_;
  }
  sub x{
  return scalar @​_ unless wantarray;
  return sort @​_;
  }

If someone needs the side effects of calling sort they should
have to ask for it. Since that is not what you normally want.

  sub x{
  return my @​sideeffect = sort @​_;
  }

  # Or

  sub x{ return sort @​_ }

  # my $x = x();
  my $x =()= x();

  # say scalar( x() );
  say my $workaround =()= x();

(Oddly enough these are also work-arounds for the current situation)

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @bulk88

Brad Gilbert wrote​:

I don't see why these aren't (largely) equivalent

$x = sort @&#8203;y;
$x =\(\)= sort @&#8203;y;

I think the first one should be optimized to just​:

$x = @&#8203;y;

Then by extension these should work the same​:

sub x\{
  return sort @&#8203;\_;
\}
sub x\{
  return scalar @&#8203;\_ unless wantarray;
  return sort @&#8203;\_;
\}

If someone needs the side effects of calling sort they should
have to ask for it. Since that is not what you normally want.

sub x\{
  return my @&#8203;sideeffect = sort @&#8203;\_;
\}

\# Or

sub x\{ return sort @&#8203;\_ \}

\# my $x = x\(\);
my $x =\(\)= x\(\);

\# say scalar\( x\(\) \);
say my $workaround =\(\)= x\(\);

(Oddly enough these are also work-arounds for the current situation)

I think this optimization is useful. Especially for less than perfectly
written code. It would speed up "old" Perl code. I wonder if someone
hooked pp_sort to crash to C debugger and ran harness, would it catch
any sort in scalar context usage.

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @b2gills

On Thu, Feb 21, 2013 at 12​:51 PM, bulk88 <bulk88@​hotmail.com> wrote​:

Brad Gilbert wrote​:

I don't see why these aren't (largely) equivalent

$x = sort @&#8203;y;
$x =\(\)= sort @&#8203;y;

I think the first one should be optimized to just​:

$x = @&#8203;y;

Then by extension these should work the same​:

sub x\{
  return sort @&#8203;\_;
\}
sub x\{
  return scalar @&#8203;\_ unless wantarray;
  return sort @&#8203;\_;
\}

If someone needs the side effects of calling sort they should
have to ask for it. Since that is not what you normally want.

sub x\{
  return my @&#8203;sideeffect = sort @&#8203;\_;
\}

\# Or

sub x\{ return sort @&#8203;\_ \}

\# my $x = x\(\);
my $x =\(\)= x\(\);

\# say scalar\( x\(\) \);
say my $workaround =\(\)= x\(\);

(Oddly enough these are also work-arounds for the current situation)

I think this optimization is useful. Especially for less than perfectly
written code. It would speed up "old" Perl code. I wonder if someone hooked
pp_sort to crash to C debugger and ran harness, would it catch any sort in
scalar context usage.

Actually the way it currently works is more like

  sub sort(...){
  return undef unless wantarray;
  ...
  }

Which should be slightly faster than what I am vying for​:

  sub sort(...){
  return scalar @​_ unless wantarray;
  ...
  }

So it is not actually speeding up old code.
It is making old code (arguably) more correct.

Any code that is successfully working around the current behaviour
will continue to work just as fast, or slow as it currently does.

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @demerphq

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

  1) return the number of items
  2) become "invisible" and return whatever the input to sort would
return if it were called in scalar context.
  3) return the minimum

For me​:

Either of the 1) or 2) make sense to me. I lean a little towards 2) as
being the least likely to /add/ confusion to already an confusing set
of rules of understanding what "return LISTYTHING" does. Regardless
IMO it is important that​:

  return @​array;
  return sort @​array;

and

  return keys %hash;
  return keys sort %hash;

and

  return grep { /pat/ } (LIST);
  return sort grep { /pat/ } (LIST);

and

  return map { "x" . $_ } (LIST);
  return sort map { "x" . $_ } (LIST);

all return the same thing in list and scalar context. I am less concerned that

  return $x, $y, $z;
  return sort $x, $y, $z

behave the same in list and scalar context, so I am fine with going
with 1, but I am inclined to say that 2 is marginally preferable as it
doesn't add any new special cases.

The third option is the most surprising, and IMO the least useful as
we can already do it with list slice and be clearer as to what it
means, and IMO much worse it requires we execute the sort, at least
when there is a comparison block and in practical terms probably
always. I personally think that any effort in this direction would be
better spent in making Perl optimize slices on sort, so that the
optimisation covered (sort @​list)[0..2] as well as (sort @​list)[0].

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @b2gills

On Thu, Feb 21, 2013 at 2​:20 PM, demerphq <demerphq@​gmail.com> wrote​:

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

1) return the number of items
2) become "invisible" and return whatever the input to sort would
return if it were called in scalar context.
3) return the minimum

For me​:

Either of the 1) or 2) make sense to me. I lean a little towards 2) as
being the least likely to /add/ confusion to already an confusing set
of rules of understanding what "return LISTYTHING" does. Regardless
IMO it is important that​:

return @​array;
return sort @​array;

and

return keys %hash;
return keys sort %hash;

and

return grep { /pat/ } (LIST);
return sort grep { /pat/ } (LIST);

and

return map { "x" . $_ } (LIST);
return sort map { "x" . $_ } (LIST);

all return the same thing in list and scalar context. I am less concerned that

return $x, $y, $z;
return sort $x, $y, $z

behave the same in list and scalar context, so I am fine with going
with 1, but I am inclined to say that 2 is marginally preferable as it
doesn't add any new special cases.

The third option is the most surprising, and IMO the least useful as
we can already do it with list slice and be clearer as to what it
means, and IMO much worse it requires we execute the sort, at least
when there is a comparison block and in practical terms probably
always. I personally think that any effort in this direction would be
better spent in making Perl optimize slices on sort, so that the
optimisation covered (sort @​list)[0..2] as well as (sort @​list)[0].

Yves

My recommendation is 1)


I think it would be better if it were 1) since it
is more self consistent.

  $x = sort qw" d a b c ";
  $y = sub{sort @​_}->(qw" d a b c ");
  $z = sub{ @​a = sort @​_; @​a }->(qw" d a b c " );

1)

  $x == 4
  $y == 4
  $z == 4

2)

  $x eq 'c' # <= WHAT??!!
  $y == 4
  $z == 4

3)

  $x eq 'a'
  $y eq 'a'
  $z == 4 # can be worked around

With 2) it would be impossible to implement a
subroutine that works anything like `sort()`
without some level of C or XS.

Since the return value depends on whether it is given a list,
or an array. ( Normal subroutines only ever get an array )

I can see where 2) would be easier to optimize
in some cases (which IMHO were badly written).
It isn't anywhere as consistent as 1) or even 3).

I do see some appeal to implementing 3)
so that `$x=sort...` does something useful.
(You could have written it as `$x=(sort...)[0]`
if that is what you meant)
It is inconsistent with other keywords though

  $ perl -E'say scalar keys %​::'
  45

So then I can only recommend 1)


I can also see this generating a warning.

  $x = sort qw' d a b c ';

Since you could write it more specifically.

1)
  $x =()= qw' d a b c ';
2)
  $x = qw' d a b c ';
3)
  $x = (sort qw' d a b c ')[0];


These should work without a warning​:

  $x = sub{ sort qw' d a b c ' }->();
  $x = sub{ sort @​_ }->(qw' d a b c ');

I also think they should return the same thing 1) or 3)


My recommendation is 1) return the number of items
It's the most consistent, with it-self and with the rest
of the language.

Far second choice is 3) return minimum.
( I am recommending against it though as it's
not consistent with the rest of the language )

I am completely against 2), as it's not even self-consistent.
( It changes when it is returned from a sub )

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @demerphq

On 22 February 2013 01​:07, Gisle Aas <gisle@​activestate.com> wrote​:

On Thu, Feb 21, 2013 at 9​:20 PM, demerphq <demerphq@​gmail.com> wrote​:

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

1) return the number of items
2) become "invisible" and return whatever the input to sort would
return if it were called in scalar context.
3) return the minimum

I checked how the listy functions in perl works in scalar context and found
this​:

splice returns the last value
delete returns the last value
qw returns the last value
reverse concatenates the arguments as a single string and reverses that
times returns the first value
unpack returns the first value
localtime returns a string
gmtime returns a string
stat returns a boolean (success)
m returns a boolean
sort returns undef
split length of list
grep length of list
map length of list
keys length of list
values length of list
eval propagate context

Option 1) above will make sort behave like split, grep, and keys. Option 2)
makes it behave like eval. Option 3) makes it behave like times, and
unpack.

For me​:

Either of the 1) or 2) make sense to me. I lean a little towards 2) as
being the least likely to /add/ confusion to already an confusing set
of rules of understanding what "return LISTYTHING" does.

You basically need to read the docs if you care about the scalar value of
LISTYTHING. My logic is to expect Perl to try do the most useful thing, not
try to be consistent. I still find 3) the most useful option for sort.

The third option is the most surprising, and IMO the least useful as
we can already do it with list slice and be clearer as to what it
means, and IMO much worse it requires we execute the sort, at least
when there is a comparison block and in practical terms probably
always.

sort can detect the scalar context and find the smallest element in linear
time. No need to do a full sort just to find the first element. In void
context it can skip the sort completely.

But then we have the optimization for exactly one case, instead of a
solution based around optimizing slices which could optimize many.

Also consider the complexity involved in making this work​:

my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array)[0];

would do. So IMO this is not the right choice. It is not huffman
encoded properly, and is dangerous as it means you cant stick a sort
into a return statement without changing the semantics of the function
in scalar context.

To me, making sort in scalar context do something that can already be
done, and making sort dangerous at the same time, is a bad call no
matter how convenient it might be for a small set of use cases.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @demerphq

On 21 February 2013 21​:20, demerphq <demerphq@​gmail.com> wrote​:

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

1) return the number of items

I have pushed a patch to implement this on the yves/scalar_sort topic
branch. I have requested it be smoked as smoke-me/yves-scalar_sort.

The commit currently is​:

commit 59cf203
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Feb 22 14​:03​:22 2013 +0100

  make sort in scalar context return the number of elements

  This allows one to add a "sort" to a routine that returned an array,
  or list of keys, and preserve the original meaning in scalar context,
  but cause the returned values to be in sorted order.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @xdg

On Fri, Feb 22, 2013 at 8​:13 AM, demerphq <demerphq@​gmail.com> wrote​:

On 21 February 2013 21​:20, demerphq <demerphq@​gmail.com> wrote​:

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

1) return the number of items

I have pushed a patch to implement this on the yves/scalar_sort topic
branch. I have requested it be smoked as smoke-me/yves-scalar_sort.

+1

I think sort is most similar to map and grep in that they are all list
transformers. Therefore, they should all work the same in scalar
context.

--
David Golden <xdg@​xdg.me>
Take back your inbox! → http​://www.bunchmail.com/
Twitter/IRC​: @​xdg

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @tamias

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:

my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array)[0];

would do.

Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
  if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
  $min = $item;
  }
}

Ronald

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @demerphq

On 22 February 2013 17​:29, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:

my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array)[0];

would do.

Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
$min = $item;
}
}

Yeah, true, still, its not "simply scanning for the minimum" anymore is it?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From j.imrie1@virginmedia.com

On 22/02/2013 16​:43, demerphq wrote​:

On 22 February 2013 17​:29, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:

my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array)[0];

would do.
Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
$min = $item;
}
}
Yeah, true, still, its not "simply scanning for the minimum" anymore is it?

Yves

use List​::Util qw(min);
$min = min @​list;

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @rjbs

* demerphq <demerphq@​gmail.com> [2013-02-22T08​:13​:36]

On 21 February 2013 21​:20, demerphq <demerphq@​gmail.com> wrote​:

On 20 February 2013 07​:06, yves orton <perlbug-followup@​perl.org> wrote​:

sort is documented as having undefined behavior in scalar context.

This is really annoying

So far the options proposed seem to be​:

1) return the number of items

I have pushed a patch to implement this on the yves/scalar_sort topic
branch. I have requested it be smoked as smoke-me/yves-scalar_sort.

I am at least tentatively in favor of this, if not just plain ol' in favor.

...but I'd like it to wait for 5.19.0 to open. We've already got a number of
non-trivial things likely to happen between now and 5.18.0, and I'd like to
keep the "things changed in short period right before release" list as small as
possible.

Thanks for working on this!

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2013

From @druud62

On 2013-02-22 23​:34, John Imrie wrote​:

On 22/02/2013 16​:43, demerphq wrote​:

On 22 February 2013 17​:29, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:
my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] }
@​array)[0];

would do.

Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
$min = $item;
}
}

Yeah, true, still, its not "simply scanning for the minimum" anymore
is it?

use List​::Util qw(min);
$min = min @​list;

John, I think that you missed the overriding of 'min'.

List​::Util​::min returns the lowest numerical value.

--
Ruud

$ perl -MList​::Util=min,reduce -wle '
  my @​data = ( [ a => 0 ], [ x => 42 ], [ x => 2 ] );
  my @​sorted = sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​data;
  my $min = reduce { ( ( $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] ) <
0 ) ? $a : $b } @​data;
  print join " => ", @​$_ for $sorted[0], min(@​data), $min;
'
x => 2
a => 0
x => 2

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2013

From j.imrie1@virginmedia.com

On 24/02/2013 12​:11, Dr.Ruud wrote​:

On 2013-02-22 23​:34, John Imrie wrote​:

On 22/02/2013 16​:43, demerphq wrote​:

On 22 February 2013 17​:29, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:
my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] }
@​array)[0];

would do.

Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
$min = $item;
}
}

Yeah, true, still, its not "simply scanning for the minimum" anymore
is it?

use List​::Util qw(min);
$min = min @​list;

John, I think that you missed the overriding of 'min'.

List​::Util​::min returns the lowest numerical value.

Argh!!

Yes you are right. Sorry

We need a string version of min then.

John

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2013

From @Smylers

John Imrie writes​:

On 24/02/2013 12​:11, Dr.Ruud wrote​:

List​::Util​::min returns the lowest numerical value.

Yes you are right. We need a string version of min then.

List​::Util​::minstr

Smylers
--
http​://twitter.com/Smylers2

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2013

From @druud62

On 2013-02-24 14​:22, John Imrie wrote​:

On 24/02/2013 12​:11, Dr.Ruud wrote​:

On 2013-02-22 23​:34, John Imrie wrote​:

On 22/02/2013 16​:43, demerphq wrote​:

On 22 February 2013 17​:29, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Fri, Feb 22, 2013 at 08​:36​:41AM +0100, demerphq wrote​:

Also consider the complexity involved in making this work​:
my $min= sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] } @​array;

We cant simply scan the list to find the minimum, we will have to
apply the sort function and then extract item 0, which is exactly
what​:

my $min= (sort { $b->[0] cmp $a->[0] || $a->[1] <=> $b->[1] }
@​array)[0];

would do.

Of course we can simply scan the list to find the minimum.

my $min = $array[0];

for my $item (@​array[1 .. $#array]) {
if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
$min = $item;
}
}

Yeah, true, still, its not "simply scanning for the minimum" anymore
is it?

use List​::Util qw(min);
$min = min @​list;

John, I think that you missed the overriding of 'min'.

List​::Util​::min returns the lowest numerical value.

Argh!!

Yes you are right. Sorry

We need a string version of min then.

Its 'minstr' can not cut this cake either.

Check the reduce-example in my previous message.
(in the sig-area)

--
Ruud

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @jandubois

On Thu, Feb 21, 2013 at 11​:36 PM, demerphq <demerphq@​gmail.com> wrote​:

To me, making sort in scalar context do something that can already be
done, and making sort dangerous at the same time, is a bad call no
matter how convenient it might be for a small set of use cases.

I don't understand the "dangerous" part of your comment, unless you mean​:
"does not do what *I* want it to do".

The only dangerous part is assuming that a function normally returning a
list will in scalar context return the number of elements in that list.

Beyond that, both the "return the number" and "return the first" is already
trivially implementable, so to be this is really a case of "six of one,
half a dozen of the other".

One thing slightly more interesting about the first element case is that we
already optimize "reverse sort", so this would also give an optimized​:

  my $max = reverse sort @​list;

that could work in linear time.

Cheers,
-Jan

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @demerphq

On 25 February 2013 19​:57, Jan Dubois <jand@​activestate.com> wrote​:

On Thu, Feb 21, 2013 at 11​:36 PM, demerphq <demerphq@​gmail.com> wrote​:

To me, making sort in scalar context do something that can already be
done, and making sort dangerous at the same time, is a bad call no
matter how convenient it might be for a small set of use cases.

I don't understand the "dangerous" part of your comment, unless you mean​:
"does not do what *I* want it to do".

Well I suppose this is sufficiently a matter of taste that probably I
am not going to convince you, but I think its dangerous because it
will not do what I think most beginners would expect. I think most
beginners would expect that when starting with​:

sub list_of_keys {
  return keys %hash;
}

if (list_of_keys()) {
  ...
}

that adding "sort" to the return of list_of_keys()​:

sub list_of_keys {
  return sort keys %hash;
}

wouldn't break the code that does if (list_of_keys()) { ... }.

Returning the max/min means that every place that I want to do​:

  return sort thingee();

I have to do

  return wantarray ? sort thingee() : thingee();

which is just PITA and poorly huffman encoded.

The only dangerous part is assuming that a function normally returning a
list will in scalar context return the number of elements in that list.

That's like saying the only dangerous part of a lawnmower is the
blades. Perl is supposed to be linguistic, intuitive, and properly
huffman encoded. From that point alone I've needed to sort the return
of a sub gazillions of time.

Whereas I only rarely want to do something like find the minimum of an
arbitrary list of values without already having iterated over them.
Most time I need that kind of min I have to iterate over the list
anyway, as such I rarely have to reach for Scalar​::Util​::min.

Beyond that, both the "return the number" and "return the first" is already
trivially implementable, so to be this is really a case of "six of one, half
a dozen of the other".

Return the first meaning return the min/max? Or return the first item
in the list?

One thing slightly more interesting about the first element case is that we
already optimize "reverse sort", so this would also give an optimized​:

my $max = reverse sort @&#8203;list;

that could work in linear time.

Except that would break the documented interface of reverse, so no we
can't make that work.

I think a much much more interesting approach for this is to make list
slices and list assignment on sort "smart".

IOW I should be able to get an optized solution to something like this​:

my ($x,$y,$z)= sort { ... } @​big_array;

and have perl do something clever so that it doesnt sort the full list
just find the first N items.

Which btw would allow this​:

my ($x,$y,$z)= reverse sort { ... } @​big_array;

Which does what you want and doesn't require we change scalar behavior
of reverse, and has a built in interface for doing more interesting
things than "find the min". :-)

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @jandubois

On Mon, Feb 25, 2013 at 11​:19 AM, demerphq <demerphq@​gmail.com> wrote​:

On 25 February 2013 19​:57, Jan Dubois <jand@​activestate.com> wrote​:

On Thu, Feb 21, 2013 at 11​:36 PM, demerphq <demerphq@​gmail.com> wrote​:

To me, making sort in scalar context do something that can already be
done, and making sort dangerous at the same time, is a bad call no
matter how convenient it might be for a small set of use cases.

I don't understand the "dangerous" part of your comment, unless you mean​:
"does not do what *I* want it to do".

Well I suppose this is sufficiently a matter of taste that probably I
am not going to convince you, but I think its dangerous because it
will not do what I think most beginners would expect. I think most
beginners would expect that when starting with​:

sub list_of_keys {
return keys %hash;
}

if (list_of_keys()) {
...
}

that adding "sort" to the return of list_of_keys()​:

sub list_of_keys {
return sort keys %hash;
}

wouldn't break the code that does if (list_of_keys()) { ... }.

No, I see what you mean. Except that it doesn't work that way right now.
So changing sort() from returning undef to returning anything else doesn't
make it more "dangerous", it just doesn't make it DWIM that is optimized
for return() statements.

Which is one of my concerns with this​: returning the number of elements in
the list is pretty useless for the normal cases like

  my $count = sort keys %hash;
  my $count = sort @​list;

The "sort" is essentially a no-op, except for the special case inside the
return() statement, and I generally hate wasting "syntactic space" for
something that doesn't provide any value, except in a narrow special case.

Returning the max/min means that every place that I want to do​:

return sort thingee();

I have to do

return wantarray ? sort thingee() : thingee();

which is just PITA and poorly huffman encoded.

I thought you could also use

  return my @​list = sort thingee();

but I now realize that this will not optimize the sort() when called in
scalar context.

The only dangerous part is assuming that a function normally returning a

list will in scalar context return the number of elements in that list.

That's like saying the only dangerous part of a lawnmower is the
blades. Perl is supposed to be linguistic, intuitive, and properly
huffman encoded. From that point alone I've needed to sort the return
of a sub gazillions of time.

Whereas I only rarely want to do something like find the minimum of an
arbitrary list of values without already having iterated over them.
Most time I need that kind of min I have to iterate over the list
anyway, as such I rarely have to reach for Scalar​::Util​::min.

For me it is a wash, I think I find both situations equally likely.

Beyond that, both the "return the number" and "return the first" is
already
trivially implementable, so to be this is really a case of "six of one,
half
a dozen of the other".

Return the first meaning return the min/max? Or return the first item
in the list?

Both

One thing slightly more interesting about the first element case is that
we
already optimize "reverse sort", so this would also give an optimized​:

my $max = reverse sort @&#8203;list;

that could work in linear time.

Except that would break the documented interface of reverse, so no we
can't make that work.

Darn. Now this makes the "return the first" option less appealing...

I think a much much more interesting approach for this is to make list
slices and list assignment on sort "smart".

IOW I should be able to get an optized solution to something like this​:

my ($x,$y,$z)= sort { ... } @​big_array;

and have perl do something clever so that it doesnt sort the full list
just find the first N items.

Yes. I suspect doing this just for N=1 covers the vast majority of use
cases, and is a lot easier to implement than any larger N, where you could
no longer have linear performance.

Which btw would allow this​:

my ($x,$y,$z)= reverse sort { ... } @​big_array;

Which does what you want and doesn't require we change scalar behavior
of reverse, and has a built in interface for doing more interesting
things than "find the min". :-)

I definitely wouldn't want to change the defined reverse() behavior; I was
just not thinking about it enough.

So I think I agree with your proposal. :)

Cheers,
-Jan

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @davidnicol

On Sun, Feb 24, 2013 at 7​:38 AM, Smylers <Smylers@​stripey.com> wrote​:

John Imrie writes​:

On 24/02/2013 12​:11, Dr.Ruud wrote​:

List​::Util​::min returns the lowest numerical value.

Then there's an an alternative to list utils, which will give you the n
first without
sorting the whole thing

http​://search.cpan.org/~davidnico/Tie-Quicksort-Lazy-0.04/lib/Tie/Quicksort/Lazy.pm

  for my $item (@​array[1 .. $#array]) {
  if (($item->[0] cmp $min->[0] || $min->[1] <=> $item->[1]) > 0) {
  $min = $item;
  }
  $min

would become

  use Tie​::Quicksort​::Lazy;
  tie my @​tmp, 'Tie​::Quicksort​::Lazy' , sub { $_[0] cmp $_[1] }, @​array;
  shift @​tmp

Anyway. Everytime there's a new cat we figure out a new way to skin it.

Having once, on something of a dare, submitted a patch to CORE​::sort that
made sort in scalar return a "sortedness ratio," I'd
prefer that sort in scalar context continue returning undef, but start
issuing a warning. If voting on the the three options presented
is what we're up to (and if my vote had any weight, which is dubious) I'm
in favor of option #1, to return the size of the unsorted input
array without doing anything else.

Peace.

--
"You are now leaving the emergency airlock. Thank you for observing all
safety precautions."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants