Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl regression bug since 5.13.11 (masked by CoW since 5.19.1) #13921

Closed
p5pRT opened this issue Jun 13, 2014 · 8 comments
Closed

Perl regression bug since 5.13.11 (masked by CoW since 5.19.1) #13921

p5pRT opened this issue Jun 13, 2014 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 13, 2014

Migrated from rt.perl.org#122099 (status was 'resolved')

Searchable as RT122099$

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2014

From @deven

The following test case demonstrates a Perl regression bug, evidently
introduced in 5.13.11 and masked by Copy-on-Write since 5.19.1​:

This is perl 5, version 18, subversion 2 (v5.18.2) built for
x86_64-linux-thread-multi
(with 19 registered patches, see perl -V for more detail)

$ cat testcase
#!/usr/bin/perl

use List​::MoreUtils qw[minmax];

my @​numbers = (20140613);
my @​strings = ('20140613');
my $check = "20140613 20140613";

my @​numbers_minmax = minmax @​numbers;
my @​numbers_minmax_map = minmax map { $_; } @​numbers;
my @​strings_minmax = minmax @​strings;
my @​strings_minmax_map = minmax map { $_; } @​strings;

print "@​numbers_minmax" eq $check ? "ok 1\n" : "not ok 1\n";
print "@​numbers_minmax_map" eq $check ? "ok 2\n" : "not ok 2\n";
print "@​strings_minmax" eq $check ? "ok 3\n" : "not ok 3\n";
print "@​strings_minmax_map" eq $check ? "ok 4\n" : "not ok 4\n";
$ ./testcase
ok 1
ok 2
ok 3
not ok 4
$ LIST_MOREUTILS_PP=1 ./testcase
ok 1
ok 2
ok 3
ok 4

After I isolated this test case under Perl 5.18.2, George Greer (my
co-worker) went to work tracking down the commit that introduced the bug,
which appears to be this commit, first included in Perl 5.13.11​:

commit acdea6f
Author​: David Mitchell <davem@​iabyn.com>
Date​: Sat Mar 12 22​:01​:26 2011 +0000

  [perl #82111] de-pessimise some my @​array = ...

  Due to obscure closure and goto tricks, it's sometimes possible for the
  array or hash in the LHS of 'my @​a = ...' and 'my %h = ...' to be
  non-empty. At compile-time, these conditions are detected and the assign
  op is compiled with the OPpASSIGN_COMMON, making the assignment slower.

  This commit speeds it up again by adding a run-time check to pp_aassign
  to only do the OPpASSIGN_COMMON code-branch if the LHS isn't an empty
  array or hash.

  See also #70171.

After further research, he determined that the bug appears to be masked
(but not fixed) by the following commit, first included in Perl 5.19.1​:

commit 13b0f67
Author​: David Mitchell <davem@​iabyn.com>
Date​: Wed May 22 16​:38​:29 2013 +0100

  re-enable Copy-on-Write by default.

  COW was first introduced (and enabled by default) in 5.17.7.
  It was disabled by default in 5.17.10, because it was though to have too
  many rough edges for the 5.18.0 release.

  By re-enabling it now, early in the 5.19.x release cycle, hopefully it
  will be ready for production use by 5.20.

  This commit mainly reverts 9f351b4 and e1fd413 (with
modifications),
  then updates perldelta.

George confirmed that the bug still exists by building blead (v5.21.1
(v5.21.0-385-gc9fcb67)) with -Accflags="-DPERL_NO_COW" to disable CoW.
This version of Perl still exhibits the bug.

He also mentioned that a 5.18.3 release is planned; is there any chance a
fix for this regression could be included in 5.18.3 and the 5.20.1 release
as well? (Is it worth backporting to 5.14 or 5.16?)

Deven

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2014

From @iabyn

On Fri, Jun 13, 2014 at 12​:41​:58PM -0700, Deven T. Corzine wrote​:

$ cat testcase
#!/usr/bin/perl

use List​::MoreUtils qw[minmax];

my @​numbers = (20140613);
my @​strings = ('20140613');
my $check = "20140613 20140613";

my @​numbers_minmax = minmax @​numbers;
my @​numbers_minmax_map = minmax map { $_; } @​numbers;
my @​strings_minmax = minmax @​strings;
my @​strings_minmax_map = minmax map { $_; } @​strings;

print "@​numbers_minmax" eq $check ? "ok 1\n" : "not ok 1\n";
print "@​numbers_minmax_map" eq $check ? "ok 2\n" : "not ok 2\n";
print "@​strings_minmax" eq $check ? "ok 3\n" : "not ok 3\n";
print "@​strings_minmax_map" eq $check ? "ok 4\n" : "not ok 4\n";
$ ./testcase
ok 1
ok 2
ok 3
not ok 4
$ LIST_MOREUTILS_PP=1 ./testcase
ok 1
ok 2
ok 3
ok 4

After I isolated this test case under Perl 5.18.2, George Greer (my
co-worker) went to work tracking down the commit that introduced the bug,
which appears to be this commit, first included in Perl 5.13.11​:

commit acdea6f
Author​: David Mitchell <davem@​iabyn.com>
Date​: Sat Mar 12 22​:01​:26 2011 +0000

\[perl \#82111\] de\-pessimise some my @&#8203;array = \.\.\.

Due to obscure closure and goto tricks\, it's sometimes possible for the
array or hash in the LHS of 'my @&#8203;a = \.\.\.' and 'my %h = \.\.\.' to be
non\-empty\. At compile\-time\, these conditions are detected and the assign
op is compiled with the OPpASSIGN\_COMMON\, making the assignment slower\.

This commit speeds it up again by adding a run\-time check to pp\_aassign
to only do the OPpASSIGN\_COMMON code\-branch if the LHS isn't an empty
array or hash\.

See also \#70171\.

After further research, he determined that the bug appears to be masked
(but not fixed) by the following commit, first included in Perl 5.19.1​:

commit 13b0f67
Author​: David Mitchell <davem@​iabyn.com>
Date​: Wed May 22 16​:38​:29 2013 +0100

re\-enable Copy\-on\-Write by default\.

COW was first introduced \(and enabled by default\) in 5\.17\.7\.
It was disabled by default in 5\.17\.10\, because it was though to have too
many rough edges for the 5\.18\.0 release\.

By re\-enabling it now\, early in the 5\.19\.x release cycle\, hopefully it
will be ready for production use by 5\.20\.

This commit mainly reverts 9f351b45f4 and e1fd41328c \(with

modifications),
then updates perldelta.

George confirmed that the bug still exists by building blead (v5.21.1
(v5.21.0-385-gc9fcb67)) with -Accflags="-DPERL_NO_COW" to disable CoW.
This version of Perl still exhibits the bug.

He also mentioned that a 5.18.3 release is planned; is there any chance a
fix for this regression could be included in 5.18.3 and the 5.20.1 release
as well? (Is it worth backporting to 5.14 or 5.16?)

It's a more general problem, and assignment to an array is only one aspect of
it. It can be reduced to the following test case​:

  my $a = "foo";
  my ($s, $t) =(map { $_; } $a)[0,0];
  print "$^V​: [$s;$t]\n";

which outputs under various perls​:

  v5.12.5​: [foo;foo]
  v5.14.3​: [foo;]
  v5.16.3​: [foo;]
  v5.18.2​: [foo;]
  v5.20.0​: [foo;foo]
  v5.21.1​: [foo;] # blead with -DPERL_NO_COW
  v5.21.1​: [foo;foo] # blead

the map creates a TEMP copy of $a; the [0,0] list slice pushes that copy
twice onto the stack; the list assignment copies the first string on the
stack to $s​: because its marked TEMP, perl knows that the string buffer
can be 'swiped', so $s steals the string buffer of the first stack SV,
leaving it undef. When the second stack element is copied, that's
actually an alias of the first element, so its now undef.

In 5.20.0 and later with COW enabled (the default), when map makes a TEMP copy
of $a, it uses COW, so $a and the copy share the string buffer. When that
same value is copied to $s it again COWed, so no buffer swiping takes
place.

It may be possible to construct a scenario that shows a similar issue in
5.20.0 or later, but I can't think of one right now.

Fixing this for maint releases would be hard. Simply reverting acdea6f
would have fairly significant performance issues (and it wouldn't fix my
simpler test case above). The whole OPpASSIGN_COMMON
pessimisation/optimisation thing has been batted backwards and forwards in
various releases, it proving surprisingly tricky to determine all the
cases where there are common elements in a list assignment, and we've at
various times broken performance by being overly cautious (5.10.0?) and/or
broken the perl-level logic by not being cautious enough.

My acdea6f commit was a final iteration of that. We had realised
that something as simple as

  my @​a = ($x, $y);

couldn't be guaranteed (at compile time) not to have common elements, much
to everyone's surprise, because for example​:

  f();
  my @​a = ($x, $y);
  sub f { @​a = ($y, $x) }

(that may not be the exact thing that reproduces it, but it was something
similar. It was something that managed to get @​a populated with a common element *before* the 'my @​a' assignment.)

so 'my @​a = ...' was pessmimsed. My commit allowed that to be
unpessimised at run time, avoiding a big slowdown on simple array
assignments.

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
  -- Things That Never Happen in "Star Trek" #7

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2014

From @deven

On Fri, Jun 13, 2014 at 5​:31 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

Fixing this for maint releases would be hard. Simply reverting
acdea6f
would have fairly significant performance issues (and it wouldn't fix my
simpler test case above). The whole OPpASSIGN_COMMON
pessimisation/optimisation thing has been batted backwards and forwards in
various releases, it proving surprisingly tricky to determine all the
cases where there are common elements in a list assignment, and we've at
various times broken performance by being overly cautious (5.10.0?) and/or
broken the perl-level logic by not being cautious enough.

So what's the solution? Prioritizing performance over correctness doesn't
sound like the right thing to do. Can the bug be fixed without undue
performance impact?

Deven

@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2014

From @Leont

On Fri, Jun 13, 2014 at 11​:31 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

the map creates a TEMP copy of $a; the [0,0] list slice pushes that copy
twice onto the stack; the list assignment copies the first string on the
stack to $s​: because its marked TEMP, perl knows that the string buffer
can be 'swiped', so $s steals the string buffer of the first stack SV,
leaving it undef. When the second stack element is copied, that's
actually an alias of the first element, so its now undef.

Yet another bug in the category "because the perl stack isnt refcounted" :-(

Leon

@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2014

From @iabyn

On Tue, Jun 17, 2014 at 10​:17​:51AM -0400, Deven T. Corzine wrote​:

On Fri, Jun 13, 2014 at 5​:31 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

Fixing this for maint releases would be hard. Simply reverting
acdea6f
would have fairly significant performance issues (and it wouldn't fix my
simpler test case above). The whole OPpASSIGN_COMMON
pessimisation/optimisation thing has been batted backwards and forwards in
various releases, it proving surprisingly tricky to determine all the
cases where there are common elements in a list assignment, and we've at
various times broken performance by being overly cautious (5.10.0?) and/or
broken the perl-level logic by not being cautious enough.

So what's the solution? Prioritizing performance over correctness doesn't
sound like the right thing to do. Can the bug be fixed without undue
performance impact?

I can't think of an easy solution. The string swiping code has been there
for 20+ years, and is a pretty significant optimisation. Without it,
something like

  sub f {
  my $s = .... some very long string ...;
  return $s;
  }

  my $t = f()

would involve copying the long string multiple times. Its an optimisation
that's ok 99.9999% of the time.

The other part of the problem is that perl's argument stack isn't
reference counted, which is a well-known long-standing bug that is deeply
non-trivial to fix.

The only easy way to fix this for your use-case would be for
List​::MoreUtils​::minmax to to make a copy of its second return value if
both return values map to the same SV. That would be List​::MoreUtils
working round a perl bug, but it's the only simple fix I can think of.

--
You live and learn (although usually you just live).

@p5pRT p5pRT closed this as completed Mar 29, 2017
@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

@iabyn - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

From @iabyn

On Wed, Jun 18, 2014 at 12​:45​:17PM +0100, Dave Mitchell wrote​:

On Tue, Jun 17, 2014 at 10​:17​:51AM -0400, Deven T. Corzine wrote​:

On Fri, Jun 13, 2014 at 5​:31 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

Fixing this for maint releases would be hard. Simply reverting
acdea6f
would have fairly significant performance issues (and it wouldn't fix my
simpler test case above). The whole OPpASSIGN_COMMON
pessimisation/optimisation thing has been batted backwards and forwards in
various releases, it proving surprisingly tricky to determine all the
cases where there are common elements in a list assignment, and we've at
various times broken performance by being overly cautious (5.10.0?) and/or
broken the perl-level logic by not being cautious enough.

So what's the solution? Prioritizing performance over correctness doesn't
sound like the right thing to do. Can the bug be fixed without undue
performance impact?

I can't think of an easy solution. The string swiping code has been there
for 20+ years, and is a pretty significant optimisation. Without it,
something like

sub f \{
    my $s = \.\.\.\. some very long string \.\.\.;
    return $s;
\}

my $t = f\(\)

would involve copying the long string multiple times. Its an optimisation
that's ok 99.9999% of the time.

The other part of the problem is that perl's argument stack isn't
reference counted, which is a well-known long-standing bug that is deeply
non-trivial to fix.

The only easy way to fix this for your use-case would be for
List​::MoreUtils​::minmax to to make a copy of its second return value if
both return values map to the same SV. That would be List​::MoreUtils
working round a perl bug, but it's the only simple fix I can think of.

Since then I've heavily reworked list assignment and I think all the
issues described above have been sorted. So I'm closing this ticket.

--
In my day, we used to edit the inodes by hand. With magnets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant