Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in Perl_mg_magical (mg.c:144) #15759

Open
p5pRT opened this issue Dec 11, 2016 · 17 comments
Open

segfault in Perl_mg_magical (mg.c:144) #15759

p5pRT opened this issue Dec 11, 2016 · 17 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 11, 2016

Migrated from rt.perl.org#130318 (status was 'open')

Searchable as RT130318$

@p5pRT
Copy link
Author

p5pRT commented Dec 11, 2016

From @geeknik

Triggered with Perl v5.25.7-98-gdf13534 while fuzzing with AFL.

od -tx1 test332
0000000 6d 61 70 25 5e 48 20 3d 0c 44 2e 2e 00 20 46 54
0000020 2c 25 5f 3d 44 2e 2e 46 54 2c 25 5f 3d 44 2e 2e
0000040 00 20 2b 54 2c 6d 61 70 20 5c 2d 41 74 76 2c 2d
0000060 4f 5e 4d 2c 6d 61 70 20 5c 2d 41 1a 53 42 55 c0
0000100 63 4d 4f 73 59 74 8a 2c 2d 4f 04 22 2c 90 21 00
0000120 00 00 6e 21 0a
0000125

ASAN​:SIGSEGV

==32602==ERROR​: AddressSanitizer​: SEGV on unknown address 0x00205fff8001
(pc 0x0000008233b8 bp 0x0c42000037e4 sp 0x7ffc5bfd7060 T0)
  #0 0x8233b7 in Perl_mg_magical /root/perl/mg.c​:144​:6
  #1 0x9476f8 in Perl_sv_magicext /root/perl/sv.c​:5767​:5
  #2 0x8fbf03 in Perl_sv_magic /root/perl/sv.c​:5856​:10
  #3 0x827b18 in Perl_mg_copy /root/perl/mg.c​:471​:3
  #4 0x87b54d in Perl_hv_common /root/perl/hv.c​:542​:7
  #5 0x8c030a in Perl_pp_aassign /root/perl/pp_hot.c​:1646​:25
  #6 0x7f81fb in Perl_runops_debug /root/perl/dump.c​:2260​:23
  #7 0x5a0ab3 in S_run_body /root/perl/perl.c​:2526​:2
  #8 0x5a0ab3 in perl_run /root/perl/perl.c​:2449
  #9 0x4de6dd in main /root/perl/perlmain.c​:123​:9
  #10 0x7f3dbd687b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287
  #11 0x4de34c in _start (/root/perl/perl+0x4de34c)

AddressSanitizer can not provide additional info.
SUMMARY​: AddressSanitizer​: SEGV /root/perl/mg.c​:144 Perl_mg_magical
==32602==ABORTING

Perl 5.20.2 fails with many lines of this​:
Attempt to free unreferenced scalar​: SV 0xe1f088, Perl interpreter​:
0xe1c010 at test332 line 1.

@p5pRT
Copy link
Author

p5pRT commented Dec 11, 2016

From @geeknik

test332.gz

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2016

From @iabyn

On Sun, Dec 11, 2016 at 01​:40​:31PM -0800, Brian Carpenter wrote​:

od -tx1 test332
0000000 6d 61 70 25 5e 48 20 3d 0c 44 2e 2e 00 20 46 54
0000020 2c 25 5f 3d 44 2e 2e 46 54 2c 25 5f 3d 44 2e 2e
0000040 00 20 2b 54 2c 6d 61 70 20 5c 2d 41 74 76 2c 2d
0000060 4f 5e 4d 2c 6d 61 70 20 5c 2d 41 1a 53 42 55 c0
0000100 63 4d 4f 73 59 74 8a 2c 2d 4f 04 22 2c 90 21 00
0000120 00 00 6e 21 0a
0000125

ASAN​:SIGSEGV

==32602==ERROR​: AddressSanitizer​: SEGV on unknown address 0x00205fff8001
(pc 0x0000008233b8 bp 0x0c42000037e4 sp 0x7ffc5bfd7060 T0)
#0 0x8233b7 in Perl_mg_magical /root/perl/mg.c​:144​:6
#1 0x9476f8 in Perl_sv_magicext /root/perl/sv.c​:5767​:5
#2 0x8fbf03 in Perl_sv_magic /root/perl/sv.c​:5856​:10
#3 0x827b18 in Perl_mg_copy /root/perl/mg.c​:471​:3
#4 0x87b54d in Perl_hv_common /root/perl/hv.c​:542​:7
#5 0x8c030a in Perl_pp_aassign /root/perl/pp_hot.c​:1646​:25
#6 0x7f81fb in Perl_runops_debug /root/perl/dump.c​:2260​:23
#7 0x5a0ab3 in S_run_body /root/perl/perl.c​:2526​:2
#8 0x5a0ab3 in perl_run /root/perl/perl.c​:2449
#9 0x4de6dd in main /root/perl/perlmain.c​:123​:9
#10 0x7f3dbd687b44 in __libc_start_main
/build/glibc-daoqzt/glibc-2.19/csu/libc-start.c​:287
#11 0x4de34c in _start (/root/perl/perl+0x4de34c)

Looks like another stack-not-refcounted issue. Reduces to​:

map(
  ((%^H) = ('D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
  'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA',
  'AB', 'AC', 'AD', 'AE', 'AF', 'AG', 'AH', 'AI', 'AJ', 'AK', 'AL',
  'AM', 'AN', 'AO', 'AP', 'AQ', 'AR', 'AS', 'AT', 'AU', 'AV', 'AW',
  'AX', 'AY', 'AZ', 'BA', 'BB', 'BC', 'BD', 'BE', 'BF', 'BG', 'BH',
  'BI', 'BJ', 'BK', 'BL', 'BM', 'BN', 'BO', 'BP', 'BQ', 'BR', 'BS',
  'BT', 'BU', 'BV', 'BW', 'BX', 'BY', 'BZ', 'CA', 'CB', 'CC', 'CD',
  'CE', 'CF', 'CG', 'CH', 'CI', 'CJ', 'CK', 'CL', 'CM', 'CN', 'CO',
  'CP', 'CQ', 'CR', 'CS', 'CT', 'CU', 'CV', 'CW', 'CX', 'CY', 'CZ',
  'DA', 'DB', 'DC', 'DD', 'DE', 'DF', 'DG', 'DH', 'DI', 'DJ', 'DK',
  'DL', 'DM', 'DN', 'DO', 'DP', 'DQ', 'DR', 'DS', 'DT', 'DU', 'DV',
  'DW', 'DX', 'DY', 'DZ', 'EA', 'EB', 'EC', 'ED', 'EE', 'EF', 'EG',
  'EH', 'EI', 'EJ', 'EK', 'EL', 'EM', 'EN', 'EO', 'EP', 'EQ', 'ER',
  'ES', 'ET', 'EU', 'EV', 'EW', 'EX', 'EY', 'EZ', 'FA', 'FB', 'FC',
  'FD', 'FE', 'FF', 'FG', 'FH', 'FI', 'FJ', 'FK', 'FL', 'FM', 'FN',
  'FO', 'FP', 'FQ', 'FR', 'FS', 'FT')),
 
  (%_) = ('D', 'E', 'F', 'G'),
  (%_) = (),

);

--
But Pity stayed his hand. "It's a pity I've run out of bullets",
he thought. -- "Bored of the Rings"

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 13, 2016

From @Leont

On Mon, Dec 12, 2016 at 12​:18 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

Looks like another stack-not-refcounted issue. Reduces to​:

My number one regret of the hackathon must be that we didn't sit down and
brainstorm how to actually fix the stack-not-refcounted issue. I've been
increasingly feeling like this is an unavoidable, but painful, path
forwards.

Leon

@p5pRT
Copy link
Author

p5pRT commented Dec 15, 2016

From @xsawyerx

On 12/13/2016 10​:02 PM, Leon Timmermans wrote​:

On Mon, Dec 12, 2016 at 12​:18 PM, Dave Mitchell <davem@​iabyn.com
<mailto​:davem@​iabyn.com>> wrote​:

Looks like another stack\-not\-refcounted issue\. Reduces to&#8203;:

My number one regret of the hackathon must be that we didn't sit down
and brainstorm how to actually fix the stack-not-refcounted issue.
I've been increasingly feeling like this is an unavoidable, but
painful, path forwards.

I think we raised that originally as a conversation topic, but I guess
it fell through the cracks with the various other discussions going
around[1].

I wonder if there is value is having a meeting over IRC (or some other
live communication) to discuss this.

[1] Hackathon report still pending.

@p5pRT
Copy link
Author

p5pRT commented Dec 25, 2016

From @iabyn

On Thu, Dec 15, 2016 at 01​:15​:59PM +0100, Sawyer X wrote​:

On 12/13/2016 10​:02 PM, Leon Timmermans wrote​:

My number one regret of the hackathon must be that we didn't sit down
and brainstorm how to actually fix the stack-not-refcounted issue.
I've been increasingly feeling like this is an unavoidable, but
painful, path forwards.

I wonder if there is value is having a meeting over IRC (or some other
live communication) to discuss this.

I think FC started some work on this (origin/sprout/rstack), but I don't
know how it was intended to work.

I don't think there's much point in a discussion until someone has some
initial ideas about how this could be achieved (I don't).

--
Modern art​:
  "That's easy, I could have done that!"
  "Ah, but you didn't!"

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2016

From @xsawyerx

On 12/25/2016 11​:19 AM, Dave Mitchell wrote​:

On Thu, Dec 15, 2016 at 01​:15​:59PM +0100, Sawyer X wrote​:

On 12/13/2016 10​:02 PM, Leon Timmermans wrote​:

My number one regret of the hackathon must be that we didn't sit down
and brainstorm how to actually fix the stack-not-refcounted issue.
I've been increasingly feeling like this is an unavoidable, but
painful, path forwards.
I wonder if there is value is having a meeting over IRC (or some other
live communication) to discuss this.

I think FC started some work on this (origin/sprout/rstack), but I don't
know how it was intended to work.

I don't think there's much point in a discussion until someone has some
initial ideas about how this could be achieved (I don't).

Makes sense to me. Without a plan, little discussion can be had, unless
someone can add insight on why these happen and what kind of approach
can be taken.

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2016

From @demerphq

On 26 December 2016 at 17​:47, Sawyer X <xsawyerx@​gmail.com> wrote​:

On 12/25/2016 11​:19 AM, Dave Mitchell wrote​:

On Thu, Dec 15, 2016 at 01​:15​:59PM +0100, Sawyer X wrote​:

On 12/13/2016 10​:02 PM, Leon Timmermans wrote​:

My number one regret of the hackathon must be that we didn't sit down
and brainstorm how to actually fix the stack-not-refcounted issue.
I've been increasingly feeling like this is an unavoidable, but
painful, path forwards.
I wonder if there is value is having a meeting over IRC (or some other
live communication) to discuss this.

I think FC started some work on this (origin/sprout/rstack), but I don't
know how it was intended to work.

I don't think there's much point in a discussion until someone has some
initial ideas about how this could be achieved (I don't).

Makes sense to me. Without a plan, little discussion can be had, unless
someone can add insight on why these happen and what kind of approach
can be taken.

My understanding is that we have a good idea why these happen, but
less of an idea what to do about it and stay performant.

Here is my very superficial understanding of the matter​:

When we place something on the stack we do not increment its refcount
and when we remove something from the stack we do not decrement its
refcount. This means that under some situations we can end up freeing
a value that is still on the stack.

An obvious solution to this would be to refcount the stack. However
even if we assume this is possible (I am doubtful), it would probably
be slow. I seem to recall that MS did some studies with VB and
refcount manipulation of stack arguments can account for 25% of run
time, which is why dotNet is garbage collected.

You can see this in a reduced/modified version of the bug report in this thread​:

$ cat t_refcount.pl
use Devel​::Peek;
map(
  {
  Dump($_);
  #((%^H) = ('D', 'E',)),
  }
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = (),

);

# this hash seed is for One-At-A-time, my tests are from Perl v5.18.4
$ PERL_HASH_SEED=0xc717eb9fe97a9788 perl t_refcount.pl
SV = PV(0x1305d80) at 0x1304fc8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x1314d20 "D"\0
  CUR = 1
  LEN = 16
SV = UNKNOWN(0xff) (0x131b2f8) at 0x1305178
  REFCNT = 1
  FLAGS = ()
Attempt to free unreferenced scalar​: SV 0x1305178 at t_refcount.pl line 4.
SV = PV(0x1305e80) at 0x131b2e0
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x132b0a0 "F"\0
  CUR = 1
  LEN = 16
SV = UNKNOWN(0xff) (0x131b328) at 0x131b2f8
  REFCNT = 1
  FLAGS = ()
Attempt to free unreferenced scalar​: SV 0x131b2f8 at t_refcount.pl line 4.

If you uncomment the modification of $^H you can see a different
issue. And if you then comment out the call to Dump() you get the
segfault.

At least part of this bug is due to the following rather insane piece of code​:

map { whatever() }
(
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = ()
);

that is, we populate %_ with 'D', 'E','F','G', which then puts the
keys and values on the stack, with the keys being copies, and the
values being actually on the stack. We then empty the hash, which
frees the values, we then enter the map with the values pointing to
freed scalars.

It is hard to say what we can do about insane stuff like this without
making all sane code pay a penalty. Perhaps it is something we just
want to address via education, and leave it as "won't fix". My
understanding is it is rare to unheard of for real code to tickle
stack-not-refcounted bugs, it only happens when people do crazy stuff
that nobody would do in real code anyway.

As for this particular bug I think it warrants further attention if
only because I think there is more weirdness afoot here than can be
accounted for simply via stack-not-refcounted, in particular the
assignment to the hints hash %^H is required for the segfault, and I
do not see why. I suspect there is some other deeper issue with our
stack manipulations. Compare

use Devel​::Peek;
map { Dump($_) }
(
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = ()
);

SV = PV(0x187dd80) at 0x187cfc8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x188cd20 "D"\0
  CUR = 1
  LEN = 16
SV = UNKNOWN(0xff) (0x18932f8) at 0x187d178
  REFCNT = 1
  FLAGS = ()
Attempt to free unreferenced scalar​: SV 0x187d178 at t_refcount.pl line 4.
SV = PV(0x187de80) at 0x18932e0
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x18a3130 "F"\0
  CUR = 1
  LEN = 16
SV = UNKNOWN(0xff) (0x1893328) at 0x18932f8
  REFCNT = 1
  FLAGS = ()
Attempt to free unreferenced scalar​: SV 0x18932f8 at t_refcount.pl line 4.

to

use Devel​::Peek;
map { Dump($_) }
(
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = ('P','Q')
);

SV = PV(0xb13d80) at 0xb12fc8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb22d20 "D"\0
  CUR = 1
  LEN = 16
SV = PV(0xb13e10) at 0xb13178
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb35200 "P"\0
  CUR = 1
  LEN = 16
SV = PV(0xb13e80) at 0xb292e0
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb50620 "F"\0
  CUR = 1
  LEN = 16
SV = PV(0xb13e90) at 0xb292f8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb40500 "Q"\0
  CUR = 1
  LEN = 16
SV = PV(0xb13e10) at 0xb13178
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb35200 "P"\0
  CUR = 1
  LEN = 16
SV = PV(0xb13e90) at 0xb292f8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0xb40500 "Q"\0
  CUR = 1
  LEN = 16

I can understand the

SV = UNKNOWN(0xff) (0x1893328) at 0x18932f8

in the first dump, but why do we see 'D', 'P', 'F', 'Q', 'P', 'Q'. I
would expect 'D', UNKNOWN, 'F', UNKNOWN, 'P', 'Q'.

Also why is it when we change it to

use Devel​::Peek;
map {
  Dump($_);
(%^H=('M','N'));
}
(
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = ()
);

Why do we get​:

SV = PV(0x233dd80) at 0x233cfc8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x234cd20 "D"\0
  CUR = 1
  LEN = 16
SV = PVMG(0x23951e0) at 0x233d178
  REFCNT = 2
  FLAGS = (SMG,RMG,POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x235f200 "N"\0
  CUR = 1
  LEN = 16
  MAGIC = 0x2363900
  MG_VIRTUAL = &PL_vtbl_hintselem
  MG_TYPE = PERL_MAGIC_hintselem(h)
  MG_LEN = -2
  MG_PTR = 0x23532f8 => HEf_SVKEY
  SV = PV(0x233de90) at 0x23532f8
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x236a500 "M"\0
  CUR = 1
  LEN = 16
SV = PV(0x233de80) at 0x23532e0
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x2357ed0 "F"\0
  CUR = 1
  LEN = 16
SV = UNKNOWN(0xff) (0x23533b8) at 0x23532f8
  REFCNT = 1
  FLAGS = ()
Attempt to free unreferenced scalar​: SV 0x23532f8 at t_refcount.pl line 4.

And why is it that when I remove the Dump we end up with a segfault
which we dont see with the Dump call?

use Devel​::Peek;
map {
(%^H=('M','N'));
}
(
  (%_) = (
  'D', 'E',
  'F', 'G'
  ),
  (%_) = ()
);

PERL_HASH_SEED=0xc717eb9fe97a9788 perl t_refcount.pl
Segmentation fault

Some of this is easily explained by stack-not-refcounted, but some of
it looks really strange.

How does the return of the the map end up polluting the stack that we
are processing? Why does it only segfault when we remove the call to
Dump()?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2016

From @iabyn

On Mon, Dec 26, 2016 at 10​:18​:05PM +0100, demerphq wrote​:

An obvious solution to this would be to refcount the stack. However
even if we assume this is possible (I am doubtful), it would probably
be slow. I seem to recall that MS did some studies with VB and
refcount manipulation of stack arguments can account for 25% of run
time, which is why dotNet is garbage collected.

But in practice large chunks of pp code mortalise the values they push on
the stack to make them last long enough to be usable for the next op, but
without leaking. So we normally encounter the cost of manipulating the
ref count anyway, with the added costs of checking/extending the mortals
stack as well as the args stack, and pushing the SVs onto both stacks.

So making the args stack ref counted would quite likely be a net
performance gain (assuming we got rid of the now superfluous
mortalisation), in addition to fixing all the bugs.

The insanely difficult bit is that we currently have nearly 30K lines of
code in the pp functions which all assume they can just do SP-- or SP -= n
without worrying about what's on the stack. Or to croak with rubbish still
in the stack, We would either have to review and modifiy *all* that code,
or come up with a scheme whereby individual pp functions can be marked as
"refcount ready"

My
understanding is it is rare to unheard of for real code to tickle
stack-not-refcounted bugs, it only happens when people do crazy stuff
that nobody would do in real code anyway.

It's quite easy to tickle stack-not-refcounted issues. E.g.

  @​a = (1,2,3);
  for (@​a, 4) {
  @​a = ();
  print $_;
  }

which croaks with

  Use of freed value in iteration at ...

or

  sub f {
  @​a = ();
  print $_[0];
  }

  @​a = (1,2,3);
  f(@​a);

which prints nothing, but can easily print garbage if the freed SV happens
to get reallocated.

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
  -- Larry Wall

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2016

From @demerphq

On 26 December 2016 at 22​:44, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Dec 26, 2016 at 10​:18​:05PM +0100, demerphq wrote​:

An obvious solution to this would be to refcount the stack. However
even if we assume this is possible (I am doubtful), it would probably
be slow. I seem to recall that MS did some studies with VB and
refcount manipulation of stack arguments can account for 25% of run
time, which is why dotNet is garbage collected.

But in practice large chunks of pp code mortalise the values they push on
the stack to make them last long enough to be usable for the next op, but
without leaking. So we normally encounter the cost of manipulating the
ref count anyway, with the added costs of checking/extending the mortals
stack as well as the args stack, and pushing the SVs onto both stacks.

So making the args stack ref counted would quite likely be a net
performance gain (assuming we got rid of the now superfluous
mortalisation), in addition to fixing all the bugs.

The insanely difficult bit is that we currently have nearly 30K lines of
code in the pp functions which all assume they can just do SP-- or SP -= n
without worrying about what's on the stack. Or to croak with rubbish still
in the stack, We would either have to review and modifiy *all* that code,
or come up with a scheme whereby individual pp functions can be marked as
"refcount ready"

My
understanding is it is rare to unheard of for real code to tickle
stack-not-refcounted bugs, it only happens when people do crazy stuff
that nobody would do in real code anyway.

It's quite easy to tickle stack-not-refcounted issues. E.g.

@&#8203;a = \(1\,2\,3\);
for \(@&#8203;a\, 4\) \{
    @&#8203;a = \(\);
    print $\_;
\}

which croaks with

Use of freed value in iteration at \.\.\.

or

sub f \{
    @&#8203;a = \(\);
    print $\_\[0\];
\}

@&#8203;a = \(1\,2\,3\);
f\(@&#8203;a\);

which prints nothing, but can easily print garbage if the freed SV happens
to get reallocated.

Ok, color me convinced. 30k lines of pp code sounds less like a
problem than all the XS code that does the same.

Do you have any insight on the other issues I mentioned?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2016

From @Leont

On Mon, Dec 26, 2016 at 10​:44 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Dec 26, 2016 at 10​:18​:05PM +0100, demerphq wrote​:

An obvious solution to this would be to refcount the stack. However
even if we assume this is possible (I am doubtful), it would probably
be slow. I seem to recall that MS did some studies with VB and
refcount manipulation of stack arguments can account for 25% of run
time, which is why dotNet is garbage collected.

But in practice large chunks of pp code mortalise the values they push on
the stack to make them last long enough to be usable for the next op, but
without leaking. So we normally encounter the cost of manipulating the
ref count anyway, with the added costs of checking/extending the mortals
stack as well as the args stack, and pushing the SVs onto both stacks.

So making the args stack ref counted would quite likely be a net
performance gain (assuming we got rid of the now superfluous
mortalisation), in addition to fixing all the bugs.

I also tend to think less mortalization will help performance, but wouldn't
dare making predictions on gains and losses until we can measure this.

The insanely difficult bit is that we currently have nearly 30K lines of
code in the pp functions which all assume they can just do SP-- or SP -= n
without worrying about what's on the stack. Or to croak with rubbish still
in the stack, We would either have to review and modifiy *all* that code,
or come up with a scheme whereby individual pp functions can be marked as
"refcount ready"

I imagined we'd have a bunch of macros that communicate intent, and those
get translated to the right thing depending on whether or not refcounted
stack is enabled.

Leon

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2016

From zefram@fysh.org

demerphq wrote​:

                                                       My

understanding is it is rare to unheard of for real code to tickle
stack-not-refcounted bugs, it only happens when people do crazy stuff
that nobody would do in real code anyway.

On the contrary, stack refcounting bugs have been encountered in real,
organic code on several occasions. Look at the tickets in RT under the
stack refcounting meta ticket.

              but why do we see 'D'\, 'P'\, 'F'\, 'Q'\, 'P'\, 'Q'\. I

would expect 'D', UNKNOWN, 'F', UNKNOWN, 'P', 'Q'.

The memory used by the prematurely-freed 'E' and 'G' was reused to store
the 'P' and 'Q', as you can see from the Dump output. They had been
UNKNOWN for a brief intermediate period, from the premature free to
the reuse. This failure mode is normal and expected​: it's an inherent
possibility for free-too-soon bugs.

Also why is it when we change it to
...
Why do we get​:

More memory reuse, transforming the prematurely-freed 'E' into $^H{M},
and by chance not reusing the prematurely-freed 'G'. Unsurprising.

And why is it that when I remove the Dump we end up with a segfault
which we dont see with the Dump call?

The segv arises from sv_clear() getting called on an UNKNOWN, which
should never happen. An assertion fails in a debugging build, causing
an abort. In a non-debugging build (not checking the assertions),
the SV type range checks have the effect that sv_clear() believes the
SV must have an SvMAGIC() field. Attempting to read that yields a wild
pointer, because the field was never allocated, so the code walking the
magic chain quickly gets to unmapped memory.

This too is an unsurprising failure mode. Generally, premature freeing
has effects that amount to memory corruption, and all the resulting
failure modes are in operation. There's no effective way to constrain
the process once corruption has set in.

As for why removing the Dump() specifically invokes this failure mode,
I don't care to investigate in such detail. It suffices to chalk it up
to chance​: once the memory is corrupted, different op sequences result
in different erroneous outcomes, in a manner that is difficult and
unrewarding to predict.

How does the return of the the map end up polluting the stack that we
are processing?

It doesn't. The corruption initially results from the premature freeing
of SV structures referenced by the stack, while the stack content is being
established before the map iteration is invoked. It's really early on,
and from that point onwards the perl runtime is doomed. The exact effects
depend on what operations are performed on the corrupted SVs and what the
SV structures have been overwritten with by the time of those operations.
Both of these are influenced by the code within the map block​: that code
performs operations on whatever's referenced by the stack, and it can
by chance reallocate the freed memory, with the effect of overwriting
the corrupted SV structures repeatedly.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2016

From @iabyn

On Mon, Dec 26, 2016 at 11​:41​:52PM +0100, demerphq wrote​:

Ok, color me convinced. 30k lines of pp code sounds less like a
problem than all the XS code that does the same.

Dealing with XS in the short term is fairly trivial. Have a flag on each
XS CV indicating that it knows about a reference-counted stack. In
pp_entersub(), when calling an XS sub that hasn't got that flag (which
initially is all of them), Copy() the sub's args, so that there are two
copies of all the sub's args on the stack - a reference-counted set which
the XS sub doesn't know about, and a second set which aren't
reference-counted, and the XS sub sees. On return, bump the reference
count of the returned args, decrement the reference count of the passed
args, then shuffle the returned args down the stack.

Do you have any insight on the other issues I mentioned?

I haven't looked at it yet. Anything involving %^H makes my eyes glaze
over.

--
Modern art​:
  "That's easy, I could have done that!"
  "Ah, but you didn't!"

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2016

From @iabyn

On Tue, Dec 27, 2016 at 12​:20​:41AM +0100, Leon Timmermans wrote​:

I imagined we'd have a bunch of macros that communicate intent, and those
get translated to the right thing depending on whether or not refcounted
stack is enabled.

But the tricky bit is seeing whether pp functions converted so far have
been converted correctly, since it's not possible to build and test
a refcounted-count-enabled perl until *all* functions have been fixed.

--
Please note that ash-trays are provided for the use of smokers,
whereas the floor is provided for the use of all patrons.
  -- Bill Royston

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2016

From @demerphq

On 27 December 2016 at 12​:24, Zefram <zefram@​fysh.org> wrote​:

demerphq wrote​:

                                                       My

understanding is it is rare to unheard of for real code to tickle
stack-not-refcounted bugs, it only happens when people do crazy stuff
that nobody would do in real code anyway.

On the contrary, stack refcounting bugs have been encountered in real,
organic code on several occasions. Look at the tickets in RT under the
stack refcounting meta ticket.

Well, thanks to you and Dave I know understand this a lot better.

Cheers,
Yves

@p5pRT
Copy link
Author

p5pRT commented Jan 1, 2017

From @Leont

On Tue, Dec 27, 2016 at 12​:24 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

Dealing with XS in the short term is fairly trivial. Have a flag on each
XS CV indicating that it knows about a reference-counted stack. In
pp_entersub(), when calling an XS sub that hasn't got that flag (which
initially is all of them), Copy() the sub's args, so that there are two
copies of all the sub's args on the stack - a reference-counted set which
the XS sub doesn't know about, and a second set which aren't
reference-counted, and the XS sub sees. On return, bump the reference
count of the returned args, decrement the reference count of the passed
args, then shuffle the returned args down the stack.

It helps that the most common case (CODE instead of PPCODE, and fixed
length argument list) can easily be handled in ExtUtils​::ParseXS.

Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants