Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For loop fails to CATCH exception on 208th iteration in Rakudo #3759

Closed
p6rt opened this issue Mar 27, 2015 · 6 comments
Closed

For loop fails to CATCH exception on 208th iteration in Rakudo #3759

p6rt opened this issue Mar 27, 2015 · 6 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Mar 27, 2015

Migrated from rt.perl.org#124191 (status was 'resolved')

Searchable as RT124191$

@p6rt
Copy link
Author

p6rt commented Mar 27, 2015

From @masak

<[Tux]> Something recent broke Text​::CSV :(
<jnthn> [Tux]​: How does it fail?
* [Tux] is investigating
<[Tux]> if I isolate the test, the test passes
<[Tux]> If anyone feels like digging​:
https://github.com/Tux/CSV/blob/master/t/55_combi.t#L46 <= is the line
that fails
<[Tux]> https://gist.github.com/Tux/7d23f5478dc962699f8f <= REPRODUCED!
<[Tux]> please fix :)
<[Tux]> memory leak or something?
<[Tux]> jnthn, ^^
<jnthn> Doubt it's a leak, could be corruption
<jnthn> Either way, a good find
<[Tux]> tests++
<jnthn> I'm working on $other-hard-problem at the moment
<jnthn> But will put it on my todo list :)
<masak> surely that can be golf'd a bit?
* masak tries
<masak> reproduced locally.
<masak> and at exactly the same point in the loop.
<masak> seems very stable that way.
<masak> [Tux]​: I managed to golf it quite a bit.
<masak> m​: class C is Exception { method message { "OH NOES" } }; for
^208 { { die C.new; CATCH { default {} } }; print "." }
<camelia> rakudo-moar fb991b​:
OUTPUT«...............................................................................................................................................................................................................OH
NOES␤ in block [...]
<masak> m​: my $dots = 0; class C is Exception { method message { "That
is $dots dots" } }; for ^208 { { die C.new; CATCH { default {} } };
$dots++ }
<camelia> rakudo-moar fb991b​: OUTPUT«That is 207 dots␤ in block [...]
<masak> so it *works* for 207 iterations, and then -- GC triggers?
JIT? optimizer? -- it fails on the 208th.
* masak submits rakudobug
<arnsholt> Weird!
<masak> Tux says something recent caused the above failure. I'd really
appreciate it if someone wanted bisect (maybe a month back or so).
<[Tux]> last PASS was yesterday or wednesday
<[Tux]> I rebuild rakudo daily
<jnthn> There was a Moar version bump yesterday, for the first time
since the 2015.03 release.
<jnthn> -2015.03
<jnthn> +2015.03-16-ga752064
<masak> oh, that's a prime candidate, then.
<jnthn> Curiously, the only spesh patches I see in the list change the
thresholds for when dynamic opt kicks in.
<jnthn> Meaning the problem in theory woulda been noticable *sooner* before...

@p6rt
Copy link
Author

p6rt commented Mar 29, 2015

From @masak

masak (>)​:

<masak> Tux says something recent caused the above failure. I'd really
appreciate it if someone wanted bisect (maybe a month back or so).

I did a bisect, which revealed a number of things. All of the findings below can be reproduced by the following commands​:

  $ git clone git@​github.com​:rakudo/rakudo
  $ cd rakudo
  $ git checkout <SHA-1 to check out>
  $ perl Configure.pl --gen-moar --gen-nqp --backends=moar
  $ make
  $ make install
  $ install/bin/perl6 -e 'my $dots = 0; class C is Exception { method message { "$dots" } }; for ^20_000 { { die C.new; CATCH { default {} } }; print "."; $dots++ }'

First off, here's the offending commit. Its parent runs all the way undisturbed, but this commit dies after 28 iterations​:

commit a398910b4a6a1bdd42cd5a3cc33d8dfad0e3501f
Author​: jnthn <jnthn@​jnthn.net>
Date​: Thu Jul 31 00​:32​:56 2014 +0200

  Better handling of sink.
 
  On MoarVM, this leads to a smaller QAST trees and much better code
  generation. On JVM and Parrot, we get the first advantage, and we
  could improve their code-gen in the future too. Note the Parrot bit
  of this is untested due to being unable to build NQP Parrot on Win32
  at present; the JVM patch worked right off, though, and there's very
  little reason fro the Parrot one not to. Testing welcome.

src/Perl6/Actions.nqp | 29 ++---------------------------
src/vm/jvm/Perl6/Ops.nqp | 27 +++++++++++++++++++++++++++
src/vm/moar/Perl6/Ops.nqp | 38 ++++++++++++++++++++++++++++++++++++++
src/vm/parrot/Perl6/Ops.nqp | 27 +++++++++++++++++++++++++++
4 files changed, 94 insertions(+), 27 deletions(-)

What's interesting is that the commit is not an NQP/Moar bump, it's just the addition of a new op. So chances are the commit uncovered an error that was already present.

I tried reverting this commit on Rakudo HEAD (c86f75), and the resulting commit doesn't have the bug. I pushed the result to the branch 'revert-bho-sink' on Github, in case someone wants to merge this into 'nom'. (Which is my recommendation.) I haven't spectested the new commit.

Next up, the following commit changes the number of iterations from 28 to 32​:

commit a1a236067b805961e742e1b51fa2ffbc274f90c4
Author​: jnthn <jnthn@​jnthn.net>
Date​: Sat Aug 2 01​:29​:16 2014 +0200

  Get latest NQP and MoarVM improvements.

tools/build/NQP_REVISION | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

This commit pulls in the following NQP commits (reverse chronological order)​:

183e611 Use new MoarVM static block lexicals support.
9268ba7 Bump to latest MoarVM, with many improvements.
c3f29b6 Bring test in line with behavior change.
aa8a615 Small optimizations to .symbol.

And that 9267ba7 bump commit pulls in the following Moar commits​:

f25affb MAST nodes can be identified by exact type.
e64c5eb Read in static lexicals.
c0984eb Write static lex values; read but don't apply them
ac33547 Update MAST​::Frame to hold static lex values.
9ba5d15 No longer need to support Parrot cross-compiler.
c65b2a6 Spec static lexical values table in bytecode.
cdda218 Switch on lazy frame deserialization.
0098c0c Preparations for lazy frame deserialization.
0043778 Split out part of frame deserialization.
985cd8b Bump minimum bytecode version to 2.
49f19ca Tweak spesh log run count.
8df127a Merge remote-tracking branch 'origin/split_get_use_facts'
b57061e Ensure OSR-triggered optimize is used next invoke.
f55e682 De-virtualize serialization write functions.
6da5b90 De-virtualize read_var_int.
9a3a96d Bump minimum serialization format version.
e92aa36 De-virtualize most reader functions.
b6a9cad Fix an uninitialized variable bug.
be8cfdf fix teh build
9d377a3 split get_facts and use_facts from get_and_use_facts.

Beyond noting that there are several spesh-related changes in there, I don't find anything particular in that list that stands out as a suspect.

Later on, another bump commit changes the number of iterations from 32 to 207​:

commit dd131050268bc63044868689df487fcc47e841de
Author​: jnthn <jnthn@​jnthn.net>
Date​: Thu Mar 26 22​:59​:35 2015 +0100

  Bump NQP_REVISION to get latest MoarVM.
 
  This brings both lazy deserialization and tuning of the spesh/JIT
  thresholds, both improving startup time.

The commit pulls in only one NQP commit, which in turn pulls in these Moar commits​:

a752064 Bump OSR theshold also.
e2e908b Tweak dynamic optimization thresholds.
b5d57ae mark existspos as pure
5ef6456 Merge pull request #​192 from fperrad/ar
bba0d8c Configure​: add an option --ar
375f647 Merge pull request #​191 from fperrad/patch-2
7955cee fix for ARMv5
01be8e3 First steps to handling bit-packed arrays.
74c1982 Fix a possible buffer-overrun in existspos.
0c95c4b Add type codes for bit-packed array sizes.
aa450df Re-enable lazy deserialization.
b26111b fix thinko
a938e5d Add a README explaining how to macport
cc1dc8b more macport updates;
226db54 Whoops. use checksums from correct version!
6d8f3f1 updates for new release.

This one is easier to explain. One of the optimization thresholds is changed from 25 to 200 in e2e908b. 25+7 == 32 and 200+7 == 207. So a decent guess is that an optimization kicks in after 207 iterations, and creates code that fails to catch the exception.

There, that's a bunch of grunt work done. :) Hopefully minds sharper that mine can look at the a39891 commit and go "ah! that's what's wrong!".

@p6rt
Copy link
Author

p6rt commented Mar 29, 2015

From @masak

<masak> so, I just removed https://github.com/rakudo/rakudo/blob/nom/src/vm/moar/Perl6/Ops.nqp#L460 locally; RT #​124191 is still present afterwards :/
<synopsebot> Link​: https://rt-archive.perl.org/perl6//Public/Bug/Display.html?id=124191
<masak> so it's something deeper, then.
<dalek> rakudo/nom​: ee7a375 | jnthn++ | src/vm/moar/Perl6/Ops.nqp​:
<dalek> rakudo/nom​: Remove dupe register free; masak++.
<dalek> rakudo/nom​: review​: rakudo/rakudo@ee7a3752d9
<masak> oh, jnthn++ beat me to it.
<masak> was still spectesting.
<masak> jnthn​: RT #​124191 is still at large, even after that remove.
<masak> jnthn​: so there is *something* wrong with that code.
<jnthn> masak​: That may fix *something*, but given the generated code works before dynamic optimization kicks in, it's fairly clear we're looking for a problem there.
<masak> jnthn​: if you can't see it immediately, I'd recommend rolling back the original commit that created that op.
<jnthn> masak​: Nope, wrong approach.
<masak> if you say so.
<masak> we have a bug in the nom branch right now, and a way to make it go away until we learn what's going on.
<jnthn> masak​: It's pretty obvious from what we know that the commit in question generates valid code.
<masak> jnthn​: ah, so you're saying we should look at how it's optimized/speshed instead?
<masak> I agree with that, I just don't see why we wouldn't remove the source of the bug in the meantime. creating a more stable nom branch.
<jnthn> masak​: Yes; presumably setting MVM_SPESH_DISABLE=1 makes it go away
* masak tries
<masak> jnthn​: yes, setting MVM_SPESH_DISABLE=1 makes it go away.
<jnthn> masak​: Right, so that narrows down where we're looking quite significantly.
<masak> jnthn​: not gonna argue with you about reverting the commit -- if you prefer that it stay, it stays. but how do we find the optimize/spesh bug?
<jnthn> masak​: Well, looking at the spesh log for one
<jnthn> masak​: Though we might try disabling other bits to narrow it further
<masak> jnthn​: I don't have the skills to do that at present. hoping that someone else finds the tuits to do that.
<jnthn> masak​: There's more env vars
<jnthn> masak​: If you moar --help they're there
<jnthn> masak​: Seems that they do help too...
<jnthn> masak​: Setting MVM_SPESH_INLINE_DISABLE=1 also makes it go away
<jnthn> masak​: Meaning that we can blame it on inlining busting...something.
<jnthn> masak​: Further, you don't need a custom exception type for it
<masak> wow.
<masak> so I could've golfed it further?
<masak> fancy that :)
<jnthn> masak​: Yeah, Exception.new
<masak> m​: for ^207 { die Exception.new(); CATCH { default {} } }
<camelia> rakudo-moar d24f30​: ( no output )
<masak> m​: for ^208 { die Exception.new(); CATCH { default {} } }
<camelia> rakudo-moar d24f30​: OUTPUT«␤ in block at /tmp/ZLP2WH1C3n​:1␤␤»
<masak> jnthn++
<jnthn> masak​: I've a feeling it is a much more general issue.
* masak adds the conversation so far to the ticket

@p6rt
Copy link
Author

p6rt commented Jun 9, 2015

From @jnthn

On Sun Mar 29 12​:52​:28 2015, masak wrote​:

<masak> m​: for ^207 { die Exception.new(); CATCH { default {} } }
<camelia> rakudo-moar d24f30​: ( no output )
<masak> m​: for ^208 { die Exception.new(); CATCH { default {} } }
<camelia> rakudo-moar d24f30​: OUTPUT«␤ in block at
/tmp/ZLP2WH1C3n​:1␤␤»
<masak> jnthn++
<jnthn> masak​: I've a feeling it is a much more general issue.
* masak adds the conversation so far to the ticket

It was a MoarVM dynamic optimization bug involving a wrongly handled interaction between inlining and exception handlers. Fixed it in MoarVM, and added a test in S04-exception-handlers/catch.t.

@p6rt
Copy link
Author

p6rt commented Jun 9, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt p6rt closed this as completed Jun 9, 2015
@p6rt
Copy link
Author

p6rt commented Jun 9, 2015

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt added the Bug label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant