New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIG_PENDING_DIE_COUNT kills program in new situation #14233
Comments
From d.white@imperial.ac.ukCreated by d.white@imperial.ac.ukThis is a bug report for perl from d.white@imperial.ac.uk, ----------------------------------------------------------------- This code has worked reliably across multiple Perl versions on "Maximal count of pending signals (120) exceeded" inside Thread::Queue line 70. This is in the dequeue() We have rebuilt our own Perl 5.18.2 experimental version with # define SIG_PENDING_DIE_COUNT 520 and invested approximately 8 hours of time in experiments on May I request that this code, the rationale for which was described in: www.nntp.perl.org/group/perl.perl5.porters/2006/12/msg119236.html [in summary: added in Dec 2008 to Perl 5.8.8 to fix a specific bug on OS/2, makes no sense now and should be altered, removed, or reimplemented, because One possible minimally intrusive solution would be to turn the limit from Best Wishes Perl Info
|
From @jkeenanOn Wed Nov 12 07:09:58 2014, d.white@imperial.ac.uk wrote:
1. Can you suggest any way to reproduce this problem outside of your customized environment? 2. Have you ruled out the possibility that the problem lies in the version of Ubuntu you are using (which I also use) as distinct from Perl? If so, how have you made that rule-out? 3. Is the 'perl -V' output attached to your bug report from the machine where you observed the problems? If not, can you provide that output? Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @LeontOn Wed, Nov 12, 2014 at 4:09 PM, via RT <perlbug-followup@perl.org> wrote:
Thread::Queue doesn't use signals. It would be helpful to strace your
It's just as arbitrary as the previous value. Odds are it would trigger
Probably.
There is some precedent for such things (such as the PERL_SIGNALS Leon |
From d.white@imperial.ac.ukDear James, On 13/11/14 01:41, James E Keenan via RT wrote:
I was thinking about that, I might be able to whip up a test version
That is, of course, totally possible at several levels. - Perl 5.18.2 as packaged by Ubuntu has many Ubuntu-specific patches. - Underneath Perl itself, gcc could compile Perl slightly differently I should mention that all the PCs in the tests (the controller and I would still argue that the presence of an arbitrary 8 year old limit
Yes, it is from the machine where we observed the problem. I checked
Thanks for getting in touch so quickly. cheers |
From @craigberryOn Thu, Nov 13, 2014 at 12:18 PM, Duncan White <d.white@imperial.ac.uk> wrote:
Yep: http://perl5.git.perl.org/perl.git/commit/2563cec55ae473562ff3ccda41cd10289db419be?f=mg.c I agree it's not nice to have these guessed-at, hard-coded numbers, FWIW, in general, newer Perls should have fewer pending signals A quick look at threads::shared (used by Thread::Queue) indicates that So I think the real question here is what causes a lot more signals to If you are stuck building your own Perl from source for now, you ./Configure -Accflags=-DSIG_PENDING_DIE_COUNT=N where N is 520 or whatever number works for you. |
From d.white@imperial.ac.ukDear all, On 14/11/14 00:51, Craig Berry via RT wrote:
Understood.
I agree we need to investigate this further. We are working on a
That's useful info, we built a vanilla 5.18.2 perl with the #define I'll send more info when I have it. cheers |
From d.white@imperial.ac.ukI wrote:
Update: We have done a number of additional tests and investigations, http://search.cpan.org/dist/perl-5.16.0/pod/perldelta.pod "system now temporarily blocks the SIGCHLD signal handler, to prevent https://rt.perl.org/Public/Bug/Display.html?id=105700 talks of system() returning -1 [which we see, btw] and blocking signals. In more detail: - we have built and used several Perl 5.18.2s (Ubuntu pkg build; - we have built a vanilla Perl 5.14 and CANNOT get that to show the bug, - we are just building a vanilla Perl 5.16 and will test it shortly; - we have built several cutdown versions of our code, all of which - finally, I have built yet-another Perl 5.18.2 with a hack patch I cheers |
From d.white@imperial.ac.ukDear all, On 19/11/14 13:34, I (Duncan White) wrote:
I should stress here that this means that our earlier belief that the
I spent much of this afternoon running tests using our vanilla Perl Thus, our current position is that Perl 5.14 does NOT suffer the bug, https://rt.perl.org/Public/Bug/Display.html?id=105700 (introduced in Perl 5.16) may be the cause of the changed behaviour. We will continue to experiment with our cutdown versions, attempting cheers |
From @LeontOn Wed, Nov 19, 2014 at 2:34 PM, Duncan White <d.white@imperial.ac.uk>
If a signal is blocked, it will not be delivered to perl in the first place Also, shortly after the system() call the delayed signal handler should run Can you «strace -e signal» your program? That should tell you more about Leon |
From @craigberryOn Wed, Nov 19, 2014 at 11:17 AM, Leon Timmermans <fawaka@gmail.com> wrote:
I wonder if the mix of signals and threads is involved. We're Just thinking out loud and haven't really analyzed anything. |
From @LeontOn Wed, Nov 19, 2014 at 7:28 PM, Craig A. Berry <craig.a.berry@gmail.com>
sigprocmask is unspecified in a multithreaded program. Signal masks are per
A process targetted signal (which is almost any signal except faults and Actually, that may well be the issue here. The child threads generate Not sure how many threads the OP has, but
Threads and signals are a problematic combination. You really don't want to Leon |
From d.white@imperial.ac.uk
Updates of our investigations, plus answering several points Leon and - Leon suggested we run strace -e signal on our code, done, attached. - Leon suggested that signals and threads may interact badly. While - Craig asked about how many threads I have, yes, typically 40-80, - important new info: Leon suggested that we compile a Perl 5.18.2 with We've run over 1000 "dump all machines" cycles under conditions of This is substantial evidence that somehow the (sensible) block- - To summarise the evidence so far: - we have NOT SEEN the "more than 120 pending signals" bug - a vanilla Perl 5.16 build DOES show this bug. - only one version of perl 5.18.2 does NOT SHOW this bug - the is - every OTHER version of perl 5.18.2 DOES show this bug. - The second strand of investigation is cutting down our exam-control - The final strand of investigation is that I have built a version of My "working conclusion" is still that the 120 pending signals patch is cheers |
From @LeontOn Fri, Nov 21, 2014 at 4:33 PM, Duncan White <d.white@imperial.ac.uk>
Not much more than confirming what we already suspected :-/
I didn't mean to suggest this wasn't a serious issue, we do take
Yes, that is very relevant. This is substantial evidence that somehow the (sensible) block-
Agreed. My hypothesis is that it's because of signals being delivered to
I have attached a tiny program that shows exactly this problem. The
Could be interesting. The count is per thread, so some kind of thread id Leon |
From d.white@imperial.ac.ukDear Leon, On 21/11/14 17:44, Leon Timmermans wrote:
sure:-)
I wasn't criticising, I just wanted to disspell the "here be dragons"
Fascinating! could well be right.
Impressive! now that's a proper test case, showing the problem quickly!
Umm, as yet I don't know how (in mg.c) to access the Perl thread id, I now have some time series data, the (utterly crap) diagnostic patch But I think the data is sound, and shows the "magnification effect"
Good luck tracking things down, I'm off home now. I'll intermittently cheers |
From @LeontOn Fri, Nov 21, 2014 at 6:44 PM, Leon Timmermans <fawaka@gmail.com> wrote:
Except that I had already removed that, since it didn't prove to be Leon |
From @craigberryOn Fri, Nov 21, 2014 at 12:42 PM, Duncan White <d.white@imperial.ac.uk> wrote:
I worked my way up to a $count of 500 with 100 threads on both Mac OS |
From @LeontOn Fri, Nov 21, 2014 at 7:42 PM, Duncan White <d.white@imperial.ac.uk>
This whole class of issues is fairly "here be dragons" if you ask me.
Getting the perl-level tid is a bit tricky I'm afraid, but printing aTHX I now have some time series data, the (utterly crap) diagnostic patch
You're doing buffered IO in a signal handler, not-crashing 1 out of 5 times
Masking the signal in the main thread appears to make the problem go away, Leon |
From @LeontOn Sat, Nov 22, 2014 at 12:03 AM, Craig A. Berry <craig.a.berry@gmail.com>
If they don't prefer delivery to the main thread, then will not see this Actually this might offer a clean way out. The original problem existed on
There are both per-process and per-thread signal queues but nothing else is Leon |
From d.white@imperial.ac.ukDear Leon, all, On 21/11/14 23:11, Leon Timmermans wrote:
Fair enough:-) Guess I was pretty lucky (only 7 years or more) up to
I'm afraid I didn't have time to try adding this into the diag patch.
Absolutely:-)
I wanted to check - I wasn't really sure whether you guys were needing I have a couple of small new things to add: I explained this perl bug #! /usr/bin/env perl On my machine, this reliably crashes almost instantly. I may Another version of this, only a few lines longer, made the signal #! /usr/bin/env perl At first, I forgot to make $count ": shared" (and use threads::shared), Of course, when I made it : shared, the only change was that the I wondered whether this tells us anything new about signal The final thing to say is that I have worked round my production- |
Migrated from rt.perl.org#123188 (status was 'open')
Searchable as RT123188$
The text was updated successfully, but these errors were encountered: