New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perl5.18 segfault after wait() when system don't have much free memory #13962
Comments
From efimov@reg.ruThe following code segfaults under FreeBSD 10 1) Can't locate object method "Dump" via package "Data::Dumper" 2) Attempt to free unreferenced scalar: SV 0x801d18ea0, Perl 3) Attempt to free nonexistent shared string 'usevendorprefix', Perl 4) Not a CODE reference at ./gemu-test-worker.pl line 1311, <STDIN> line 5) END failed--call queue aborted at ./gemu-test-worker.pl line 1311, <STDIN> 6) Core dump stack: ... 7) or evidences of data corruption like %INC entries replaced with empty strings ===
|
From @iabynOn Mon, Jun 30, 2014 at 11:23:40AM -0700, Victor Efimov wrote:
I can't reproduce this on linux, so what follows is just speculation. -- |
The RT System itself - Status changed from 'new' to 'open' |
From victor@vsespb.ruOn Tue Jul 01 04:13:40 2014, davem wrote:
yes, I was unable to reproduce it on Linux.
today, probably after VM reboot, it was very hard to reproduce Segfault, but I adjusted script a bit and here it is (perl5.16 clang) ==== $ truss -o /tmp/truss.out perl 4.pl for (1..1000) { print Dumper \%INC; ==== full truss (strace) output attached. I tried truss previously - it was same: wait() and then segfault. today I saw also when print Dumper { a => 42} $VAR1 = { and sometimes $VAR1 = { (i.e. corrupted output). seen this previously too. |
From victor@vsespb.ru$ cat /tmp/truss.out |
From lsim42@gmail.comHi, The original author was correct to say that perl with a large data set I have also found that the problem exist *only* on FreeBSD amd64 systems On amd64, I've installed different versions of perl and all had the same I have also written a small C problem to mimic the perl script and it runs The core dump occurs when the child dies. This also seems to corrupt the I have posted this problem with FreeBSD and am awaiting their response. Thanks -- |
From Mark.Martinec@ijs.sisim, I'm glad that I'm not alone with this problem. Have been experiencing
The above list *exactly* matches my experience.
Please provide a link. So here are my findings. Let me start with a slightly redacted copy ======> First we saw these with perl 5.20 from ports (which was built with Next we downgraded to 5.18 from ports, which just shifted the Then I built 5.18 from ports, this time without ithrteads. Next I built 5.18 with -O0, still with clang 3.3. I guess I should try to build perl with gcc now, haven't The same application was working flawlessly under 5.18 (and earlier The perl program (amavisd + SpamAssassin) consists of a master I have tried to eliminate the use of most of perl modules One thing I tried is to add: use sigtrap qw(stack-trace BUS SEGV EMT FPE ILL SYS TRAP); in the program, and run the master process attached to a Just wanted to let you know that we may be experiencing the Now for my more recent findings. Crashes occur on FreeBSD 10.0 (amd64), but not on older OS versions. Perl crashes regardless of the version of perl (5.18. 5.20, 5.21), Have been running with malloc checks enabled for a long time I tried to run SpamAssassin tests with valgrind, which didn't show The parent process (master amavisd process) doesn't do much This seems to be happening: at some point during operation I call this state of the master process 'a primed' state From this point on, all newly forked child processes are Restarting the master process solves the problem for a while I have collected perl core dumps (with debugging symbols) of: Have browsed a bit through these core dumps, although I'm not What I can say is that a 'tainted' child process can crash just Unfortunately I'm unable to cause the problem at will. It just |
From sim@s1.uniqstats.comHi Mark, Thanks for your reply. It was very painful just reading your email But I think I've managed to side step the bug for my applications. But firstly, the critical components that will cause the core dump! In the past, my application will run for 4 or more hours before core dumps <<<<<<<<<< start of code >>>>>>>>>> use strict; # ----------------------------------------------------------------------------- ## Just to ensure that child uses VM # Causes coredump on dell1 1G mem, (10.0-RELEASE-p3 amd64) perl v5.16.3 # NO coredump! on cs 1G mem, (10.0-RELEASE-p3 amd64) perl v5.14.4 # NO coredump! on s1 2G mem, ( 6.2-RELEASE-p7 i386 ) perl v5.8.8 # NO coredump! on cs1 1G mem, (10.0-RELEASE-p9 i386 ) perl v5.16.3 # ----------------------------------------------------------------------------- For the code dump to happen, it needs all of the following conditions: As you can see, this pretty covers most seriously used perl programs. I've tested on i386 versions and the bug is not there. It has something to do with perl's pseudo fork model, perl modules (via 'use' For my applications, in the past I had a parent continously forking out I've had to change this to a parent forking out <NCPU> child processes As you can see, I skip the wait() call for the parent and for the I've tried swapping over to use threads in perl - it works but performance <<<<<<<<<< start of p1 code >>>>>>>>>> use strict; # ----------------------------------------------------------------------------- ## <Process the Job> run(2); So we run in shell: <mark that child is alive> sync_wait_for_child_end The other trick is to let a non-perl (eg bash) script/program to I had to use my approach as the application is actually quite I've stressed tested my new code for days now and it has not core dump Well best wishes to you. Will try to reply when I have time See you Some small comments below
Gave up - no reply - might re-send to Fbsd perl mailing list later.
The death of the child when calling 'exit()', triggers the corruption in the parent,
Yup!
This is the first real bug that I see in perl that even tempted me to -- -- -- -- |
From @iabynOn Sat, Oct 18, 2014 at 03:39:17AM +1100, L.O.Sim wrote:
Can you (or someone else with the ability to reproduce this) please run Also if possible, run it under valgrind, or clang with Address Sanitizer, -- |
From Mark.Martinec@ijs.si
2014-10-20 12:38, Dave Mitchell via RT wrote:
Thanks Dave for your interest and concern in this matter. Meanwhile our mail filtering application had one particularly This only confirms that the problem does not originate from some xs code The few experiments that I have run under valgrind did not result Mark |
From lsim42@gmail.comHi all, The saga grinds on :-) Sim On 4 Nov 2014, at 7:30 am, Mark Martinec via RT <perlbug-followup@perl.org> wrote:
|
From victor@vsespb.ruPossibly related https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122868 |
From Mark.Martinec@ijs.siAfter upgrading to FreeBSD 10.1 (from 10.0) and running the same application I can also confirm that the problem under 10.0 can be avoided by A brief summary of the problem: Setup: an application consisting of a master perl process spawning worker Environent: What seems to be happening: So it seems the problem is somehow connected with how FreeBSD 10.0 |
From Mark.Martinec@ijs.siThe problem is definitely specific to version 10.0 of FreeBSD, I can attest that the bug does not occur in versions 10.1 and So I guess this ticket can be closed, seems the bug is not |
@iabyn - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#122199 (status was 'rejected')
Searchable as RT122199$
The text was updated successfully, but these errors were encountered: