Skip Menu |
Report information
Id: 1564
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: Richard.Hensgens [at] nl.origin-it.com
Cc:
AdminCc:

Operating System: Solaris
PatchStatus: (no value)
Severity: medium
Type: library
Perl Version: (no value)
Fixed In: (no value)



From: "Hensgens, Richard" <Richard.Hensgens [...] nl.origin-it.com>
To: "'perlbug [...] perl.com'" <perlbug [...] perl.com>
Cc: "Zuijdwijk, Pieter" <Pieter.Zuijdwijk [...] nl.origin-it.com>
Subject: FW: Call Nr. 6583771 (!!! HELP !!!)
Date: Mon, 4 Oct 1999 17:22:53 +0200
Download (untitled) / with headers
text/plain 4.6k
L.S., We have encountered a very interesting problem on which you are really our last resort: Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request. We have tried almost everything within our power, e.g.: * compiling Perl on a working OS level and copying the binaries to the non-working OS level, * compiling the current development version (5.005_61), * different GNU compilers (2.8.1 and 2.95.1), * SUN Workshop Compiler C/C++ 4.2, * hacking in 'config.sh' (e.g. 'usevfork=false/true', multithreaded/non-multithreaded). Nothing works out. After issuing a bug report, SUN responded with the following: <<FW: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above. In more understandable language they claimed that older versions of Solaris had a bug, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS , Perl is still working around, but this time unsuccesfully. Does this make sense to you ? Can you help ??? P.S.: Below you can find all mail communications with SUN. If you need additional information, please let us now. Met vriendelijke groet/Kind regards, Richard Hensgens ORIGIN B.V. - Managed Services - Distributed Systems Building VA-171, E-Mail: Richard.Hensgens@nl.origin-it.com Phone: (+31:4027)87097, Fax: (+31:4027)83962 The unix guru's view on sex: # unzip; strip; touch; finger; mount; fsck; more; yes; umount; sleep Show quoted text
> -----Original Message----- > From: Zuijdwijk, Pieter > Sent: Thursday, September 30, 1999 6:44 PM > To: 'dispatch@holland.sun.com' > Cc: Zuijdwijk, Pieter; Hensgens, Richard > Subject: Call Nr. 6583771 > > Hereby the "truss -aef" output of 2 SUN systems running 2 different OS > levels: > > OK files: SunOS ... 5.6 Generic_105181-06 sun4u sparc SUNW,Ultra-4 > NOK files: SunOS ... 5.6 Generic_105181-15 sun4u sparc > SUNW,Ultra-Enterprise > > <<Client.truss.out.NOK>> <<Client.truss.out.OK>> <<Client.pl>> > <<Server.pl>> <<Server.truss.out.NOK>> <<Server.truss.out.OK>>
As you can see we have also problems on 5.6 Generic_105181-15 on Ultra-Enterprise 3000. Not a specific Solaris 7 issue after all. Thanks in advance. Show quoted text
> Pieter Zuijdwijk > Origin TIS-DS-UNIX-SUN > Groenewoudseweg 1 > 5621 BA Eindhoven, The Netherlands > Building VA-169 > Phone +31 (0)40 27 89605 > Fax +31 (0)40 27 89362 >
Show quoted text
-----Original Message----- From: Hensgens, Richard Sent: Tuesday, September 28, 1999 1:09 PM To: Zuijdwijk, Pieter Subject: Bug Solaris 2.7 Pieter, Before we start downgrading the SUN box, maybe first a bug report to SUN ? Regular examples from the O'Reilly Perl books work differently on Solaris 2.6 and Solaris 2.7 with exactly the same Perl versions (5.005_03): Server.pl: #!/usr/bin/perl use IO::Socket; $SIG{CHLD} = sub { wait() }; $Sock = new IO::Socket::INET( LocalPort => 9000, Proto => 'tcp', Listen => SOMAXCONN, Reuse => 1 ) or die "SOCKET() error [$!]"; while ( $NewSock = $Sock->accept() ) { $Pid = fork(); if ( $Pid == 0 ) { while ( defined( $Buffer = <$NewSock> ) ) { print( $Buffer ); } exit( 0 ); } } close( $Sock ); exit( 0 ); Client.pl: #!/usr/bin/perl use IO::Socket; $Sock = new IO::Socket::INET( PeerAddr => 'tsesun01', PeerPort => 9000, Proto => 'tcp' ) or die "SOCKET() error [$!]"; foreach ( 1..10 ) { print( $Sock "Msg $_: How are you ?\n" ); } close( $Sock ); exit( 0 ); Output on Solaris 2.6: nl1sahd1:root> ./Server.pl nl1sahd1:root> jobs [1] + Running ./Server.pl & nl1sahd1:root> ./Client.pl nl1sahd1:root> Msg 1: How are you ? Msg 2: How are you ? Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ? nl1sahd1:root> jobs [1] + Running ./Server.pl & Server serves as many requests as it should be. Output on Solaris 2.7: tsesun01:root> ./Server.pl & [1] 12331 tsesun01:root> jobs [1] + Running ./Server.pl & tsesun01:root> ./Client.pl Msg 1: How are you ? Msg 2: How are you ? tsesun01:root> Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ? [1] + Done ./Server.pl & tsesun01:root> jobs Server only serves one request and ends !!!!!
Download (untitled)
message/rfc822 13.9k
Message-ID: <986AEA765305D311AA7B0008C75D97AFBB8A23@NLEHX020.origimail.origin-it.com> From: "Hensgens, Richard" <Richard.Hensgens@nl.origin-it.com> To: "Hensgens, Richard" <Richard.Hensgens@nl.origin-it.com> Subject: FW: Bug ID# 4146098 Date: Mon, 4 Oct 1999 17:19:14 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" Bug Id: 4146098 Product: sunos Category: network Subcategory: socket Bug/Rfe/Eou: bug State: integrated Development Status: INT Synopsis: connect() and accept() can RESTART instead of returning EINTR Keywords: esc#514623 Severity: 2 Severity Impact: 1 Severity Functionality: 0 Priority: 2 Description: When SA_RESTART is passed to sigaction(), connect() and accept() restart instead of returning with errno EINTR. CONNECT(2) SYSTEM CALLS CONNECT(2) EINTR The connection attempt was interrupted before any data arrived by the delivery of a signal. Sun Release 4.1 Last change: 21 January 1990 3 ============================================================================ = SVR4 example sigaction() is needed to set SA_RESTART. c_test_sys5 <hostname> <portNO> <hostname> is selected that will not respond to connect. kill -ALARM <process id>SVR4 example sigaction() is needed to set SA_RESTART. is sent to process while waiting. /* cc -o c_test_sys5 c_test_sys5.c -lsocket -lnsl Usage: c_test_sys5 <hostname> <portNO> */ #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> #include <netinet/in.h> #include <signal.h> void handler(sig) int sig; { printf("SIGNAL CATCHED\n"); } main(argc, argv) int argc; char **argv; { struct sockaddr_in ser; struct hostent *serhost; int sock; int n; char buf[256]; struct sigaction sa; if(argc != 3){ fprintf(stderr, "Usage: client <hostname-of-server> <portNO>\n"); exit(1); } sa.sa_flags = SA_RESTART; sa.sa_handler = handler; sigemptyset(&sa.sa_mask); sigaction(SIGALRM, &sa, NULL); serhost = gethostbyname(argv[1]); if(serhost == NULL){ fprintf(stderr, "bad hostname\n"); exit(1); } memset((char *)&ser, 0, sizeof(ser)); ser.sin_family = AF_INET; ser.sin_port = atoi(argv[2]); memcpy(&ser.sin_addr, serhost->h_addr, serhost->h_length); sock = socket(AF_INET, SOCK_STREAM, 0); if(sock == -1){ fprintf(stderr, "socket failed\n"); exit(1); } if(connect(sock, (struct sockaddr *)&ser, sizeof(ser)) == -1){ perror("CONNECT"); exit(1); } while(1){ n = read(sock, buf, sizeof(buf)); if(n == 0) break; if(n < 0){ fprintf(stderr, "file read error\n"); exit(1); } write(1, buf, n); } close(sock); exit(0); } Justification: This is the root cause of Escalation # 514623 bug# 4132657, Customer needs a patch for 5.6. Work around: Suggested fix: Diffs are shown below for sparc and x86 (diffs are identical for sparc and sparcv9). The entire set of files changed are: usr/src/lib/libc/i386/sys/_so_accept.s usr/src/lib/libc/i386/sys/_so_connect.s usr/src/lib/libc/sparc/sys/_so_accept.s usr/src/lib/libc/sparc/sys/_so_connect.s usr/src/lib/libc/sparcv9/sys/_so_accept.s usr/src/lib/libc/sparcv9/sys/_so_connect.s note _cerror maps() ERESTART to EINTR ####### usr/src/lib/libc/sparc/sys ###### % diff -c _so_connect.s.1.2 _so_connect.s *** _so_connect.s.1.2 Thu May 22 14:38:48 1997 --- _so_connect.s Fri Jun 5 08:28:00 1998 *************** *** 18,24 **** #include "SYS.h" ! SYSCALL2_RESTART(_so_connect,connect) RET SET_SIZE(_so_connect) --- 18,24 ---- #include "SYS.h" ! SYSCALL2(_so_connect,connect) RET SET_SIZE(_so_connect) % diff -c _so_accept.s.1.2 _so_accept.s *** _so_accept.s.1.2 Thu May 22 14:38:48 1997 --- _so_accept.s Fri Jun 5 08:27:12 1998 *************** *** 19,25 **** #include "SYS.h" ! SYSCALL2_RESTART(_so_accept,accept) RET SET_SIZE(_so_accept) --- 19,25 ---- #include "SYS.h" ! SYSCALL2(_so_accept,accept) RET SET_SIZE(_so_accept) sctesrv 54: ##################### usr/src/lib/libc/i386/sys % diff -c _so_connect.s.1.5 _so_connect.s *** _so_connect.s.1.5 Fri Jun 5 08:59:52 1998 --- _so_connect.s Fri Jun 5 09:01:48 1998 *************** *** 18,25 **** movl $CONNECT,%eax lcall $SYSCALL_TRAPNUM,$0 jae noerror - cmpb $ERESTART,%al - je _so_connect _prologue_ _m4_ifdef_(`DSHLIB', `pushl %eax', --- 18,23 ---- sctesrv 43: diff -c _so_accept.s.1.5 diff: two filename arguments required sctesrv 44: diff -c _so_accept.s.1.5 _so_accept.s *** _so_accept.s.1.5 Fri Jun 5 09:02:12 1998 --- _so_accept.s Fri Jun 5 09:02:53 1998 *************** *** 18,25 **** movl $ACCEPT,%eax lcall $SYSCALL_TRAPNUM,$0 jae noerror - cmpb $ERESTART,%al - je _so_accept _prologue_ _m4_ifdef_(`DSHLIB', `pushl %eax', --- 18,23 ---- State triggers: Accepted: yes Evaluated: yes Evaluation: 4132657 covers the binary compatibility problem. When the sample program from 4132657 is compiled and tested on 5.6, the result is: $ /ws/on297-tools/SUNWspro/SC4.2/bin/cc x.c -lsocket -lnsl $ ./x fade 15000 & 18519 $ kill -ALRM 18519 $ SIGNAL CATCHED CONNECT: Interrupted system call $ wait That is, the behavior is correct. Thus the only problem appears to be the BCP one and this one isn't reproducible. ================================= updated description with reporducable example 1998-06-16 ================================= 1998-07-20 ---------------------------- I thought this fix was being done as part of the escalation process (4132567 and this are essentially the same bug for pre-kernel socket and post-kernel socket source bases...not sure why they got split into two bugs. The fixes are different because of different sources, bugs are not). Will try to fix and test this. The code in Suggested Fix should work. 1998-07-23 ---------------------------- My guess is that this bug got split into 2.5.1 and 2.6-and-later versions since this might be not-quite-easily fixable for 2.5.1 since that would involve changing the restartable nature of getmsg()/putmsg() system calls. The fix here is to make the system calls underlying calls for connect() and accept() interfaces NOT restartable as they currently (and erroneously) are. This makes the behavior compatible to SunOS4.x and also fixes it for SunOS5.x [ The BCP interfaces are implemented using the native OS interfaces, a BCP program just happens to have uncovered this bug ]. The emails in the "Comments" section further clairfy some of the technical background behind this fix. The program in the description section tests only the connect() interface. Test programs with slight modifications were used to test both the connect() and accept() interfaces and those test programs have been added to the attachments. WITHOUT THE FIX, the observed behavior is as follows with output slightly edited for clarity: === % ./accept_test 1234 & [1] 668 Process id is 668 % truss -v all -p 668 accept(3, 0xEFFFF9D4, 0xEFFFF9C0, 1) (sleeping...) ^C% kill -ALRM 668 SIGNAL CATCHED % truss -v all -p 668 accept(3, 0xEFFFF9D4, 0xEFFFF9C0, 1) (sleeping...) Thus accept() call is restarted and continues sleeping. % ./connect_test bobo 1234 & [1] 671 Process id is 671 % kill -ALRM 671 % SIGNAL CATCHED CONNECT: Operation already in progress [1] Exit 1 ./connect_test bobo 1234 The connect() call is restarted and fails with EINPROGRESS ==== WITH THE FIX, the observed behavior is as follows with output slightly edited for clarity: === % ./accept_test 1234 & [1] 4523 Process id is 4523 % kill -ALRM 4523 % SIGNAL CATCHED ACCEPT: Interrupted system call [1] Exit 1 ./accept_test 1234 The accept() call now fails with EINTR even when SA_RESTART is set. % ./connect_test bobo 1234 & [1] 4525 Process id is 4525 % kill -ALRM 4525 % SIGNAL CATCHED CONNECT: Interrupted system call [1] Exit 1 ./connect_test bobo 1234 The connect() call now fails with EINTR even when SA_RESTART is set. ==== Commit to fix in releases: generic, s998_20 Fixed in releases: s998_20 Integrated in releases: s998_20 Verified in releases: Closed because: Incomplete because: Duplicate of: Introduced in Release: Root cause: Program management: Fix affects documentation: no Exempt from dev rel: no Fix affects L10N: no Patch id: Comments: ============================== added sys5 example to description and reopened bug. 1998-06-16 ============================== 1998-07-20 ------------------------------ An archive of two emails which are part of discussions relevant to this bug which also point to a man page deficiency. ========= Show quoted text
> > Roger, > > Jim seems to claim that all system calls except connect() were
automatically Show quoted text
> restarted after a signal in SunOS 4.X. Is this really true i.e. did 4.X > have different restart semantics for different system calls? > (I figured asking you would be quicker than reading the 4.x source.) > > My assumption is that in 5.X SA_RESTART should/must apply to all > interruptible system calls i.e. that we should not treat connect() > differently. Correct? > > Note that connect() is odd because it can fail with EINTR/ERESTART after > having started the connect attempt. Thus when connect() is restarted > the 2nd one might fail with EALREADY or EISCONN even though connect > was sucessful. I don't know of any other syscalls that modify "state" > before returning EINTR.
4.x never restarted anything other than what 5.x does with SA_RESTART passed to sigaction(). SA_RESTART does not mean that all interruptible system calls are restarted. Only a subset. This is what the man page for sigaction(2) says. This is also true of 4.x: SA_RESTART If set and the signal is caught, certain functions that are interrupted by the execution of this signal's handler are transparently restarted by the system; namely, read(2) or write(2) on slow dev- ices like terminals, ioctl(2), fcntl(2), wait(2), and waitid(2). Otherwise, that function returns an EINTR error. Roger ==================== MIME-Version: 1.0 Thanks for the info. Show quoted text
> 4.x never restarted anything other than what 5.x does with > SA_RESTART passed to sigaction(). SA_RESTART does not mean > that all interruptible system calls are restarted. Only a subset. > This is what the man page for sigaction(2) says. > This is also true of 4.x: > > SA_RESTART If set and the signal is caught, certain > functions that are interrupted by the > execution of this signal's handler are > transparently restarted by the system; > namely, read(2) or write(2) on slow dev- > ices like terminals, ioctl(2), fcntl(2), > wait(2), and waitid(2). Otherwise, that > function returns an EINTR error.
The above man page doesn't take sockets into account. In 4.X the source code tells me that restart also applies to send, sendto, sendmsg, recv, recvmsg, recvfrom. Thus we clearly need to fix the man page to say that for SA_RESTART. But what about getmsg and putmsg on slow devices? Shouldn't they get the same treatment as read/write/send*/recv*? The SunOS 5.6 source code shows the following ERESTARTs: fcntl getmsg getpmsg NOT in man page putmsg putpmsg NOT in man page ioctl read pread readv NOT in man page write pwrite writev NOT in man page wait waitid connect accept THIS is a bug recv recvfrom recvmsg NOT in man page send sendto sendmsg NOT in man page Thus the man page had it right 6 out of 14 prior to kernel sockets and 6 out of 20 with kernel sockets!!! Tim, assuming Roger doesn't have an issue with documenting all 20, can you file a man page bug to have the 14 missing calls added to the SA_RESTART decription. Also, fix the connect and accept wrappers in libc (sparc and x86) to not use the restart macro/code. That will fix this "BCP problem". Erik ================ See also: 4132657 History: Submitter: wadej Date: Jun 5 1998 10:02AM Dispatch operator: bugtraq Date: Jun 5 1998 10:02AM Acceptor: cs Date: Jun 11 1998 1:19PM Evaluator: cs Date: Jun 11 1998 1:19PM Commit operator: mukesh Date: Jul 27 1998 5:14PM Fix operator: mukesh Date: Jul 27 1998 5:14PM Integrating operator: bmc Date: Jul 28 1998 12:24PM Verify operator: Date: Closeout operator: Date: Called in by:
Download Client.truss.out.NOK
application/octet-stream 19k

Message body not shown because it is not plain text.

Download Client.truss.out.OK
application/octet-stream 19.4k

Message body not shown because it is not plain text.

Download Client.pl
text/x-perl 248b

Message body is not shown because sender requested not to inline it.

Download Server.pl
text/x-perl 269b

Message body is not shown because sender requested not to inline it.

To: perl5-porters [...] perl.org
Subject: [ID 19991004.002] server exits early
Date: Thu, 21 Dec 2000 14:57:39 -0500
From: "Stephen P. Potter" <spp [...] spotter.yi.org>
Download (untitled) / with headers
text/plain 3.1k
This bug still seems to be present in 5.7.0@8221, only on Solaris. -spp We have encountered a very interesting problem on which you are really our last resort: Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request. We have tried almost everything within our power, e.g.: * compiling Perl on a working OS level and copying the binaries to the non-working OS level, * compiling the current development version (5.005_61), * different GNU compilers (2.8.1 and 2.95.1), * SUN Workshop Compiler C/C++ 4.2, * hacking in 'config.sh' (e.g. 'usevfork=false/true', multithreaded/non-multithreaded). Nothing works out. After issuing a bug report, SUN responded with the following: <<FW: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above. In more understandable language they claimed that older versions of Solaris had a bug, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS , Perl is still working around, but this time unsuccesfully. Server.pl: #!/usr/bin/perl use IO::Socket; $SIG{CHLD} = sub { wait() }; $Sock = new IO::Socket::INET( LocalPort => 9000, Proto => 'tcp', Listen => SOMAXCONN, Reuse => 1 ) or die "SOCKET() error [$!]"; while ( $NewSock = $Sock->accept() ) { $Pid = fork(); if ( $Pid == 0 ) { while ( defined( $Buffer = <$NewSock> ) ) { print( $Buffer ); } exit( 0 ); } } close( $Sock ); exit( 0 ); Client.pl: #!/usr/bin/perl use IO::Socket; $Sock = new IO::Socket::INET( PeerAddr => 'tsesun01', PeerPort => 9000, Proto => 'tcp' ) or die "SOCKET() error [$!]"; foreach ( 1..10 ) { print( $Sock "Msg $_: How are you ?\n" ); } close( $Sock ); exit( 0 ); Output on Solaris 2.6: nl1sahd1:root> ./Server.pl nl1sahd1:root> jobs [1] + Running ./Server.pl & nl1sahd1:root> ./Client.pl nl1sahd1:root> Msg 1: How are you ? Msg 2: How are you ? Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ? nl1sahd1:root> jobs [1] + Running ./Server.pl & Server serves as many requests as it should be. Output on Solaris 2.7: tsesun01:root> ./Server.pl & [1] 12331 tsesun01:root> jobs [1] + Running ./Server.pl & tsesun01:root> ./Client.pl Msg 1: How are you ? Msg 2: How are you ? tsesun01:root> Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ? [1] + Done ./Server.pl & tsesun01:root> jobs Server only serves one request and ends !!!!!
Date: Thu, 21 Dec 2000 11:46:51 -0800
From: ___cliff rayman___ <cliff [...] genwax.com>
To: "Stephen P. Potter" <spp [...] spotter.yi.org>, perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Download (untitled) / with headers
text/plain 876b
perhaps on solaris 2.7 a shutdown is being performed on the socket when the child closes. as an 'experiment/work around', try specifically close the listening socket in the child as per below. "Stephen P. Potter" wrote: Show quoted text
> > Server.pl: > #!/usr/bin/perl > > use IO::Socket; > > $SIG{CHLD} = sub { wait() }; > > $Sock = new IO::Socket::INET( LocalPort => 9000, Proto => 'tcp', Listen => > SOMAXCONN, Reuse => 1 ) or die "SOCKET() error [$!]"; > > while ( $NewSock = $Sock->accept() ) > { > $Pid = fork(); > > if ( $Pid == 0 ) > {
close $Sock && $sockClosed=1; Show quoted text
> > while ( defined( $Buffer = <$NewSock> ) ) > { > print( $Buffer ); > } > > exit( 0 ); > } > } > > close( $Sock );
close $Sock unless $sockClosed;
To: ___cliff rayman___ <cliff [...] genwax.com>
Cc: perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Date: Thu, 21 Dec 2000 15:56:08 -0500
From: "Stephen P. Potter" <spp [...] spotter.yi.org>
Download (untitled) / with headers
text/plain 563b
Lightning flashed, thunder crashed and ___cliff rayman___ <cliff@genwax.com> wh ispered: | perhaps on solaris 2.7 a shutdown is being performed on the socket when the | child closes. as an 'experiment/work around', try specifically close the lis Show quoted text
> tening socket
| in the child as per below. I think the point being made in this report is that the script functions differently between Solaris versions. Sun claims to have fixed a bug, that we may have been working around, and that the work around may no longer be needed and may be causing the problem. -spp
Date: Thu, 21 Dec 2000 23:46:17 +0000
From: Alan Burlison <Alan.Burlison [...] uk.sun.com>
To: "Stephen P. Potter" <spp [...] spotter.yi.org>
Cc: perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
"Stephen P. Potter" wrote: Show quoted text
> We have encountered a very interesting problem on which you are really our > last resort: > > Exiting a child in a forking server (example on page 194 of 'Advanced Perl > Programming' O'Reilly) seems to clean-up the server socket of the parent on > newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' > after having served one client request. > > We have tried almost everything within our power, e.g.: > > * compiling Perl on a working OS level and copying the binaries to > the non-working OS level, > * compiling the current development version (5.005_61), > * different GNU compilers (2.8.1 and 2.95.1), > * SUN Workshop Compiler C/C++ 4.2, > * hacking in 'config.sh' (e.g. 'usevfork=false/true', > multithreaded/non-multithreaded). > > Nothing works out.
Right - I've read the bugrep, played with the example code and here is the story. Prior to the fix, accept() and connect() were erroneously being restarted when a signal was caught. The correct behaviour according to the SVR4 spec is for them to return with EINTR, even if SA_RESTART has been passed to sigaction(). The bugfix changed the behaviour so that if a signal was caught when either accept() or connect() are in progress they fail with EINTR instead of being restarted. There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR. The quick and easy fix is to ignore SIGCHILD rather than catching it - this way no zombie child processes are created and no signals are generated to screw up the accept() call. Change the line $SIG{CHLD} = sub { wait() }; to $SIG{CHLD} = 'IGNORE'; And the script then works as expected. Hope that helps, Alan Burlison Solaris Kernel Development, Sun Microsystems
Date: Thu, 21 Dec 2000 17:47:43 -0600
From: Jarkko Hietaniemi <jhi [...] iki.fi>
To: Alan Burlison <Alan.Burlison [...] uk.sun.com>
Cc: "Stephen P. Potter" <spp [...] spotter.yi.org>, perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Download (untitled) / with headers
text/plain 840b
Show quoted text
> There are two ways to fix the example script. The first is to redo the > accept() if EINTR is returned. The problem with this approach is that > the IO::Socket library doesn't check the return value of the accept() > call, and then tries to do some I/O ops [llseek()] on the invalid file > handle. This then means that by the time your script can get hold of > errno it is set to EBADF instead of EINTR. > > The quick and easy fix is to ignore SIGCHILD rather than catching it -
Can I still fix IO::Socket? :-) Show quoted text
> this way no zombie child processes are created and no signals are > generated to screw up the accept() call. Change the line > $SIG{CHLD} = sub { wait() }; > to > $SIG{CHLD} = 'IGNORE'; > And the script then works as expected. > > Hope that helps, > > Alan Burlison > Solaris Kernel Development, Sun Microsystems
Date: Fri, 22 Dec 2000 00:09:40 +0000
From: Alan Burlison <Alan.Burlison [...] uk.sun.com>
To: Jarkko Hietaniemi <jhi [...] iki.fi>
Cc: "Stephen P. Potter" <spp [...] spotter.yi.org>, perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Download (untitled) / with headers
text/plain 869b
Jarkko Hietaniemi wrote: Show quoted text
> > There are two ways to fix the example script. The first is to redo the > > accept() if EINTR is returned. The problem with this approach is that > > the IO::Socket library doesn't check the return value of the accept() > > call, and then tries to do some I/O ops [llseek()] on the invalid file > > handle. This then means that by the time your script can get hold of > > errno it is set to EBADF instead of EINTR. > > > > The quick and easy fix is to ignore SIGCHILD rather than catching it -
> > Can I still fix IO::Socket? :-)
Hey, you're the main man... :-) Actually I was surmising from the truss output that the problem was in IO::Socket. I had a quick look and it doesn't seem to be doing anything naughty. I've had a look at pp_sys.c as well, and I can't see it there either. Hmmm, wonder what is doing it? Alan Burlison
Date: Fri, 22 Dec 2000 00:21:50 +0000
From: Graham Barr <gbarr [...] pobox.com>
To: Alan Burlison <Alan.Burlison [...] uk.sun.com>
Cc: Jarkko Hietaniemi <jhi [...] iki.fi>, "Stephen P. Potter" <spp [...] spotter.yi.org>, perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Download (untitled) / with headers
text/plain 1.3k
On Fri, Dec 22, 2000 at 12:09:40AM +0000, Alan Burlison wrote: Show quoted text
> Jarkko Hietaniemi wrote: >
> > > There are two ways to fix the example script. The first is to redo the > > > accept() if EINTR is returned. The problem with this approach is that > > > the IO::Socket library doesn't check the return value of the accept() > > > call, and then tries to do some I/O ops [llseek()] on the invalid file > > > handle. This then means that by the time your script can get hold of > > > errno it is set to EBADF instead of EINTR. > > > > > > The quick and easy fix is to ignore SIGCHILD rather than catching it -
> > > > Can I still fix IO::Socket? :-)
> > Hey, you're the main man... > > :-) > > Actually I was surmising from the truss output that the problem was in > IO::Socket. I had a quick look and it doesn't seem to be doing anything > naughty. I've had a look at pp_sys.c as well, and I can't see it there > either. Hmmm, wonder what is doing it?
It may be something along the lines that IO::Socket::accept creates a new object which gets destroyed when the method exits with an error. And during that destroy process various calls may be made I suppose. If this is the case, changing the return to something like the following may help $peer = accept($new,$sock) or do { local $!; undef $new; return }; Graham.
To: Alan Burlison <Alan.Burlison [...] uk.sun.com>
Cc: perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Date: Thu, 21 Dec 2000 23:42:30 -0500
From: "Stephen P. Potter" <spp [...] spotter.yi.org>
Download (untitled) / with headers
text/plain 783b
Lightning flashed, thunder crashed and Alan Burlison <Alan.Burlison@uk.sun.com> whispered: | There are two ways to fix the example script. The first is to redo the | accept() if EINTR is returned. The problem with this approach is that | the IO::Socket library doesn't check the return value of the accept() | call, and then tries to do some I/O ops [llseek()] on the invalid file | handle. This then means that by the time your script can get hold of | errno it is set to EBADF instead of EINTR. What I'm getting from all this is that there isn't a perceived bug in perl, so I should go ahead and close the ticket. Is that correct? How do I explain that the script works as the user expects on other OSes (and earlier versions of Solaris)? A bug in those other OSes? -spp
Date: Fri, 22 Dec 2000 09:07:28 GMT
Subject: Re: [ID 19991004.002] server exits early
To: Alan.Burlison [...] uk.sun.com
From: Nick Ing-Simmons <nik [...] tiuk.ti.com>
Cc: "Stephen P. Potter" <spp [...] spotter.yi.org>, perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 488b
Alan Burlison <Alan.Burlison@uk.sun.com> writes: Show quoted text
>The bugfix changed the behaviour so that if a signal was caught when >either accept() or connect() are in progress they fail with EINTR >instead of being restarted. > >There are two ways to fix the example script. The first is to redo the >accept() if EINTR is returned. The problem with this approach is that >the IO::Socket library doesn't check the return value of the accept() >call,
So we can consider this a bug in IO::Socket.
Date: Fri, 22 Dec 2000 09:28:37 +0000
From: Alan Burlison <Alan.Burlison [...] uk.sun.com>
To: "Stephen P. Potter" <spp [...] spotter.yi.org>
Cc: perl5-porters [...] perl.org
Subject: Re: [ID 19991004.002] server exits early
Download (untitled) / with headers
text/plain 1.6k
Show quoted text
> What I'm getting from all this is that there isn't a perceived bug in perl, > so I should go ahead and close the ticket. Is that correct? How do I > explain that the script works as the user expects on other OSes (and > earlier versions of Solaris)? A bug in those other OSes?
Correct - there is no bug in perl (well, perhaps it should return EINTR instead of EBADF...) I've tried to track down exactly which standard mandates this behaviour, but without a lot of success. Signals are one of the areas where different Unixes tend to differ wildly, and this particular problem is a manefestation of those differences rather than a bug per se - the behaviour will depend on which standards a particular Unix is based on, and how closely it adheres to those standards. The sigaction manpage for Solaris says this: SA_RESTART If set and the signal is caught, functions that are interrupted by the execution of this signal's handler are transparently restarted by the system, namely fcntl(2), ioctl(2), wait(2), waitid(2), and the following functions on slow dev- ices like terminals: getmsg() and getpmsg() (see getmsg(2)); putmsg() and putpmsg() (see putmsg(2)); pread(), read(), and readv() (see read(2)); pwrite(), write(), and writev() (see write(2)); recv(), recvfrom(), and recvmsg() (see recv(3SOCKET)); and send(), sendto(), and sendmsg() (see send(3SOCKET). Otherwise, the function returns an EINTR error. So in fact the behaviour seen is as documented on Solaris.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org