Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition+fail in dist\IO\t\cachepropagate-tcp.t #12979

Closed
p5pRT opened this issue May 20, 2013 · 21 comments
Closed

race condition+fail in dist\IO\t\cachepropagate-tcp.t #12979

p5pRT opened this issue May 20, 2013 · 21 comments

Comments

@p5pRT
Copy link

p5pRT commented May 20, 2013

Migrated from rt.perl.org#118059 (status was 'resolved')

Searchable as RT118059$

@p5pRT
Copy link
Author

p5pRT commented May 20, 2013

From @bulk88

Created by @bulk88

There is a race condition in cachepropagate-tcp.t between the parent
proc's accept() and the child proc's connection since the child proc has
a "sleep(1);" delay. On my Win32 32 bit, Server 2003 x64, 8 core, VC
2008 Perl, the ->accept() times out, does not return an obj, and then
fatally errors when ->sockdomain() is called on an undefined scalar. Example

___________________________________________________________________________
C​:\p519\src\t>..\perl.exe -I..\lib harness ../dist/IO/t/cachepropagate-tcp.t
../dist/IO/t/cachepropagate-tcp.t .. 1/8 Can't call method "sockdomain"
on an un
defined value at t/cachepropagate-tcp.t line 46.
# Looks like you planned 8 tests but ran 5.
# Looks like your test exited with 9 just after 5.
../dist/IO/t/cachepropagate-tcp.t .. Dubious, test returned 9 (wstat
2304, 0x900
)
Failed 3/8 subtests

Test Summary Report
-------------------
../dist/IO/t/cachepropagate-tcp.t (Wstat​: 2304 Tests​: 5 Failed​: 0)
  Non-zero exit status​: 9
  Parse errors​: Bad plan. You planned 8 tests but ran 5.
Files=1, Tests=5, 2 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
Result​: FAIL

C​:\p519\src\t>
___________________________________________________________________________

running with -v,
___________________________________________________________________________
C​:\p519\src\t>..\perl.exe -I..\lib harness -v
../dist/IO/t/cachepropagate-tcp.t
../dist/IO/t/cachepropagate-tcp.t ..
1..8
ok 1 - socket created
ok 2 - protocol defined
ok 3 - domain defined
ok 4 - type defined
ok 5 - spawned a child
Can't call method "sockdomain" on an undefined value at
t/cachepropagate-tcp.t l
ine 46.
# Looks like you planned 8 tests but ran 5.
# Looks like your test exited with 9 just after 5.
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 3/8 subtests

Test Summary Report
-------------------
../dist/IO/t/cachepropagate-tcp.t (Wstat​: 2304 Tests​: 5 Failed​: 0)
  Non-zero exit status​: 9
  Parse errors​: Bad plan. You planned 8 tests but ran 5.
Files=1, Tests=5, 1 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU)
Result​: FAIL

C​:\p519\src\t>
__________________________________________________________________________

line 46 is
__________________________________________________________________________
  my $new = $listener->accept();

is($new->sockdomain(), $d, 'domain match');
__________________________________________________________________________

accept the function in IO​::Socket​::accept the method fails with $! being
"Bad file descriptor"/9. From my research on my machine the timeout is
undefined, so IO​::Select and can_read parts of IO​::Socket​::accept are
not executed.

The failing test was added in
http​://perl5.git.perl.org/perl.git/commit/93a5d7bfc07a41ef26fb3e3b298a7d88c3741ed1?f=dist/IO/t/cachepropagate-tcp.t
as part of CPAN RT #61577 and was written by Daniel Kahn Gillmor. I dont
see any explanation for the "sleep(1);" in the child fork proc.

If I put a "sleep(1);" before the ->accept(), it passes for me most (4
trys pass, 5th failed) of the time. If I put a 2 sec sleep, it always
(10 trys, no fail) passes, but 2 seconds of sleeping is alot of wall
time wasted. If I remove the sleep(1) from the child proc, it always
fails (10 fails out of 10 trys). If I put a sleep(1) at the accept,
remove the sleep(1) at the child (child does not sleep), 10 passes out
of 10 trys. This sockets/unix IO/unix events stuff I am not very
familiar with, so I dont know how to fix it.

Perl Info

Flags:
    category=library
    severity=medium
    module=IO

Site configuration information for perl 5.19.0:

Configured by Administrator at Sun May 19 19:59:16 2013.

Summary of my perl5 (revision 5 version 19 subversion 0 patch blead 
2013-05-19.21:05:35 bb003204009d113d60d4173c3ed72b10c8169f14 
v5.18.0-25-gbb00320) configuration:
  Snapshot of: bb003204009d113d60d4173c3ed72b10c8169f14
  Platform:
    osname=MSWin32, osvers=5.2, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GS- -GL 
-DWIN32 -D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE 
-D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS 
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO',
    optimize='-O1 -MD -Zi -DNDEBUG -GS- -GL',
    cppflags='-DWIN32'
    ccversion='15.00.30729.01', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf 
-ltcg  -libpath:"c:\p519\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib 
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib 
oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  
version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl519.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug 
-opt:ref,icf -ltcg  -libpath:"c:\p519\lib\CORE"  -machine:x86'

Locally applied patches:
    


@INC for perl 5.19.0:
    C:/p519/site/lib
    C:/p519/lib
    .


Environment for perl 5.19.0:
    CYGWIN=tty
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=C:\p519\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program 
Files (x86)\Microsoft Visual Studio 9.0\VC\bin;C:\Program Files 
(x86)\Microsoft Visual Studio 9.0\VC;C:\Program Files\TortoiseGit\bin
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented May 20, 2013

From @tonycoz

On Sun May 19 20​:24​:50 2013, bulk88 wrote​:

There is a race condition in cachepropagate-tcp.t between the parent
proc's accept() and the child proc's connection since the child proc
has
a "sleep(1);" delay. On my Win32 32 bit, Server 2003 x64, 8 core, VC
2008 Perl, the ->accept() times out, does not return an obj, and then
fatally errors when ->sockdomain() is called on an undefined scalar.
Example

___________________________________________________________________________

C​:\p519\src\t>..\perl.exe -I..\lib harness
../dist/IO/t/cachepropagate-tcp.t
../dist/IO/t/cachepropagate-tcp.t .. 1/8 Can't call method
"sockdomain"
on an un
defined value at t/cachepropagate-tcp.t line 46.
# Looks like you planned 8 tests but ran 5.
# Looks like your test exited with 9 just after 5.
../dist/IO/t/cachepropagate-tcp.t .. Dubious, test returned 9 (wstat
2304, 0x900
)
Failed 3/8 subtests

I see this failure maybe 1 in 20 runs (Windows 7 x64, Core i7).

I suspect the problem is that winsock is invalidating the accept socket
once the child socket goes out of scope, causing a race in the accept()
implementation.

I changes the code so the child reads a line from the socket, and
the parent closes the socket. With that I had 2700 successful runs
before I stopped testing.

I'll try a smoke-me to see if it breaks anywhere else.

Tony

@p5pRT
Copy link
Author

p5pRT commented May 20, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 26, 2013

From @kmx

I get the same failure with perl-5.18.0 + gcc-4.7.3 + 32bit MS Windows
on my Windows 7 box.

It fails always

--
kmx

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2013

From @bulk88

On Mon May 20 02​:10​:13 2013, tonyc wrote​:

I see this failure maybe 1 in 20 runs (Windows 7 x64, Core i7).

I suspect the problem is that winsock is invalidating the accept socket
once the child socket goes out of scope, causing a race in the accept()
implementation.

I changes the code so the child reads a line from the socket, and
the parent closes the socket. With that I had 2700 successful runs
before I stopped testing.

I'll try a smoke-me to see if it breaks anywhere else.

Tony

Any updates TonyC?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jun 28, 2013

From @tonycoz

On Thu, Jun 27, 2013 at 04​:34​:17PM -0700, bulk88 via RT wrote​:

On Mon May 20 02​:10​:13 2013, tonyc wrote​:

I see this failure maybe 1 in 20 runs (Windows 7 x64, Core i7).

I suspect the problem is that winsock is invalidating the accept socket
once the child socket goes out of scope, causing a race in the accept()
implementation.

I changes the code so the child reads a line from the socket, and
the parent closes the socket. With that I had 2700 successful runs
before I stopped testing.

I'll try a smoke-me to see if it breaks anywhere else.

Tony

Any updates TonyC?

It smoked ok, but I don't trust that it fixes the underlying problem
(which may not be fixable, leaving us with the work-around.)

My theory above, as written, is hopefully nonsense - Windows returning
a socket and suddenly making it not-a-socket, rather than return an
end-of-file or EPIPE on the next operation would be even more broken
than I expect from Microsoft.

For a Real™ fork() any open file handles or sockets are cloned in the
child - the child can exit or explicitly cose their socket handle, but
it won't have an effect on the socket handle the parent has.

Under Win32 we emulate that, which I suspect is the real cause of the
problem here - when a thread is created fp_dup() calls
PerlIO_fdupopen() which does reasonable things on Unix, but on Win32
that leaves all the work to win32_fdupopen().

win32_fdupopen() calls win32_dup() a trivial wrapper around dup() -
and I don't see how that works for a socket fd unless the CRT dup() is
tolerant of errors from DuplicateHandle().

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2013

From @tonycoz

On Thu Jun 27 23​:50​:22 2013, tonyc wrote​:

It smoked ok, but I don't trust that it fixes the underlying problem
(which may not be fixable, leaving us with the work-around.)

My theory above, as written, is hopefully nonsense - Windows returning
a socket and suddenly making it not-a-socket, rather than return an
end-of-file or EPIPE on the next operation would be even more broken
than I expect from Microsoft.

For a Real™ fork() any open file handles or sockets are cloned in the
child - the child can exit or explicitly cose their socket handle, but
it won't have an effect on the socket handle the parent has.

Under Win32 we emulate that, which I suspect is the real cause of the
problem here - when a thread is created fp_dup() calls
PerlIO_fdupopen() which does reasonable things on Unix, but on Win32
that leaves all the work to win32_fdupopen().

win32_fdupopen() calls win32_dup() a trivial wrapper around dup() -
and I don't see how that works for a socket fd unless the CRT dup() is
tolerant of errors from DuplicateHandle().

This theory turned out to be nonsense, I'm exploring the behaviour some
more.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 29, 2013

From @tonycoz

On Tue Aug 27 16​:41​:11 2013, tonyc wrote​:

On Thu Jun 27 23​:50​:22 2013, tonyc wrote​:

It smoked ok, but I don't trust that it fixes the underlying problem
(which may not be fixable, leaving us with the work-around.)

My theory above, as written, is hopefully nonsense - Windows returning
a socket and suddenly making it not-a-socket, rather than return an
end-of-file or EPIPE on the next operation would be even more broken
than I expect from Microsoft.

For a Real™ fork() any open file handles or sockets are cloned in the
child - the child can exit or explicitly cose their socket handle, but
it won't have an effect on the socket handle the parent has.

Under Win32 we emulate that, which I suspect is the real cause of the
problem here - when a thread is created fp_dup() calls
PerlIO_fdupopen() which does reasonable things on Unix, but on Win32
that leaves all the work to win32_fdupopen().

win32_fdupopen() calls win32_dup() a trivial wrapper around dup() -
and I don't see how that works for a socket fd unless the CRT dup() is
tolerant of errors from DuplicateHandle().

This theory turned out to be nonsense, I'm exploring the behaviour some
more.

Amongst many other things I tried, I changed win32_accept to​:

win32_accept(SOCKET s, struct sockaddr *addr, int *addrlen)
{
  SOCKET r, s2;
  SOCKET x;

  SOCKET_TEST((r = accept(TO_SOCKET(s), addr, addrlen)), INVALID_SOCKET);
  if (r == INVALID_SOCKET) {
  dTHX;
  PerlIO_printf(PerlIO_stderr(), "accept(%d (%p)) => %d failed %d\n",
(int)s, _get_osfhandle(s), (int)r, errno);
  }
  s2 = OPEN_SOCKET(r);
  x = _get_osfhandle(s2);
  if (x != r) {
  dTHX;
  PerlIO_printf(PerlIO_stderr(), "accept(%d (%d)) => %d but osfhandle
returned bad %d\n", (int)s, _get_osfhandle(s), (int)r, (int)x);
  }
  return s2;
}

The output from a failed run looked like​:

1..8
ok 1 - socket created
ok 2 - protocol defined
ok 3 - domain defined
ok 4 - type defined
ok 5 - spawned a child
accept(3 (112)) => 236 but osfhandle returned bad -1
Can't use an undefined value as a symbol reference at
../dist/IO/t/cachepropagate-tcp.t line 49.
# Looks like you planned 8 tests but ran 5.
# Looks like your test exited with 9 just after 5.

So it seems some race is closing the accept()ed socket before we get to
use it.

Pretty much any output I do in the child or parent before the accept()
prevents the problem from occuring for me, which makes it difficult to test.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2013

From @bulk88

On Wed Aug 28 18​:09​:32 2013, tonyc wrote​:

The output from a failed run looked like​:

1..8
ok 1 - socket created
ok 2 - protocol defined
ok 3 - domain defined
ok 4 - type defined
ok 5 - spawned a child
accept(3 (112)) => 236 but osfhandle returned bad -1
Can't use an undefined value as a symbol reference at
../dist/IO/t/cachepropagate-tcp.t line 49.
# Looks like you planned 8 tests but ran 5.
# Looks like your test exited with 9 just after 5.

So it seems some race is closing the accept()ed socket before we get to
use it.

Pretty much any output I do in the child or parent before the accept()
prevents the problem from occuring for me, which makes it difficult to
test.

Tony

Using freeze/thaw features of the parent and child OS threads, the bug
can be prevented. 2 theories I have

1. somewhere a CloseHandle was done on a socket handle, which isn't
allowed on paper, because supposedly there are winsock user-mode
resources which aren't cleaned up when the kernel handle (a socket on NT
is a kernel handle from the AFD driver, IDK if its true that the AFD
driver won't cleanup/do a callback to the usermode winsock side when the
kernel handle is closed) is closed.

2. a double free (using the correct closesocket() command) was done,
when the child psuedo proc exited

I include 2 screen shots of why I think the above. The break happened at
a invalid handle exception from zwclose/ntclose. Notice it came from the
usermode winsock dll.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2013

From @bulk88

On Fri Sep 20 01​:14​:51 2013, bulk88 wrote​:

I include 2 screen shots of why I think the above. The break happened at
a invalid handle exception from zwclose/ntclose. Notice it came from the
usermode winsock dll.

forgot the attachments

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2013

From @bulk88

118059-9-20-2.GIF

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2013

From @bulk88

118059-9-20-1.GIF

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

Adding a trimmed down test version of cacheproagate-tcp.t to isolate
problem. Another bug investigating is, in badrun, the $^E is 6, which is
ERROR_INVALID_HANDLE, the accept returned 10038/0x2736/WSAENOTSOCK to
perl. There is a line that must be commented out to be a badrun,
uncomment (and the print in the child thread adds delay) it will be a
good run.

Shouldn't $^E be 10038 not 6 at this point? Is this another "bug" or
not? Ill have more to say tomorrow from rewriting a IRC p5p convo today.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

1..6
ok 1 - socket created
ok 2 - protocol defined
ok 3 - domain defined
ok 4 - type defined
listener socket is OS Hnd 368
Press any key to continue . . .
ok 5 - spawned a child
client socket os hnd >2d4< fileno >4<
rawerr pass cerr >25< winerr >0<
$VAR1 = {
  'SO_ERROR' => 4103,
  'SOL_SOCKET' => 65535,
  '$listener' => bless( \*Symbol​::GEN0, 'IO​::Socket​::INET' ),
  'SockOptRes' => 0
  };
IsPerlIOValid($listener) 1
post accept cerr >25< winerr >6< child OS Hnd >2d4< parent accept OS Hnd >2c8<
child fds >4< parent fds >5<
ok 6 - domain match
Press any key to continue . . .

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

tcphandlebugs.pl

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

1..6
ok 1 - socket created
ok 2 - protocol defined
ok 3 - domain defined
ok 4 - type defined
listener socket is OS Hnd 368
Press any key to continue . . .
ok 5 - spawned a child
rawerr pass cerr >9< winerr >0<
$VAR1 = {
  '$listener' => bless( \*Symbol​::GEN0, 'IO​::Socket​::INET' ),
  'SOL_SOCKET' => 65535,
  'SO_ERROR' => 4103,
  'SockOptRes' => 0
  };
IsPerlIOValid($listener) 1
post accept cerr >9< winerr >6< child OS Hnd >2d4< parent accept OS Hnd >1.#QNAN<
child fds >4< parent fds >$new is undef<
sleep block Excpt​: "Can't call method "sockdomain" on an undefined value at tcphandlebugs.pl line 75." 2d4child fds 4
Press any key to continue . . .

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

On Wed Sep 25 23​:42​:55 2013, bulk88 wrote​:

Adding a trimmed down test version of cacheproagate-tcp.t to isolate
problem. Another bug investigating is, in badrun, the $^E is 6, which is
ERROR_INVALID_HANDLE, the accept returned 10038/0x2736/WSAENOTSOCK to
perl. There is a line that must be commented out to be a badrun,
uncomment (and the print in the child thread adds delay) it will be a
good run.

Made an ithreads version which freezes the server thread that does the
accept in the accept, for the whole life time of the child thread I
think, (didnt verify with c debugger). It does NOT fail on it. And the
accept returns a working handle.

Is this a handle leak in Perl PLUS a race in winsock?

The script can be messed around with.
--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2013

From @bulk88

tcphandlebugst.pl

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2013

From @bulk88

Adding some things I said in IRC for archival reasons.

Would this make any sense, in Winsock, if a process does a loopback
connection, the Winsock creates a pipe OS handle instead of a packetized
TCP OS handle, then registers the pipe OS handle with a Winsock user
mode struct so all socket funcs will work as normal on the pipe handle,
then the socket() returns the same handle that accept() returns, and
instead of going through the OS packet switcher, they talk over a pipe
which is more efficient? About the bug, what happens is, the accept()
calls into kernel mode and the server is descheduled from the CPU, then
the child psuedo fork runs, does its connect(), then closesocket()s
during perl ithread glboal destruction, then the accept() unblocks from
kernel mode, but the handle is that unblocked the accept is dead already?

Invalid handle exception with C debugger.
______________________________
  ntdll.dll!_ZwClose@​4() + 0x12 bytes
  mswsock.dll!_SockCloseSocket@​4() + 0x240 bytes
  mswsock.dll!_WSPAccept@​24() + 0xaef bytes
  ws2_32.dll!_WSAAccept@​20() + 0x85 bytes
  ws2_32.dll!_accept@​12() + 0x17 bytes

perl519.dll!win32_accept(unsigned int s=3, sockaddr * addr=0x0012f700,
int * addrlen=0x0012fb00) Line 370 + 0x12 bytes C
  perl519.dll!PerlSockAccept(IPerlSock * piPerl=0x01b67218, unsigned int
s=3, sockaddr * addr=0x0012f700, int * addrlen=0x0012fb00) Line 1227 +
0x11 bytes C++
  perl519.dll!Perl_pp_accept(interpreter * my_perl=0x01b64e0c) Line
2557 + 0x37 bytes C
  perl519.dll!Perl_runops_standard(interpreter * my_perl=0x01b64e0c)
Line 42 + 0xc bytes C
  perl519.dll!S_run_body(interpreter * my_perl=0x01b64e0c, long
oldscope=1) Line 2500 + 0xf bytes C
  perl519.dll!perl_run(interpreter * my_perl=0x01b64e0c) Line 2419 C
  perl519.dll!RunPerl(int argc=2, char * * argv=0x01b64d68, char * *
env=0x01b63430) Line 270 + 0x9 bytes C++
  perl.exe!__tmainCRTStartup() Line 582 + 0x17 bytes C
  kernel32.dll!_BaseProcessStart@​4() + 0x28 bytes
______________________________

The invalid handle passed to NtClose in the parent thread is the same OS
kernel handle that appeared on the CRT level from $connector.
_________________________________
  my $connector = IO​::Socket​::INET->new(PeerAddr => '127.0.0.1',
  PeerPort => $port,
  Proto => 'tcp');
  Win32​::API​::WriteMemory($ptr, pack('I',
Win32API​::File​::GetOsFHandle(*{$connector}{IO})), 4);
_________________________________
So Winsock somehow lets accept() and its callees in parent thread see
the OS handle of the child thread and try to close it from the parent
thread.

The pointer memory write stuff was the fastest way I though of for the
child thread to report its data without doing stdio to console which
then stops the race/bug.

The accept actually fails with 10038 WSAENOTSOCK but PerlIO corrupts it
to ERROR_INVALID_HANDLE/6. Disassembly of WSPAccept and more debugging
shows the 10038 is coming from a NtDeviceIoControlFile call in WSPAccept
with IOCTL 0x12010 which google and reactos say is a constant called
AFD_ACCEPT. NtDeviceIoControlFile AFD_ACCEPT returns 0xc0000008
STATUS_INVALID_HANDLE . The asm shows SockCloseSocket branch is only
taken on error codes != 0, so its resource cleanup code on failure.
NtStatusToSocketError later in WSPAccept converts STATUS_INVALID_HANDLE
to WSAENOTSOCK which becomes the public facing error code of the failed
accept().

Because of the is a loopback in the same process implemented using pipes
by winsock theory I present above. TonyC asked are the handles the same
from the accept() in the parent thread and connect() in the child
thread? On a successful accept(), the OS handle returned is DIFFERENT
from the OS handle in $connector in the child. Not in IRC comment​: I
didnt research if the 2 handles are the same kernel object/struct in
object manager and Winsock is doing a DuplicateHandle internally.

This is an answer for another TonyC question. The fds in a successful
accept() are different also. There is a class of bugs/limitations in
Win32 psuedo-fork where the fileno/fds aren't virtualized between the
psuedo-procs. Child socket is fd 4 and parent socket is fd 5 in perl
lang on a successful accept() by my tests. See earlier posts in this ticket.

@p5pRT
Copy link
Author

p5pRT commented Nov 2, 2013

From @steve-m-hay

Fixed by commit b47a847 from #120091, so closing this ticket too :-)

@p5pRT
Copy link
Author

p5pRT commented Nov 2, 2013

@steve-m-hay - Status changed from 'open' to 'resolved'

h-vn added a commit to radiator-software/p5-net-ssleay that referenced this issue Feb 2, 2022
…ls on Windows. (#366)

Calling a one second sleep before each connect to server seems fix the occasionally occurring failures. Sleep is now done when running on Windows and Perl is earlier than 5.20.0.

For the details, see GH-356 and look for CloseHandle() in Perl 5.20.0 changelog.

It seems that this is the Perl bug that causes the socket failures:
  Perl/perl5#12979
h-vn added a commit to radiator-software/p5-net-ssleay that referenced this issue Feb 2, 2022
…ls on Windows. (#366)

Calling a one second sleep before each connect to the server seems fix the
occasionally occurring failures. Sleep is now done when running on Windows and
Perl is earlier than 5.20.0.

For the details, see GH-356 and look for CloseHandle() in Perl 5.20.0 changelog.
It seems that this is the Perl bug that causes the socket failures:
  Perl/perl5#12979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant