Skip Menu |
 
Report information
Id: 88840
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: tonyc <tony [at] develop-help.com>
Cc: bulk88 <bulk88 [at] hotmail.com>
AdminCc:

Operating System: mswin32
PatchStatus: (no value)
Severity: medium
Type:
  • core
  • patch
  • patch
Perl Version: 5.14.0
Fixed In: (no value)



Subject: t/op/fork.t blocks in external process test on Win32
Date: Mon, 18 Apr 2011 23:04:42 +1000
To: perlbug [...] perl.org
From: Tony Cook <tony [...] develop-help.com>
This is a bug report for perl from tony@develop-help.com, generated with the help of perlbug 1.39 running under perl 5.14.0. ----------------------------------------------------------------- [Please describe your issue here] t/op/fork.t blocks running the test code: system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; This leaves a process running that Ctrl-C does not kill and needs to be killed with Task Manager. Simply reproducable with: ..\perl -I..\lib -le "if ($pid=fork){ kill 9, $pid; } else { sleep 5;}" Loading a non-trivial module prevents the lockup: ..\perl -I..\lib -MEncode -le "if ($pid=fork){ kill 9, $pid; } else { sleep 5;}" Modifying win32.c to include Sleep(1) immediately before the call to TerminateThread() prevents the lockup. Sleep(0) does not prevent the problem. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl 5.14.0: Configured by tony at Mon Apr 18 20:41:14 2011. Summary of my perl5 (revision 5 version 14 subversion 0) configuration: Derived from: d12b49d6e531fdc354980af14eb2efa34756c1d8 Platform: osname=MSWin32, osvers=6.1, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -Od -MD -Zi -DDEBUGGING -DWIN32 -D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO', optimize='-Od -MD -Zi -DDEBUGGING', cppflags='-DWIN32' ccversion='15.00.30729.01', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -libpath:"c:\notthere\lib\CORE" -machine:x86 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' libpth=\lib libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl514.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -libpath:"c:\notthere\lib\CORE" -machine:x86 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"' Locally applied patches: RC0 --- @INC for perl 5.14.0: lib C:/Users/tony/dev/perl/git/perl/lib . --- Environment for perl 5.14.0: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\Tools;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\Tools\bin;C:\Windows\Microsoft.NET\Framework\;C:\Windows\Microsoft.NET\Framework\\Microsoft .NET Framework 3.5 (Pre-Release Version);C:\Windows\Microsoft.NET\Framework\v2.0.50727;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\VCPackages;C:\Program Files\Microsoft SDKs\Windows\v7.0\bin;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bin\amd64;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\vcpackages;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE;C:\apps\platsdk\win7\Bin\x64;C:\apps\platsdk\win7\Bin;C:\Windows\Microsoft.NET\Framework64\v3.5;C:\Windows\Microsoft.NET\Framework\v3.5;C:\Windows\Microsoft.NET\Framework64\v2.0.50727;C:\Windows\Microsoft.NET\Framework\v2.0.50727;C:\apps\platsdk\win7\Setup;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\MiKTeX 2.8\miktex\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\WIDCOMM\Bluetooth Software\;C:\Program Files\WIDCOMM\Bluetooth Software\syswow64;C:\Program Files (x86)\QuickTime\QTSystem\;C:\apps\git\Git\cmd PERL_BADLANG (unset) SHELL (unset)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 5.7k
I am using Show quoted text
__________________________________________________________ C:\>perl -V Summary of my perl5 (revision 5 version 17 subversion 3 patch blead 2012-08-02.1 5:28:40 4276fd3639b8e3fdebe28a2eb81d455f0f33fd05 v5.17.2-193-g4276fd3) configura tion: Snapshot of: 4276fd3639b8e3fdebe28a2eb81d455f0f33fd05 Platform: osname=MSWin32, osvers=5.2, archname=MSWin32-x64-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -Od -MD -Zi -DDEBUGGING -fp:precise -DWIN 32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -D_CRT_SECURE_NO_DEPRECATE -D_C RT_NONSTDC_NO_DEPRECATE -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_ IMPLICIT_SYS -DUSE_PERLIO', optimize='-Od -MD -Zi -DDEBUGGING -fp:precise', cppflags='-DWIN32' ccversion='15.00.30729.01', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lsee ksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -libpath:"c:\p517\lib\COR E" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Com mon-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b6 4144ccf1df' language='*'"' libpth=\lib libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.l ib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32 .lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt. lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg 32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws 2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msv crt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -libpath:"c:\p 517\lib\CORE" -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft. Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyTo ken='6595b64144ccf1df' language='*'"' Characteristics of this binary (from libperl): Compile-time options: DEBUGGING HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_TRACK_MEMPOOL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under MSWin32 Compiled at Aug 2 2012 17:01:32 @INC: C:/p517/site/lib C:/p517/lib . C:\>
______________________________________________________________ VS 2008 and Server 2003 x64, and a x64 perl. The /op/fork.t freeze is this being run "perl -e "if ($pid=fork){kill(9, $pid)} else {sleep 5}"". The callstack from the only thread at that point in the process is
___________________________________________________ ntdll.dll!ZwWaitForSingleObject() + 0xa bytes ntdll.dll!RtlpWaitOnCriticalSection() - 0x1aa bytes ntdll.dll!LdrUnloadDll() - 0x1d9a0 bytes kernel32.dll!FreeLibrary() + 0x71 bytes
> perl517.dll!VMem::~VMem() Line 159 C++
perl517.dll!VMem::`scalar deleting destructor'() + 0x17 bytes C++ perl517.dll!VMem::Release() Line 268 + 0x2b bytes C++ perl517.dll!CPerlHost::~CPerlHost() Line 2081 C++ perl517.dll!CPerlHost::`scalar deleting destructor'() + 0x17 bytes C++ perl517.dll!win32_delete_internal_host(void * h=0x0000000000346e60) Line 202 + 0x2b bytes C++ perl517.dll!perl_free(interpreter * my_perl=0x0000000001aa6a08) Line 1335 C perl517.dll!RunPerl(int argc=3, char * * argv=0x0000000000343d90, char * * env=0x00000000003441b0) Line 277 C++ perl.exe!main(int argc=3, char * * argv=0x0000000000343d90, char * * env=0x0000000000344820) Line 24 C perl.exe!__tmainCRTStartup() Line 586 + 0x19 bytes C kernel32.dll!BaseProcessStart() + 0x2c bytes
__________________________________________________ This is the parent "process" in ithreads, the child thread would have a different callstack begining. In vmem.h at line 158.
_____________________________________________ VMem::~VMem(void) { #ifdef _USE_LINKED_LIST while (m_Dummy.pNext != &m_Dummy) { Free(m_Dummy.pNext+1); } #endif if (m_hLib) FreeLibrary(m_hLib); DeleteCriticalSection(&m_cs); }
____________________________________________ m_hLib is 0x000007ff7fc00000 which is C:\WINDOWS\system32\msvcrt.dll according to Process Explorer (HMODULEs are pointers to a memory mapped DLL PE file header). The problem with vmem.h is that it calls msvcrt.dll instead of being hard coded to whatever the static linked malloc or static linked CRT is, so 2 CRTs wind up in the process space always unless its compiled with Platform SDK or VC 6 (ActivePerl is made this way) which doesnt use VS specific CRTs but msvcrt.dll for a c std lib. This is an optimization issue but is not the cause of the deadlock, since vmem.h and msvcrt.dll have been around for almost a decade ( added in http://perl5.git.perl.org/perl.git/commit/f57e8d3bb1a36000e1951a4d2d5b0104b39bde4b ) So I got private symbol global variable "LdrpLoaderLock" (google it) from ntdll and dumped it.
Download ldrpdump.PNG
image/png 74.7k
ldrpdump.PNG
Download (untitled) / with headers
text/plain 1.5k
The only thread in the process is PID 45692, which is parent thread based on callstack. 0x000000000000815c is 33116 in dec. So I presume PID 33116 was the child thread. So I put a breakpoint on TerminateThread in win32_kill in win32.c and restart the perl process Show quoted text
____________________________________________________ case 9: /* kill -9 style un-graceful exit */ if (TerminateThread(hProcess, sig)) { /* Allow the scheduler to finish cleaning up the other thread. * Otherwise, if we ExitProcess() before another context switch * happens we will end up with a process exit code of "sig" instead * of our own exit status. * See also: https://rt.cpan.org/Ticket/Display.html?id=66016#txn-908976 */ Sleep(0); remove_dead_pseudo_process(child);
____________________________________________________ Ldr critical section is owned by the child thread (the PID in the critical section is the child thread at the top tab in VS) at the point where we will call TerminateThread. Not 1 Perl machine opcode executed yet in the child thread. The NT DLL loader is calling the PE header entry point on kernel32.dll (function BaseDllInitialize in kernel32.dll) and kernel32.dll the dll is setting itself up before running any !!!win32!!! user mode code. For more on the behavior of the DLL loader lock, read this by MS http://download.microsoft.com/download/a/f/7/af7777e5-7dcd-4800-8a0a-b18336565f5b/dll_bestprac.doc .
Download childthread.png
image/png 76.2k
childthread.png
Download (untitled) / with headers
text/plain 3.1k
The simplest (it can get much more complicated, I discuss later) fix is either to wait for the child interp in the child thread to run at PerlProcFork/CreateThread time (bad, a context switch which 99% of the isn't needed), or at win32_kill time (good, I explain later). CreateThread on Windows is lazy, you get back a handle even if the thread never gets scheduled ( http://msdn.microsoft.com/en-us/library/ms682453%28VS.85%29.aspx ) or access vio function pointer to start the thread. So I think for win32_kill, win32_kill needs to get the PerlInterpreter * of the child interp, and check if the child interp knows its thread id yet, relying on this line http://perl5.git.perl.org/perl.git/blob/HEAD:/win32/perlhost.h#l1711 . At the TerminateThread in win32_kill do something like if(! child_perl->Isys_intern->psuedo_id){ Sleep(0); if(! child_perl->Isys_intern->psuedo_id){ Sleep(1); //maybe some sync IO or mutex collision in thread attach in some dll entry point, how many millsec do we wait? if(! child_perl->Isys_intern->psuedo_id){ croak("panic: child interp never scheduled"); } } } The problem is, I dont think that the PerlInterpreter * of child interp is stored anywhere in the parent interp, maybe I just can't find where it is stored (newbie). Anyone know how to get the PerlInterpreter * of the child interp from the parent interp's PerlInterpreter struct or somehow else? Reason it is better to check if the child interp was ever scheduled at kill time than create time is most often the parent will go do work then block on IO or wait for a child interp or run bytecode till its quantum expires, or hits a malloc mutex, or something and the child thread is scheduled enough times to go through the DLL thread initialization and reach win32_start_child function. I slightly screwed up the description of DLL entry points above, they run on process start, and on thread start (usually, see http://msdn.microsoft.com/en-us/library/windows/desktop/ms682579%28v=vs.85%29.aspx usually called from process attach of dll for dll that doesn't care about threads) and 2 other cases, see http://msdn.microsoft.com/en-us/library/windows/desktop/ms682583%28v=vs.85%29.aspx for the 4 times dll entry point is called/runs. A more complicated alternative/higher overhead is creating a 1 time use event handle http://msdn.microsoft.com/en-us/library/windows/desktop/ms682655%28v=vs.85%29.aspx that win32_start_child activates and wait on it in parent interp at kill time with a timeout. What the timeout should be IDK. A more complicated solution is to try to obtain the dll loader lock using NT Native API, http://www.geoffchappell.com/studies/windows/win32/ntdll/api/ldrapi/lockloaderlock.htm and not call TerminateThread if parent interp can't get the lock, the solution is not WinCE compatible (is this deprecated/near dead platform for Perl or not?), and not DOS Windows compatible (code removed in 5.15 (I think) for DOS Windows by Jan Dubois). At one time Perl did use Native API for NT 4 support but Jan removed ntdll from Perl in http://perl5.git.perl.org/perl.git/commitdiff/8cbe99e5b6fe99a6bc17c0b0cee249bac3565da4 .
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 620b
A half baked solution is to not have perlhost.h load a 2nd C std Lib into the perl OS process, and therefore never need to call FreeLibrary in ~Vmem destroyer which will freeze the parent interp. Why the loader lock never times out with a software exception (similar to access vio GUI wise) IDK enough about windows to say. Any XS library that does a FreeLibrary or a LoadLibrary, either explicitly or secretly in a pure C/C++ DLL it wraps, probably even Dynaloader bootstrap will freeze indefinitely, aslong as you Perl Lang kill 9ed a child fork "process" earlier in runtime fast enough to never run win32_start_child.
RT-Send-CC: perl5-porters [...] perl.org
I am done, I am waiting for comments from the rest of P5P now.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 645b
I've been wondering, has this test always failed? or it only fails on SMP >1 CPU Windows (fails for me on 2 core and 8 core PCs)? Or does it only fail on fast CPUs? or on Server 2003 (my machine) or NT 6 and not XP (does for me) or 2000 because kernel scheduler tweaks by MS over the years. Does it only fail on Visual Studio compiled Perl that uses a VS specific CRT instead of msvcrt.dll (ActivePerl's C lib) (I blindly guess, because a FreeLibrary that lowers the reference count but doesn't free the DLL does so without using the loader lock maybe???). How was 5.16.0 released with this failing supposedly since 5.14 according to tony cook?
CC: perl5-porters [...] perl.org
Subject: Re: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Sun, 5 Aug 2012 17:09:20 +1000
To: bulk 88 via RT <perlbug-followup [...] perl.org>
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 947b
On Fri, Aug 03, 2012 at 12:59:18PM -0700, bulk 88 via RT wrote: Show quoted text
> I've been wondering, has this test always failed? or it only fails on > SMP >1 CPU Windows (fails for me on 2 core and 8 core PCs)? Or does it > only fail on fast CPUs? or on Server 2003 (my machine) or NT 6 and not > XP (does for me) or 2000 because kernel scheduler tweaks by MS over the > years.
This is my suspicion for the cause. Show quoted text
> Does it only fail on Visual Studio compiled Perl that uses a VS > specific CRT instead of msvcrt.dll (ActivePerl's C lib) (I blindly > guess, because a FreeLibrary that lowers the reference count but doesn't > free the DLL does so without using the loader lock maybe???). How was > 5.16.0 released with this failing supposedly since 5.14 according to > tony cook?
Because no-one with sufficient Win32 knowledge cares enough about fixing the problems. Win32 isn't the only platform where blead fails, and has failed for a long time. Tony
CC: perl5-porters [...] perl.org
Subject: Re: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Sun, 5 Aug 2012 17:15:08 +1000
To: bulk 88 via RT <perlbug-followup [...] perl.org>
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 1.1k
On Thu, Aug 02, 2012 at 06:03:33PM -0700, bulk 88 via RT wrote: Show quoted text
> A half baked solution is to not have perlhost.h load a 2nd C std Lib > into the perl OS process, and therefore never need to call FreeLibrary > in ~Vmem destroyer which will freeze the parent interp. Why the loader > lock never times out with a software exception (similar to access vio > GUI wise) IDK enough about windows to say. Any XS library that does a > FreeLibrary or a LoadLibrary, either explicitly or secretly in a pure > C/C++ DLL it wraps, probably even Dynaloader bootstrap will freeze > indefinitely, aslong as you Perl Lang kill 9ed a child fork "process" > earlier in runtime fast enough to never run win32_start_child.
The problem is other libraries (eg. any XS module) loads DLLs at runtime, which means there's always the possibility of the loader lock causing a problem. It doesn't need to be an early kill to cause the lockup. I can see ways to fix this specific instance of the bug - eg. have win32_kill() wait a little for the child hwnd to be created (see the default case in the win32_kill() thread kill code.), but it won't protect against the more general problem that TerminateThread() isn't a safe function to call. Tony
CC: bulk 88 via RT <perlbug-followup [...] perl.org>, perl5-porters [...] perl.org
Subject: Re: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Sun, 5 Aug 2012 08:34:39 +0100
To: Tony Cook <tony [...] develop-help.com>
From: Nicholas Clark <nick [...] ccl4.org>
On Sun, Aug 05, 2012 at 05:09:20PM +1000, Tony Cook wrote: Show quoted text
> On Fri, Aug 03, 2012 at 12:59:18PM -0700, bulk 88 via RT wrote:
Show quoted text
> > Does it only fail on Visual Studio compiled Perl that uses a VS > > specific CRT instead of msvcrt.dll (ActivePerl's C lib) (I blindly > > guess, because a FreeLibrary that lowers the reference count but doesn't > > free the DLL does so without using the loader lock maybe???). How was > > 5.16.0 released with this failing supposedly since 5.14 according to > > tony cook?
Because it's not a regression, Ricardo has no way to compel anyone to fix it, and if the choice is between delaying a release indefinitely and making a release with known problems, releasing with known problems is the lesser of two evils. Show quoted text
> Because no-one with sufficient Win32 knowledge cares enough about > fixing the problems.
There actually aren't very many people working on the perl core. (I'd describe it as about 4 and 4 halves worth of very active contributors, and it's been this way for years. This surprises people. Ohloh has graphs, eg https://www.ohloh.net/p/perl/contributors/summary ) Even out into the "long tail" there is very little knowledge of Win32 specifics. As there's also plenty of other stuff that needs doing, pretty much everyone elects to do things they are able to do well, as that gets the best return soonest. So Win32 doesn't get worked on that way. And as there's no hiring organisation in "charge" of Perl 5, it means that it doesn't get worked on the "other" way, by paying people to work on the things that no volunteer volunteers to work on. Show quoted text
> Win32 isn't the only platform where blead fails, and has failed for a > long time.
And whilst we'd love it to be otherwise, it's really not obvious how to get from where we are to where we want to be, short of hoping that someone (or someones) volunteers to start digging into fixing it. There's no hostility to Win32 (or any other platform). It's just that there are no active core developers who use it as a primary development platform. Nicholas Clark
CC: <perl5-porters [...] perl.org>
Subject: RE: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Mon, 6 Aug 2012 13:14:15 -0700
To: "'Tony Cook'" <tony [...] develop-help.com>, "'bulk 88 via RT'" <perlbug-followup [...] perl.org>
From: "Jan Dubois" <jand [...] activestate.com>
Download (untitled) / with headers
text/plain 703b
On Sun, 05 Aug 2012, Tony Cook wrote: Show quoted text
> On Fri, Aug 03, 2012 at 12:59:18PM -0700, bulk 88 via RT wrote:
> > I've been wondering, has this test always failed? or it only fails on > > SMP >1 CPU Windows (fails for me on 2 core and 8 core PCs)? Or does it > > only fail on fast CPUs? or on Server 2003 (my machine) or NT 6 and not > > XP (does for me) or 2000 because kernel scheduler tweaks by MS over the > > years.
> > This is my suspicion for the cause.
I suspect it also doesn't happen when you link Perl against MSVCRT.dll because LoadLibrary("MSVCRT.dll") then just increments the reference count and won't hold the loader lock for very long. Both VC6 and MinGW builds use MSVCRT. Cheers, -Jan
CC: <perl5-porters [...] perl.org>
Subject: RE: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Mon, 6 Aug 2012 13:25:18 -0700
To: "'Tony Cook'" <tony [...] develop-help.com>, "'bulk 88 via RT'" <perlbug-followup [...] perl.org>
From: "Jan Dubois" <jand [...] activestate.com>
Download (untitled) / with headers
text/plain 1.8k
On Sun, 05 Aug 2012, Tony Cook wrote: Show quoted text
> On Thu, Aug 02, 2012 at 06:03:33PM -0700, bulk 88 via RT wrote:
> > A half baked solution is to not have perlhost.h load a 2nd C std Lib > > into the perl OS process, and therefore never need to call FreeLibrary > > in ~Vmem destroyer which will freeze the parent interp. Why the loader > > lock never times out with a software exception (similar to access vio > > GUI wise) IDK enough about windows to say. Any XS library that does a > > FreeLibrary or a LoadLibrary, either explicitly or secretly in a pure > > C/C++ DLL it wraps, probably even Dynaloader bootstrap will freeze > > indefinitely, aslong as you Perl Lang kill 9ed a child fork "process" > > earlier in runtime fast enough to never run win32_start_child.
I wonder if there isn't a simpler solution: Just add #undef malloc #undef free #undef realloc after all the Perl headers have been #included, and then later assign m_pfree = (LPFREE)free; etc and get rid of the whole LoadLibrary/FreeLibrary stuff. That way VMem will just use the current CRT versions of these functions. I suspect the whole LoadLibrary/FreeLibrary dance is only done because the Perl headers redefine malloc to Perl_malloc, and VMem needs to provide the implementation to Perl_malloc, so can't use the redefined malloc() functions. Just getting rid of the wrapper macros should solve the issue though. There are more chances for code cleanup, but someone who observes the problem should try this "fix" to see if it works first. :) Show quoted text
> [...] It won't > protect against the more general problem that TerminateThread() isn't > a safe function to call.
Yes, the whole fork() emulation stuff is rather brittle and can never cease to be "experimental". I wish we could just get rid of it, but CPAN modules seem to rely on it to use forking regression tests on Windows, so I guess it is at least marginally useful. Cheers, -Jan
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.5k
Another more rare way of deadlocking, child thread was killed while holding malloc/heap lock, on fast kill 9 on psuedo fork process test. Show quoted text
_______________________________________________________ ntdll.dll!_KiFastSystemCallRet@0() ntdll.dll!_ZwWaitForSingleObject@12() + 0xc ntdll.dll!_RtlpWaitForCriticalSection@4() + 0x8c ntdll.dll!_RtlEnterCriticalSection@4() + 0x46 ntdll.dll!_RtlDebugFreeHeap@12() + 0x6a ntdll.dll!_RtlFreeHeapSlowly@12() + 0x246cf ntdll.dll!_RtlFreeHeap@12() + 0x17646 msvcrt.dll!_free() + 0xc3
> perl510.dll!VMem::Free(void * pMem=0x0183780c) Line 230 C++
perl510.dll!PerlMemFree(IPerlMem * piPerl=0x002342a4, void * ptr=0x0183780c) Line 310 C++ perl510.dll!Perl_safesysfree(void * where=0x0183780c) Line 250 + 0xe C perl510.dll!Perl_sv_clear(interpreter * my_perl=0x00235ffc, sv * sv=0x00000006) Line 5260 C perl510.dll!Perl_sv_free2(interpreter * my_perl=0x00235ffc, sv * sv=0x0182bee4) Line 5367 C perl510.dll!Perl_sv_free(interpreter * my_perl=0x00235ffc, sv * sv=0x0182bee4) Line 5345 + 0xa C perl510.dll!Perl_av_undef(interpreter * my_perl=0x00235ffc, av * av=0x0023ad5c) Line 485 + 0xd C perl510.dll!Perl_sv_clear(interpreter * my_perl=0x0183780c, sv * sv=0x0140fa94) Line 5191 + 0x7 C perl510.dll!Perl_sv_2pv_flags(interpreter * my_perl=0x00000ff0, sv * sv=0x00081414, unsigned int * lp=0x0000007c, long flags=4080) Line 2776 C 0000007b()
_________________________________________________ Symbols files werent perfect for this run but it shows of the point of deadlock on heap lock. I will have more comments in the next day or 2 on this bug.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 183b
https://rt.cpan.org/Ticket/Display.html?id=66016 one of these is a duplicate of the other, or atleast they are related. Someone link them in RT. I only noticed the older bug just now.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 253b
On Mon Aug 06 21:15:09 2012, bulk88 wrote: Show quoted text
> https://rt.cpan.org/Ticket/Display.html?id=66016 one of these is a > duplicate of the other, or atleast they are related. Someone link them > in RT. I only noticed the older bug just now.
NVM. Different RTs.
Subject: [PATCH] t/op/fork.t blocks in external process test on Win32
Download (untitled) / with headers
text/plain 213b
Wrote up a patch. fork.t/"fast kill 9 after fork" stopped deadlocking for me. I wonder if the 5 ms delay is enough (I really dont know). I came up with the 5 ms delay from the previous code that did 5 "Sleep(0)"s.
Download commit-2d681a6
application/octet-stream 6.1k

Message body not shown because it is not plain text.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 333b
On Wed Aug 08 17:59:27 2012, bulk88 wrote: Show quoted text
> Wrote up a patch. fork.t/"fast kill 9 after fork" stopped deadlocking > for me. I wonder if the 5 ms delay is enough (I really dont know). I > came up with the 5 ms delay from the previous code that did 5 "Sleep(0)"s.
My last post made it to Perl RT but not P5P. This one hopefully will.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #88840] [PATCH] t/op/fork.t blocks in external process test on Win32
Date: Fri, 10 Aug 2012 17:32:35 +1000
To: bulk 88 via RT <perlbug-followup [...] perl.org>
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 539b
On Wed, Aug 08, 2012 at 05:59:28PM -0700, bulk 88 via RT wrote: Show quoted text
> Wrote up a patch. fork.t/"fast kill 9 after fork" stopped deadlocking > for me. I wonder if the 5 ms delay is enough (I really dont know). I > came up with the 5 ms delay from the previous code that did 5 "Sleep(0)"s.
I tried this and ran t/op/fork.t in a loop. First time it deadlocked after 7 runs, second it deadlocked after 27 runs. I increased delay to 10 ms and it successfully ran the test 500 times. Also, please use git format-patch to produce patches. Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 4.2k
On Fri Aug 10 00:33:07 2012, tonyc wrote: Show quoted text
> I tried this and ran t/op/fork.t in a loop. > > First time it deadlocked after 7 runs, second it deadlocked after 27 > runs. > > I increased delay to 10 ms and it successfully ran the test 500 times. > > Also, please use git format-patch to produce patches. > > Tony >
I wrote a script Show quoted text
_______________________________ for(0..1000) { system $^X, "-e", "if (\$pid=fork){sleep 1;kill(9, \$pid)} else {sleep 5}"; } print "done\n";
_______________________________ Ran it twice on a DEBUGGING 5.17 with my patch, each one succeeded, no deadlock and no timeout message (this machine did previously deadlock as I described above). Did it deadlock (no message) for you or it hit the croak I put in? Are you using a DEBUGGING perl or not (according to smokes, it only affects DEBUGGING builds, http://www.nntp.perl.org/group/perl.daily-build.reports/2012/08/msg125162.html )? Did you change the ms time to 10 (so 10 x Sleep(1)s happened), or you put in a Sleep(10)? If it deadlocked for you, I am very interested in that, since my patch is supposed to die or succeed, not deadlock. The only thing I can think of, unless you can produce a callstack that shows otherwise (that would be greatly appreciated), you are probably deadlocking on the heap critical section (but how timing wise? child interp is supposed to be in a sleep 5;, is it really taking that long from the hwnd message send, thus unlocking get_hwnd_delay to runops calling the sleep opcode function?), since the DLL loader lock is guaranteed to not be on in the child interp unless your doing XS. In child interp's VMem, the LoadLibrary on msvcrt.dll is done in the parent interp thread and it does use the loader lock even though msvcrt.dll is already in the process from the parent interp's VMem. Why MS uses the loader lock on a refcount increase only LoadLibrary IDK.
____________________________________________________________
> ntdll.dll!RtlEnterCriticalSection() + 0xa bytes
ntdll.dll!LdrLockLoaderLock() + 0x78 bytes ntdll.dll!LdrLoadDll() + 0xeb bytes kernel32.dll!LoadLibraryExW() + 0x120 bytes kernel32.dll!LoadLibraryA() + 0x86 bytes perl517.dll!VMem::VMem() Line 142 + 0xd bytes C++ perl517.dll!CPerlHost::CPerlHost() Line 1949 + 0x21 bytes C++ perl517.dll!perl_alloc() Line 179 + 0x21 bytes C++ perl517.dll!RunPerl(int argc=0x00000003, char * * argv=0x0000000000343d90, char * * env=0x0000000000344820) Line 248 + 0x5 bytes C++ perl.exe!main(int argc=0x00000003, char * * argv=0x0000000000343d90, char * * env=0x0000000000344820) Line 24 C perl.exe!__tmainCRTStartup() Line 586 + 0x19 bytes C kernel32.dll!BaseProcessStart() + 0x2c bytes
________________________________________________________ The FreeLibrary of the child interp's VMem happens in the child thread, unlike the LoadLibrary of the child interp (" perl -e "if ($pid=fork){waitpid($pid,0);} else {sleep 1;}"") but this code is never reached in a early kill 9 after fork scenario. I stepped through the FreeLibrary asm, it does not get/use the loader lock unless refcount reaches 0 and instead uses inlined Interlocked* asm ops to decrease refcount. So even if get_hwnd_delay took 5 clock seconds (a better question is why? Perl's fault or user's fault?) (the Perl sleep period in the child fork), since FreeLibrary doesn't get the loader lock, a TerminateThread wouldn't deadlock there, unless, TerminateThread was called after the win32_start_child "exited" and was in DLL_THREAD_DETACH which will have the loader lock. I haven't seen this scenario happen on my machine.
________________________________________________________
> kernel32.dll!FreeLibrary()
perl517.dll!VMem::~VMem() Line 159 C++ perl517.dll!VMem::`scalar deleting destructor'() + 0x17 bytes C++ perl517.dll!VMem::Release() Line 268 + 0x2b bytes C++ perl517.dll!CPerlHost::~CPerlHost() Line 2081 C++ perl517.dll!CPerlHost::`scalar deleting destructor'() + 0x17 bytes C++ perl517.dll!win32_delete_internal_host(void * h=0x000000000034aae0) Line 202 + 0x2b bytes C++ perl517.dll!perl_free(interpreter * my_perl=0x0000000001af8888) Line 1335 C perl517.dll!win32_start_child(void * arg=0x0000000001af8888) Line 1804 C++ kernel32.dll!BaseThreadStart() + 0x3a bytes
________________________________________________________
From 2d681a6d132537b5febb79b96d5169b822b510cc Mon Sep 17 00:00:00 2001 From: Daniel Dragan <bulk88@hotmail.com> Date: Wed, 8 Aug 2012 20:46:31 -0400 Subject: [PATCH] fix RT#88840, don't terminate a child fork psuedo process in DLL Loader Lock TerminateThread will terminate a thread but leaks all resources of that thread, and all locks will never be released, as documented in MSDN. There is no alternative to locks not being released that I see, but atleast -e "if ($pid=fork){kill(9,$pid)} else {sleep 5}" in fork.t won't deadlock with this patch since win32_start_child be reached before TerminateThread happens. The 5 ms timeout can be increased if problems arise in the future. The HWND of the child is delivered by win32_start_child very early, before any perl bytecode is executed, therefore the delay is keeping in spirit with a kill 9. In any case, if the child thread fails to schedule, (a DllMain in DLL_THREAD_ATTACH of some DLL in the process deadlocks or does very long (over 5 ms right now) sync IO), the parent interp will bail out. --- win32/win32.c | 88 ++++++++++++++++++++++++++++++++++++++------------------ 1 files changed, 60 insertions(+), 28 deletions(-) diff --git a/win32/win32.c b/win32/win32.c index 211ca6f..54159ca 100644 --- a/win32/win32.c +++ b/win32/win32.c @@ -1271,6 +1271,46 @@ my_kill(int pid, int sig) return retval; } + + +#ifdef USE_ITHREADS +/* + will get a child psuedo process hwnd, with retrying and sleeping + success is hwnd != INVALID_HANDLE_VALUE, so NULL HWND can be returned + ms is milliseconds to sleep/tries, each try is 1 millisec, fatally + errors if child psuedo process doesn't schedule and deliver a HWND in the + time period specified, 0 milliseconds causes only Sleep(0) to be used + with "no" OS delay being given to the calling thread, 0 msis not recommended +*/ +static HWND +get_hwnd_delay(pTHX, long child, DWORD ms){ + HWND hwnd = w32_pseudo_child_message_hwnds[child]; +/* pseudo-process has not yet properly initialized if hwnd isn't set */ + if (hwnd != INVALID_HANDLE_VALUE) return hwnd; +/*fast sleep, on some NT Kernels/systems, a Sleep(0) won't deschedule a +thread 100% of the time since threads are sticked to a CPU for NUMA +and caching reasons, and the child thread was stickied to a different CPU +therefore there is no workload on that CPU, and Sleep(0) returns control +without yielding the time slot +https://rt.perl.org/rt3/Ticket/Display.html?id=88840 +*/ + Sleep(0); + win32_async_check(aTHX); + hwnd = w32_pseudo_child_message_hwnds[child]; + if (hwnd != INVALID_HANDLE_VALUE) return hwnd; + { + int count = 0; + while (count++ < ms){ /*ms=0 no Sleep(1),just fail by now*/ + Sleep(1); + win32_async_check(aTHX); + hwnd = w32_pseudo_child_message_hwnds[child]; + if (hwnd != INVALID_HANDLE_VALUE) return hwnd; + } + } + Perl_croak(aTHX_ "panic: child psuedo process was never scheduled"); +} +#endif + DllExport int win32_kill(int pid, int sig) { @@ -1281,15 +1321,16 @@ win32_kill(int pid, int sig) /* it is a pseudo-forked child */ child = find_pseudo_pid(-pid); if (child >= 0) { - HWND hwnd = w32_pseudo_child_message_hwnds[child]; HANDLE hProcess = w32_pseudo_child_handles[child]; switch (sig) { case 0: /* "Does process exist?" use of kill */ return 0; - case 9: + case 9: { /* kill -9 style un-graceful exit */ +/*do a wait to make sure child starts and isnt in DLL Loader Lock*/ + HWND hwnd = get_hwnd_delay(aTHX, child, 5);/*XXX change delay*/ if (TerminateThread(hProcess, sig)) { /* Allow the scheduler to finish cleaning up the other thread. * Otherwise, if we ExitProcess() before another context switch @@ -1301,36 +1342,27 @@ win32_kill(int pid, int sig) remove_dead_pseudo_process(child); return 0; } + } break; default: { - int count = 0; - /* pseudo-process has not yet properly initialized if hwnd isn't set */ - while (hwnd == INVALID_HANDLE_VALUE && count < 5) { - /* Yield and wait for the other thread to send us its message_hwnd */ - Sleep(0); - win32_async_check(aTHX); - hwnd = w32_pseudo_child_message_hwnds[child]; - ++count; - } - if (hwnd != INVALID_HANDLE_VALUE) { - /* We fake signals to pseudo-processes using Win32 - * message queue. In Win9X the pids are negative already. */ - if ((hwnd != NULL && PostMessage(hwnd, WM_USER_KILL, sig, 0)) || - PostThreadMessage(-pid, WM_USER_KILL, sig, 0)) - { - /* Don't wait for child process to terminate after we send a SIGTERM - * because the child may be blocked in a system call and never receive - * the signal. - */ - if (sig == SIGTERM) { - Sleep(0); - w32_pseudo_child_sigterm[child] = 1; - } - /* It might be us ... */ - PERL_ASYNC_CHECK(); - return 0; + HWND hwnd = get_hwnd_delay(aTHX, child, 5);/*XXX change delay*/ + /* We fake signals to pseudo-processes using Win32 + * message queue. In Win9X the pids are negative already. */ + if ((hwnd != NULL && PostMessage(hwnd, WM_USER_KILL, sig, 0)) || + PostThreadMessage(-pid, WM_USER_KILL, sig, 0)) + { + /* Don't wait for child process to terminate after we send a SIGTERM + * because the child may be blocked in a system call and never receive + * the signal. + */ + if (sig == SIGTERM) { + Sleep(0); + w32_pseudo_child_sigterm[child] = 1; } + /* It might be us ... */ + PERL_ASYNC_CHECK(); + return 0; } break; } -- 1.7.9.msysgit.0
CC: perl5-porters [...] perl.org
Subject: Re: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Sun, 12 Aug 2012 12:48:51 +1000
To: bulk 88 via RT <perlbug-followup [...] perl.org>
From: Tony Cook <tony [...] develop-help.com>
Download (untitled) / with headers
text/plain 1.4k
On Fri, Aug 10, 2012 at 09:42:22AM -0700, bulk 88 via RT wrote: Show quoted text
> On Fri Aug 10 00:33:07 2012, tonyc wrote:
> > I tried this and ran t/op/fork.t in a loop. > > > > First time it deadlocked after 7 runs, second it deadlocked after 27 > > runs. > > > > I increased delay to 10 ms and it successfully ran the test 500 times. > > > > Also, please use git format-patch to produce patches. > > > > Tony > >
> I wrote a script
Show quoted text
> If it deadlocked for you, I am very interested in that, since my patch > is supposed to die or succeed, not deadlock. The only thing I can think > of, unless you can produce a callstack that shows otherwise (that would > be greatly appreciated), you are probably deadlocking on the heap > critical section (but how timing wise? child interp is supposed to be in > a sleep 5;, is it really taking that long from the hwnd message send, > thus unlocking get_hwnd_delay to runops calling the sleep opcode > function?), since the DLL loader lock is guaranteed to not be on in the > child interp unless your doing XS. In child interp's VMem, the > LoadLibrary on msvcrt.dll is done in the parent interp thread and it > does use the loader lock even though msvcrt.dll is already in the > process from the parent interp's VMem. Why MS uses the loader lock on a > refcount increase only LoadLibrary IDK.
I rebuilt from scratch and ran my test again, and couldn't reproduce the failures. Sorry for the noise. So I'm happy with this change. Tony
RT-Send-CC: perl5-porters [...] perl.org
Can someone apply the patch to perl git?
CC: <perl5-porters [...] perl.org>
Subject: RE: [perl #88840] t/op/fork.t blocks in external process test on Win32
Date: Tue, 14 Aug 2012 09:14:54 +0100
To: <perlbug-followup [...] perl.org>, "OtherRecipients of perl Ticket #88840" <"OtherRecipients of perl Ticket #88840:;" [...] planit.com>
From: "Steve Hay" <Steve.Hay [...] verosoftware.com>
Download (untitled) / with headers
text/plain 308b
bulk 88 via RT wrote on 2012-08-14: Show quoted text
> Can someone apply the patch to perl git? > > --- > via perlbug: queue: perl5 status: open > https://rt.perl.org:443/rt3/Ticket/Display.html?id=88840
Apologies for the delay. I will test it and expect to apply it in a few hours. Thanks for your contributions!
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 430b
On Tue Aug 14 01:15:45 2012, Steve.Hay@verosoftware.com wrote: Show quoted text
> bulk 88 via RT wrote on 2012-08-14:
> > Can someone apply the patch to
> perl git?
> > > > --- > > via perlbug: queue: perl5 status: open > >
> https://rt.perl.org:443/rt3/Ticket/Display.html?id=88840 > > Apologies > for the delay. I will test it and expect to apply it in a few hours. > Thanks for your contributions!
Thanks again, now committed in d903973c05.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org