Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threads sort crashes with sort subroutine (but not with sort block) #7372

Closed
p5pRT opened this issue Jun 17, 2004 · 14 comments
Closed

threads sort crashes with sort subroutine (but not with sort block) #7372

p5pRT opened this issue Jun 17, 2004 · 14 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 17, 2004

Migrated from rt.perl.org#30333 (status was 'resolved')

Searchable as RT30333$

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2004

From ajsavige@yahoo.com.au

The test program below demonstrates a threads-related crash on
both Linux and Windows. The crashes only seem to occur when
sorting using an explicit subroutine name (not when using
a sort block). The crashes are somewhat intermittent, but
will generally occur when you run the test program below
(at least, that's what I see).

There has been some discussion of this interesting bug at​:
http​://www.perlmonks.org/index.pl?node_id=367162

-----------snip------------snip------------snip-----------------------
#!/usr/bin/perl -w

use strict;
use threads;
select(STDERR);$|=1;select(STDOUT);$|=1;

sub mycmp { length($b) <=> length($a) }

sub do_one_thread {
  my $kid = shift;
  warn "kid $kid before sort\n";
  my @​list = ( 'x', 'yy', 'zzz', 'a', 'bb', 'ccc', 'aaaaa', 'z',
  'hello', 's', 'thisisalongname', '1', '2', '3',
  'abc', 'xyz', '1234567890', 'm', 'n', 'p' );
  for my $j (1..99999) {
  # This does not crash
  # for my $k (sort { length($b) <=> length($a) } @​list) {}
  # Yet this does crash
  for my $k (sort mycmp @​list) {}
  # This does not crash
  # sort mycmp @​list;
  # This does crash
  # my @​sl = sort mycmp @​list;
  }
  warn "kid $kid after sort, sleeping 1\n";
  sleep(1);
  warn "kid $kid exit\n";
}

sub do_threads {
  my $nthreads = shift;
  my @​kids = ();
  for my $i (1..$nthreads) {
  my $t = threads->new(\&do_one_thread, $i);
  warn "parent $$​: continue\n";
  push(@​kids, $t);
  }
  for my $t (@​kids) {
  warn "parent $$​: waiting for join\n";
  $t->join();
  warn "parent $$​: thread exited\n";
  }
}

# do_threads(1); # does not crash
do_threads(2); # crashes
-----------snip------------snip------------snip-----------------------

/-\


Flags​:
  category=
  severity=


Site configuration information for perl v5.8.4​:

Configured by andrews at Sun Jun 6 11​:34​:57 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration​:
  Platform​:
  osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread
  uname=''
  config_args='undef'
  hint=recommended, useposix=true, d_sigaction=undef
  usethreads=undef use5005threads=undef useithreads=define
usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE
-DNO_STRICT -DHAVE_DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
-DUSE_PERLIO -DPERL_MSVCRT_READFIX',
  optimize='-MD -Zi -DNDEBUG -O1',
  cppflags='-DWIN32'
  ccversion='', gccversion='', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='link', ldflags ='-nologo -nodefaultlib -debug -opt​:ref,icf
-libpath​:"c​:\vperl\lib\CORE" -machine​:x86'
  libpth="C​:\Program Files\Microsoft Visual Studio\VC98\lib"
  libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib
uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib
msvcrt.lib
  perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib
uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib
msvcrt.lib
  libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
  gnulibc_version='undef'
  Dynamic Linking​:
  dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt​:ref,icf
-libpath​:"c​:\vperl\lib\CORE" -machine​:x86'

Locally applied patches​:


@​INC for perl v5.8.4​:
  C​:/vperl/lib
  C​:/vperl/site/lib
  .


Environment for perl v5.8.4​:
  HOME (unset)
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=C​:\vperl\bin;C​:\tools\bin;C​:\Program Files\Microsoft Visual
Studio\common\msdev98\BIN;C​:\Program Files\Microsoft Visual
Studio\VC98\BIN;C​:\Program Files\Microsoft Visual
Studio\common\TOOLS\WINNT;C​:\Program Files\Microsoft Visual
Studio\common\TOOLS;C​:\windows\system32;C​:\windows;C​:\windows\system32\WBEM
  PERL_BADLANG (unset)
  SHELL (unset)

Find local movie times and trailers on Yahoo! Movies.
http​://au.movies.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Jun 25, 2004

From ajsavige@yahoo.com.au

--- Andrew Savige <perlbug-followup@​perl.org> wrote​:

The test program below demonstrates a threads-related crash on
both Linux and Windows. The crashes only seem to occur when
sorting using an explicit subroutine name (not when using
a sort block). The crashes are somewhat intermittent, but
will generally occur when you run the test program below
(at least, that's what I see).

There has been some discussion of this interesting bug at​:
http​://www.perlmonks.org/index.pl?node_id=367162

-----------snip------------snip------------snip-----------------------
#!/usr/bin/perl -w

use strict;
use threads;
select(STDERR);$|=1;select(STDOUT);$|=1;

sub mycmp { length($b) <=> length($a) }

sub do_one_thread {
my $kid = shift;
warn "kid $kid before sort\n";
my @​list = ( 'x', 'yy', 'zzz', 'a', 'bb', 'ccc', 'aaaaa', 'z',
'hello', 's', 'thisisalongname', '1', '2', '3',
'abc', 'xyz', '1234567890', 'm', 'n', 'p' );
for my $j (1..99999) {
# This does not crash
# for my $k (sort { length($b) <=> length($a) } @​list) {}
# Yet this does crash
for my $k (sort mycmp @​list) {}
# This does not crash
# sort mycmp @​list;
# This does crash
# my @​sl = sort mycmp @​list;
}
warn "kid $kid after sort, sleeping 1\n";
sleep(1);
warn "kid $kid exit\n";
}

sub do_threads {
my $nthreads = shift;
my @​kids = ();
for my $i (1..$nthreads) {
my $t = threads->new(\&do_one_thread, $i);
warn "parent $$​: continue\n";
push(@​kids, $t);
}
for my $t (@​kids) {
warn "parent $$​: waiting for join\n";
$t->join();
warn "parent $$​: thread exited\n";
}
}

# do_threads(1); # does not crash
do_threads(2); # crashes
-----------snip------------snip------------snip-----------------------

I have analysed some sort crashes and have some more information.

As will soon become obvious, the analysis below is written from the
point of view of a perl internals novice and I have no idea how to
fix this. :-(

This is a typical crash​:

1) Thread 1 is inside CALLRUNOPS (called from sortcv in pp_sort.c).
  i.e. it is executing the OPs of the mycmp subroutine.

2) Thread 2 does a POPBLOCK/LEAVE scope as is exits pp_sort.
  i.e. it has just finished a sort.

3) Thread 1 now executes the final leavesub OP inside CALLRUNOPS,
  but instead of calling pp_null (as it usually does) it instead
  calls pp_leavesub, resulting in my_perl->Tfirstgv being set to
  zero [inside Perl_leave_scope (case​: SAVEt_SPTR)].

4) Thread 1 happily exits sortcv, but when it reenters it, there
  is trouble because Tfirstgv is still zero. So when it tries to
  execute the first statement of sortcv, namely​:
  GvSV(PL_firstgv) = a;
  which, after macro expansion, is​:
  ((((XPVGV*)((my_perl->Tfirstgv))->sv_any)->xgv_gp)->gp_sv) = a;
  it crashes because my_perl->Tfirstgv is zero.

Point 3) above is a mystery to me, and I hope someone can shed some
light on it. I might be going mad, but stepping thru in the debugger
in the non-crashing cases, OP 167 (leavesub) did not call pp_leavesub
as I expected, but instead called pp_null!

This is just a guess really, but pp_sort lines 1472/1473 look odd​:

  SAVEVPTR(CvROOT(cv)->op_ppaddr);
  CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL];

I am guessing that these two lines set ppaddr of the leavesub OP to
the address of pp_null -- and when thread 2 leaves pp_sort, the
ppaddr of the leavesub OP reverts to pp_leavesub (??).

In case you're interested, the original place where this crash was
causing me grief was a multi-threaded LWP program, AFAICT because
of this line​:
  foreach $key (sort _header_cmp keys %$self) {
in HTTP​::Headers.

/-\

Find local movie times and trailers on Yahoo! Movies.
http​://au.movies.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Jun 28, 2004

From @iabyn

I've traced the location of this threads/sort segfault to some code
in pp_sort​:

  SAVEVPTR(CvROOT(cv)->op_ppaddr);
  CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL];

This is deeply, deeply, not thread-safe. It's supposed to turn the
leavesub op temporarily into a null op​: the sort sub is invoked via a
lightweight mechanism that doesn't require a leavesub at the end; but
since the op tree is shared between threads, the other thread may have
already done the same and then restored it, leading to leavesub being
erroneouly called, and general corruption ensuing.

However, it's too late at night for me to work out how to fix this yet.

Dave.

--
Justice is when you get what you deserve.
Law is when you get what you pay for.

@p5pRT
Copy link
Author

p5pRT commented Jun 28, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 28, 2004

From @rgs

Dave Mitchell wrote​:

I've traced the location of this threads/sort segfault to some code
in pp_sort​:

    SAVEVPTR\(CvROOT\(cv\)\->op\_ppaddr\);
    CvROOT\(cv\)\->op\_ppaddr = PL\_ppaddr\[OP\_NULL\];

This is deeply, deeply, not thread-safe. It's supposed to turn the
leavesub op temporarily into a null op​: the sort sub is invoked via a
lightweight mechanism that doesn't require a leavesub at the end; but
since the op tree is shared between threads, the other thread may have
already done the same and then restored it, leading to leavesub being
erroneouly called, and general corruption ensuing.

However, it's too late at night for me to work out how to fix this yet.

That's a difficult problem because :
  * we must not make leavesub any slower, this is not an option
  * we must not modify an optree (even in non-threaded perls this can
  be unsafe)
  * we don't know at compile-time whether a particular sub will be used
  as a sort subroutine, thus we can't compile them with particular
  hints, flags or ops
  * we don't want to make sort noticeably slower
I see two options so far :
  * cloning the optree of the sort sub (heavy)
  * dropping the leavesub optimisation (IIUC it's only used for _named_
  sort subroutines) (easier, slower)
Better ideas?

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2004

From ajsavige@yahoo.com.au

I've just done a full search of my perl install (perl-5.8.4 plus quite
a few CPAN modules). The only code I found that actually uses sort
subroutines is​:
HTTP/Headers.pm (sort _header_cmp) -- I just raised RT cpan #6781 for this
Test/More.pm (sort _bogus_sort) -- I just raised RT cpan #6782 for this
For both of these, it seems easy to work around this bug by inlining the
sort subroutine as a sort block.

Luckily, it seems that sort subroutines are rarely used.

/-\

Find local movie times and trailers on Yahoo! Movies.
http​://au.movies.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2004

From @rgs

Tassilo von Parseval wrote​:

FWIW, I have one module that exports sort-functions from a C library as
XSUBS (it uses the prototyped sort subroutines when available). I am not
sure though whether this leavesub optimization kicks in for XSUBS as
well.

Apparently not : from pp_sort.c,

  if (is_xsub)
  PL_sortcop = (OP*)cv;
  else {
  PL_sortcop = CvSTART(cv);
  SAVEVPTR(CvROOT(cv)->op_ppaddr);
  CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL];

  PAD_SET_CUR(CvPADLIST(cv), 1);
  }

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2004

From tassilo.parseval@post.rwth-aachen.de

On Tue, Jun 29, 2004 at 01​:38​:54PM +1000 Andrew Savige wrote​:

I've just done a full search of my perl install (perl-5.8.4 plus quite
a few CPAN modules). The only code I found that actually uses sort
subroutines is​:
HTTP/Headers.pm (sort _header_cmp) -- I just raised RT cpan #6781 for this
Test/More.pm (sort _bogus_sort) -- I just raised RT cpan #6782 for this
For both of these, it seems easy to work around this bug by inlining the
sort subroutine as a sort block.

Luckily, it seems that sort subroutines are rarely used.

FWIW, I have one module that exports sort-functions from a C library as
XSUBS (it uses the prototyped sort subroutines when available). I am not
sure though whether this leavesub optimization kicks in for XSUBS as
well.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@​"tnirp}3..0}_$;//​::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?&lt;=sub).+q#q!'"qq.\t$&amp;."'!#+sexisexiixesixeseg;y~\n~~dddd;eval

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2004

From ajsavige@yahoo.com.au

Tassilo von Parseval wrote​:

FWIW, I have one module that exports sort-functions from a C library as
XSUBS (it uses the prototyped sort subroutines when available). I am not
sure though whether this leavesub optimization kicks in for XSUBS as
well.

I think you're OK. I've tested four cases (with 2 threads)​:
  1) sort mycmp @​list; # where mycmp is a Perl subroutine
  2) sort mycmp @​list; # Like 1) but mycmp has $$ prototype
  3) sort mycmp @​list; # where mycmp is a XSUB
  4) sort { block } @​list;
Cases 1) and 2) above both pass thru the nasty​:
  SAVEVPTR(CvROOT(cv)->op_ppaddr);
  CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL];
code path. Cases 3) and 4) do not.

Cases 3 and 4 never crash for me. Case 1 always crashes for me.
Case 2 crashes and/or gives this nasty error​:
  NULL OP IN RUN
  thread failed to start​: Sort subroutine didn't return single value

When the smoke has cleared, I'd be happy to contribute a test case.

/-\

Find local movie times and trailers on Yahoo! Movies.
http​://au.movies.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2004

From tassilo.parseval@post.rwth-aachen.de

On Tue, Jun 29, 2004 at 09​:39​:25AM +0200 Rafael Garcia-Suarez wrote​:

Tassilo von Parseval wrote​:

FWIW, I have one module that exports sort-functions from a C library as
XSUBS (it uses the prototyped sort subroutines when available). I am not
sure though whether this leavesub optimization kicks in for XSUBS as
well.

Apparently not : from pp_sort.c,

    if \(is\_xsub\)
    PL\_sortcop = \(OP\*\)cv;
    else \{
    PL\_sortcop = CvSTART\(cv\);
    SAVEVPTR\(CvROOT\(cv\)\->op\_ppaddr\);
    CvROOT\(cv\)\->op\_ppaddr = PL\_ppaddr\[OP\_NULL\];

    PAD\_SET\_CUR\(CvPADLIST\(cv\)\, 1\);
        \}

Oh, good. In this case you have my full blessing to do any
un-optimization necessary to fix the problem. :-)

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@​"tnirp}3..0}_$;//​::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?&lt;=sub).+q#q!'"qq.\t$&amp;."'!#+sexisexiixesixeseg;y~\n~~dddd;eval

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2004

From @iabyn

On Mon, Jun 28, 2004 at 09​:32​:23PM +0200, Rafael Garcia-Suarez wrote​:

Dave Mitchell wrote​:

I've traced the location of this threads/sort segfault to some code
in pp_sort​:

    SAVEVPTR\(CvROOT\(cv\)\->op\_ppaddr\);
    CvROOT\(cv\)\->op\_ppaddr = PL\_ppaddr\[OP\_NULL\];

This is deeply, deeply, not thread-safe. It's supposed to turn the
leavesub op temporarily into a null op​: the sort sub is invoked via a
lightweight mechanism that doesn't require a leavesub at the end; but
since the op tree is shared between threads, the other thread may have
already done the same and then restored it, leading to leavesub being
erroneouly called, and general corruption ensuing.

However, it's too late at night for me to work out how to fix this yet.

That's a difficult problem because :
* we must not make leavesub any slower, this is not an option

Hmm, I just tried fixing it in leavesub (see the untested diff below), and
with the following code got these timings (run several times)​:

  use Time​::HiRes qw( usleep ualarm gettimeofday tv_interval );

  sub f {};

  $t = [gettimeofday];
  for (1..50000){
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f();
  }
  $elasped = tv_interval ($t);
  printf "%2.5f\n", $elasped;

pre-patch​:

8.57977
8.80304
9.21903
8.65993
8.84125
8.57947
9.34966

post-patch​:

8.78078
8.58703
8.54905
8.56382
8.59135
8.79712
8.67267

from I which conclude that fiddling with leavesub speeds it up. Don't you
just love benchmarks!

Dave.

--
Counsellor Troi states something other than the blindingly obvious.
  -- Things That Never Happen in "Star Trek" #16

Inline Patch
--- ../22960.ORIG/pp_hot.c	Sat Jun 19 10:59:42 2004
+++ pp_hot.c	Wed Jun 30 01:42:26 2004
@@ -2344,6 +2344,8 @@ PP(pp_leavesub)
     register PERL_CONTEXT *cx;
     SV *sv;
 
+    if (PL_curstackinfo->si_type == PERLSI_SORT && cxstack_ix == PL_sortcxix)
+	return Nullop;
     POPBLOCK(cx,newpm);
     cxstack_ix++; /* temporarily protect top context */
 
--- ../22960.ORIG/pp_sort.c	Sat Jun 19 10:59:59 2004
+++ pp_sort.c	Wed Jun 30 01:42:50 2004
@@ -1469,8 +1469,8 @@ PP(pp_sort)
 		PL_sortcop = (OP*)cv;
 	    else {
 		PL_sortcop = CvSTART(cv);
-		SAVEVPTR(CvROOT(cv)->op_ppaddr);
-		CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL];
+		/* XXX SAVEVPTR(CvROOT(cv)->op_ppaddr);
+		CvROOT(cv)->op_ppaddr = PL_ppaddr[OP_NULL]; */
 
 		PAD_SET_CUR(CvPADLIST(cv), 1);
             }

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2004

From @rgs

Dave Mitchell wrote​:

Hmm, I just tried fixing it in leavesub (see the untested diff below), and
with the following code got these timings (run several times)​:
...
from I which conclude that fiddling with leavesub speeds it up. Don't you
just love benchmarks!

Funny thing, I have opposite results (8.31 average timing without the
patch, 8.35 with).

@p5pRT
Copy link
Author

p5pRT commented Sep 27, 2010

From @cpansprout

On Sun Jun 27 17​:58​:55 2004, davem@​iabyn.com wrote​:

I've traced the location of this threads/sort segfault to some code
in pp_sort​:

    SAVEVPTR\(CvROOT\(cv\)\->op\_ppaddr\);
    CvROOT\(cv\)\->op\_ppaddr = PL\_ppaddr\[OP\_NULL\];

This is deeply, deeply, not thread-safe. It's supposed to turn the
leavesub op temporarily into a null op​: the sort sub is invoked via a
lightweight mechanism that doesn't require a leavesub at the end; but
since the op tree is shared between threads, the other thread may have
already done the same and then restored it, leading to leavesub being
erroneouly called, and general corruption ensuing.

However, it's too late at night for me to work out how to fix this yet.

Dave.

I cannot reproduced this in 5.10.0, probably because commit
9850bf2 removed those lines.

@p5pRT
Copy link
Author

p5pRT commented Sep 27, 2010

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant