Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with "thread" version of perl 5.8.3 #7218

Closed
p5pRT opened this issue Apr 8, 2004 · 19 comments
Closed

Segmentation fault with "thread" version of perl 5.8.3 #7218

p5pRT opened this issue Apr 8, 2004 · 19 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 8, 2004

Migrated from rt.perl.org#28369 (status was 'resolved')

Searchable as RT28369$

@p5pRT
Copy link
Author

p5pRT commented Apr 8, 2004

From nog@MPA-Garching.MPG.DE

Created by nog@mpa-garching.mpg.de

My name is Norbert Gruener. I am owner and maintainer of the Perl XS
module AFS.

Recently I got two reports that my module is crashing under "Debian
unstable" while it is running under "Debian stable". I could isolate
the problem to the following situation. When I am using the
"threaded" version

  "perl, v5.8.3 built for i686-linux-thread-multi"

then my module crashes. Whereas if I am using the version without
"threading"

  "perl, v5.8.3 built for i686-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying
"threads are not recommended, so forget it"?

If you are interested in this case I can supply you with details of
this problem.

Cheers,

Norbert

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.3:

Configured by nog at Thu Apr  8 12:06:32 CEST 2004.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.4.25, archname=i686-linux-thread-multi
    uname='linux ncf-15 2.4.25 #1 smp thu feb 19 13:14:53 cet 2004 i686 unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.1', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.3:
    /tmp/local/lib/perl5/5.8.3/i686-linux-thread-multi
    /tmp/local/lib/perl5/5.8.3
    /tmp/local/lib/perl5/site_perl/5.8.3/i686-linux-thread-multi
    /tmp/local/lib/perl5/site_perl/5.8.3
    /tmp/local/lib/perl5/site_perl
    .


Environment for perl v5.8.3:
    HOME=/afs/mpa/home/nog
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=
    LD_LIBRARY_PATH=/opt/gnome/lib
    LOGDIR (unset)
    PATH=/afs/mpa/home/nog/bin:/afs/mpa/home/nog/pmtools:/opt/gnome/bin:/usr/common/sbin:/usr/local/sbin:/usr/afsws/etc:/usr/local/bin:/usr/common/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/afsws/bin:
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2004

From @lizmat

At 11​:18 +0000 4/8/04, Norbert Gruener (via RT) wrote​:

# New Ticket Created by Norbert Gruener
# Please include the string​: [perl #28369]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org​:80/rt3/Ticket/Display.html?id=28369 >
-----------------------------------------------------------------
[Please enter your report here]

My name is Norbert Gruener. I am owner and maintainer of the Perl XS
module AFS.

Recently I got two reports that my module is crashing under "Debian
unstable" while it is running under "Debian stable". I could isolate
the problem to the following situation. When I am using the
"threaded" version

"perl\, v5\.8\.3 built for i686\-linux\-thread\-multi"

then my module crashes. Whereas if I am using the version without
"threading"

"perl\, v5\.8\.3 built for i686\-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying
"threads are not recommended, so forget it"?

If you are interested in this case I can supply you with details of
this problem.

Either that or build in some code so that the module refuses to
operate in a threaded environment?

This is definitely _not_ enough to go on.

Does this only happen on Debian. Are other distributions unaffected
or simply not tested?

Liz

@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2004

From @iabyn

On Thu, Apr 08, 2004 at 11​:18​:49AM -0000, Norbert Gruener wrote​:

My name is Norbert Gruener. I am owner and maintainer of the Perl XS
module AFS.

Recently I got two reports that my module is crashing under "Debian
unstable" while it is running under "Debian stable". I could isolate
the problem to the following situation. When I am using the
"threaded" version

"perl\, v5\.8\.3 built for i686\-linux\-thread\-multi" 

then my module crashes. Whereas if I am using the version without
"threading"

"perl\, v5\.8\.3 built for i686\-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying
"threads are not recommended, so forget it"?

If you are interested in this case I can supply you with details of
this problem.

Questions​:

Is the fault reproducable?
Is it reproducable without having an AFS filesystem around?
Is the code that is faulting actually multi-threaded code, or is it
just single-threaded code that happens to crash on a thread-enabled
interpeter?

Dave.

--
Never do today what you can put off till tomorrow.

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nog@MPA-Garching.MPG.DE

Hi Liz,

On Sun, Apr 11 2004, Elizabeth Mattijsen via RT wrote​:

At 11​:18 +0000 4/8/04, Norbert Gruener (via RT) wrote​:

I don't know if this is of any interest for you or if you are saying
"threads are not recommended, so forget it"?

If you are interested in this case I can supply you with details of
this problem.

Either that or build in some code so that the module refuses to
operate in a threaded environment?

This is definitely _not_ enough to go on.

Yes, I am aware of that :-)

Does this only happen on Debian. Are other distributions unaffected
or simply not tested?

In the meantime I could reproduce that problem at my office. We are
running a distribution independent Linux installation. So it has
nothing to do with the Debian distribution. It is just connected to
the "threaded" version of the plain, standard version of Perl 5.8.3
running on Linux taking all "configure" defaults except the
"threading".

There are more details in the answer to Dave.

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nog@MPA-Garching.MPG.DE

Hi Dave,

On Sun, Apr 11 2004, Dave Mitchell via RT wrote​:

On Thu, Apr 08, 2004 at 11​:18​:49AM -0000, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of
this problem.

Questions​:

Is the fault reproducable?

Yes, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test
case.

Is the code that is faulting actually multi-threaded code, or is it
just single-threaded code that happens to crash on a thread-enabled
interpeter?

To be honestly, I have no experience with "threading" that I could
judge that.

So, let me give you some more details where the AFS API crashes. The
place where everything is happening, is very deep in one of the
OpenAFS libraries. It is the function "savecontext". I have attached
a modified version of this function containing several test prints.

And these are the test outputs
with "unthreaded perl" with "threaded perl"
 
  LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
  LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-1
  LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-2
  LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-3
  LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-4
  LWP2-SaveContext-DEBUG-5-5 LWP2-SaveContext-DEBUG-5-5
  LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Cheers,

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nog@MPA-Garching.MPG.DE

static jmp_buf jmp_tmp;
static char (*EP)();
static int rc;
static jmp_buf_type *jmpBuffer;

afs_int32 savecontext(ep, savearea, sp)
char (*ep)();
struct lwp_context *savearea;
char* sp;
{
  int code;

  printf("LWP2-SaveContext-DEBUG-1 \n");
  PRE_Block = 1;
  EP = ep;

  printf("LWP2-SaveContext-DEBUG-2 \n");
  code = setjmp(savearea->setjmp_buffer);
  jmpBuffer = (jmp_buf_type *)savearea->setjmp_buffer;
  savearea->topstack = (char*)jmpBuffer[LWP_SP];
  printf("LWP2-SaveContext-DEBUG-3 \n");

  printf("LWP2-SaveContext-DEBUG-4 \n");
  switch ( code )
  {
  case 0​: if ( !sp )
  (*EP)();
  else
  {
  printf("LWP2-SaveContext-DEBUG-5 \n");
  rc = setjmp(jmp_tmp);
  printf("LWP2-SaveContext-DEBUG-5-1 \n");
  switch ( rc )
  {
  case 0​:
  printf("LWP2-SaveContext-DEBUG-5-2 \n");
  jmpBuffer = (jmp_buf_type *)jmp_tmp;
  printf("LWP2-SaveContext-DEBUG-5-3 \n");
  jmpBuffer[LWP_SP] = (jmp_buf_type)sp;
  printf("LWP2-SaveContext-DEBUG-5-4 \n");
  printf("LWP2-SaveContext-DEBUG-5-5 \n");
  longjmp(jmp_tmp,1);
  break;
  case 1​: (*EP)();
  assert(0); /* never returns */
  break;
  default​:
  perror("Error in setjmp1\n");
  exit(2);
  }
  }
  break;
  case 2​: /* restoring frame */
  printf("LWP2-SaveContext-DEBUG-6 \n");
  break;
 
  default​:
  perror("Error in setjmp2 : restoring\n");
  exit(3);
  }
  printf("LWP2-SaveContext-DEBUG-7 \n");
  return 0;
}

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From @lizmat

At 07​:56 +0200 4/13/04, Norbert Gruener wrote​:

On Sun, Apr 11 2004, Dave Mitchell via RT wrote​:

On Thu, Apr 08, 2004 at 11​:18​:49AM -0000, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of
this problem.

Questions​:

Is the fault reproducable?

Yes, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test
case.

Is the code that is faulting actually multi-threaded code, or is it
just single-threaded code that happens to crash on a thread-enabled
interpeter?

To be honestly, I have no experience with "threading" that I could
judge that.

So, let me give you some more details where the AFS API crashes. The
place where everything is happening, is very deep in one of the
OpenAFS libraries. It is the function "savecontext". I have attached
a modified version of this function containing several test prints.

And these are the test outputs
with "unthreaded perl" with "threaded perl"

LWP2\-SaveContext\-DEBUG\-5                LWP2\-SaveContext\-DEBUG\-5  
LWP2\-SaveContext\-DEBUG\-5\-1              LWP2\-SaveContext\-DEBUG\-5\-1
LWP2\-SaveContext\-DEBUG\-5\-2              LWP2\-SaveContext\-DEBUG\-5\-2
LWP2\-SaveContext\-DEBUG\-5\-3              LWP2\-SaveContext\-DEBUG\-5\-3
LWP2\-SaveContext\-DEBUG\-5\-4              LWP2\-SaveContext\-DEBUG\-5\-4
LWP2\-SaveContext\-DEBUG\-5\-5              LWP2\-SaveContext\-DEBUG\-5\-5
LWP2\-SaveContext\-DEBUG\-5\-1              Segmentation fault        

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Maybe that will provide some clues.

Liz

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From @lizmat

At 15​:42 +0200 4/13/04, Norbert Gruener wrote​:

On Tue, Apr 13 2004, Elizabeth Mattijsen wrote​:

At 07​:56 +0200 4/13/04, Norbert Gruener wrote​:

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Not yet.

Maybe that will provide some clues.

OK, here it comes ...

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes.
Now there are some lines of "messages". I don't know if you can
interpret them.

Well, maybe​:

==32696==
==32696== Invalid write of size 4
==32696== at 0x41C38654​: savecontext (in
/afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== Address 0x41A3602C is on thread 1's stack
==32696==
==32696== Invalid read of size 4
==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so)
==32696== Address 0x41A3602C is on thread 1's stack

I understand from your problem description that you only used a
threaded Perl, but not actually start any threads, right? This
message implies to me that there are multiple threads running (don't
thread numbers start at 0?).

Liz

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nick@ing-simmons.net

Norbert Gruener <nog@​MPA-Garching.MPG.DE> writes​:

Hi Dave,

On Sun, Apr 11 2004, Dave Mitchell via RT wrote​:

On Thu, Apr 08, 2004 at 11​:18​:49AM -0000, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of
this problem.

Questions​:

Is the fault reproducable?

Yes, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test
case.

Is the code that is faulting actually multi-threaded code, or is it
just single-threaded code that happens to crash on a thread-enabled
interpeter?

To be honestly, I have no experience with "threading" that I could
judge that.

So, let me give you some more details where the AFS API crashes. The
place where everything is happening, is very deep in one of the
OpenAFS libraries.

It may well be that those libraries are "thread aware" and if linked
with a threading library will spawn threads

It is the function "savecontext". I have attached
a modified version of this function containing several test prints.

And these are the test outputs
with "unthreaded perl" with "threaded perl"

LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-1
LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-2
LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-3
LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-4
LWP2-SaveContext-DEBUG-5-5 LWP2-SaveContext-DEBUG-5-5
LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Perl uses longjmp too so that should not itself be a problem.

Enabling threads in perl does two things (mainly) as far as XS code
is concerned​:

  1. Changes #define-s so that perl variables are accessed via
  my_perl pointer rather than as globals and dTHX and friends
  stop being no-ops. This tends to mean you XS code should
  be written in style that used PERL_NO_GET_CONTEXT and dTHX/pTHX
  as other wise it gets slow.
  Snags with this show should show up at compile time.

  2. Links with special "threading" versions of system libraries.
  This is most likely cause of the problem you are seeing.
  On linux this basically means libpthread.so gets used,
  and so now you are using pthread's version of longjmp().

Cheers,

Norbert

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nog@MPA-Garching.MPG.DE

On Tue, Apr 13 2004, Elizabeth Mattijsen wrote​:

At 07​:56 +0200 4/13/04, Norbert Gruener wrote​:

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Not yet.

Maybe that will provide some clues.

OK, here it comes ...

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes.
Now there are some lines of "messages". I don't know if you can
interpret them.

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2004

From nog@MPA-Garching.MPG.DE

~/AFS.short/src>/tmp/local/bin/valgrind /tmp/local/bin/perl ../examples/constructor ==32696== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==32696== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==32696== Using valgrind-2.0.0, a program supervision framework for x86-linux.
==32696== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
==32696== Estimated CPU clock rate is 1595 MHz
==32696== For more details, rerun with​: -v
==32696==
==32696== Conditional jump or move depends on uninitialised value(s)
==32696== at 0x4000882A​: _dl_relocate_object (in /lib/ld-2.3.2.so)
==32696== by 0x40380950​: (within /lib/libc-2.3.2.so)
==32696== by 0x4000AEE5​: _dl_catch_error (in /lib/ld-2.3.2.so)
==32696== by 0x40380BBB​: _dl_open (in /lib/libc-2.3.2.so)
==32696==
==32696== Conditional jump or move depends on uninitialised value(s)
==32696== at 0x40008875​: _dl_relocate_object (in /lib/ld-2.3.2.so)
==32696== by 0x40380950​: (within /lib/libc-2.3.2.so)
==32696== by 0x4000AEE5​: _dl_catch_error (in /lib/ld-2.3.2.so)
==32696== by 0x40380BBB​: _dl_open (in /lib/libc-2.3.2.so)
DEBUG-11​: 1
DEBUG-12
RX-DEBUG-1
RX-DEBUG-2
RX-DEBUG-3
RX-DEBUG-4
RX-DEBUG-5
RX-DEBUG-6
RX-DEBUG-7
RX-LWP-Init-Thread-DEBUG-1
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
RX-LWP-Init-Thread-DEBUG-2
IOMGR-Init-DEBUG-1
IOMGR-Init-DEBUG-2
IOMGR-Init-DEBUG-3
IOMGR-Init-DEBUG-4
IOMGR-Init-DEBUG-5
IOMGR-Init-DEBUG-8
LWP1-Create-Proc-DEBUG-1
LWP1-Create-Proc-DEBUG-2
LWP1-Create-Proc-DEBUG-3
LWP1-Create-Proc-DEBUG-4
LWP1-Create-Proc-DEBUG-5
LWP1-Create-Proc-DEBUG-6
LWP1-Create-Proc-DEBUG-7
LWP1-Create-Proc-DEBUG-9
LWP1-Create-Proc-DEBUG-10
LWP1-Create-Proc-DEBUG-11
==32696==
==32696== Invalid write of size 1
==32696== at 0x41C38480​: Initialize_Stack (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== by 0x41C37661​: LWP_CreateProcess (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== by 0x41C39121​: IOMGR_Initialize (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== by 0x41C36B4D​: rxi_InitializeThreadSupport (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== Address 0x41A3602C is 0 bytes after a block of size 196608 alloc'd
==32696== at 0x4002AA2D​: malloc (vg_replace_malloc.c​:153)
==32696== by 0x41C375F3​: LWP_CreateProcess (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== by 0x41C39121​: IOMGR_Initialize (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== by 0x41C36B4D​: rxi_InitializeThreadSupport (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
LWP1-Create-Proc-DEBUG-12
LWP1-Create-Proc-DEBUG-13
LWP1-Create-Proc-DEBUG-18
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-5
LWP2-SaveContext-DEBUG-5-1
LWP2-SaveContext-DEBUG-5-2
LWP2-SaveContext-DEBUG-5-3
LWP2-SaveContext-DEBUG-5-4
LWP2-SaveContext-DEBUG-5-5
==32696==
==32696== Invalid write of size 4
==32696== at 0x41C38654​: savecontext (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== Address 0x41A3602C is on thread 1's stack
==32696==
==32696== Invalid read of size 4
==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so)
==32696== Address 0x41A3602C is on thread 1's stack
LWP2-SaveContext-DEBUG-5-1
LWP1-Create-Proc2-DEBUG-1
LWP1-Create-Proc2-DEBUG-2
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
LWP1-Create-Proc-DEBUG-19
==32696==
==32696== ERROR SUMMARY​: 10 errors from 5 contexts (suppressed​: 0 from 0)
==32696== malloc/free​: in use at exit​: 983273 bytes in 13312 blocks.
==32696== malloc/free​: 25891 allocs, 12579 frees, 13728687 bytes allocated.
==32696== For a detailed leak analysis, rerun with​: --leak-check=yes
==32696== For counts of detected errors, rerun with​: -v

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2004

From nick@ing-simmons.net

Norbert Gruener <nog@​MPA-Garching.MPG.DE> writes​:

2. Links with special "threading" versions of system libraries.
This is most likely cause of the problem you are seeing.
On linux this basically means libpthread.so gets used,
and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is
"libpthread.so" buggy, or one of the "OpenAFS" system libraries, or is
it just an incompatible interconnection of "threaded Perl" and
"OpenAFS" which is surely doing some threadening?

That is the issue - is OpenAFS doing threading?
Can you ask it not to?
Does OpenAFS "callback" into perl?
If perl code gets invoked by another thread then thread-local stuff
will not be pointing at right place.

At the moment I don't have any idea which direction I should go.

Prove it works okay with non-threaded perl?

Norbert

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2004

From nog@MPA-Garching.MPG.DE

On Wed, Apr 14 2004, Nick Ing-Simmons wrote​:

Norbert Gruener <nog@​MPA-Garching.MPG.DE> writes​:

2. Links with special "threading" versions of system libraries.
This is most likely cause of the problem you are seeing.
On linux this basically means libpthread.so gets used,
and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is
"libpthread.so" buggy, or one of the "OpenAFS" system libraries, or is
it just an incompatible interconnection of "threaded Perl" and
"OpenAFS" which is surely doing some threadening?

  Ooops ^^^^^^^^^^^ :-)
  this is true but it's not what
  I meant :-)
 

That is the issue - is OpenAFS doing threading?

I checked the sparse documentation. Yes, OpenAFS is doing threading
but ... OpenAFS uses its own thread library called "Light Weight
Process package" (LWP). It is definitely not using "pthreads".

Can you ask it not to?

I haven't found any pointer how to do that.

Does OpenAFS "callback" into perl?

I don't think so in this case. The XS code makes an OpenAFS
"initialization" call. From there it steps down several OpenAFS
system library calls and then it crashes in the savecontext function
at the "longjmp" statement.

If perl code gets invoked by another thread then thread-local stuff
will not be pointing at right place.

Do you think it is possible that the coincidence of "pthread" in Perl
and the "LWP thread" in OpenAFS is causing that segmentation ?

At the moment I don't have any idea which direction I should go.

Prove it works okay with non-threaded perl?

Well, I am a little bit reluctant to call it "proven" but the AFS XS
package is working with the non-threaded perl for nearly ten years
now.

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2004

From nog@MPA-Garching.MPG.DE

Hi Nick,

On Tue, Apr 13 2004, Nick Ing-Simmons wrote​:

Norbert Gruener <nog@​MPA-Garching.MPG.DE> writes​:

It is the function "savecontext". I have attached
a modified version of this function containing several test prints.

And these are the test outputs
with "unthreaded perl" with "threaded perl"

LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
[snipped several lines of debug output]
LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see, the call for "longjmp" is crashing in "threaded". And
there is definitely not a problem in the function "savecontext" since
this function is used many times in the OpenAFS system without any
problems.

Perl uses longjmp too so that should not itself be a problem.

Enabling threads in perl does two things (mainly) as far as XS code
is concerned​:

1. Changes #define-s so that perl variables are accessed via
[snipped several lines of explanation]
Snags with this show should show up at compile time.

2. Links with special "threading" versions of system libraries.
This is most likely cause of the problem you are seeing.
On linux this basically means libpthread.so gets used,
and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is
"libpthread.so" buggy, or one of the "OpenAFS" system libraries, or is
it just an incompatible interconnection of "threaded Perl" and
"OpenAFS" which is surely doing some threadening?

At the moment I don't have any idea which direction I should go.

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2004

From nog@MPA-Garching.MPG.DE

Hi Liz,

On Tue, Apr 13 2004, Elizabeth Mattijsen wrote​:

At 15​:42 +0200 4/13/04, Norbert Gruener wrote​:

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes.
Now there are some lines of "messages". I don't know if you can
interpret them.

Well, maybe​:

==32696==
==32696== Invalid write of size 4
==32696== at 0x41C38654​: savecontext (in
/afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so)
==32696== Address 0x41A3602C is on thread 1's stack
==32696==
==32696== Invalid read of size 4
==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so)
==32696== Address 0x41A3602C is on thread 1's stack

I understand from your problem description that you only used a
threaded Perl, but not actually start any threads, right?

That is correct.

                                                        This 

message implies to me that there are multiple threads running (don't
thread numbers start at 0?).

I really don't know.

The only thing which makes me a little bit sceptical is the output of
a "valgrind" run against the equivalent OpenAFS command.

I have attached the output of the equivalent OpenAFS command. The
only difference to the perl version is the entry point of the
program. As you can see from the test prints, both programs are
running through the same test prints. Both outputs are nearly
identical. The Perl version seg-faults and the OpenAFS is working
fine.

So, at the moment I am absolutely clueless :-((

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2004

From nog@MPA-Garching.MPG.DE

~/AFS.short>/tmp/local/bin/valgrind /tmp/openafs/sbin/vos exa home nog
==763== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==763== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==763== Using valgrind-2.0.0, a program supervision framework for x86-linux.
==763== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
==763== Estimated CPU clock rate is 1608 MHz
==763== For more details, rerun with​: -v
==763==
VOS-DEBUG-1
VOS-DEBUG-2
VOS-DEBUG-3
VSU-DEBUG-1
RX-DEBUG-1
RX-DEBUG-2
RX-DEBUG-3
RX-DEBUG-4
RX-DEBUG-5
RX-DEBUG-6
RX-DEBUG-7
RX-LWP-Init-Thread-DEBUG-1
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
RX-LWP-Init-Thread-DEBUG-2
IOMGR-Init-DEBUG-1
IOMGR-Init-DEBUG-2
IOMGR-Init-DEBUG-3
IOMGR-Init-DEBUG-4
IOMGR-Init-DEBUG-5
IOMGR-Init-DEBUG-8
LWP1-Create-Proc-DEBUG-1
LWP1-Create-Proc-DEBUG-2
LWP1-Create-Proc-DEBUG-3
LWP1-Create-Proc-DEBUG-4
LWP1-Create-Proc-DEBUG-5
LWP1-Create-Proc-DEBUG-6
LWP1-Create-Proc-DEBUG-7
LWP1-Create-Proc-DEBUG-9
LWP1-Create-Proc-DEBUG-10
LWP1-Create-Proc-DEBUG-11
==763== Invalid write of size 1
==763== at 0x808CCF0​: (within /tmp/openafs/sbin/vos)
==763== by 0x808BED1​: (within /tmp/openafs/sbin/vos)
==763== by 0x808D991​: (within /tmp/openafs/sbin/vos)
==763== by 0x808B3BD​: (within /tmp/openafs/sbin/vos)
==763== Address 0x411802FC is 0 bytes after a block of size 196608 alloc'd
==763== at 0x4002AA2D​: malloc (vg_replace_malloc.c​:153)
==763== by 0x808BE63​: (within /tmp/openafs/sbin/vos)
==763== by 0x808D991​: (within /tmp/openafs/sbin/vos)
==763== by 0x808B3BD​: (within /tmp/openafs/sbin/vos)
LWP1-Create-Proc-DEBUG-12
LWP1-Create-Proc-DEBUG-13
LWP1-Create-Proc-DEBUG-18
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-5
LWP2-SaveContext-DEBUG-5-1
LWP2-SaveContext-DEBUG-5-2
LWP2-SaveContext-DEBUG-5-3
LWP2-SaveContext-DEBUG-5-4
LWP2-SaveContext-DEBUG-5-5
==763==
==763== Invalid write of size 4
==763== at 0x808CEC4​: (within /tmp/openafs/sbin/vos)
==763== Address 0x411802FC is on thread 1's stack
==763==
==763== Invalid read of size 4
==763== at 0x402AEEE9​: _IO_puts (in /lib/libc-2.3.2.so)
==763== Address 0x411802FC is on thread 1's stack
LWP2-SaveContext-DEBUG-5-1
LWP1-Create-Proc2-DEBUG-1
LWP1-Create-Proc2-DEBUG-2
LWP2-SaveContext-DEBUG-1
LWP2-SaveContext-DEBUG-2
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP-Dispatcher-DEBUG-3
LWP-Dispatcher-DEBUG-4
LWP-Dispatcher-DEBUG-7
LWP-Dispatcher-DEBUG-8
LWP-Dispatcher-DEBUG-9
LWP-Dispatcher-DEBUG-11
LWP-Dispatcher-DEBUG-12
LWP2-SaveContext-DEBUG-3
LWP2-SaveContext-DEBUG-4
LWP2-SaveContext-DEBUG-6
LWP2-SaveContext-DEBUG-7
LWP1-Create-Proc-DEBUG-19
==763==
==763== ERROR SUMMARY​: 6 errors from 3 contexts (suppressed​: 0 from 0)
==763== malloc/free​: in use at exit​: 250063 bytes in 676 blocks.
==763== malloc/free​: 676 allocs, 0 frees, 250063 bytes allocated.
==763== For a detailed leak analysis, rerun with​: --leak-check=yes
==763== For counts of detected errors, rerun with​: -v

@p5pRT
Copy link
Author

p5pRT commented Apr 15, 2004

From nog@MPA-Garching.MPG.DE

Hi Nick, hi Dave, hi Liz,

On Wed, Apr 14 2004, Norbert Gruener wrote​:

On Wed, Apr 14 2004, Nick Ing-Simmons wrote​:

2. Links with special "threading" versions of system libraries.
This is most likely cause of the problem you are seeing.
On linux this basically means libpthread.so gets used,
and so now you are using pthread's version of longjmp().

this statement brought me on the right track. Thank you Nick !!! :-)

The problem is the "pthread" version of longjmp in connection with
OpenAFS and its "own threading". After I had understood that
completely I could proof that OpenAFS gets the same problem if I force
it to use the "pthread" version of longjmp. Then that specific
OpenAFS server crashes also at the same statement.

So, whoever is allowed to modify the status of that request,

  please close request # 28369

This is not a Perl problem.

I want to thank all of you for your patience and your assistance. You
have given me a great help in tracing my problem. I am really proud
of the Perl community and especially of you guys.

Thank you,

Norbert
--
Ceterum censeo | PGP encrypted mail preferred.
Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

@p5pRT
Copy link
Author

p5pRT commented Apr 15, 2004

@Tux - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant