Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode tr/// Fails Inside of ithreads #8717

Closed
p5pRT opened this issue Dec 23, 2006 · 14 comments
Closed

Unicode tr/// Fails Inside of ithreads #8717

p5pRT opened this issue Dec 23, 2006 · 14 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 23, 2006

Migrated from rt.perl.org#41124 (status was 'resolved')

Searchable as RT41124$

@p5pRT
Copy link
Author

p5pRT commented Dec 23, 2006

From imacat@mail.imacat.idv.tw

Created by imacat@mail.imacat.idv.tw

  Hi. This is imacat from Taiwan. Unicode tr/// seems to fail inside
of ithreads. It seems that in utf8_heavy.pl SWASHGET() takes the @​_
from the main thread as its arguments.

  Here is the example. Please tell me if you need any more information.

imacat@​rinse ~ % perl -mthreads -e'@​_ = qw(abc); threads->new(sub { $_ = "z"; tr/\x{FF21}/A/; })->join;'
thread failed to start​: Can't use string ("abc") as a HASH ref while "strict refs" in use at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 275.
imacat@​rinse ~ % perl -mthreads -e'threads->new(sub { $_ = "A"; tr/\x{FF21}/A/; })->join;'
Use of uninitialized value in addition (+) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 279.
Use of uninitialized value in addition (+) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 279.
Use of uninitialized value in subtraction (-) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
Use of uninitialized value in vec at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
thread failed to start​: Negative offset to vec in lvalue context at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
imacat@​rinse ~ %

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl v5.8.8:

Configured by imacat at Tue May  9 09:15:58 CST 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.16.14, archname=x86_64-linux-thread-multi-ld
    uname='linux yuki 2.6.16.14 #1 smp mon may 8 23:44:32 cst 2006 x86_64 gnulinux '
    config_args='-s -d -Dusethreads -Dcc=gcc -Duselongdouble -Doptimize=-g -O3 -Duse64bitint -Duse64bitall -Dprefix=/usr -Dd_dosuid -Dotherlibdirs=/usr/share/perl5 -Dinc_version_list=none -Acccdlflags=-fPIC -Duseshrplib=true -Dcf_email=imacat@mail.imacat.idv.tw'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=define
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-g -O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include'
    ccversion='', gccversion='3.4.4 20050314 (prerelease) (Debian 3.4.3-13)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8
    alignbytes=16, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi-ld/CORE'
    cccdlflags=' -fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.8:
    /home/imacat/lib/perl5
    /usr/lib/perl5/5.8.8/x86_64-linux-thread-multi-ld
    /usr/lib/perl5/5.8.8
    /usr/lib/perl5/site_perl/5.8.8/x86_64-linux-thread-multi-ld
    /usr/lib/perl5/site_perl/5.8.8
    /usr/lib/perl5/site_perl
    /usr/share/perl5
    .


Environment for perl v5.8.8:
    HOME=/home/imacat
    LANG=zh_TW
    LANGUAGE=zh_TW
    LC_COLLATE=zh_TW
    LC_CTYPE=zh_TW
    LC_MESSAGES=zh_TW
    LC_MONETARY=zh_TW
    LC_NUMERIC=zh_TW
    LC_TIME=zh_TW
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/imacat/bin:/bin:/usr/bin:/opt/java/bin:/usr/local/bin
    PERL5LIB=/home/imacat/lib/perl5
    PERL5_CPANPLUS_CONFIG=/home/imacat/.cpanplus/config
    PERL_BADLANG (unset)
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2006

From BQW10602@nifty.com

Hi\.  This is imacat from Taiwan\.  Unicode tr/// seems to fail inside

of ithreads. It seems that in utf8_heavy.pl SWASHGET() takes the @​_
from the main thread as its arguments.

Here is the example\.  Please tell me if you need any more information\.

imacat@​rinse ~ % perl -mthreads -e'@​_ = qw(abc); threads->new(sub { $_ = "z"; tr/\x{FF21}/A/; })->join;'
thread failed to start​: Can't use string ("abc") as a HASH ref while "strict refs" in use at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 275.
imacat@​rinse ~ % perl -mthreads -e'threads->new(sub { $_ = "A"; tr/\x{FF21}/A/; })->join;'
Use of uninitialized value in addition (+) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 279.
Use of uninitialized value in addition (+) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 279.
Use of uninitialized value in subtraction (-) at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
Use of uninitialized value in vec at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
thread failed to start​: Negative offset to vec in lvalue context at /usr/lib/perl5/5.8.8/utf8_heavy.pl line 282.
imacat@​rinse ~ %

In perl-current, utf8​::SWASHGET() has been removed.

But something seems still wrong with tr///, while character classes,
properties and case mapping functions seems to work well.
(Note​: those all utilize the swash.)

It seems that tr///, not swash, would be to blame.

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/; s/[\x{391}-\x{39F}]/\x{3A1}/g; })->join;"

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/; tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "Î" during global destruction.

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/; s/\p{Greek}/\x{3A1}/g; })->join;"

%perl -mthreads -e "sub ToLower { qq/0391\t039F\t03A1/ } threads->new(sub { $_ = qq/\x{391}/; $_ = lc; })->join;"

Note​: "Î" (0xCE) is the first octet of U+0391..U+039F.
That is same as the hash key in a swash for these characters.

Regards,
SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2006

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2006

From @jdhedden

SADAHIRO Tomoyuki wrote​:

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/;
tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "Î" during global
destruction.

With -DDEBUGGING, this also produces​:

  Attempt to free unreferenced scalar​: SV 0x3cf5b8,
  Perl interpreter​: 0x2ecf00.

Jerry D. Hedden wrote in
http​://www.nntp.perl.org/group/perl.perl5.porters/119425​:

leak($_); # Produces "Scalars leaked​: 1"

It may be a stretch, but could these be related?

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http​://mail.yahoo.com

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2006

From @nwc10

On Wed, Dec 27, 2006 at 12​:00​:32AM +0900, SADAHIRO Tomoyuki wrote​:

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/; tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "Î" during global destruction.

op.c has this​:

  case OP_TRANS​:
  if (o->op_private & (OPpTRANS_FROM_UTF|OPpTRANS_TO_UTF)) {
  SvREFCNT_dec(cSVOPo->op_sv);
  cSVOPo->op_sv = NULL;
  }
  else {
  Safefree(cPVOPo->op_pv);
  cPVOPo->op_pv = NULL;
  }
  break;

That SV appears to be in the *shared* optree. :-(

It's getting freed once in each thread, hence the error​:

Attempt to free unreferenced scalar​: SV 0x8371e4c, Perl interpreter​: 0x82b0008.

I wonder - would there be a way under some level of DEBUGGING to assert that
SVs are being freed in their correct thread? Is there already a backpointer
from an SV to its creating thread?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Dec 30, 2006

From BQW10602@nifty.com

On Wed, 27 Dec 2006 18​:03​:35 +0000, Nicholas Clark wrote

On Wed, Dec 27, 2006 at 12​:00​:32AM +0900, SADAHIRO Tomoyuki wrote​:

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/; tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "Î" during global destruction.

Other instances​:

%perl -mthreads -e "threads->new(sub { $_ = qq/A/; tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "" during global destruction.

Note​: "" (empty string) is the hash key of the swatch for "A"

%perl -mthreads -e "threads->new(sub { $_ = qq//; tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"

? No error? In this case no swatch is created, as swash_fetch() in utf8.c
isn't called.

op.c has this​:

case OP\_TRANS​:
    if \(o\->op\_private & \(OPpTRANS\_FROM\_UTF|OPpTRANS\_TO\_UTF\)\) \{
        SvREFCNT\_dec\(cSVOPo\->op\_sv\);
        cSVOPo\->op\_sv = NULL;
    \}
    else \{
        Safefree\(cPVOPo\->op\_pv\);
        cPVOPo\->op\_pv = NULL;
    \}
    break;

That SV appears to be in the *shared* optree. :-(

At the run time of tr///, swash_fetch() stores the swatch into the swash,
which cSVOP->op_sv is pointed to the reference of. Moreover HV of
the swash is marked with HvSHAREKEYS on.
These seem to relate to the failure of the destruction.

The swash for tr/// is created by calling swash_init() from pmtrans()
in op.c at the compile time.
The swash for m// and s/// is created by calling swash_init() from
regclass_swash() in regexec.c at the run time and seems to bound
with PL_regex_pad, instead of opcode.
The swash for uc() etc. is created by calling swash_init() from
to_utf8_case() in utf8.c and stored in PL_utf8_toupper etc.

Then should the swash for tr/// be stored somewhere but cSVOP->op_sv?

Regards,
SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Dec 30, 2006

From BQW10602@nifty.com

On Wed, 27 Dec 2006 06​:57​:33 -0800 (PST), "Jerry D. Hedden" wrote

SADAHIRO Tomoyuki wrote​:

%perl -mthreads -e "threads->new(sub { $_ = qq/\x{391}/;
tr/\x{391}-\x{39F}/\x{3A1}/; })->join;"
Unbalanced string table refcount​: (1) for "Î" during global
destruction.

With -DDEBUGGING, this also produces​:

Attempt to free unreferenced scalar​: SV 0x3cf5b8,
Perl interpreter​: 0x2ecf00.

Jerry D. Hedden wrote in
http​://www.nntp.perl.org/group/perl.perl5.porters/119425​:

leak($_); # Produces "Scalars leaked​: 1"

It may be a stretch, but could these be related?

Not certain, but it seems that for tr/// the swash might
not be cloned properly.

Regards,
SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2006

From BQW10602@nifty.com

On Sat, 30 Dec 2006 22​:11​:09 +0900, SADAHIRO Tomoyuki wrote

The swash for tr/// is created by calling swash_init() from pmtrans()
in op.c at the compile time.
The swash for m// and s/// is created by calling swash_init() from
regclass_swash() in regexec.c at the run time and seems to bound
with PL_regex_pad, instead of opcode.
The swash for uc() etc. is created by calling swash_init() from
to_utf8_case() in utf8.c and stored in PL_utf8_toupper etc.

Then should the swash for tr/// be stored somewhere but cSVOP->op_sv?

For cache of swashes, a new variable PL_trans_padav is added.
Here is a patch attached​: wtrans_padav.patch.gz

P.S.
I'm not sure why storing the address of swash into op_sv​:
[say, cSVOPo->op_sv = newSViv(PTR2IV(swash)); and fetch it
  by INT2PTR(SV*, SvIV((SV*)cSVOP->op_sv)) ] won't do trick
  under threads.

Hence the index for PL_trans_padav is stored into op_sv
and each do_trans_<something>_utf8 needs av_fetch().

Regards,
SADAHIRO Tomoyuki

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2006

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From @nwc10

On Sun, Dec 31, 2006 at 10​:55​:25PM +0900, SADAHIRO Tomoyuki wrote​:

On Sat, 30 Dec 2006 22​:11​:09 +0900, SADAHIRO Tomoyuki wrote

The swash for tr/// is created by calling swash_init() from pmtrans()
in op.c at the compile time.
The swash for m// and s/// is created by calling swash_init() from
regclass_swash() in regexec.c at the run time and seems to bound
with PL_regex_pad, instead of opcode.
The swash for uc() etc. is created by calling swash_init() from
to_utf8_case() in utf8.c and stored in PL_utf8_toupper etc.

Then should the swash for tr/// be stored somewhere but cSVOP->op_sv?

For cache of swashes, a new variable PL_trans_padav is added.
Here is a patch attached​: wtrans_padav.patch.gz

I don't think that this is going to work - see below​:

P.S.
I'm not sure why storing the address of swash into op_sv​:
[say, cSVOPo->op_sv = newSViv(PTR2IV(swash)); and fetch it
by INT2PTR(SV*, SvIV((SV*)cSVOP->op_sv)) ] won't do trick
under threads.

The problem is that storing anything SV-like under ithreads in the optree
isn't valid, because the optree is shared between threads, whereas SVs, AVs
HVs, GVs, CVs etc all belong to a particular thread, and are automatically
freed when that thread terminates. Hence if the optree outlives the thread
which created it, the optree now points to something that no longer exists.

I'm not sure what to do here. The "obvious" thing is to indirect this swash
via the pad, and duplicate it at thread clone time.

But a part of me is wondering if a "non-obvious" solution would work better,
and also for various other structures. Would it be viable to allow the optree
to point to SVs etc stored in the shared interpreter that threads​::shared
uses? Am I right in thinking that that shared interpreter isn't destroyed
until global destruction of the first interpreter (ie it will outlive all
optrees). If so, it seems to offer a way of creating read-only structures
that don't need to be copied.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From @iabyn

On Wed, Jan 03, 2007 at 06​:21​:38PM +0000, Nicholas Clark wrote​:

But a part of me is wondering if a "non-obvious" solution would work better,
and also for various other structures. Would it be viable to allow the optree
to point to SVs etc stored in the shared interpreter that threads​::shared
uses? Am I right in thinking that that shared interpreter isn't destroyed
until global destruction of the first interpreter (ie it will outlive all
optrees). If so, it seems to offer a way of creating read-only structures
that don't need to be copied.

But that runs into the problem that 'read-only' SVs actually get modified
- their refcnts change, they get magic attached, they get upgraded from IV
to IVPV etc etc.

--
"The greatest achievement of the Austrians has been convincing the world
that Hitler was German, and Mozart Austrian."

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From @nwc10

On Wed, Jan 03, 2007 at 06​:40​:44PM +0000, Dave Mitchell wrote​:

On Wed, Jan 03, 2007 at 06​:21​:38PM +0000, Nicholas Clark wrote​:

But a part of me is wondering if a "non-obvious" solution would work better,
and also for various other structures. Would it be viable to allow the optree
to point to SVs etc stored in the shared interpreter that threads​::shared
uses? Am I right in thinking that that shared interpreter isn't destroyed
until global destruction of the first interpreter (ie it will outlive all
optrees). If so, it seems to offer a way of creating read-only structures
that don't need to be copied.

But that runs into the problem that 'read-only' SVs actually get modified
- their refcnts change, they get magic attached, they get upgraded from IV
to IVPV etc etc.

True. I'd not fully thought that through.
But in theory anything that really is read only in how it is handled
(eg hashes just used for string data read lookup, not referenced) could work?
If done carefully and all those conditions met.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2007

From @iabyn

tr/// failing under threads has (hopefully) been fixed by changes

  29765 make tr/// threadsafe by moving swash into pad
  29711 allocate op_pv strings from shared mem pool

Which in the utf8 case moves the swash from op_sv into the pad, and in the
non-threaded case, allocs op_pv from the shared pool.

--
There's a traditional definition of a shyster​: a lawyer who, when the law
is against him, pounds on the facts; when the facts are against him,
pounds on the law; and when both the facts and the law are against him,
pounds on the table.
  -- Eben Moglen referring to SCO

@p5pRT
Copy link
Author

p5pRT commented Jul 4, 2007

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant