Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tr/...//CU core dumps #1945

Closed
p5pRT opened this issue May 7, 2000 · 4 comments
Closed

tr/...//CU core dumps #1945

p5pRT opened this issue May 7, 2000 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented May 7, 2000

Migrated from rt.perl.org#3215 (status was 'resolved')

Searchable as RT3215$

@p5pRT
Copy link
Author

p5pRT commented May 7, 2000

From mschilli@perlmeister.com

Created by mschilli1@aol.com

This is a bug report for perl from mschilli1@​aol.com,
generated with the help of perlbug 1.28 running under perl v5.6.0.

-----------------------------------------------------------------
UTF8 support for the tr// operator doesn't seem to work properly.
The following snippet, should, as advertised in
'perldoc perlunicode', convert $string from latin1 to utf8​:

  while (<>) {
  tr/\0-\xff//CU; # latin1 char to utf8
  }

It throws two (compile time) warnings​:

  Malformed UTF-8 character at ./t line 4.
  Malformed UTF-8 character at ./t line 4.

And the snippet below, when presented with latin1 chars, throws a
"Segmentation fault (core dumped)"​:

  $latin1 = "Abc ääää";
  ($utf8 = $latin1) =~ tr/\0-\0177//CU;

Would be great if you guys could take a look.

Thanks,

-- Mike Schilli

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl v5.6.0:

Configured by mschilli at Sun Mar 26 23:14:38 PST 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.12-20, archname=i686-linux
    uname='linux www.noevalley.com 2.2.12-20 #1 mon sep 27 10:40:35 edt 1999 i686 unknown '
    config_args='-d -D prefix=/home/mschilli/PERL-5.6.0 -e'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
  Compiler:
    cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=/lib/libc-2.1.2.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.6.0:
    /home/mschilli/PERL-5.6.0/lib/perl5/5.6.0/i686-linux
    /home/mschilli/PERL-5.6.0/lib/perl5/5.6.0
    /home/mschilli/PERL-5.6.0/lib/perl5/site_perl/5.6.0/i686-linux
    /home/mschilli/PERL-5.6.0/lib/perl5/site_perl/5.6.0
    /home/mschilli/PERL-5.6.0/lib/perl5/site_perl
    .


Environment for perl v5.6.0:
    HOME=/home/mschilli
    LANG=en_US
    LANGUAGE (unset)
    LC_ALL=en_US
    LD_LIBRARY_PATH=/usr/local/lib:/home/mschilli/download/xerces-c_1_0_0-linux/lib
    LOGDIR (unset)
    PATH=/usr/local/prod/bin:/home/mschilli/PERL/bin:/home/mschilli/teTeX/bin:/home/cm/bin/Linux:/bin:/usr/bin:/home/cm/bin/ksh:/home/cm/bin/ksh/prm:/home/cm/bin/linux:/usr/local/bin:/bin:/usr/bin:/usr/X11/bin:/usr/andrew/bin:/usr/openwin/bin:/usr/games:.:~/bin:/sbin:/services/bsi/bin:./bin:../bin:/home/mschilli/download/xerces-c_1_0_0-linux/bin:/usr/X11R6/bin:/opt/local/bin:/home/mschilli/INSTALL/framemaker/FM556_linux/bin:
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented May 7, 2000

From @simoncozens

The following snippet, should, as advertised in
'perldoc perlunicode', convert $string from latin1 to utf8​:

while (<>) {
tr/\0-\xff//CU; # latin1 char to utf8
}

Bleh. Yes, it should, but toke.c is incorrectly marking the
left hand side of that expression as being a Unicode string;
if you say tr/\0-\xff//UC, it marks it as being non-Unicode.
pmtrans actually expects a range of the form
"Unicode char255 Unicode" even if if's converting C->U,
Currently, it only Unicodifies if you're doing UC, so the right
fix is to get toke.c to treat CU as the same as UC
and not expand the range but convert the LHS to Unicode.

This does that​:

Inline Patch
--- toke.c~      Mon May 08 14:38:48 2000
+++ toke.c     Mon May 08 14:38:29 2000
@@ -1448,7 +1448,7 @@
                         }
                     }

-                    if (thisutf || uv > 255) {
+                    if (utf || uv > 255) {
                        d = (char*)uv_to_utf8((U8*)d, uv);
                        has_utf = TRUE;
                     }

I then tried this: \#\!/usr/bin/perl \-w use Devel​::Peek;

$unistr = v300.202.203;
Dump($unistr);
($bytestr=$unistr) =~ tr/\0-\x{ff}//UC;
Dump($bytestr);
($unistr2=$bytestr) =~ tr/\0-\xff//CU;
Dump($unistr2);

And got​:
SV = PV(0xa04142c) at 0xa053c98
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0xa048578 "\304\254\303\212\303\213"\0
  CUR = 6
  LEN = 7
SV = PV(0xa041480) at 0xa058fe0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0xa048550 ",\312\313"\0
  CUR = 3
  LEN = 7
SV = PV(0xa04151c) at 0xa06c3b8
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0xa0487f0 ",\303\212\303\213"\0
  CUR = 5
  LEN = 6

Which is fine apart from the fact that, amusingly, tr///CU
fails to set Sv_UTF8. This patch fixes that​:

Inline Patch
--- doop.c~     Mon May 08 15:23:34 2000
+++ doop.c      Mon May 08 15:24:46 2000
@@ -321,6 +321,7 @@
     }
     *d = '\0';
     sv_usepvn_mg(sv, (char*)dst, d - dst);
+    SvUTF8_on(sv);

     return matches;
 }
@@ -389,6 +390,7 @@
     }
     *d = '\0';
     sv_usepvn_mg(sv, (char*)dst, d - dst);
+    SvUTF8_on(sv);

     return matches;
 }

And it now all plays nicely.

I am working on making UTF8 treatment the default and
deprecating utf8.pm; demand-loading the tables at the right
place is the tricky bit.

And the snippet below, when presented with latin1 chars, throws a
"Segmentation fault (core dumped)"​:

Yep, I reported that before. Looks like it's fixed in perl-current.

UTF8 support for the tr// operator doesn't seem to work properly.

Does now. :)

Simon


The information transmitted is intended only for the person or entity to which
it is addressed and may contain confidential and/or privileged material. Any
review, retransmission, dissemination or other use of, or taking of any action
in reliance upon, this information by persons or entities other than the
intended recipient is prohibited. If you received this in error, please
contact the sender and delete the material from any computer.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2000

From @gsar

On Mon, 08 May 2000 15​:21​:10 +0900, simon.p.cozens@​jp.pwcglobal.com wrote​:

UTF8 support for the tr// operator doesn't seem to work properly.

Does now. :)

Please note​: Larry wants tr///CU/UC removed entirely rather than fixed,
since it is a rather limiting interface. The intent is to replace it
with Unicode​::Map. If you have tuits to help integrating that into the
distribution, let me know.

Sarathy
gsar@​ActiveState.com

@p5pRT p5pRT closed this as completed Nov 28, 2003
@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2003

From The RT System itself

The tr///CU feature has been *removed* in 5.7.0, and will be removed also in 5.6.1 because the interface was a mistake. For similar functionality there is the new pack('U0', ...) functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant