Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perlunicode.pod still mentions tr///CU #2619

Closed
p5pRT opened this issue Sep 12, 2000 · 6 comments
Closed

perlunicode.pod still mentions tr///CU #2619

p5pRT opened this issue Sep 12, 2000 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 12, 2000

Migrated from rt.perl.org#4299 (status was 'resolved')

Searchable as RT4299$

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2000

From @nwc10

Created by nick@bagpuss.unfortu.net

man perlunicode says​:

  · The `tr///' operator translates characters instead of
  bytes. It can also be forced to translate between
  8-bit codes and UTF-8. For instance, if you know your
  input in Latin-1, you can say​:

2000-08-01 perl v5.7.0 3

PERLUNICODE(1) Perl Programmers Reference Guide PERLUNICODE(1)

  while (<>) {
  tr/\0-\xff//CU; # latin1 char to utf8
  ...
  }

  Similarly you could translate your output with

  tr/\0-\x{ff}//UC; # utf8 to latin1 char

  No, `s///' doesn't take /U or /C (yet?).

Cribbing the wording from perldelta.pod suggests the following patch.
Should "work in progress on improved interfaces" (the Encode.pm work that
prompted me to read this man page to try to get my head around the confusion)
be mentioned, or is that implicit in the
WARNING​: The implementation of Unicode support in Perl is incomplete.
at the top?

Nicholas Clark

Inline Patch
--- perlunicode.pod.orig        Tue Aug  1 03:32:06 2000
+++ perlunicode.pod     Tue Sep 12 22:37:20 2000
@@ -157,20 +157,9 @@
 
 =item *
 
-The C<tr///> operator translates characters instead of bytes.  It can also
-be forced to translate between 8-bit codes and UTF-8.  For instance, if you
-know your input in Latin-1, you can say:
-
-    while (<>) {
-       tr/\0-\xff//CU;         # latin1 char to utf8
-       ...
-    }
-
-Similarly you could translate your output with
-
-    tr/\0-\x{ff}//UC;          # utf8 to latin1 char
-
-No, C<s///> doesn't take /U or /C (yet?).
+The C<tr///> operator translates characters instead of bytes.  Note that
+the C<tr///CU> functionality has been removed, as the interface was a
+mistake.  For similar functionality see pack('U0', ...) and pack('C0', ...).
 
 =item *
 
Perl Info

Flags:
    category=docs
    severity=medium

Site configuration information for perl v5.7.0:

Configured by nick at Wed Sep  6 23:16:48 BST 2000.

Summary of my perl5 (revision 5.0 version 7 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.16-rmk3, archname=armv4l-linux
    uname='linux bagpuss.unfortu.net 2.2.16-rmk3 #2 fri sep 1 08:55:31 bst 2000 armv4l unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='/usr/local/bin/gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.2 20000516 (release) [Rebel.com]', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='/usr/local/bin/gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lsfio -lnsl -lndbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    libc=/lib/libc-2.1.3.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.7.0:
    /usr/local/lib/perl5/5.7.0/armv4l-linux
    /usr/local/lib/perl5/5.7.0
    /usr/local/lib/perl5/site_perl/5.7.0/armv4l-linux
    /usr/local/lib/perl5/site_perl/5.7.0
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.7.0:
    HOME=/home/nick
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=en_GB.ISO-8859-1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/nick/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:/usr/local/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Sep 13, 2000

From @ysth

In article <E13YxwU-000DDQ-00@​plum.flirble.org>,
Nick Clark <nick@​plum.flirble.org> wrote​:

--- perlunicode.pod.orig Tue Aug 1 03​:32​:06 2000
+++ perlunicode.pod Tue Sep 12 22​:37​:20 2000

Here's another, albeit in a comment​:

Inline Patch
--- op.h.orig	Sun Aug 13 12:35:24 2000
+++ op.h	Tue Sep 12 21:41:12 2000
@@ -130,9 +130,7 @@
 /* Private for OP_TRANS */
 #define OPpTRANS_FROM_UTF	1
 #define OPpTRANS_TO_UTF		2
-#define OPpTRANS_IDENTICAL	4
-	/* When CU or UC, means straight latin-1 to utf-8 or vice versa */
-	/* Otherwise, IDENTICAL means the right side is the same as the left */
+#define OPpTRANS_IDENTICAL	4	/* right side is same as left */
 #define OPpTRANS_SQUASH		8
 #define OPpTRANS_DELETE		16
 #define OPpTRANS_COMPLEMENT	32
End of Patch.

@p5pRT
Copy link
Author

p5pRT commented Sep 14, 2000

From @nwc10

I've just checked perl5.6's perlop and it happily mentions tr///CU as if
nothing is going to happen. I assume that it would be a good idea to update
this for 5.6.1 to say that this (experimental, because all unicode is
experimental?) feature is depreciated and will be removed in the 5.8
release of perl.

(except that I'm not convinced that the above is near to a good wording, hence
no comprehensive search of the pods and patch)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 14, 2000

From @gsar

On Thu, 14 Sep 2000 14​:13​:20 BST, Nicholas Clark wrote​:

I've just checked perl5.6's perlop and it happily mentions tr///CU as if
nothing is going to happen. I assume that it would be a good idea to update
this for 5.6.1 to say that this (experimental, because all unicode is
experimental?) feature is depreciated and will be removed in the 5.8
release of perl.

It shall be removed in 5.6.1. Larry/Camel-III say so.

Speaking of 5.6.1, if folks can send me (not the list) recommendations
for patches from 5.7.0 they'd like to see in 5.6.1, I'm listening.
Negative votes for patches are welcome too. Mind however that there's
no guarantee your pet patch will end up in there if it doesn't pass my
own sanity checks. :-)

I plan to put out a 5.6.1-trial1 "soon".

Sarathy
gsar@​ActiveState.com

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2000

From [Unknown Contact. See original ticket]

I've just checked perl5.6's perlop and it happily mentions tr///CU as if
nothing is going to happen. I assume that it would be a good idea to update
this for 5.6.1 to say that this (experimental, because all unicode is
experimental?) feature is depreciated and will be removed in the 5.8
release of perl.

(except that I'm not convinced that the above is near to a good wording, hence
no comprehensive search of the pods and patch)

FYI​: the new Camel and Pockref intentionally mention them not at all,
as Larry pronounced them nasty and dead. Probably better that the deed
were done earlier than later.

--tom

@p5pRT p5pRT closed this as completed Nov 28, 2003
@p5pRT
Copy link
Author

p5pRT commented Nov 28, 2003

From The RT System itself

will be in 5.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant