Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction of \U/\L and \u/\l escapes are undocumented #5467

Open
p5pRT opened this issue May 16, 2002 · 5 comments
Open

Interaction of \U/\L and \u/\l escapes are undocumented #5467

p5pRT opened this issue May 16, 2002 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented May 16, 2002

Migrated from rt.perl.org#9360 (status was 'open')

Searchable as RT9360$

@p5pRT
Copy link
Author

p5pRT commented May 16, 2002

From scs@superior.inland-sea.com

This is a bug report for perl from scs@​di.org,
generated with the help of perlbug 1.33 running under perl v5.6.1.


When doing case manipulation with \L, \U, \l and \u, the result of
mixed operations are unexpected and (IMHO) a bug. The following
script​:

  use strict;
  print "host\unaMe is in lower case except for the N and M.\n";
  print "\Lhost\unaMe\E is in lower case, even the N.\n";

prints

  hostNaMe is in lower case except for the N and M.
  hostname is in lower case, even the N.

with both perl5.6.1 and perl5.00503. In the second line printed,
I believe that the proper functioning should be to capitalize the
N in hostname.

One might argue that these are functional (and indeed, they may
be implemented internally as functions), so that the second print
is interpreted something like​:

  print lc( "host" . uc( "n" ) . "aMe" ) .
  " is in lower case, even the N.\n" );

I believe that a state-wise interpretation is more reasonable, ie,
at the beginning of \L the state becomes `force characters to lower
case until \E seen.' When the \u is seen the state switches to
`force next char to upper case, then revert to previous state'.
The \E means abandon all forcing of case.

The usage I suggest is more consistant with perls current treatment
of backslashed single-character handlings, eg,

  print "\Uhost\nnaMe\E\n";

does not print

  HOSTNAME

IMHO the programmer who writes "\Lhost\unaMe\E" clearly wants
`hostName", and IMHO that's what he should get.



Flags​:
  category=core
  severity=low


Site configuration information for perl v5.6.1​:

Configured by root at Tue Mar 26 11​:46​:11 GMT 2002.

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration​:
  Platform​:
  osname=freebsd, osvers=4.5-release, archname=i386-freebsd
  uname='freebsd gohan11.freebsd.org 4.5-release freebsd 4.5-release #0​: sun apr 1 02​:34​:56 pst 2002 asami@​bento.freebsd.org​:usrsrcsyscompilebento i386 '
  config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.6.1/mach -Dprivlib=/usr/local/lib/perl5/5.6.1 -Dman3dir=/usr/local/lib/perl5/5.6.1/man/man3 -Dsitearch=/usr/local/lib/perl5/site_perl/5.6.1/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.6.1 -Ui_gdbm -Ui_malloc -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN"'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
  useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler​:
  cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN" -fno-strict-aliasing',
  optimize='-O -pipe ',
  cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN" -fno-strict-aliasing'
  ccversion='', gccversion='2.95.3 20010315 (release) [FreeBSD]', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags ='-Wl,-E '
  libpth=/usr/lib
  libs=-lm -lc -lcrypt -lutil
  perllibs=-lm -lc -lcrypt -lutil
  libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
  cccdlflags='-DPIC -fpic', lddlflags='-shared '

Locally applied patches​:
 


@​INC for perl v5.6.1​:
  /usr/local/lib/perl5/5.6.1/BSDPAN
  /usr/local/lib/perl5/5.6.1/mach
  /usr/local/lib/perl5/5.6.1
  /usr/local/lib/perl5/site_perl/5.6.1/mach
  /usr/local/lib/perl5/site_perl/5.6.1
  /usr/local/lib/perl5/site_perl
  .


Environment for perl v5.6.1​:
  HOME=/home/scs
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/scs/.bin​:/usr/inland-sea/bin​:/usr/local/sbin​:/usr/local/bin​:/home/scs/.bin​:/usr/inland-sea/bin​:/usr/local/bin​:/usr/local/sbin​:/sbin​:/bin​:/usr/sbin​:/usr/bin​:/usr/games​:/usr/X11R6/bin
  PERL_BADLANG (unset)
  SHELL=/usr/local/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Jun 7, 2006

From dland@landgren.net

This isn't limited to FreeBSD, it's general to all perls from at least
from 5.005_03 onwards. I independently rediscovered this bug a few weeks
ago.

(\l is "lower case next letter", \U is "uppercase until \E or EOS").

For instance​:

"\Un\lext" eq "NeXT" # wrong, currently "NEXT"

because toke.c breaks this up as

  uc("n" . lcfirst("ext"))

And while this is fixable, I think it would be unwise for maint.

For maint+blead, should we document that \U takes precedence over \l,
and \L takes precedence of \u (documenting the implementation)?

Do we fix it for blead? If not, then the bug could be rejected.

David

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2016

From @dcollinsn

It seems to me that we should at least document this behavior and test for it. Since changing it now could affect existing code, and evidently is not easy (since the tokenizer implements them as uc() and lcfirst(), it would take a significant overhaul to change this behavior), we should resolve the documentation ambiguity. This patch does so, and adds a test.

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2016

From @dcollinsn

0001-RT-9360-Document-interaction-of-U-L-u-l.patch
From f699759c922a0976bdd5295e9f7e7c58ceee7ffc Mon Sep 17 00:00:00 2001
From: Dan Collins <dcollinsn@gmail.com>
Date: Mon, 4 Jul 2016 21:25:30 -0400
Subject: [PATCH] [RT #9360] Document interaction of \U \L \u \l

---
 pod/perlop.pod | 5 +++++
 t/op/lc.t      | 6 +++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 365c962..42ef2d7 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1592,6 +1592,11 @@ C<\E> for each.  For example:
  say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
  This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
 
+In the case of a conflict between C<\L>, C<\U>, C<\l>, and C<\u>, the
+outermost escape sequence will apply. For example, C<\L\Utest> is C<test>,
+and C<\Lt\uest> is C<test>. If you find this surprising, consider that
+the latter example is interpreted as C<lc('t' . ucfirst('est'))>.
+
 If a S<C<use locale>> form that includes C<LC_CTYPE> is in effect (see
 L<perllocale>), the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is
 taken from the current locale.  If Unicode (for example, C<\N{}> or code
diff --git a/t/op/lc.t b/t/op/lc.t
index 2ce65ac..565c1cb 100644
--- a/t/op/lc.t
+++ b/t/op/lc.t
@@ -16,7 +16,7 @@ BEGIN {
 
 use feature qw( fc );
 
-plan tests => 139 + 4 * 256;
+plan tests => 142 + 4 * 256;
 
 is(lc(undef),	   "", "lc(undef) is ''");
 is(lcfirst(undef), "", "lcfirst(undef) is ''");
@@ -341,6 +341,10 @@ SKIP: {
     is($x, "A", "first { fc }");
 }
 
+# RT #9360: \L and \U vs \l and \u
+is("\Utest",   'TEST', 'RT #9360: \L, \U, \l, \u');
+is("\Ute\Est", 'TEst', 'RT #9360: \L, \U, \l, \u');
+is("\Ut\lest", 'TEST', 'RT #9360: \L, \U, \l, \u');
 
 my $utf8_locale = find_utf8_ctype_locale();
 
-- 
2.8.1

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2016

From @khwilliamson

On Mon Jul 04 18​:31​:20 2016, dcollinsn@​gmail.com wrote​:

It seems to me that we should at least document this behavior and test
for it. Since changing it now could affect existing code, and
evidently is not easy (since the tokenizer implements them as uc() and
lcfirst(), it would take a significant overhaul to change this
behavior), we should resolve the documentation ambiguity. This patch
does so, and adds a test.

Please read the thread beginning at
http​://www.nntp.perl.org/group/perl.perl5.porters/2012/01/msg181429.html

I'm unsure as to the right course.
--
Karl Williamson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants