Interaction of \U/\L and \u/\l escapes are undocumented #5467

p5pRT · 2002-05-16T11:18:07Z

Migrated from rt.perl.org#9360 (status was 'open')

Searchable as RT9360$

p5pRT · 2002-05-16T11:18:07Z

From scs@superior.inland-sea.com

This is a bug report for perl from scs@di.org,
generated with the help of perlbug 1.33 running under perl v5.6.1.

When doing case manipulation with \L, \U, \l and \u, the result of
mixed operations are unexpected and (IMHO) a bug. The following
script:

use strict;
print "host\unaMe is in lower case except for the N and M.\n";
print "\Lhost\unaMe\E is in lower case, even the N.\n";

prints

hostNaMe is in lower case except for the N and M.
hostname is in lower case, even the N.

with both perl5.6.1 and perl5.00503. In the second line printed,
I believe that the proper functioning should be to capitalize the
N in hostname.

One might argue that these are functional (and indeed, they may
be implemented internally as functions), so that the second print
is interpreted something like:

print lc( "host" . uc( "n" ) . "aMe" ) .
" is in lower case, even the N.\n" );

I believe that a state-wise interpretation is more reasonable, ie,
at the beginning of \L the state becomes `force characters to lower
case until \E seen.' When the \u is seen the state switches to
`force next char to upper case, then revert to previous state'.
The \E means abandon all forcing of case.

The usage I suggest is more consistant with perls current treatment
of backslashed single-character handlings, eg,

print "\Uhost\nnaMe\E\n";

does not print

HOSTNAME

IMHO the programmer who writes "\Lhost\unaMe\E" clearly wants
`hostName", and IMHO that's what he should get.

Flags:
category=core
severity=low

Site configuration information for perl v5.6.1:

Configured by root at Tue Mar 26 11:46:11 GMT 2002.

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
Platform:
osname=freebsd, osvers=4.5-release, archname=i386-freebsd
uname='freebsd gohan11.freebsd.org 4.5-release freebsd 4.5-release #0: sun apr 1 02:34:56 pst 2002 asami@bento.freebsd.org:usrsrcsyscompilebento i386 '
config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.6.1/mach -Dprivlib=/usr/local/lib/perl5/5.6.1 -Dman3dir=/usr/local/lib/perl5/5.6.1/man/man3 -Dsitearch=/usr/local/lib/perl5/site_perl/5.6.1/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.6.1 -Ui_gdbm -Ui_malloc -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN"'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Compiler:
cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN" -fno-strict-aliasing',
optimize='-O -pipe ',
cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.6.1/BSDPAN" -fno-strict-aliasing'
ccversion='', gccversion='2.95.3 20010315 (release) [FreeBSD]', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='cc', ldflags ='-Wl,-E '
libpth=/usr/lib
libs=-lm -lc -lcrypt -lutil
perllibs=-lm -lc -lcrypt -lutil
libc=, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-DPIC -fpic', lddlflags='-shared '

Locally applied patches:

@INC for perl v5.6.1:
/usr/local/lib/perl5/5.6.1/BSDPAN
/usr/local/lib/perl5/5.6.1/mach
/usr/local/lib/perl5/5.6.1
/usr/local/lib/perl5/site_perl/5.6.1/mach
/usr/local/lib/perl5/site_perl/5.6.1
/usr/local/lib/perl5/site_perl
.

Environment for perl v5.6.1:
HOME=/home/scs
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/scs/.bin:/usr/inland-sea/bin:/usr/local/sbin:/usr/local/bin:/home/scs/.bin:/usr/inland-sea/bin:/usr/local/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/X11R6/bin
PERL_BADLANG (unset)
SHELL=/usr/local/bin/zsh

p5pRT · 2006-06-07T14:03:44Z

From dland@landgren.net

This isn't limited to FreeBSD, it's general to all perls from at least
from 5.005_03 onwards. I independently rediscovered this bug a few weeks
ago.

(\l is "lower case next letter", \U is "uppercase until \E or EOS").

For instance:

"\Un\lext" eq "NeXT" # wrong, currently "NEXT"

because toke.c breaks this up as

uc("n" . lcfirst("ext"))

And while this is fixable, I think it would be unwise for maint.

For maint+blead, should we document that \U takes precedence over \l,
and \L takes precedence of \u (documenting the implementation)?

Do we fix it for blead? If not, then the bug could be rejected.

David

p5pRT · 2016-07-05T01:31:20Z

From @dcollinsn

It seems to me that we should at least document this behavior and test for it. Since changing it now could affect existing code, and evidently is not easy (since the tokenizer implements them as uc() and lcfirst(), it would take a significant overhaul to change this behavior), we should resolve the documentation ambiguity. This patch does so, and adds a test.

p5pRT · 2016-07-05T01:31:20Z

From @dcollinsn

0001-RT-9360-Document-interaction-of-U-L-u-l.patch

From f699759c922a0976bdd5295e9f7e7c58ceee7ffc Mon Sep 17 00:00:00 2001
From: Dan Collins <dcollinsn@gmail.com>
Date: Mon, 4 Jul 2016 21:25:30 -0400
Subject: [PATCH] [RT #9360] Document interaction of \U \L \u \l

---
 pod/perlop.pod | 5 +++++
 t/op/lc.t      | 6 +++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 365c962..42ef2d7 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1592,6 +1592,11 @@ C<\E> for each.  For example:
  say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
  This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
 
+In the case of a conflict between C<\L>, C<\U>, C<\l>, and C<\u>, the
+outermost escape sequence will apply. For example, C<\L\Utest> is C<test>,
+and C<\Lt\uest> is C<test>. If you find this surprising, consider that
+the latter example is interpreted as C<lc('t' . ucfirst('est'))>.
+
 If a S<C<use locale>> form that includes C<LC_CTYPE> is in effect (see
 L<perllocale>), the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is
 taken from the current locale.  If Unicode (for example, C<\N{}> or code
diff --git a/t/op/lc.t b/t/op/lc.t
index 2ce65ac..565c1cb 100644
--- a/t/op/lc.t
+++ b/t/op/lc.t
@@ -16,7 +16,7 @@ BEGIN {
 
 use feature qw( fc );
 
-plan tests => 139 + 4 * 256;
+plan tests => 142 + 4 * 256;
 
 is(lc(undef),	   "", "lc(undef) is ''");
 is(lcfirst(undef), "", "lcfirst(undef) is ''");
@@ -341,6 +341,10 @@ SKIP: {
     is($x, "A", "first { fc }");
 }
 
+# RT #9360: \L and \U vs \l and \u
+is("\Utest",   'TEST', 'RT #9360: \L, \U, \l, \u');
+is("\Ute\Est", 'TEst', 'RT #9360: \L, \U, \l, \u');
+is("\Ut\lest", 'TEST', 'RT #9360: \L, \U, \l, \u');
 
 my $utf8_locale = find_utf8_ctype_locale();
 
-- 
2.8.1

p5pRT · 2016-08-10T22:03:37Z

From @khwilliamson

On Mon Jul 04 18:31:20 2016, dcollinsn@gmail.com wrote:

It seems to me that we should at least document this behavior and test
for it. Since changing it now could affect existing code, and
evidently is not easy (since the tokenizer implements them as uc() and
lcfirst(), it would take a significant overhaul to change this
behavior), we should resolve the documentation ambiguity. This patch
does so, and adds a test.

Please read the thread beginning at
http://www.nntp.perl.org/group/perl.perl5.porters/2012/01/msg181429.html

I'm unsure as to the right course.
--
Karl Williamson

p5pRT added Severity Low distro-All hasTest type-core labels Oct 18, 2019

xenu removed affects-5.5 labels Nov 19, 2021

xenu removed the Severity Low label Dec 29, 2021

This was referenced Aug 4, 2022

\U ... \Q ... \E ... \E #8846

Open

Interaction of case-modifiers (\U, \L, \u, \l, \F, \Q, \E) in double quoted strings #20042

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction of \U/\L and \u/\l escapes are undocumented #5467

Interaction of \U/\L and \u/\l escapes are undocumented #5467

p5pRT commented May 16, 2002

p5pRT commented May 16, 2002

p5pRT commented Jun 7, 2006

p5pRT commented Jul 5, 2016

p5pRT commented Jul 5, 2016

p5pRT commented Aug 10, 2016

Interaction of \U/\L and \u/\l escapes are undocumented #5467

Interaction of \U/\L and \u/\l escapes are undocumented #5467

Comments

p5pRT commented May 16, 2002

p5pRT commented May 16, 2002

From scs@superior.inland-sea.com

p5pRT commented Jun 7, 2006

From dland@landgren.net

p5pRT commented Jul 5, 2016

From @dcollinsn

p5pRT commented Jul 5, 2016

From @dcollinsn

p5pRT commented Aug 10, 2016

From @khwilliamson