Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report: error with \L, \l, \U and \u operators #11145

Open
p5pRT opened this issue Feb 21, 2011 · 11 comments
Open

Bug report: error with \L, \l, \U and \u operators #11145

p5pRT opened this issue Feb 21, 2011 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 21, 2011

Migrated from rt.perl.org#84578 (status was 'open')

Searchable as RT84578$

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2011

From g+i@gameintellect.com

Hello Perl maintainers,

does the operators \L, \l, \U, and \u have right to left associativity,
or vice versa? I think, the operators must have right to left associativity as =.
Have the operators such property as a priority?

print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right.
print "\u\la\n"; # A First \l, then \u. Right
print "\l\ua\n"; # a First \u, then \l. Right
print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!
print lc "\udD\n"; # dd Yes, the result differ from previous line!
print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...

print "\L\Ua\n"; # Syntax error, oops!
print "\U\La\n"; # Syntax error
print "\L\La\n"; # Syntax error
print "\u\la\n"; # A Right
print "\u\ua\n"; # A Right

--
Regards,
Serge

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2011

From @Abigail

On Mon, Feb 21, 2011 at 06​:57​:58AM -0800, Serge wrote​:

# New Ticket Created by Serge
# Please include the string​: [perl #84578]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=84578 >

Hello Perl maintainers,

does the operators \L, \l, \U, and \u have right to left associativity,
or vice versa? I think, the operators must have right to left associativity as =.
Have the operators such property as a priority?

All that's being said about priorities in "Gory details of parsing quoted
constructs" in perlop is​:

  All operations above are performed simultaneously, left to right.

Which is vague enough that any behaviour described below can be explained
away as "not a bug". ;-)

print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right.
print "\u\la\n"; # A First \l, then \u. Right
print "\l\ua\n"; # a First \u, then \l. Right
print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

  lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous line!
print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops!
print "\U\La\n"; # Syntax error
print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

print "\u\la\n"; # A Right
print "\u\ua\n"; # A Right

Abigail

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @dcollinsn

On Tue Feb 22 02​:31​:59 2011, abigail@​abigail.be wrote​:

print "\L\udD\n"; # Dd It seems, first works \L, then \u. I
think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous
line!

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops!
print "\U\La\n"; # Syntax error
print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

Abigail

This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.

--
Respectfully,
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @khwilliamson

On 08/15/2016 10​:40 AM, Dan Collins via RT wrote​:

On Tue Feb 22 02​:31​:59 2011, abigail@​abigail.be wrote​:

print "\L\udD\n"; # Dd It seems, first works \L, then \u. I
think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous
line!

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops!
print "\U\La\n"; # Syntax error
print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

Abigail

This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.

The whole thing is broken. I'm not sure I agree with your assessment.

IIRC we decided that someone would look thoroughly at the situation and
come back with a proposal. I thought demerphq was doing it, and he
thought I was doing it, and we both hoped someone else would do it. And
there it remains.

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q,
and merge all the other tickets into it.

b) note that the regex pattern results diverge from the double-quoted
string results, and the latter is more sane; so that the regex code
should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/'
(?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a
BELL character.

$ blead -le 'print "\L\ABCD"'
Unrecognized escape \A passed through at -e line 1.
abcd

acts like what I consider sanely, as does this​:

$ blead -le 'print "\l\ABCD"'
Unrecognized escape \A passed through at -e line 1.
aBCD

but I don't know about this​:

blead -le 'print qr/\l\ABCD/'
(?^​:\ABCD)

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @cpansprout

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

but I don't know about this​:

blead -le 'print qr/\l\ABCD/'
(?^​:\ABCD)

lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.

Most of the code that handle this is in the tokenizer. I know that code fairly well, so I could fix it easily. I just need to know *how* things *should* behave.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @cpansprout

On Mon Aug 15 14​:24​:23 2016, sprout wrote​:

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

but I don't know about this​:

blead -le 'print qr/\l\ABCD/'
(?^​:\ABCD)

lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.

Oh, I see what you are getting at. qq behaves differently, because things happen in a different order​:

$ perl -lwe 'print "\l\ABCD"'
Unrecognized escape \A passed through at -e line 1.
aBCD

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @cpansprout

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q,
and merge all the other tickets into it.

I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).

b) note that the regex pattern results diverge from the double-quoted
string results, and the latter is more sane; so that the regex code
should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/'
(?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a
BELL character.

And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.

They do not have to be fixed at the same time.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2016

From @khwilliamson

On 08/15/2016 03​:54 PM, Father Chrysostomos via RT wrote​:

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q,
and merge all the other tickets into it.

I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).

b) note that the regex pattern results diverge from the double-quoted
string results, and the latter is more sane; so that the regex code
should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/'
(?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a
BELL character.

And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.

They do not have to be fixed at the same time.

Perhaps not, but any decision will need to consider the effects on the
totality of the language

@Grinnz
Copy link
Contributor

Grinnz commented Aug 4, 2022

As I recently commented on the mailing list:

To put my 2c in for this part, it is necessary and useful that certain ones nest:

perl -E'say "\u\LfoO"'
Foo

perl -E'say "\l\UFoO"'
fOO

So unless there's a compelling reason otherwise it seems intuitive for them all to work consistently with that.

@demerphq
Copy link
Collaborator

demerphq commented Aug 5, 2022

Just wanted to note that double quoted strings and regex behave differently with regards to escape characters necessarily, and that this necessarily interacts with \U \L and friends differently. The basic issue is that in the regex engine escapes do not mean their literal equivalent, and in a double quoted string they do. Arguably in regex quoting \Q \L \U and friends should be deferred to the regex engine, and act as modifiers to the regex parser and not be converted by the toker at all. We should focus on getting the rules right for double quoted strings, and then have the regex engine simulate that as much as is sensible.

Consider that /a\x{7c}b/ matches very differently to /a|b/, but "a\x{7c}b" and "a|b" are the same strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants