Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] addition to perlrecharclass about '$' as "special" #15601

Closed
p5pRT opened this issue Sep 15, 2016 · 19 comments
Closed

[PATCH] addition to perlrecharclass about '$' as "special" #15601

p5pRT opened this issue Sep 15, 2016 · 19 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Sep 15, 2016

Migrated from rt.perl.org#129277 (status was 'resolved')

Searchable as RT129277$

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

From cpan@goess.org

Created by cpan@goess.org

I'm suggesting a documentation change. perlrecharclass says "Most characters
that are meta characters in regular expressions...lose their special meaning
and can be used inside a character class without the need to escape them" and
goes on to list the ones that do need to be escaped. It does *not* list a '$'.
But this will not match a dollar sign or a comma​:

  [$,]

and it would be good advice to point out that a '$' is as special inside a
character class as it is anywhere else in a regular expression.

Perl Info


[documentation-irrelevant systems information elided for security paranoia]

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

From cpan@goess.org

0001-adding-note-re-in-character-classes.patch
From bcda3765acb6e782fac719483de50b4ab51ab13b Mon Sep 17 00:00:00 2001
From: "Kevin M. Goess" <cpan@goess.org>
Date: Thu, 15 Sep 2016 11:19:06 -0700
Subject: [PATCH] adding note re '$' in character classes

It would be helpful to know that a $ in a character class really *does* want to
be escaped, unless you really do intend to refer to the perl special variable
whose name you've happened to create.  Which is to say:

    [$!]

should probably really be

    [\$!]
---
 pod/perlrecharclass.pod | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 89f4a7e..61c8b3a 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -543,6 +543,12 @@ bracketed character class.
 Also, a backslash followed by two or three octal digits is considered an octal
 number.
 
+A C<$> is just as special in a character class as it is anywhere else in a
+regular expression: if it's not marking the end of a pattern then it's the
+sigil for a perl variable. So the class C<[$?]> actually interpolates to this
+C<[0]> or whatever the current value of your $CHILD_ERROR special variable
+happens to be.
+
 A C<[> is not special inside a character class, unless it's the start of a
 POSIX character class (see L</POSIX Character Classes> below). It normally does
 not need escaping.
-- 
1.8.3.1

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

From @jkeenan

On Thu Sep 15 11​:34​:02 2016, cpan@​goess.org wrote​:

This is a bug report for perl from cpan@​goess.org,
generated with the help of perlbug 1.40 running under perl 5.25.5.

-----------------------------------------------------------------
[Please describe your issue here]

I'm suggesting a documentation change. perlrecharclass says "Most
characters
that are meta characters in regular expressions...lose their special
meaning
and can be used inside a character class without the need to escape
them" and
goes on to list the ones that do need to be escaped. It does *not*
list a '$'.
But this will not match a dollar sign or a comma​:

[$,]

and it would be good advice to point out that a '$' is as special
inside a
character class as it is anywhere else in a regular expression.

While I agree with the general thrust of the patch, I think we're going to have to brainstorm for edge cases before we nail down its final wording.

Consider the following​:

#####
$ cat 129277-charclass-dollar.pl
#####
# perl
use strict;
use warnings;
use 5.10.1;

{
  my $str;
  local $/ = "\n";

  $str = 'This is a string with hard-quoted $/ in its middle.';
  say $str;
  say ( ($str =~ m{[$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[\$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[^$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[^\$/]}) ? "Yes" : "No" );

  $str = "This is a string with interpolated $/ in its middle.";
  say $str;
  say ( ($str =~ m{[$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[\$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[^$/]}) ? "Yes" : "No" );
  say ( ($str =~ m{[^\$/]}) ? "Yes" : "No" );
}

__END__
#####

Output​:

#####
$ perl 129277-charclass-dollar.pl
#####
This is a string with hard-quoted $/ in its middle.
No
Yes
Yes
Yes
This is a string with interpolated
in its middle.
Yes
No
Yes
Yes
#####

Is that the output we all expect? How do we describe the behavior of the "dollar-variables" inside negated character classes?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

From @mauke

Am 15.09.2016 um 20​:34 schrieb Kevin Goess (via RT)​:

# New Ticket Created by Kevin Goess
# Please include the string​: [perl #129277]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129277 >

This is a bug report for perl from cpan@​goess.org,
generated with the help of perlbug 1.40 running under perl 5.25.5.

-----------------------------------------------------------------
[Please describe your issue here]

I'm suggesting a documentation change. perlrecharclass says "Most characters
that are meta characters in regular expressions...lose their special meaning
and can be used inside a character class without the need to escape them" and
goes on to list the ones that do need to be escaped. It does *not* list a '$'.
But this will not match a dollar sign or a comma​:

[$,]

It will if you use it in m'...', as in m'[$,]'.

--
Lukas Mai <plokinom@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2016

From @Abigail

On Thu, Sep 15, 2016 at 11​:34​:02AM -0700, Kevin Goess wrote​:

# New Ticket Created by Kevin Goess
# Please include the string​: [perl #129277]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129277 >

This is a bug report for perl from cpan@​goess.org,
generated with the help of perlbug 1.40 running under perl 5.25.5.

-----------------------------------------------------------------
[Please describe your issue here]

I'm suggesting a documentation change. perlrecharclass says "Most characters
that are meta characters in regular expressions...lose their special meaning
and can be used inside a character class without the need to escape them" and
goes on to list the ones that do need to be escaped. It does *not* list a '$'.
But this will not match a dollar sign or a comma​:

[$,]

But it does​:

  $ perl -wE 'say q{$} =~ q{[$,]}'
  1
  $

and it would be good advice to point out that a '$' is as special inside a
character class as it is anywhere else in a regular expression.

Now, if you use delimiters which allow for interpolation of variables,
said interpolation will happen -- but it will happen before perl even
knows there is a character class.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @khwilliamson

On Thu Sep 15 16​:13​:23 2016, abigail@​abigail.be wrote​:

On Thu, Sep 15, 2016 at 11​:34​:02AM -0700, Kevin Goess wrote​:

# New Ticket Created by Kevin Goess
# Please include the string​: [perl #129277]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129277 >

This is a bug report for perl from cpan@​goess.org,
generated with the help of perlbug 1.40 running under perl 5.25.5.

-----------------------------------------------------------------
[Please describe your issue here]

I'm suggesting a documentation change. perlrecharclass says "Most
characters
that are meta characters in regular expressions...lose their special
meaning
and can be used inside a character class without the need to escape
them" and
goes on to list the ones that do need to be escaped. It does *not*
list a '$'.
But this will not match a dollar sign or a comma​:

[$,]

But it does​:

$ perl -wE 'say q{$} =~ q{[$,]}'
1
$

and it would be good advice to point out that a '$' is as special
inside a
character class as it is anywhere else in a regular expression.

Now, if you use delimiters which allow for interpolation of variables,
said interpolation will happen -- but it will happen before perl even
knows there is a character class.

Abigail

How about the attached patch instead?

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @khwilliamson

0001-alternate-patch-for-129277.patch
From a6b63f4e4781c3ae0a4dc1e4ae0eedd98bb7e781 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Thu, 15 Sep 2016 21:52:44 -0600
Subject: [PATCH] alternate patch for 129277

---
 pod/perlrecharclass.pod | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 89f4a7e..51ad3db 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -512,7 +512,13 @@ is, characters that carry a special meaning like C<.>, C<*>, or C<(>) lose
 their special meaning and can be used inside a character class without
 the need to escape them. For instance, C<[()]> matches either an opening
 parenthesis, or a closing parenthesis, and the parens inside the character
-class don't group or capture.
+class don't group or capture.  Beware that, unless the pattern is
+enclosed in single-quotes, variable interpolation will take place before
+the bracketed class is parsed:
+
+ $, = "\t| ";
+ $a =~ m'[$,]';        # single-quotish: matches '$' or ','
+ $a =~ m/[$,]/;        # double-quotish: matches "\t", "|", or " "
 
 Characters that may carry a special meaning inside a character class are:
 C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be
-- 
2.7.4

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @khwilliamson

Or this slightly improved version?

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From @khwilliamson

0001-alternate-patch-for-129277.patch
From 8828f70c81bbe6d7af0213f6345831c120b72ad9 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Thu, 15 Sep 2016 21:52:44 -0600
Subject: [PATCH] alternate patch for 129277

---
 pod/perlrecharclass.pod | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 89f4a7e..89a481c 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -512,7 +512,14 @@ is, characters that carry a special meaning like C<.>, C<*>, or C<(>) lose
 their special meaning and can be used inside a character class without
 the need to escape them. For instance, C<[()]> matches either an opening
 parenthesis, or a closing parenthesis, and the parens inside the character
-class don't group or capture.
+class don't group or capture.  Beware that, unless the pattern is
+evaluated in single-quotish context, variable interpolation will take
+place before the bracketed class is parsed:
+
+ $, = "\t| ";
+ $a =~ m'[$,]';        # single-quotish: matches '$' or ','
+ $a =~ q{[$,]}'        # same
+ $a =~ m/[$,]/;        # double-quotish: matches "\t", "|", or " "
 
 Characters that may carry a special meaning inside a character class are:
 C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be
-- 
2.7.4

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From [Unknown Contact. See original ticket]

Or this slightly improved version?

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2016

From kevin@goess.org

Instead of "beware that" I might say "be aware that", but otherwise it
looks good to me.

On Mon, Oct 17, 2016 at 7​:44 PM, Karl Williamson via RT <
perlbug-followup@​perl.org> wrote​:

On Thu Sep 15 16​:13​:23 2016, abigail@​abigail.be wrote​:

On Thu, Sep 15, 2016 at 11​:34​:02AM -0700, Kevin Goess wrote​:

# New Ticket Created by Kevin Goess
# Please include the string​: [perl #129277]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129277 >

This is a bug report for perl from cpan@​goess.org,
generated with the help of perlbug 1.40 running under perl 5.25.5.

-----------------------------------------------------------------
[Please describe your issue here]

I'm suggesting a documentation change. perlrecharclass says "Most
characters
that are meta characters in regular expressions...lose their special
meaning
and can be used inside a character class without the need to escape
them" and
goes on to list the ones that do need to be escaped. It does *not*
list a '$'.
But this will not match a dollar sign or a comma​:

[$,]

But it does​:

$ perl -wE 'say q{$} =~ q{[$,]}'
1
$

and it would be good advice to point out that a '$' is as special
inside a
character class as it is anywhere else in a regular expression.

Now, if you use delimiters which allow for interpolation of variables,
said interpolation will happen -- but it will happen before perl even
knows there is a character class.

Abigail

How about the attached patch instead?

--
Karl Williamson

From a6b63f4e4781c3ae0a4dc1e4ae0eedd98bb7e781 Mon Sep 17 00​:00​:00 2001
From​: Karl Williamson <khw@​cpan.org>
Date​: Thu, 15 Sep 2016 21​:52​:44 -0600
Subject​: [PATCH] alternate patch for 129277

---
pod/perlrecharclass.pod | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 89f4a7e..51ad3db 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@​@​ -512,7 +512,13 @​@​ is, characters that carry a special meaning like
C<.>, C<*>, or C<(>) lose
their special meaning and can be used inside a character class without
the need to escape them. For instance, C<[()]> matches either an opening
parenthesis, or a closing parenthesis, and the parens inside the character
-class don't group or capture.
+class don't group or capture. Beware that, unless the pattern is
+enclosed in single-quotes, variable interpolation will take place before
+the bracketed class is parsed​:
+
+ $, = "\t| ";
+ $a =~ m'[$,]'; # single-quotish​: matches '$' or ','
+ $a =~ m/[$,]/; # double-quotish​: matches "\t", "|", or " "

Characters that may carry a special meaning inside a character class are​:
C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be
--
2.7.4

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2016

From @khwilliamson

In the absence of further comment, I changed it to 'Be aware', and applied it as
dff5470f076e80e45a5a05627dcc8622402d6416
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2016

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2016

From @khwilliamson

On Wed Oct 26 10​:35​:42 2016, khw wrote​:

In the absence of further comment, I changed it to 'Be aware', and
applied it as
dff5470f076e80e45a5a05627dcc8622402d6416

Oops, the commit really was
6e16fd3

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2016

From @Abigail

On Mon, Oct 17, 2016 at 07​:44​:39PM -0700, Karl Williamson via RT wrote​:

How about the attached patch instead?

--
Karl Williamson

---
via perlbug​: queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=129277

From a6b63f4e4781c3ae0a4dc1e4ae0eedd98bb7e781 Mon Sep 17 00​:00​:00 2001
From​: Karl Williamson <khw@​cpan.org>
Date​: Thu, 15 Sep 2016 21​:52​:44 -0600
Subject​: [PATCH] alternate patch for 129277

---
pod/perlrecharclass.pod | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 89f4a7e..51ad3db 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@​@​ -512,7 +512,13 @​@​ is, characters that carry a special meaning like C<.>, C<*>, or C<(>) lose
their special meaning and can be used inside a character class without
the need to escape them. For instance, C<[()]> matches either an opening
parenthesis, or a closing parenthesis, and the parens inside the character
-class don't group or capture.
+class don't group or capture. Beware that, unless the pattern is
+enclosed in single-quotes, variable interpolation will take place before
+the bracketed class is parsed​:
+
+ $, = "\t| ";
+ $a =~ m'[$,]'; # single-quotish​: matches '$' or ','
+ $a =~ m/[$,]/; # double-quotish​: matches "\t", "|", or " "

Characters that may carry a special meaning inside a character class are​:
C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be

Looks good to me.

Abigail

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant