Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

underscore regex delimiters #9853

Closed
p5pRT opened this issue Aug 26, 2009 · 11 comments
Closed

underscore regex delimiters #9853

p5pRT opened this issue Aug 26, 2009 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 26, 2009

Migrated from rt.perl.org#68804 (status was 'resolved')

Searchable as RT68804$

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2009

From chip@seas.upenn.edu

Created by chip@seas.upenn.edu

Substitutions don't work if the delimiter is the underscore character.
Reproduced under Suse 11.1 and rhel 5.3 .

$ perl -p -e 's#/32##;'
58.128.45.21/32 <<<<< input
58.128.45.21 >>>>> output
^C
$ perl -p -e 's_/32__;'
58.128.45.21/32 <<<<< input
58.128.45.21/32 >>>>> output
^C

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl 5.10.0 - Wed Jan 28 15:26:33 UTC 2009
It is being executed now by  Perl 5.10.0 - Wed Jan 28 15:18:54 UTC 2009.

Site configuration information for perl 5.10.0:

Configured by abuild at Wed Jan 28 15:18:54 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.27, archname=x86_64-linux-thread-multi
    uname='linux nono 2.6.27 #1 smp thu may 17 14:00:09 utc 2007 x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.2 [gcc-4_3-branch revision 141291]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.9.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.9'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/home1/c/chip
    LANG=en_US
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home1/c/chip/Scripts:/home1/c/chip/Scripts:/home1/c/chip/Scripts:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/opt/kde3/bin:/opt/gnome/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/opt/kde3/bin:/opt/gnome/bin
    PERL_BADLANG (unset)
    SHELL=/pkg/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2009

From @rgs

2009/8/26 chip@​seas.upenn.edu (via RT) <perlbug-followup@​perl.org>​:

Substitutions don't work if the delimiter is the underscore character.
Reproduced under Suse 11.1 and rhel 5.3 .

Hmm but _ is a word character. What if I have a function named
s_ubst_itutin_g for example ?

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2009

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2009

From zefram@fysh.org

chip@​seas.upenn.edu (via RT) wrote​:

$ perl -p -e 's_/32__;'

The underscore is not perceived as a delimiter there, but as part of
an identifier. Observe how it was parsed​:

$ perl -MO=Deparse -e 's#/32##;'
s[/32][];
-e syntax OK
$ perl -MO=Deparse -e 's_/32__;'
's_' / 32;
-e syntax OK

Other clues are available if you turn on warnings​:

$ perl -wce 's_/32__;'
Misplaced _ in number at -e line 1.
Misplaced _ in number at -e line 1.
Useless use of division (/) in void context at -e line 1.
-e syntax OK

And if the substitution had contained other text then it would have
blown up earlier​:

$ perl -wce 's_\\__;'
Backslash found where operator expected at -e line 1, near "s_\"
syntax error at -e line 1, near "s_\"
-e had compilation errors.

I believe the documentation is at fault. perlop(1) says​:

  Any non-alphanumeric, non-whitespace delimiter may replace the
  slashes.

Underscore is not alphanumeric or whitespace, but is evidently
being treated the same way that an alphanumeric character would be.
The prohibition is really on identifier characters, not alphanumerics.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2009

From @Abigail

On Thu, Aug 27, 2009 at 09​:33​:13AM +0100, Zefram wrote​:

chip@​seas.upenn.edu (via RT) wrote​:

$ perl -p -e 's_/32__;'

The underscore is not perceived as a delimiter there, but as part of
an identifier. Observe how it was parsed​:

$ perl -MO=Deparse -e 's#/32##;'
s[/32][];
-e syntax OK
$ perl -MO=Deparse -e 's_/32__;'
's_' / 32;
-e syntax OK

Other clues are available if you turn on warnings​:

$ perl -wce 's_/32__;'
Misplaced _ in number at -e line 1.
Misplaced _ in number at -e line 1.
Useless use of division (/) in void context at -e line 1.
-e syntax OK

And if the substitution had contained other text then it would have
blown up earlier​:

$ perl -wce 's_\\__;'
Backslash found where operator expected at -e line 1, near "s_\"
syntax error at -e line 1, near "s_\"
-e had compilation errors.

I believe the documentation is at fault. perlop(1) says​:

Any non\-alphanumeric\, non\-whitespace delimiter may replace the
slashes\.

Underscore is not alphanumeric or whitespace, but is evidently
being treated the same way that an alphanumeric character would be.
The prohibition is really on identifier characters, not alphanumerics.

But even then, the documentation isn't correct. You *can* use word characters
as delimiters​:

  s _/32__

and

  s s/32ss

are fine.

The problem here lies in the tokenization part​: the first token of
C<< s_/32__ >> is C<< s_ >>, which isn't the substitution operator.

In fact, this issue isn't any different from saying you cannot use C<< _ >>
as a function argument because you wrote​:

  C<< func_ >>

which isn't parsed as C<< func (_) >> either.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2009

From @demerphq

2009/8/27 Abigail <abigail@​abigail.be>​:

On Thu, Aug 27, 2009 at 09​:33​:13AM +0100, Zefram wrote​:

chip@​seas.upenn.edu (via RT) wrote​:

$ perl -p -e 's_/32__;'

The underscore is not perceived as a delimiter there, but as part of
an identifier.  Observe how it was parsed​:

$ perl -MO=Deparse -e 's#/32##;'
s[/32][];
-e syntax OK
$ perl -MO=Deparse -e 's_/32__;'
's_' / 32;
-e syntax OK

Other clues are available if you turn on warnings​:

$ perl -wce 's_/32__;'
Misplaced _ in number at -e line 1.
Misplaced _ in number at -e line 1.
Useless use of division (/) in void context at -e line 1.
-e syntax OK

And if the substitution had contained other text then it would have
blown up earlier​:

$ perl -wce 's_\\__;'
Backslash found where operator expected at -e line 1, near "s_\"
syntax error at -e line 1, near "s_\"
-e had compilation errors.

I believe the documentation is at fault.  perlop(1) says​:

    Any non-alphanumeric, non-whitespace delimiter may replace the
    slashes.

Underscore is not alphanumeric or whitespace, but is evidently
being treated the same way that an alphanumeric character would be.
The prohibition is really on identifier characters, not alphanumerics.

The doocumentation should read "non-perl-word, non whitespace delimiter"

We use "alphhanumeric" fairly regularly when we speak of
"perl-word-characters" or "identifier" (which cannot be expressed by a
character class as it cannot start with a number, yet can end with
one).

But even then, the documentation isn't correct. You *can* use word characters
as delimiters​:

  s _/32__

and

  s s/32ss

are fine.

The problem here lies in the tokenization part​: the first token of
C<< s_/32__ >> is C<< s_ >>, which isn't the substitution operator.

In fact, this issue isn't any different from saying you cannot use C<< _ >>
as a function argument because you wrote​:

   C<< func_ >>

which isn't parsed as C<< func (_) >> either.

Right. This is not a bug.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 28, 2009

From @nwc10

On Thu, Aug 27, 2009 at 02​:03​:46PM +0200, demerphq wrote​:

2009/8/27 Abigail <abigail@​abigail.be>​:

The doocumentation should read "non-perl-word, non whitespace delimiter"

We use "alphhanumeric" fairly regularly when we speak of
"perl-word-characters" or "identifier" (which cannot be expressed by a
character class as it cannot start with a number, yet can end with
one).

But even then, the documentation isn't correct. You *can* use word characters
as delimiters​:

  s _/32__

and

  s s/32ss

are fine.

The problem here lies in the tokenization part​: the first token of
C<< s_/32__ >> is C<< s_ >>, which isn't the substitution operator.

In fact, this issue isn't any different from saying you cannot use C<< _ >>
as a function argument because you wrote​:

   C<< func_ >>

which isn't parsed as C<< func (_) >> either.

Right. This is not a bug.

in the implementation.

I think we agree that the documentation needs to be improved.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 29, 2009

From @davidnicol

I think we agree that the documentation needs to be improved.

Nicholas Clark

Inline Patch
diff --git a/pod/perlop.pod b/pod/perlop.pod
index 0170202..26d0f1e 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1167,11 +1167,13 @@ process modifiers are available:
     c  Do not reset search position on a failed match when /g is in effect.

 If "/" is the delimiter then the initial C<m> is optional.  With the C<m>
-you can use any pair of non-alphanumeric, non-whitespace characters
+you can use any pair of non-whitespace characters
 as delimiters.  This is particularly useful for matching path names
 that contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
 the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
 If "'" is the delimiter, no interpolation is performed on the PATTERN.
+When using a character valid in an identifier, whitespace is required
+after the C<m>.

 PATTERN may contain variables, which will be interpolated (and the
 pattern recompiled) every time the pattern search is evaluated, except
@@ -1381,13 +1383,13 @@ specific options:
     e  Evaluate the right side as an expression.
     ee  Evaluate the right side as a string then eval the result

-Any non-alphanumeric, non-whitespace delimiter may replace the
-slashes.  If single quotes are used, no interpretation is done on the
-replacement string (the C</e> modifier overrides this, however).  Unlike
-Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
-text is not evaluated as a command.  If the
-PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
-pair of quotes, which may or may not be bracketing quotes, e.g.,
+Any non-whitespace delimiter may replace the slashes.  Add space after
+the C<s> when using a character allowed in identifiers.  If single quotes
+are used, no interpretation is done on the replacement string (the C</e>
+modifier overrides this, however).  Unlike Perl 4, Perl 5 treats backticks
+as normal delimiters; the replacement text is not evaluated as a command.
+If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has
+its own pair of quotes, which may or may not be bracketing quotes, e.g.,
 C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
 replacement portion to be treated as a full-fledged Perl expression
 and evaluated right then and there.  It is, however, syntax checked at
diff --git a/t/op/subst.t b/t/op/subst.t
index 06c04e8..fe03599 100755
--- a/t/op/subst.t
+++ b/t/op/subst.t
@@ -7,7 +7,7 @@ BEGIN {
 }

 require './test.pl';
-plan( tests => 139 );
+plan( tests => 140 );

 $x = 'foo';
 $_ = "x";
@@ -263,6 +263,9 @@ eval 's{foo} # this is a comment, not a delimiter
        {bar};';
 ok( ! @?, 'parsing of split subst with comment' );

+$snum = eval '$_="exactly"; s sxsys;m 3(yactl)3;$1'  #68804
+ok( $snum eq 'yactl', 'alpha delimiters are allowed' );
+
 $_="baacbaa";
 $snum = tr/a/b/s;
 ok( $_ eq "bbcbb" && $snum == 4,



-- 

"Yes to health care, no to wealth care!"

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2009

From @rgs

Thanks, applied to bleadperl.

2009/8/30 David Nicol <davidnicol@​gmail.com>​:

I think we agree that the documentation needs to be improved.

Nicholas Clark

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 0170202..26d0f1e 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@​@​ -1167,11 +1167,13 @​@​ process modifiers are available​:
    c  Do not reset search position on a failed match when /g is in effect.

 If "/" is the delimiter then the initial C<m> is optional.  With the C<m>
-you can use any pair of non-alphanumeric, non-whitespace characters
+you can use any pair of non-whitespace characters
 as delimiters.  This is particularly useful for matching path names
 that contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
 the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
 If "'" is the delimiter, no interpolation is performed on the PATTERN.
+When using a character valid in an identifier, whitespace is required
+after the C<m>.

 PATTERN may contain variables, which will be interpolated (and the
 pattern recompiled) every time the pattern search is evaluated, except
@​@​ -1381,13 +1383,13 @​@​ specific options​:
    e  Evaluate the right side as an expression.
    ee  Evaluate the right side as a string then eval the result

-Any non-alphanumeric, non-whitespace delimiter may replace the
-slashes.  If single quotes are used, no interpretation is done on the
-replacement string (the C</e> modifier overrides this, however).  Unlike
-Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
-text is not evaluated as a command.  If the
-PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
-pair of quotes, which may or may not be bracketing quotes, e.g.,
+Any non-whitespace delimiter may replace the slashes.  Add space after
+the C<s> when using a character allowed in identifiers.  If single quotes
+are used, no interpretation is done on the replacement string (the C</e>
+modifier overrides this, however).  Unlike Perl 4, Perl 5 treats backticks
+as normal delimiters; the replacement text is not evaluated as a command.
+If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has
+its own pair of quotes, which may or may not be bracketing quotes, e.g.,
 C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
 replacement portion to be treated as a full-fledged Perl expression
 and evaluated right then and there.  It is, however, syntax checked at
diff --git a/t/op/subst.t b/t/op/subst.t
index 06c04e8..fe03599 100755
--- a/t/op/subst.t
+++ b/t/op/subst.t
@​@​ -7,7 +7,7 @​@​ BEGIN {
 }

 require './test.pl';
-plan( tests => 139 );
+plan( tests => 140 );

 $x = 'foo';
 $_ = "x";
@​@​ -263,6 +263,9 @​@​ eval 's{foo} # this is a comment, not a delimiter
       {bar};';
 ok( ! @​?, 'parsing of split subst with comment' );

+$snum = eval '$_="exactly"; s sxsys;m 3(yactl)3;$1'  #68804
+ok( $snum eq 'yactl', 'alpha delimiters are allowed' );
+
 $_="baacbaa";
 $snum = tr/a/b/s;
 ok( $_ eq "bbcbb" && $snum == 4,

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2009

@rgs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Aug 30, 2009
@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2009

From @davidnicol

On Sun, Aug 30, 2009 at 7​:42 AM, Rafael
Garcia-Suarez<rgarciasuarez@​gmail.com> wrote​:

Thanks, applied to bleadperl.

this is more thorough, and the test more fun​:

Inline Patch
diff --git a/pod/perlop.pod b/pod/perlop.pod
index adf0718..31489e1 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -962,6 +962,9 @@ from the next line.  This allows you to write:
     s {foo}  # Replace foo
       {bar}  # with bar.

+You can use alphanumerics for quote delimiters too, in which case
+the whitespace is required.
+
 The following escape sequences are available in constructs that interpolate
 and in transliterations.
 X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N>
diff --git a/t/op/subst.t b/t/op/subst.t
index 30af8a2..f1b16ee 100644
--- a/t/op/subst.t
+++ b/t/op/subst.t
@@ -263,8 +263,8 @@ eval 's{foo} # this is a comment, not a delimiter
        {bar};';
 ok( ! @?, 'parsing of split subst with comment' );

-$snum = eval '$_="exactly"; s sxsys;m 3(yactl)3;$1';
-is( $snum, 'yactl', 'alpha delimiters are allowed' );
+{ my $match = eval '$_= q laurel;y subcostalis;s paperclip;m elite;$&';
+is( $match, 'lit', 'alpha delimiters are allowed' ); }

 $_="baacbaa";
 $snum = tr/a/b/s;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant