Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping track of 'Unescaped left brace in regex is deprecated' #12137

Closed
p5pRT opened this issue May 26, 2012 · 51 comments
Closed

Keeping track of 'Unescaped left brace in regex is deprecated' #12137

p5pRT opened this issue May 26, 2012 · 51 comments

Comments

@p5pRT
Copy link

p5pRT commented May 26, 2012

Migrated from rt.perl.org#113094 (status was 'resolved')

Searchable as RT113094$

@p5pRT
Copy link
Author

p5pRT commented May 26, 2012

From @andk

This ticket could serve as a meeting point for authors affected. I'll
write individual tickets to the authors and link to this ticket for
further information.

The commit itself describes the change very well, so there may be
nothing to add. I think it boils down that the use of a '{' in a regexp
needs to be replaced with a '\{'.

The commit itself


http​://perl5.git.perl.org/perl.git/commit/2a53d3314d380af5ab5283758219417c6dfa36e9

The first fail reports


DRTECH/HTML-StripScripts-1.05.tar.gz
http​://www.cpantesters.org/cpan/report/44b2867a-a69c-11e1-ad06-f004f4b14d39

HMBRAND/Data-Peek-0.37.tgz

HMBRAND/Spreadsheet-Read-0.46.tgz

JAK/File-ANVL-1.04.tar.gz
http​://www.cpantesters.org/cpan/report/8d6ccd90-a6b4-11e1-8cad-34edf3b14d39

JSTEBENS/POE-Component-Server-REST-1.11.tar.gz

JUSTER/WWW-AUR-0.14.tar.gz

KJETILK/RDF-Trine-Node-Literal-XML-0.16.tar.gz

KRYDE/Perl-Critic-Pulp-70.tar.gz

KRYDE/distlinks-5.tar.gz
http​://www.cpantesters.org/cpan/report/a1288590-a6d7-11e1-a26b-8e4cf4b14d39

MAROS/Business-UPS-Tracking-1.09.tar.gz
http​://www.cpantesters.org/cpan/report/e5f091ea-a646-11e1-bb55-de44f4b14d39

MONS/XML-RPC-Fast-0.8.tar.gz
http​://www.cpantesters.org/cpan/report/557692fc-a634-11e1-9aa5-411ff4b14d39

SDPRICE/App-Framework-Lite-1.08.tar.gz
http​://www.cpantesters.org/cpan/report/2bbe4ca0-a6b3-11e1-bdb5-314cf4b14d39

WARRINGD/Elive-1.26.tar.gz

perl -V


Summary of my perl5 (revision 5 version 17 subversion 0) configuration​:
  Commit id​: 2a53d33
  Platform​:
  osname=linux, osvers=3.2.0-2-amd64, archname=x86_64-linux-ld
  uname='linux k83 3.2.0-2-amd64 #1 smp mon apr 30 05​:20​:23 utc 2012 x86_64 gnulinux '
  config_args='-Dprefix=/home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e -Dmyhostname=k83 -Dinstallusrbinperl=n -Uversiononly -Dusedevel -des -Ui_db -Uuseithreads -Duselongdouble -DDEBUGGING=-g'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g',
  cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.6.3', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8
  alignbytes=16, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
  libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.13'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl)​:
  Compile-time options​: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV
  PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_USE_DEVEL
  USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES
  USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
  USE_LOCALE_NUMERIC USE_LONG_DOUBLE USE_PERLIO
  USE_PERL_ATOF
  Built under linux
  Compiled at May 25 2012 07​:02​:38
  @​INC​:
  /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/site_perl/5.17.0/x86_64-linux-ld
  /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/site_perl/5.17.0
  /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/5.17.0/x86_64-linux-ld
  /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/5.17.0
  .

--
andreas

@p5pRT
Copy link
Author

p5pRT commented May 27, 2012

From @khwilliamson

On 05/26/2012 02​:59 AM, (Andreas J. Koenig) (via RT) wrote​:

The first fail reports
----------------------
DRTECH/HTML-StripScripts-1.05.tar.gz
http​://www.cpantesters.org/cpan/report/44b2867a-a69c-11e1-ad06-f004f4b14d39

This new deprecation message appears to have exposed a real bug in this
code. It looks like a missing "}" to me, which silently caused a
would-be-quantifier to be treated as a literal.

# Failed test 'use HTML​::StripScripts;'
# at t/10basic.t line 7.
# Tried to use 'HTML​::StripScripts'.
# Error​: Unescaped left brace in regex is deprecated, passed
through in regex; marked by <-- HERE in m/^\s*([+-]?\d{1,20}(?​:\.\d{ <--
HERE 1,20)?)\s*((?​:\%|\*|ex|px|pc|cm|mm|in|pt|em)?)\s*$/ at
/tmp/loop_over_bdir-4PCTbR/HTML-StripScripts-1.05-Id9VGu/blib/lib/HTML/StripScripts.pm
line 1633.
# Compilation failed in require at (eval 4) line 2.

@p5pRT
Copy link
Author

p5pRT commented May 27, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 27, 2012

From @clintongormley

This new deprecation message appears to have exposed a real bug in this
code. It looks like a missing "}" to me, which silently caused a
would-be-quantifier to be treated as a literal.

... which embarrassingly wasn't being tested either.

thanks!

clint

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @andk

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

  995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;
  1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test
[...]
t/0_Config.t .. Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031.
# Package Version
# perl 5.17.0
# XML​::Simple 2.18
# Storable 2.35
# XML​::Parser 2.41
# XML​::SAX 0.99
# XML​::NamespaceSupport 1.11
t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp
is accepted by perl because the brace is escaped.

--
andreas

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @cpansprout

On Mon May 28 00​:51​:00 2012, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

995       $val =~ s\{\\$\\\{\(\[\\w\.\]\+\)\\\}\}\{ $self\->get\_var\($1\) \}ge;

1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test
[...]
t/0_Config.t .. Unescaped left brace in regex is deprecated, passed
through in regex; marked by <-- HERE in m/\${ <-- HERE ([\w.]+)}/
at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-
cUi3UY/blib/lib/XML/Simple.pm line 995.
Unescaped left brace in regex is deprecated, passed through in regex;
marked by <-- HERE in m/\${ <-- HERE (\w+)}/ at
/tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm
line 1031.
# Package Version
# perl 5.17.0
# XML​::Simple 2.18
# Storable 2.35
# XML​::Parser 2.41
# XML​::SAX 0.99
# XML​::NamespaceSupport 1.11
t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp
is accepted by perl because the brace is escaped.

I think this pretty much ends this deprecation. Too many people use {}
as delimiters.

What’s happening above is that delimiter escapes are removed before that
pattern reaches the regular expression engine.

The same thing happens with m.\.., which is equivalent to /./, not /\./.

If one is using {} delimiters, then there is no way to match a literal {
or } without doing something like [{] or [}].

On the other hand, m{ a\{1,2\} }x doesn’t do what most people think it does.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @demerphq

On 28 May 2012 17​:45, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon May 28 00​:51​:00 2012, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

    995       $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;
   1031         $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test
[...]
t/0_Config.t .. Unescaped left brace in regex is deprecated, passed
   through in regex; marked by <-- HERE in m/\${ <-- HERE ([\w.]+)}/
   at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-
   cUi3UY/blib/lib/XML/Simple.pm line 995.
Unescaped left brace in regex is deprecated, passed through in regex;
   marked by <-- HERE in m/\${ <-- HERE (\w+)}/ at
   /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm
   line 1031.
# Package                        Version
#  perl                           5.17.0
#  XML​::Simple                    2.18
#  Storable                       2.35
#  XML​::Parser                    2.41
#  XML​::SAX                       0.99
#  XML​::NamespaceSupport          1.11
t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp
is accepted by perl because the brace is escaped.

I think this pretty much ends this deprecation.  Too many people use {}
as delimiters.

Well, we could fix how this is handled. But one wonders if its worth
it, outside of a larger effort anyway.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation. Too many people use {}
as delimiters.

Not at all. It's an especially confusing case, especially deserving of
the clarification wrought by the deprecation.

The same thing happens with m.\.., which is equivalent to /./, not /\./.

Maybe there should be a warning for every use of a backslashed
metacharacter in this manner. It's too easy to be mistaken about
m.\.. and its ilk.

-zefram

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @hvds

andreas.koenig.7os6VVqR@​franz.ak.mind.de (Andreas J. Koenig) wrote​:
:Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:
:
: 995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;
: 1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;
:
:% make test
:[...]
:t/0_Config.t .. Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995.
:Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031.
:# Package Version
:# perl 5.17.0
:# XML​::Simple 2.18
:# Storable 2.35
:# XML​::Parser 2.41
:# XML​::SAX 0.99
:# XML​::NamespaceSupport 1.11
:t/0_Config.t .. ok
:
:
:Look like perl miscounts one backslash. I would expect that the regexp
:is accepted by perl because the brace is escaped.

IIRC, when you use a regex metacharacter as a delimiter, escaping it
gives you the metacharacter​:

% perl -wle 'print "line\n" =~ m$\$$'
1
%

So I think the warning is correct, though somewhat misleadingly expressed.

I don't remember if there is a way to get past that to match the literal.

Hugo

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @demerphq

On 28 May 2012 20​:31, Zefram <zefram@​fysh.org> wrote​:

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation.  Too many people use {}
as delimiters.

Not at all.  It's an especially confusing case, especially deserving of
the clarification wrought by the deprecation.

The alternative is to stop the tokenizer from doing this type of
unescaping on regex patterns. There is no need to do it, and it breaks
stuff. Seems like a good reason to stop doing something.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 28, 2012

From @ikegami

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

The same thing happens with m.\.., which is equivalent to /./, not /\./.

Wow. That's unexpected! I would call it a bug even.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From vadim.konovalov@alcatel-lucent.com

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT <perlbug-followup@​perl.org<mailto​:perlbug-followup@​perl.org>> wrote​:
The same thing happens with m.\.., which is equivalent to /./, not /\./.

Wow. That's unexpected! I would call it a bug even.

not a bug.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From vadim.konovalov@alcatel-lucent.com

From​: Zefram [mailto​:zefram@​fysh.org]
Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation. Too many
people use {}
as delimiters.

Not at all. It's an especially confusing case, especially
deserving of
the clarification wrought by the deprecation.

wow. that's tough.

I use s{}{}ge often, and then escape any { and } by \, which
happens to be just fine with me.

AFAIR I saw this somewhere in perl documentation and got used
to it.

So - there are many constructs of type -
  s{} {
  if (somthing) \{
\} else \{ \} }ge;

Please do not deprecate this. Having "{}" delimeters is fun and
symmetrical.

The same thing happens with m.\.., which is equivalent to
/./, not /\./.

This is fine, and meets my expectations.

Maybe there should be a warning for every use of a backslashed
metacharacter in this manner. It's too easy to be mistaken about
m.\.. and its ilk.

please don't

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @ikegami

On Tue, May 29, 2012 at 12​:49 AM, Konovalov, Vadim (Vadim)** CTR ** <
vadim.konovalov@​alcatel-lucent.com> wrote​:

**

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

The same thing happens with m.\.., which is equivalent to /./, not /\./.

Wow. That's unexpected! I would call it a bug even.

not a bug.

The escape doesn't escape! How is that not a bug?

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 29 May 2012 07​:17, Eric Brine <ikegami@​adaelis.com> wrote​:

On Tue, May 29, 2012 at 12​:49 AM, Konovalov, Vadim (Vadim)** CTR **
<vadim.konovalov@​alcatel-lucent.com> wrote​:

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

The same thing happens with m.\.., which is equivalent to /./, not /\./.

Wow. That's unexpected! I would call it a bug even.

 not a bug.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From vadim.konovalov@alcatel-lucent.com

From​: demerphq
On 29 May 2012 07​:17, Eric Brine wrote​:

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT wrote​:

The same thing happens with m.\.., which is equivalent to
/./, not /\./.

Wow. That's unexpected! I would call it a bug even.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

And also, it escapes the delimeters, which is what I do expect.

regards,
Vadim.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @ikegami

On Tue, May 29, 2012 at 1​:26 AM, demerphq <demerphq@​gmail.com> wrote​:

On 29 May 2012 07​:17, Eric Brine <ikegami@​adaelis.com> wrote​:

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Where? If so, it directly contradicts perlre.

"Quote the next metacharacter."

"So anything that looks like \\, \(, \), \<, \>, \{, or \} is always
interpreted as a literal character, not a metacharacter."

"Any single character matches itself, unless it is a metacharacter with a
special meaning described here or above. You can cause characters that
normally function as metacharacters to be interpreted literally by
prefixing them with a "\" (e.g., "\." matches a ".", not any character;
"\\" matches a "\"). This escape mechanism is also required for the
character used as the pattern delimiter."

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 29 May 2012 07​:45, Konovalov, Vadim (Vadim)** CTR **
<vadim.konovalov@​alcatel-lucent.com> wrote​:

From​: demerphq
On 29 May 2012 07​:17, Eric Brine wrote​:

On Mon, May 28, 2012 at 11​:45 AM, Father Chrysostomos via RT wrote​:

The same thing happens with m.\.., which is equivalent to
/./, not /\./.

Wow. That's unexpected! I would call it a bug even.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

And also, it escapes the delimeters, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way
to pass esc-delimiter "through" the tokenizer to something deeper.
Which makes sense for everything but regexes.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 29 May 2012 08​:23, Eric Brine <ikegami@​adaelis.com> wrote​:

On Tue, May 29, 2012 at 1​:26 AM, demerphq <demerphq@​gmail.com> wrote​:

On 29 May 2012 07​:17, Eric Brine <ikegami@​adaelis.com> wrote​:

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Where? If so, it directly contradicts perlre.

"Quote the next metacharacter."

"So anything that looks like \\, \(, \), \<, \>, \{, or \} is always
interpreted as a literal character, not a metacharacter."

"Any single character matches itself, unless it is a metacharacter with a
special meaning described here or above. You can cause characters that
normally function as metacharacters to be interpreted literally by prefixing
them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a
"\"). This escape mechanism is also required for the character used as the
pattern delimiter."

I posted the relevent docs in the mail titled "Oh dear, maybe we have
to rethink 'Unescaped left brace in regex is deprecated' warnings..."
(did noone see that?) But here it is again.. From perlop in the
section titled "Gory details of parsing quoted constructs" with the
subheading​:"RE" in "?RE?", "/RE/", "m/RE/", "s/RE/foo/"​:

  The lack of processing of "\\" creates specific
restrictions on the post-processed text. If the delimiter is "/", one
cannot get
  the combination "\/" into the result of this step. "/"
will finish the regular expression, "\/" will be stripped to "/" on
the
  previous step, and "\\/" will be left as is. Because
"/" is equivalent to "\/" inside a regular expression, this does not
matter
  unless the delimiter happens to be character special to
the RE engine, such as in "s*foo*bar*", "m[foo]", or "?foo?"; or an
  alphanumeric char, as in​:

  m m ^ a \s* b mmx;

  In the RE above, which is intentionally obfuscated for
illustration, the delimiter is "m", the modifier is "mx", and after
  delimiter-removal the RE is the same as for "m/ ^ a \s*
b /mx". There's more than one reason you're encouraged to restrict
your
  delimiters to non-alphanumeric, non-whitespace choices.

While documented I do think the behavior is undesirable and I think
should be changed.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From vadim.konovalov@alcatel-lucent.com

From​: demerphq
On 29 May 2012 07​:45, Konovalov, Vadim wrote​:

And also, it escapes the delimeters, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way
to pass esc-delimiter "through" the tokenizer to something deeper.
Which makes sense for everything but regexes.

I am not talking about tokenizer,

I just see that '\' escapes a dot in m.\...
and this is what I had in mind when talking about escaping,
I had no intentions to mention on what happens internally.

On the other side, escaping by '\' in replacement part of the
  s{}{}e
construct works just fine for me - so I have no problem with
passing esc-delimiter to something deeper

Regards,
Vadim.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 28 May 2012 20​:31, Zefram <zefram@​fysh.org> wrote​:

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation.  Too many people use {}
as delimiters.

Not at all.  It's an especially confusing case, especially deserving of
the clarification wrought by the deprecation.

The same thing happens with m.\.., which is equivalent to /./, not /\./.

Maybe there should be a warning for every use of a backslashed
metacharacter in this manner.  It's too easy to be mistaken about
m.\.. and its ilk.

I dont think that is really a good idea. It would have to be handled
by the tokenizer which would mean the tokenizer needs to know regex
syntax which we really dont want, consider regex engine plugins.

I think we should "just" disentangle the regex parsing from normal
string parsing. Or, hmm. Or we could change the regex engine
interface to pass in the quote chars used in the pattern... Which
would also cause issues with plugins, but might be a good idea anyway.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 29 May 2012 08​:38, Konovalov, Vadim (Vadim)** CTR **
<vadim.konovalov@​alcatel-lucent.com> wrote​:

From​: demerphq
On 29 May 2012 07​:45, Konovalov, Vadim wrote​:

And also, it escapes the delimeters, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way
to pass esc-delimiter "through" the tokenizer to something deeper.
Which makes sense for everything but regexes.

I am not talking about tokenizer,

I just see that '\' escapes a dot in m.\...
and this is what I had in mind when talking about escaping,
I had no intentions to mention on what happens internally.

On the other side, escaping by '\' in replacement part of the
 s{}{}e
construct works just fine for me - so I have no problem with
passing esc-delimiter to something deeper

I dont think you are understanding this issue properly. What you just
said either isnt what we are talking about, or it doesnt do what you
think it does.

perl -le'$_="x"; s/x/\//; print'

does NOT pass through '\/' to the regex engine. You can pass "\\/"
through to the regex engine, but not "\/" if the delims are /.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @Tux

On Mon, 28 May 2012 19​:19​:25 +0100, hv@​crypt.org wrote​:

andreas.koenig.7os6VVqR@​franz.ak.mind.de (Andreas J. Koenig) wrote​:
:Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:
:
: 995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;
: 1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;
:
:% make test
:[...]
:t/0_Config.t .. Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995.
:Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031.
:# Package Version
:# perl 5.17.0
:# XML​::Simple 2.18
:# Storable 2.35
:# XML​::Parser 2.41
:# XML​::SAX 0.99
:# XML​::NamespaceSupport 1.11
:t/0_Config.t .. ok
:
:
:Look like perl miscounts one backslash. I would expect that the regexp
:is accepted by perl because the brace is escaped.

IIRC, when you use a regex metacharacter as a delimiter, escaping it
gives you the metacharacter​:

% perl -wle 'print "line\n" =~ m$\$$'
1
%

So I think the warning is correct, though somewhat misleadingly expressed.

I don't remember if there is a way to get past that to match the literal.

Hugo

Got this report this morning​:

http​://www.cpantesters.org/cpan/report/b40c2a9c-a87e-11e1-85e6-b1e975b706df

t/11_DDumper.t .... ok

# Failed test 'no warnings'
# at /home/src/perl/repoperls/installed-perls/perl/v5.16.0-226-g760209f/165a/lib/site_perl/5.17.0/Test/NoWarnings.pm line 45.
# There were 1 warning(s)
# Previous test 0 ''
# Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{ <-- HERE 20ac}"\]/ at t/20_DPeek.t line 72.
# at t/20_DPeek.t line 72.
#
# Looks like you failed 1 test of 50.
t/20_DPeek.t ......
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/50 subtests

code that causes it

  SKIP​: {
  $] <= 5.008001 and skip "UTF8 tests useless in this ancient perl version", 1;
  $VAR = "a\x0a\x{20ac}";
  like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]',
  ' $VAR "a\x0a\x{20ac}"');
  }

I just added the \ before the { as it passes on 5.8.0 up to blead (64
versions of perl)

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @ikegami

On Tue, May 29, 2012 at 2​:34 AM, demerphq <demerphq@​gmail.com> wrote​:

I posted the relevent docs in the mail titled "Oh dear, maybe we have
to rethink 'Unescaped left brace in regex is deprecated' warnings..."

(did noone see that?) But here it is again..

I understand why it behaves the way it does.

From perlop in the section titled "Gory details of parsing quoted
constructs"

with the subheading​:"RE" in "?RE?", "/RE/", "m/RE/", "s/RE/foo/"​:

Ah, so there is some disagreement, then. perlre says it is required for
meta characters *and also for delimiters*, with no caveat. The underlying
part indicates it's talking about literals, not just patterns.

- Eric

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @cpansprout

On Mon May 28 23​:42​:13 2012, demerphq wrote​:

On 29 May 2012 08​:38, Konovalov, Vadim (Vadim)** CTR **
<vadim.konovalov@​alcatel-lucent.com> wrote​:

From​: demerphq
On 29 May 2012 07​:45, Konovalov, Vadim wrote​:

And also, it escapes the delimeters, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way
to pass esc-delimiter "through" the tokenizer to something deeper.
Which makes sense for everything but regexes.

I am not talking about tokenizer,

I just see that '\' escapes a dot in m.\...
and this is what I had in mind when talking about escaping,
I had no intentions to mention on what happens internally.

On the other side, escaping by '\' in replacement part of the
�s{}{}e
construct works just fine for me - so I have no problem with
passing esc-delimiter to something deeper

I dont think you are understanding this issue properly. What you just
said either isnt what we are talking about, or it doesnt do what you
think it does.

perl -le'$_="x"; s/x/\//; print'

does NOT pass through '\/' to the regex engine. You can pass "\\/"
through to the regex engine, but not "\/" if the delims are /.

But \ does escape the delimiter in that it stops it from being
interpreted as a delimiter.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @hvds

demerphq <demerphq@​gmail.com> wrote​:
:On 28 May 2012 20​:31, Zefram <zefram@​fysh.org> wrote​:
:> Father Chrysostomos via RT wrote​:
:>>I think this pretty much ends this deprecation.  Too many people use {}
:>>as delimiters.
:>
:> Not at all.  It's an especially confusing case, especially deserving of
:> the clarification wrought by the deprecation.
:
:The alternative is to stop the tokenizer from doing this type of
:unescaping on regex patterns. There is no need to do it, and it breaks
:stuff. Seems like a good reason to stop doing something.

It occurs to me that we could draw a useful distinction between balanced
delimiters and others.

I think <> are the only balanced delimiters used in an unbalanced way
in regular expression syntax. If we were to disable the escape-stripping
for those cases, the cost (I think) would be you could no longer use
lookbehinds and cuts in <>-delimited regexps, which sounds like a small
price; but now you would be able to use literal (), [] and {} even when
they matched your delimiters, by escaping them.

Of course that would involve a) disentangling at least some of the work
from the tokenizer, and b) an incompatible change requiring either
a deprecation cycle or protection behind a feature. Oh, and I guess also
c) an undertaking not to introduce new unbalanced uses of these in regexp
syntax, like (?[...), though I don't particularly imagine we would have
considered such.

I'm not really sure how much work this would be, however.

Hugo

@p5pRT
Copy link
Author

p5pRT commented May 29, 2012

From @demerphq

On 29 May 2012 09​:27, <hv@​crypt.org> wrote​:

demerphq <demerphq@​gmail.com> wrote​:
:On 28 May 2012 20​:31, Zefram <zefram@​fysh.org> wrote​:
:> Father Chrysostomos via RT wrote​:
:>>I think this pretty much ends this deprecation.  Too many people use {}
:>>as delimiters.
:>
:> Not at all.  It's an especially confusing case, especially deserving of
:> the clarification wrought by the deprecation.
:
:The alternative is to stop the tokenizer from doing this type of
:unescaping on regex patterns. There is no need to do it, and it breaks
:stuff. Seems like a good reason to stop doing something.

It occurs to me that we could draw a useful distinction between balanced
delimiters and others.

I think <> are the only balanced delimiters used in an unbalanced way
in regular expression syntax. If we were to disable the escape-stripping
for those cases, the cost (I think) would be you could no longer use
lookbehinds and cuts in <>-delimited regexps, which sounds like a small
price; but now you would be able to use literal (), [] and {} even when
they matched your delimiters, by escaping them.

Of course that would involve a) disentangling at least some of the work
from the tokenizer, and b) an incompatible change requiring either
a deprecation cycle or protection behind a feature. Oh, and I guess also
c) an undertaking not to introduce new unbalanced uses of these in regexp
syntax, like (?[...), though I don't particularly imagine we would have
considered such.

I'm not really sure how much work this would be, however.

I plan to look into it. Your analysis is much appreciated.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2012

From @khwilliamson

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

SKIP​: {
$]<= 5.008001 and skip "UTF8 tests useless in this ancient perl version", 1;
$VAR = "a\x0a\x{20ac}";
like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]',
' $VAR "a\x0a\x{20ac}"');
}

Bug #21491 says that single quotes should not interpolate. But this
code assumes that it does. If we fixed #21491, I believe it would break
this code, would it not?

I wonder how much code is out there that depends on #21491 being broken.
  We might have to mark it as won't fix, then.

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2012

From @cpansprout

On Fri Jun 01 14​:40​:56 2012, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

SKIP​: {
$]<= 5.008001 and skip "UTF8 tests useless in this ancient
perl version", 1;
$VAR = "a\x0a\x{20ac}";
like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\)
\[UTF8 "a\\?n\\x{20ac}"\]',
' $VAR
"a\x0a\x{20ac}"');
}

Bug #21491 says that single quotes should not interpolate. But this
code assumes that it does. If we fixed #21491, I believe it would
break
this code, would it not?

Yes, and it would diverge from the long-documented behaviour​:

  Customary Generic Meaning Interpolates
  '' q{} Literal no
  "" qq{} Literal yes
  `` qx{} Command yes*
  qw{} Word list no
  // m{} Pattern match yes*
  qr{} Pattern yes*
  s{}{} Substitution yes*
  tr{}{} Transliteration no (but see below)
  y{}{} Transliteration no (but see below)
  <<EOF here-doc yes*

  * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being
broken.
We might have to mark it as won't fix, then.

Yes, and not-a-bug.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2012

From @cpansprout

On Fri Jun 01 17​:57​:55 2012, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

SKIP​: {
$]<= 5.008001 and skip "UTF8 tests useless in this ancient
perl version", 1;
$VAR = "a\x0a\x{20ac}";
like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\)
\[UTF8 "a\\?n\\x{20ac}"\]',
' $VAR
"a\x0a\x{20ac}"');
}

Bug #21491 says that single quotes should not interpolate. But this
code assumes that it does. If we fixed #21491, I believe it would
break
this code, would it not?

Yes, and it would diverge from the long-documented behaviour​:

Customary  Generic        Meaning         Interpolates
''     q\{\}          Literal          no
""    qq\{\}          Literal          yes
\`\`    qx\{\}          Command          yes\*
    qw\{\}         Word list          no
//     m\{\}       Pattern match      yes\*
    qr\{\}          Pattern          yes\*
     s\{\}\{\}        Substitution      yes\*
    tr\{\}\{\}      Transliteration      no \(but see below\)
     y\{\}\{\}      Transliteration      no \(but see below\)
    \<\<EOF                 here\-doc            yes\*

\* unless the delimiter is ''\.

I wonder how much code is out there that depends on #21491 being
broken.
We might have to mark it as won't fix, then.

Yes, and not-a-bug.

Sorry, I was a little confused.

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
That means ack '\n' won’t work any more.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2012

From @demerphq

On 2 June 2012 03​:01, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Fri Jun 01 17​:57​:55 2012, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {
       $]<= 5.008001 and skip "UTF8 tests useless in this ancient
perl version", 1;
       $VAR = "a\x0a\x{20ac}";
       like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\)
\[UTF8 "a\\?n\\x{20ac}"\]',
                                                   ' $VAR
"a\x0a\x{20ac}"');
       }

Bug #21491 says that single quotes should not interpolate.  But this
code assumes that it does.  If we fixed #21491, I believe it would
break
this code, would it not?

Yes, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates
      ''       q{}          Literal             no
      ""      qq{}          Literal             yes
      ``      qx{}          Command             yes*
              qw{}         Word list            no
      //       m{}       Pattern match          yes*
              qr{}          Pattern             yes*
               s{}{}      Substitution          yes*
              tr{}{}    Transliteration         no (but see below)
               y{}{}    Transliteration         no (but see below)
        <<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being
broken.
  We might have to mark it as won't fix, then.

Yes, and not-a-bug.

Sorry, I was a little confused.

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2012

From @cpansprout

On Sat Jun 02 02​:07​:11 2012, demerphq wrote​:

On 2 June 2012 03​:01, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Fri Jun 01 17​:57​:55 2012, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {
       $]<= 5.008001 and skip "UTF8 tests useless in this ancient
perl version", 1;
       $VAR = "a\x0a\x{20ac}";
       like (DPeek ($VAR),
qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\)
\[UTF8 "a\\?n\\x{20ac}"\]',
                                                   ' $VAR
"a\x0a\x{20ac}"');
       }

Bug #21491 says that single quotes should not interpolate.  But this
code assumes that it does.  If we fixed #21491, I believe it would
break
this code, would it not?

Yes, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates
      ''       q{}          Literal             no
      ""      qq{}          Literal             yes
      ``      qx{}          Command             yes*
              qw{}         Word list            no
      //       m{}       Pattern match          yes*
              qr{}          Pattern             yes*
               s{}{}      Substitution          yes*
              tr{}{}    Transliteration         no (but see below)
               y{}{}    Transliteration         no (but see below)
        <<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being
broken.
  We might have to mark it as won't fix, then.

Yes, and not-a-bug.

Sorry, I was a little confused.

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Either way, the regular expression engine itself has to interpret \n.
It can’t rely on m// syntax to resolve it.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2012

From @demerphq

On 2 June 2012 14​:29, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Jun 02 02​:07​:11 2012, demerphq wrote​:

On 2 June 2012 03​:01, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Fri Jun 01 17​:57​:55 2012, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {
       $]<= 5.008001 and skip "UTF8 tests useless in this ancient
perl version", 1;
       $VAR = "a\x0a\x{20ac}";
       like (DPeek ($VAR),
qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\)
\[UTF8 "a\\?n\\x{20ac}"\]',
                                                   ' $VAR
"a\x0a\x{20ac}"');
       }

Bug #21491 says that single quotes should not interpolate.  But this
code assumes that it does.  If we fixed #21491, I believe it would
break
this code, would it not?

Yes, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates
      ''       q{}          Literal             no
      ""      qq{}          Literal             yes
      ``      qx{}          Command             yes*
              qw{}         Word list            no
      //       m{}       Pattern match          yes*
              qr{}          Pattern             yes*
               s{}{}      Substitution          yes*
              tr{}{}    Transliteration         no (but see below)
               y{}{}    Transliteration         no (but see below)
        <<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being
broken.
  We might have to mark it as won't fix, then.

Yes, and not-a-bug.

Sorry, I was a little confused.

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Either way, the regular expression engine itself has to interpret \n.
It can’t rely on m// syntax to resolve it.

Its cant rely on the tokenizer to resolve it no. That is a general
rule, nearly.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2012

From @rjbs

* Karl Williamson <public@​khwilliamson.com> [2012-06-01T17​:40​:18]

Bug #21491 says that single quotes should not interpolate. But this
code assumes that it does. If we fixed #21491, I believe it would
break this code, would it not?

I wonder how much code is out there that depends on #21491 being
broken. We might have to mark it as won't fix, then.

This very naive CPAN search indicates "not much."

  http​://grep.cpan.me/?q=qr%27%5B%5E%27%5Cn%5D%2B%5C%24%5B%5E%27%5D&page=2

I think that bug should still be fixed.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2012

From @rjbs

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2012-06-02T08​:29​:07]

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Either way, the regular expression engine itself has to interpret \n.
It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own machinery
for turning \n into a \n to match, apart from the qq-ish behavior.

I could be wrong. Somebody tell me.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2012

From @cpansprout

On Mon Jun 04 16​:52​:26 2012, perl.p5p@​rjbs.manxome.org wrote​:

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2012-06-
02T08​:29​:07]

The reason for it not being a bug is that, if m '\n' stops
matching
"\n", then $foo =~ $user_pat will stop working if the user
enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Either way, the regular expression engine itself has to interpret
\n.
It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own
machinery
for turning \n into a \n to match, apart from the qq-ish behavior.

I could be wrong. Somebody tell me.

That’s right, which is why "\n" =~ '\n' matches, and why "\n" =~ m'\n'
should continue to match.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2012

From @demerphq

On 5 June 2012 01​:51, Ricardo Signes <perl.p5p@​rjbs.manxome.org> wrote​:

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2012-06-02T08​:29​:07]

The reason for it not being a bug is that, if m '\n' stops matching
"\n", then $foo =~ $user_pat will stop working if the user enters '\n'.
 That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting
issue. ack doesnt see the quotes.

Either way, the regular expression engine itself has to interpret \n.
It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own machinery
for turning \n into a \n to match, apart from the qq-ish behavior.

I could be wrong.  Somebody tell me.

I already said this was the case. The regex engine cannot depend on
the tokenizer handling *any* escapes, although there are some that are
handled by both the tokenizer AND the regex engine.

Tokenizer does nothing​:

$ perl -Mre=debug -e'/\n/'
Compiling REx "\n"
Final program​:
  1​: EXACT <\n> (3)
  3​: END (0)
anchored "%n" at 0 (checking anchored isall) minlen 1
Freeing REx​: "\n"

Tokenizer does something​:

$ perl -Mre=debug -e'my $x="\n"; /$x/'
Compiling REx "%n"
Final program​:
  1​: EXACT <\n> (3)
  3​: END (0)
anchored "%n" at 0 (checking anchored isall) minlen 1
Freeing REx​: "%n"

Notice the difference?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2012

From @demerphq

On 1 June 2012 23​:40, Karl Williamson <public@​khwilliamson.com> wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM, H.Merijn Brand wrote​:

code that causes it

  SKIP​: {
      $]<= 5.008001 and skip "UTF8 tests useless in this ancient perl
version", 1;
      $VAR = "a\x0a\x{20ac}";
      like (DPeek ($VAR), qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8
"a\\?n\\x{20ac}"\]',
                                                  ' $VAR
"a\x0a\x{20ac}"');
      }

Bug #21491 says that single quotes should not interpolate.  But this code
assumes that it does.  If we fixed #21491, I believe it would break this
code, would it not?

Are you sure about that? I don't see any interpolation there. Do you
mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped
inside m''.

So what do you mean by this? I see nothing unexpected here.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2012

From @rjbs

* demerphq <demerphq@​gmail.com> [2012-06-05T02​:12​:29]

On 1 June 2012 23​:40, Karl Williamson <public@​khwilliamson.com> wrote​:

Bug #21491 says that single quotes should not interpolate.  But this code
assumes that it does.  If we fixed #21491, I believe it would break this
code, would it not?

Are you sure about that? I don't see any interpolation there. Do you
mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped
inside m''.

So what do you mean by this? I see nothing unexpected here.

I was confused by this, too, and foolishly didn't go read 21491. This is about
escape sequences, not variable interpolation. I was rushing through my mail
queue and not verifying everything I read. I'm sorry if this lead to spreading
any confusion!

Yes, fixing this looks like it would break the world. In fact, I think the
busted thing is likely the documentation, although it looks like it needs a
careful read before I really state that with confidence.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2012

From @demerphq

On 5 June 2012 14​:26, Ricardo Signes <perl.p5p@​rjbs.manxome.org> wrote​:

* demerphq <demerphq@​gmail.com> [2012-06-05T02​:12​:29]

On 1 June 2012 23​:40, Karl Williamson <public@​khwilliamson.com> wrote​:

Bug #21491 says that single quotes should not interpolate.  But this code
assumes that it does.  If we fixed #21491, I believe it would break this
code, would it not?

Are you sure about that? I don't see any interpolation there. Do you
mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped
inside m''.

So what do you mean by this? I see nothing unexpected here.

I was confused by this, too, and foolishly didn't go read 21491.  This is about
escape sequences, not variable interpolation.  I was rushing through my mail
queue and not verifying everything I read.  I'm sorry if this lead to spreading
any confusion!

Arent the case that Karl mentioned and the case in the bug different?

The case in the bug comes down to this​:

my $pat= "\\n"; print "\n"=~/$pat/;

Which matches, because the toker first turns "\\n" into "\n" and then
hands it to the regex engine which turns the "\n" into a literal $n.

This behaviour has changed over time and the docs should probably
explain that \n IS a regex escape sequence just like \w, which
"happens" to match the same thing that "\n" is unescaped into.

Yes, fixing this looks like it would break the world.  In fact, I think the
busted thing is likely the documentation, although it looks like it needs a
careful read before I really state that with confidence.

Well, I can argue the case of​: $x="\\n"; "\n"=~/$x/, but the case of
"\n"=~m'\\n' is a lot easier to say is a bug. Even if neither is
entirely clear.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2012

From @rurban

This is a bug report for perl from rurban@​cpanel.net,
generated with the help of perlbug 1.39 running under perl 5.17.1.

From a375e6bcfaaf64bae9ab3e153f1721225d6ae631 Mon Sep 17 00​:00​:00 2001
From​: Reini Urban <rurban@​x-ray.at>
Date​: Mon, 11 Jun 2012 09​:18​:21 -0500
Subject​: [PATCH] [perl #113094] Fix a couple of Unescaped left brace in regex


cpan/ExtUtils-MakeMaker/t/MM_OS2.t | 2 +-
lib/DB.t | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

Inline Patch
diff --git a/cpan/ExtUtils-MakeMaker/t/MM_OS2.t b/cpan/ExtUtils-MakeMaker/t/MM_OS2.t
index 4d88e85..2997541 100644
--- a/cpan/ExtUtils-MakeMaker/t/MM_OS2.t
+++ b/cpan/ExtUtils-MakeMaker/t/MM_OS2.t
@@ -42,7 +42,7 @@ delete $mm->{SKIPHASH};
 my $res = $mm->dlsyms();
 like( $res, qr/baseext\.def: Makefile/,
 	'... without flag, should return make targets' );
-like( $res, qr/"DL_FUNCS" => {  }/, 
+like( $res, qr/"DL_FUNCS" => \{  }/,
 	'... should provide empty hash refs where necessary' );
 like( $res, qr/"DL_VARS" => \[]/, '... and empty array refs too' );
 
diff --git a/lib/DB.t b/lib/DB.t
index a1fadf3..cdb6583 100644
--- a/lib/DB.t
+++ b/lib/DB.t
@@ -126,7 +126,7 @@ is( DB::_clientname('bar'), undef,
         my @ret = eval { DB->backtrace() };
         like( $ret[0], qr/file.+\Q$0\E/, 'DB::backtrace() should report current file');
         like( $ret[0], qr/line $line/, '... should report calling line number' );
-        like( $ret[0], qr/eval {...}/, '... should catch eval BLOCK' );
+        like( $ret[0], qr/eval \{...}/, '... should catch eval BLOCK' );
 
         @ret = eval "one(2)";
         is( scalar @ret, 1, '... should report from provided stack frame number' );
-- 
1.7.10

Flags​:
  category=library
  severity=low


Site configuration information for perl 5.17.1​:

Configured by rurban at Tue Jun 5 09​:12​:24 CDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 1) configuration​:
  Derived from​: 65bc432f8bb96d463b290c78d34350cb2d289cbc
  Platform​:
  osname=linux, osvers=3.2.0-2-amd64, archname=x86_64-linux-thread-multi-debug@​65bc432
  uname='linux reini 3.2.0-2-amd64 #1 smp mon may 21 17​:45​:41 utc 2012 x86_64 gnulinux '
  config_args='-de -Dusedevel -Dinstallman1dir=none -Dinstallman3dir=none -Dinstallsiteman1dir=none -Dinstallsiteman3dir=none -Dmksymlinks -DEBUGGING -Doptimize=-g3 -Duseithreads -Accflags='-msse4.2' -Accflags='-march=corei7' -Dcf_email='rurban@​cpanel.net' -Dperladmin='rurban@​cpanel.net' -Duseshrplib'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -msse4.2 -march=corei7 -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-g3',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -msse4.2 -march=corei7 -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.6.3', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.13'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/local/lib/perl5/5.17.1/x86_64-linux-thread-multi-debug@​65bc432/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -g3 -L/usr/local/lib -fstack-protector'

Locally applied patches​:
  [cpan #72700] List​::Util heap-overflow


@​INC for perl 5.17.1​:
  /usr/local/lib/perl5/site_perl/5.17.1/x86_64-linux-thread-multi-debug@​65bc432
  /usr/local/lib/perl5/site_perl/5.17.1
  /usr/local/lib/perl5/5.17.1/x86_64-linux-thread-multi-debug@​65bc432
  /usr/local/lib/perl5/5.17.1
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.17.1​:
  HOME=/home/rurban
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/rurban/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/games
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2012

From @trwyant

The ExtUtils-MakeMaker and DB warnings reported by Reini Urban are still
present in 5.17.2. The former is also
https://rt.cpan.org/Ticket/Display.html?id=77468

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2012

From @trwyant

Interesting thing found under 5.17.2​:

$ perl -E 'm{ \$ \{ }x'
Unescaped left brace in regex is deprecated, passed through in regex;
marked by <-- HERE in m/ \$ { <-- HERE / at -e line 1.
$ perl -E 'm{ x \{ }x'
Unescaped left brace in regex is deprecated, passed through in regex;
marked by <-- HERE in m/ x { <-- HERE / at -e line 1.
$ perl -E 'm{ \{ }x'
$ perl -E 'm< \$ \{ >x'
$

Is this a weird bug in the escape code, or am I missing something?

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2012

From rmbarker.cpan@btinternet.com

On Sat, 2012-07-21 at 07​:26 -0700, Tom Wyant via RT wrote​:

The ExtUtils-MakeMaker and DB warnings reported by Reini Urban are still
present in 5.17.2. The former is also
https://rt.cpan.org/Ticket/Display.html?id=77468

The latter is fixed by 7150f91

@p5pRT
Copy link
Author

p5pRT commented Jul 21, 2012

From @khwilliamson

On 07/21/2012 11​:44 AM, Robin Barker wrote​:> I suggest this bug is
marked as resolved.

The only warnings now come from cpan/ modules.
[perl#113094] is tracking the \{ issue for CPAN modules.

OK, moving it to the 113094

On 07/21/2012 07​:51 AM, Dave Mitchell wrote​:

On Fri, Jul 20, 2012 at 08​:35​:27PM -0700, Reverend Chip wrote​:

On 7/19/2012 10​:15 AM, Karl Williamson wrote​:

3) Consider this as acceptable collateral breakage, document it, and
keep the warning for a cycle or two, after which we prohibit unescaped
literal left brackets.

The final possibility is only feasible if there is very little current
breakage.

My 2c​: I think this is ideal, since stripping \ on {} in regexes seems
like something we should never have done in the first place.

Note that if we changed it so that escaped delimiters are no longer
stripped, *all* the following regexes would change their meaning; the last
three would become compile errors, while the first four would just
silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong, but I believe that the most encompassing change
would have breakages with if the delimiter is any of the dirty dozen
metacharacters, but no others. Since '/' isn't one of the 12, there
should be no potential problems with it.

Obviously, if we decided to, we could restrict the change to just '{',
as that is the only one we care about now. But that isn't aesthetically
pleasing.

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2012

From @iabyn

On Sat, Jul 21, 2012 at 12​:42​:07PM -0700, Karl Williamson via RT wrote​:

Note that if we changed it so that escaped delimiters are no longer
stripped, *all* the following regexes would change their meaning; the last
three would become compile errors, while the first four would just
silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong, but I believe that the most encompassing change
would have breakages with if the delimiter is any of the dirty dozen
metacharacters, but no others. Since '/' isn't one of the 12, there
should be no potential problems with it.

Correct. It's only cases where a regex metacharacter is used a delimiter,
and the char has to be escaped to allow the string as a whole to be
correctly delimited. Formerly, the quoting mechanism would strip the \,
allowing the metachar to been by the regex engine. Under the new proposal,
the engine would see the char as still escaped, and thus no longer a
metachar.

Obviously, if we decided to, we could restrict the change to just '{',
as that is the only one we care about now. But that isn't aesthetically
pleasing.

My own opinion is that quoting for literal patterns is already complex
enough without introducing a special case for only certain delimiters.

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
  -- Larry Wall

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2012

From @cpansprout

On Sat Jul 21 12​:42​:06 2012, khw wrote​:

On 07/21/2012 11​:44 AM, Robin Barker wrote​:> I suggest this bug is
marked as resolved.

The only warnings now come from cpan/ modules.
[perl#113094] is tracking the \{ issue for CPAN modules.

OK, moving it to the 113094

On 07/21/2012 07​:51 AM, Dave Mitchell wrote​:

On Fri, Jul 20, 2012 at 08​:35​:27PM -0700, Reverend Chip wrote​:

On 7/19/2012 10​:15 AM, Karl Williamson wrote​:

3) Consider this as acceptable collateral breakage, document it, and
keep the warning for a cycle or two, after which we prohibit unescaped
literal left brackets.

The final possibility is only feasible if there is very little current
breakage.

My 2c​: I think this is ideal, since stripping \ on {} in regexes seems
like something we should never have done in the first place.

Note that if we changed it so that escaped delimiters are no longer
stripped, *all* the following regexes would change their meaning;
the last
three would become compile errors, while the first four would just
silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong, but I believe that the most encompassing change
would have breakages with if the delimiter is any of the dirty dozen
metacharacters, but no others.

Punctuation variables. In a match-once pattern, I can refer to $? as $\?.

Anyway, if the interpretation of escaped delimiters is going to change
(which I still oppose), there is no reason that regexps should differ
from strings in this regard. This should still match, no matter what​:

q n\nn =~ m n\nn;

Also, please take m?fo\?? into account. I would have to rewrite that
m?fo{0,1}?.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2012

From @khwilliamson

On 07/22/2012 03​:30 AM, Dave Mitchell wrote​:

On Sat, Jul 21, 2012 at 12​:42​:07PM -0700, Karl Williamson via RT wrote​:

Note that if we changed it so that escaped delimiters are no longer
stripped, *all* the following regexes would change their meaning; the last
three would become compile errors, while the first four would just
silently start matching different things​:

  qr\[^\\\[a\-z\\\]$\]

  qr\#  a
 \\\#xxx
 b
  \#x;

  qr\(^\\\(x\\\)$\);

  m?^xy\\?$?

  qr\!a\(?\\\!b\)\!;

  qr\<a\(?\\\<foo\\>b\)>;

  qr|a\(?\\|foo\)|;

Correct me if I'm wrong, but I believe that the most encompassing change
would have breakages with if the delimiter is any of the dirty dozen
metacharacters, but no others. Since '/' isn't one of the 12, there
should be no potential problems with it.

Correct. It's only cases where a regex metacharacter is used a delimiter,
and the char has to be escaped to allow the string as a whole to be
correctly delimited. Formerly, the quoting mechanism would strip the \,
allowing the metachar to been by the regex engine. Under the new proposal,
the engine would see the char as still escaped, and thus no longer a
metachar.

Obviously, if we decided to, we could restrict the change to just '{',
as that is the only one we care about now. But that isn't aesthetically
pleasing.

My own opinion is that quoting for literal patterns is already complex
enough without introducing a special case for only certain delimiters.

It is currently quite possible to make an easily overlooked error in
specifying the general form of a quantifier, and have it silently match
the literal characters specified instead of the intended meaning.
Indeed, one of the first CPAN failures reported when the left brace
experimental change was made is an example of this that had gone
uncaught. I think that that demonstrated deficiency in the current
scheme should carry significant weight in considering what to do.

We also have a need going forward to be able to specify new
constructions to support the many Unicode features that we don't
currently. The logical candidate for these is using braces.

One option is to say that this new rule applies only to balanced
delimiters. Then the special case seems more logical and easier to
remember. It reduces the issues to just 4 delimiters. The number of
affected cases in CPAN is miniscule; results given at the end.

Another option would be to have an optional feature enabled under 'use
5.18'.

Two other options I've mentioned previously are to abandon any work in
this area, or to have a deprecation cycle for unnecessary escaping the
delimiters.

Other option ideas are welcome.

Here is what I found with grepping cpan

Left brace as a delimiter​: {

yielded no problems, as previously reported.

Less-than sign as a delimiter​: <
\b([ms]|qr)\s*<[^<]*\\<

I found no examples where this would be a problem. There is one
instance of this use​:

Regexp-NamedCaptures-0.05/lib/Regexp/NamedCaptures.pm

  { type => SCALAR,
  regex => qr<\A\(\?\<.+\>.*\)\z>s
  }

But the meaning is unchanged.

Left paren as a delimiter​: (
(((^|[^\\])\b[ms])|\bqr)\s*\([^(]*\\\(

(returned only false positives. The regex is more complicated because
of things like \s(... which are false positives.)

Left bracket as a delimiter​: [

yielded a single potential problem​:

perl-5.15.6/ext/B/t/OptreeCheck.pm

  # symbolic hints from the golden results.
  $str =~ s[( # capture
  \(\?​:next\|db\)state # the regexp matching next/db state
  .* # all sorts of things follow it
  v # The opening v
  )
  :(?​:\\[{*] # \{ or \*
  |[^,\\]) # or other symbols on their own

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2012

From @iabyn

On Thu, Jul 26, 2012 at 10​:51​:46AM -0600, Karl Williamson wrote​:

Two other options I've mentioned previously are to abandon any work
in this area, or to have a deprecation cycle for unnecessary
escaping the delimiters.

I think I like the idea of a deprecation cycle.
i.e. warn on any literal regex which includes an escaped delimiter where
the delimiter is a regex metachar.
By the sound if it, these are quite rare.

But have quite a long cycle; e.g. two major releases that have the
deprecation warning. Then in the 3rd release we stop stripping the \, and
add the unescaped-{ warning.

--
Never work with children, animals, or actors.

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2013

From @khwilliamson

Commit e62d0b1 reverts the offending commit

Instead, commit4d68ffa0f7f345bc1ae6751744518ba4bc3859bd implements a
restricted version of what was discussed in the final few messages in
the discussion before this message

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2013

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant