Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear note about \Q, etc., in perlretut #11215

Closed
p5pRT opened this issue Mar 27, 2011 · 17 comments
Closed

Unclear note about \Q, etc., in perlretut #11215

p5pRT opened this issue Mar 27, 2011 · 17 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 27, 2011

Migrated from rt.perl.org#87128 (status was 'resolved')

Searchable as RT87128$

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From @cpansprout

At <http​://www.nntp.perl.org/group/perl.perl5.porters/2011/03/msg170208.html>, Tom Christiansen pointed out that this paragraph I added to perlretut is missing \l and \u​:

  +C<\Q>, C<\L>, C<\U> and C<\E> are actually part of the syntax of regular
  +expression I<literals>, and are not part of regexp syntax proper. So they
  +do not work in interpolated patterns.

Then Karl Williamson pointed out that ‘regular expression literal’ is not defined anywhere.

So I would like to change this to​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.


Flags​:
  category=docs
  severity=low


Site configuration information for perl 5.13.11​:

Configured by sprout at Thu Mar 24 14​:02​:46 PDT 2011.

Summary of my perl5 (revision 5 version 13 subversion 11) configuration​:
  Snapshot of​: cc13eef
  Platform​:
  osname=darwin, osvers=10.5.0, archname=darwin-thread-multi-2level
  uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
  config_args='-Dusedevel -de -Duseithreads -Doptimize=-g'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-g',
  cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.13.11​:
  /usr/local/lib/perl5/site_perl/5.13.11/darwin-thread-multi-2level
  /usr/local/lib/perl5/site_perl/5.13.11
  /usr/local/lib/perl5/5.13.11/darwin-thread-multi-2level
  /usr/local/lib/perl5/5.13.11
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.13.11​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/sprout
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From tchrist@perl.com

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From @khwilliamson

On 03/27/2011 01​:47 PM, Father Chrysostomos (via RT) wrote​:

# New Ticket Created by Father Chrysostomos
# Please include the string​: [perl #87128]
# in the subject line of all future correspondence about this issue.
#<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=87128>

At<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/03/msg170208.html>, Tom Christiansen pointed out that this paragraph I added to perlretut is missing \l and \u​:

 \+C\<\\Q>\, C\<\\L>\, C\<\\U>  and C\<\\E>  are actually part of the syntax of regular
 \+expression I\<literals>\, and are not part of regexp syntax proper\.  So they
 \+do not work in interpolated patterns\.

Then Karl Williamson pointed out that ‘regular expression literal’ is not defined anywhere.

So I would like to change this to​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

+1

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From @cpansprout

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From tchrist@perl.com

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2011

From @khwilliamson

On 03/27/2011 02​:24 PM, Tom Christiansen wrote​:

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

By which he means that in something like \N{COLON}, the name is
looked-up during tokenizing, and you get an error message if you try to
bypass that and go directly to the regex compiler.

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2011

From @demerphq

On 27 March 2011 22​:44, Karl Williamson <public@​khwilliamson.com> wrote​:

On 03/27/2011 02​:24 PM, Tom Christiansen wrote​:

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u>  and C<\E>  are actually part of
double-quotish syntax, and not part of regexp syntax proper.  They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

By which he means that in something like \N{COLON}, the name is looked-up
during tokenizing, and you get an error message if you try to bypass that
and go directly to the regex compiler.

Im kinda aware of the issues here and I didn't understand that
sentence, so I'm betting people that dont understand the issues at all
don't either. Care to rephrase yourself?

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2011

From @khwilliamson

On 03/28/2011 06​:45 AM, demerphq wrote​:

On 27 March 2011 22​:44, Karl Williamson<public@​khwilliamson.com> wrote​:

On 03/27/2011 02​:24 PM, Tom Christiansen wrote​:

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

By which he means that in something like \N{COLON}, the name is looked-up
during tokenizing, and you get an error message if you try to bypass that
and go directly to the regex compiler.

Im kinda aware of the issues here and I didn't understand that
sentence, so I'm betting people that dont understand the issues at all
don't either. Care to rephrase yourself?

http​://groups.google.com/group/perl.perl5.porters/msg/0716102cfbcde32e
and http​://rt.perl.org/rt3//Public/Bug/Display.html?id=56444

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2011

From @khwilliamson

On 03/28/2011 09​:43 AM, Karl Williamson wrote​:

On 03/28/2011 06​:45 AM, demerphq wrote​:

On 27 March 2011 22​:44, Karl Williamson<public@​khwilliamson.com> wrote​:

On 03/27/2011 02​:24 PM, Tom Christiansen wrote​:

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They
will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated
in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

By which he means that in something like \N{COLON}, the name is
looked-up
during tokenizing, and you get an error message if you try to bypass
that
and go directly to the regex compiler.

Im kinda aware of the issues here and I didn't understand that
sentence, so I'm betting people that dont understand the issues at all
don't either. Care to rephrase yourself?

http​://groups.google.com/group/perl.perl5.porters/msg/0716102cfbcde32e
and http​://rt.perl.org/rt3//Public/Bug/Display.html?id=56444

To add some to that. The solution to #56444 was to move the parsing of
\N{} back to the lexer. And Tom in the mail message is pointing out
that that has down sides which aren't adequately documented.

The problem is that regex compilation is not done at the time of the
original parse, and the \N{} definitions are not invariant, so the regex
compilation needs to be cognizant of what they were at the time of the
parse. What the #56444 solution does is to compile them at parse time
into an intermediate invariant form, but this has down sides, as Tom's
email (in the first link) shows.

Since the time I fixed #56444, it has come to light that the \N{}
module, charnames, had significant scoping bugs; and I fixed all known
problems involving those for 5.14. I got involved in fixing 56444 not
because of any expertise on my part, but in answer to Jesse's plea for
someone to fix this 5.12 release blocker; I learned just enough to get
the job done, along lines outlined by Yves, as I understood them. I
have heard since that Dave Mitchell had an idea for fixing it along some
other path of attack; but I don't know what that might be.

But a possibility for future work is to revisit this fix, and put things
back into the regex compiler. Now that charnames has those fixes, I
believe that all that is necessary to get the correct results is to give
it the correct %^H, the one that was in effect at the time of the
lexing. I don't know how to do that.

But in 5.14, \N should be listed in the pods as not being part of the
regex syntax.

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2011

From @demerphq

On 28 March 2011 18​:14, Karl Williamson <public@​khwilliamson.com> wrote​:

On 03/28/2011 09​:43 AM, Karl Williamson wrote​:

On 03/28/2011 06​:45 AM, demerphq wrote​:

On 27 March 2011 22​:44, Karl Williamson<public@​khwilliamson.com> wrote​:

On 03/27/2011 02​:24 PM, Tom Christiansen wrote​:

On Mar 27, 2011, at 12​:57 PM, tchrist1 via RT wrote​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They
will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated
in a
pattern.

And \N{}.

$ perl5.13.11 -le '$p = q"\N{U+66}"; print "ofo" =~ /o $p o/x'
1

What am I missing?

Gr.

It's the charnames issue.

--tom

By which he means that in something like \N{COLON}, the name is
looked-up
during tokenizing, and you get an error message if you try to bypass
that
and go directly to the regex compiler.

Im kinda aware of the issues here and I didn't understand that
sentence, so I'm betting people that dont understand the issues at all
don't either. Care to rephrase yourself?

http​://groups.google.com/group/perl.perl5.porters/msg/0716102cfbcde32e
and http​://rt.perl.org/rt3//Public/Bug/Display.html?id=56444

To add some to that.  The solution to #56444 was to move the parsing of \N{}
back to the lexer.  And Tom in the mail message is pointing out that that
has down sides which aren't adequately documented.

The problem is that regex compilation is not done at the time of the
original parse, and the \N{} definitions are not invariant, so the regex
compilation needs to be cognizant of what they were at the time of the
parse.  What the #56444 solution does is to compile them at parse time into
an intermediate invariant form, but this has down sides, as Tom's email (in
the first link) shows.

Since the time I fixed #56444, it has come to light that the \N{} module,
charnames, had significant scoping bugs; and I fixed all known problems
involving those for 5.14.  I got involved in fixing 56444 not because of any
expertise on my part, but in answer to Jesse's plea for someone to fix this
5.12 release blocker; I learned just enough to get the job done, along lines
outlined by Yves, as I understood them.  I have heard since that Dave
Mitchell had an idea for fixing it along some other path of attack; but I
don't know what that might be.

But a possibility for future work is to revisit this fix, and put things
back into the regex compiler.  Now that charnames has those fixes, I believe
that all that is necessary to get the correct results is to give it the
correct %^H, the one that was in effect at the time of the lexing.  I don't
know how to do that.

But in 5.14, \N should be listed in the pods as not being part of the regex
syntax.

Ok. I get it now.

The lexer only sees patterns compiled at compile time, not those
compiled at run time. Thus *any* construct the lexer handles for the
regex engine (and does not pass through) has to be equivalently
supported in the regex compiler as well.

Historically​:

Long ago BOTH the regex engine and the lexer handled \N{ .... }.

Then we moved it into the regex engine only. (because we thought that
fixed bugs)

Then we moved it back to the lexer as a translator. But I guess when
we moved it back to the lexer as a translator we forgot to ALSO
provide equivalent logic in the regex engine.

So I dont see this as a problem of moving the logic back to the lexer,
its rather the problem that we forgot to support it in the regex
compiler too.

To repeat​: *any* construct the lexer handles for the regex engine (and
does not pass through) has to be equivalently supported in the regex
compiler as well.

We should put that in a comment somewhere.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2011

From @demerphq

On 28 March 2011 18​:14, Karl Williamson <public@​khwilliamson.com> wrote​:

But a possibility for future work is to revisit this fix, and put things
back into the regex compiler.  Now that charnames has those fixes, I believe
that all that is necessary to get the correct results is to give it the
correct %^H, the one that was in effect at the time of the lexing.  I don't
know how to do that.

I dont get you here. I don't see how charnames *can* be fixed and not
have the problem that we solve by translation.

The problem here is purely that the translation logic has to be
supported via two separate entry points, code and strings.

if ($foo=~/\N{bar}/) {

and via a string in something like​:

if ($foo=~/$bar/) {

So when $bar contains "\N{bar}" we will find it in the regex engine.
We will then have to do some trickery to convert it into a \N{U+...}
sequence when the pattern is compiled (this will mean changing the
stored pattern as well.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2011

From @cpansprout

On Sun Mar 27 12​:47​:01 2011, sprout wrote​:

At

<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/03/msg170208.html>,

Tom Christiansen pointed out that this paragraph I added to perlretut
is missing \l and \u​:

\+C\<\\Q>\, C\<\\L>\, C\<\\U> and C\<\\E> are actually part of the syntax of

regular
+expression I<literals>, and are not part of regexp syntax proper.
So they
+do not work in interpolated patterns.

Then Karl Williamson pointed out that ‘regular expression literal’ is
not defined anywhere.

So I would like to change this to​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They
will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

I have just committed this change as 8e71069.

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2011

From [Unknown Contact. See original ticket]

On Sun Mar 27 12​:47​:01 2011, sprout wrote​:

At

<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/03/msg170208.html>,

Tom Christiansen pointed out that this paragraph I added to perlretut
is missing \l and \u​:

\+C\<\\Q>\, C\<\\L>\, C\<\\U> and C\<\\E> are actually part of the syntax of

regular
+expression I<literals>, and are not part of regexp syntax proper.
So they
+do not work in interpolated patterns.

Then Karl Williamson pointed out that ‘regular expression literal’ is
not defined anywhere.

So I would like to change this to​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They
will
work if they appear in a regular expression embeddded directly in a
program, but not when contained in a string that is interpolated in a
pattern.

I have just committed this change as 8e71069.

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2011

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Mar 29, 2011
@p5pRT
Copy link
Author

p5pRT commented Mar 30, 2011

From @epa

Father Chrysostomos <perlbug-followup <at> perl.org> writes​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program,

Unless that regular expression uses '' as delimiters?

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2011

From @cpansprout

On Mar 30, 2011, at 6​:04 AM, Ed Avis via RT wrote​:

Father Chrysostomos <perlbug-followup <at> perl.org> writes​:

C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embeddded directly in a
program,

Unless that regular expression uses '' as delimiters?

Yes, but I did say ‘double-quotish syntax’. Since this is tutorial, I think we can leave that implicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant