Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\Q buggy, eg /x on \Q#foo\E doesn't match '#foo', # becomes special #13257

Open
p5pRT opened this issue Sep 15, 2013 · 10 comments
Open

\Q buggy, eg /x on \Q#foo\E doesn't match '#foo', # becomes special #13257

p5pRT opened this issue Sep 15, 2013 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 15, 2013

Migrated from rt.perl.org#119793 (status was 'open')

Searchable as RT119793$

@p5pRT
Copy link
Author

p5pRT commented Sep 15, 2013

From @bulk88

Created by @bulk88

Following perlre, I used \Q\E to escape things from /x.
The \Q\E line in perlre was added in June 2006 in
http​://perl5.git.perl.org/perl.git/commitdiff/1031e5dba2bc40203b5942f84d3d2bc335470dba .

_______________________________________________
$_ = '# These definitions are from config.sh (via C​:\p519\src\lib/Config.pm).';
if(/\Q# These definitions are from config.sh (via \E/x) {print 1;}
else {print 0;}
______________________________________________
With /x modifier, it prints 0, without prints 1. The bug is the "#" is still special after \Q\E, but only under /x, and doesn't become a normal dead character. This contradicts the perlre suggestion. It also contradicts that \Q\E are supposed to be the same as quotemeta() according to jamesw on irc. This line from perlfunc supports that in bulk88's opinion http​://perl5.git.perl.org/perl.git/blob/HEAD​:/pod/perlfunc.pod#l5334 .

______________________________________________
$_ = '# These definitions are from config.sh (via C​:\p519\src\lib/Config.pm).';
my $x = quotemeta;
print /$x/x;
______________________________________________
prints 1.

I don't know what the correct behavior here, and if there is a doc problem here or a regexp bug or what but something has to change. Tested with Perl 5.12.3 and Perl 5.19.4.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.19.4:

Configured by Administrator at Thu Sep  5 21:44:53 2013.

Summary of my perl5 (revision 5 version 19 subversion 4) configuration:
  Derived from: 
  Platform:
    osname=MSWin32, osvers=5.2, archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GL -fp:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO',
    optimize='-O1 -MD -Zi -DNDEBUG -GL -fp:precise',
    cppflags='-DWIN32'
    ccversion='15.00.30729.01', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg  -libpath:"c:\p519\lib\CORE"  -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl519.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg  -libpath:"c:\p519\lib\CORE"  -machine:AMD64 "/manifestdependency:type='Win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='*' publicKeyToken='6595b64144ccf1df' language='*'"'

Locally applied patches:
    uncommitted-changes


@INC for perl 5.19.4:
    C:/p519/site/lib
    C:/p519/lib
    .


Environment for perl 5.19.4:
    CYGWIN=tty
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/usr/lib/x86:/usr/X11R6/lib
    LOGDIR (unset)
    PATH=C:\p519\bin;C:\Program Files (x86)\Intel\Composer XE 2011 SP1\redist\ia32\tbb\vc9;C:\Program Files (x86)\Intel\Composer XE 2011 SP1\redist\intel64\tbb\vc9;C:\Program Files (x86)\Intel\Composer XE 2011 SP1\redist\intel64\ipp;C:\Program Files (x86)\Intel\Composer XE 2011 SP1\redist\ia32\ipp;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32\compiler;C:\Perl\site\bin;C:\Perl\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC;C:\Program Files\TortoiseGit\bin
    PERL_BADLANG (unset)
    SHELL (unset) 		 	   		  

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2013

From @nwc10

On Sun, Sep 15, 2013 at 12​:44​:18AM -0700, bulk88 wrote​:

Following perlre, I used \Q\E to escape things from /x.
The \Q\E line in perlre was added in June 2006 in
http​://perl5.git.perl.org/perl.git/commitdiff/1031e5dba2bc40203b5942f84d3d2bc335470dba .

_______________________________________________
$_ = '# These definitions are from config.sh (via C​:\p519\src\lib/Config.pm).';
if(/\Q# These definitions are from config.sh (via \E/x) {print 1;}
else {print 0;}
______________________________________________
With /x modifier, it prints 0, without prints 1. The bug is the "#" is still special after \Q\E, but only under /x, and doesn't become a normal dead character. This contradicts the perlre suggestion. It also contradicts that \Q\E are supposed to be the same as quotemeta() according to jamesw on irc. This line from perlfunc supports that in bulk88's opinion http​://perl5.git.perl.org/perl.git/blob/HEAD​:/pod/perlfunc.pod#l5334 .

______________________________________________
$_ = '# These definitions are from config.sh (via C​:\p519\src\lib/Config.pm).';
my $x = quotemeta;
print /$x/x;
______________________________________________
prints 1.

Nice bug. I'm not sure if the behaviour is as simple as # being a "dead"
character or not. If differs depending on whether \E is present​:

$ perl -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# / ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
. at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
M at -e line 1.

It seems to be have happened between 5.000 and 5.001

$ ./perl -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
M at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# / ? "M" : "."'
M at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
. at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
M at -e line 1.

bisect.pl --target miniperl --start=perl-5.000 --end=perl-5.001 -e 'if ("a#b" !~ /\Qa#b\E/x) { exit 1 }'

reports​:

commit 748a930
Author​: Larry Wall <lwall@​netlabs.com>
Date​: Sun Mar 12 22​:32​:14 1995 -0800
  Perl 5.001
  [See the Changes file for a list of changes]

The symptom may relate to the code referred to by this entry in its Changes​:

+NETaa13369​: # is now a comment character, and \# should be left for regcomp.
+From​: Simon Parsons
+Files patched​: toke.c
+ It was not skipping the comment when it skipped the white space, and construct
+ an opcode that tried to match a null string. Unfortunately, the previous
+ star tried to use the first character of the null string to optimize where
+ to recurse, so it never matched.

but the fact that even back then it differs depending on \E being present or
implict makes me think that it's actually a bug somewhere else.

And -Dr suggests something far more screwed up​:

$ ./perl -Dr -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
Compiling REx "a\#\ \\E"
rarest char \ at 3
Final program​:
  1​: EXACT <a# \\E> (4)
  4​: END (0)
anchored "a# \E" at 0 (checking anchored isall) minlen 5
Enabling $` $&amp; $' support (0x7).

EXECUTING...

String shorter than min possible regex match (3 < 5)
. at -e line 1.
Freeing REx​: "a\#\ \\E"

It seems that the trailing \E is being left in the string, and then ending up
as something which the engine attempts to match against.

Looks like 5.001 makes exactly the same mistake​:

$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
rarest char \ at 3
first 14 next 97 offset 0
1​:BRANCH(15)
5​:EXACTLY(15) <a# \E>
15​:END(0)
start `a# \E' minlen 5

EXECUTING...

. at -e line 1.
$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
rarest char # at 1
first 14 next 97 offset 0
1​:BRANCH(13)
5​:EXACTLY(13) <a# >
13​:END(0)
start `a# ' minlen 3

EXECUTING...

1​:BRANCH <a# >
5​:EXACTLY <a# >
13​:END <>
M at -e line 1.
$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
rarest char # at 1
first 14 next 97 offset 0
1​:BRANCH(13)
5​:EXACTLY(13) <a# >
13​:END(0)
start `a# ' minlen 3

EXECUTING...

1​:BRANCH <a# >
5​:EXACTLY <a# >
13​:END <>
M at -e line 1.

I don't know what the correct behavior here, and if there is a doc problem here or a regexp bug or what but something has to change. Tested with Perl 5.12.3 and Perl 5.19.4.

I think that the implementation is at fault here. In that, conceptually,
\Q\E processing is meant to be an earlier step than comment stripping,
hence \Q\E should apply to comments too.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2013

From @ikegami

On Wed, Sep 18, 2013 at 10​:56 AM, Nicholas Clark <nick@​ccl4.org> wrote​:

Nice bug. I'm not sure if the behaviour is as simple as # being a "dead"
character or not. If differs depending on whether \E is present​:

$ perl -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# / ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
. at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
M at -e line 1.

Nice indeed. Easy to see with qr//

$re = qr/\Q#abc\E/x; print "$re\n"; print '#abc' =~ /$re/ || 0, "\n";
$re = qr/\Q#abc/x; print "$re\n"; print '#abc' =~ /$re/ || 0, "\n";
$re = qr/\Qabc\E/x; print "$re\n"; print 'abc' =~ /$re/ || 0, "\n";

(?^x​:\#abc\\E)
0
(?^x​:\#abc)
1
(?^x​:abc)
1

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2015

From @mauke

Created by @mauke

% perl -wle 'print "#\\E \\z" =~ / \Q#\E \z/x ? "wtf" : "k"'
wtf

Compiling REx " \#\\E\ \\z"
Final program​:
  1​: EXACT <#\\E \\z> (4)
  4​: END (0)

For some reason the # turns everything after it into literal text instead of a
comment. My guess is that \Q skips over comments, doesn't find its \E and
so proceeds to quotemeta all of '#\E \z' as is, which then somehow makes # a
not-comment again.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.22.0:

Configured by mauke at Sat Jul  4 14:56:57 CEST 2015.

Summary of my perl5 (revision 5 version 22 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=4.0.1-1-arch, archname=i686-linux
    uname='linux simplicio 4.0.1-1-arch #1 smp preempt wed apr 29 12:15:20 cest 2015 i686 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='5.1.0', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12, longdblkind=3
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/5.1.0/include-fixed /usr/lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.21'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'



@INC for perl 5.22.0:
    /home/mauke/usr/lib/perl5/site_perl/5.22.0/i686-linux
    /home/mauke/usr/lib/perl5/site_perl/5.22.0
    /home/mauke/usr/lib/perl5/5.22.0/i686-linux
    /home/mauke/usr/lib/perl5/5.22.0
    .


Environment for perl 5.22.0:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_BASHRC_VERSION=0.69
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_ROOT=/home/mauke/perl5/perlbrew
    PERL_BADLANG (unset)
    PERL_UNICODE=SAL
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2016

From @bulk88

On Thu Jul 09 13​:33​:29 2015, mauke- wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.22.0.

-----------------------------------------------------------------
[Please describe your issue here]

% perl -wle 'print "#\\E \\z" =~ / \Q#\E \z/x ? "wtf" : "k"'
wtf

Compiling REx " \#\\E\ \\z"
Final program​:
1​: EXACT <#\\E \\z> (4)
4​: END (0)

For some reason the # turns everything after it into literal text
instead of a
comment. My guess is that \Q skips over comments, doesn't find its \E
and
so proceeds to quotemeta all of '#\E \z' as is, which then somehow
makes # a
not-comment again.

Is this a dup of https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119793 ?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2017

From @hvds

On Fri, 15 Jan 2016 16​:56​:40 -0800, bulk88 wrote​:

On Thu Jul 09 13​:33​:29 2015, mauke- wrote​:

% perl -wle 'print "#\\E \\z" =~ / \Q#\E \z/x ? "wtf" : "k"'
wtf

Compiling REx " \#\\E\ \\z"
Final program​:
1​: EXACT <#\\E \\z> (4)
4​: END (0)

For some reason the # turns everything after it into literal text instead of a
comment. My guess is that \Q skips over comments, doesn't find its \E and
so proceeds to quotemeta all of '#\E \z' as is, which then somehow makes # a
not-comment again.

Is this a dup of https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119793 ?

Yes, I'll merge them.

It's still happening​:

% perl -Dr -we '
qr{\Qa#bc};
qr{\Qa#b\Ec};
qr{\Qa#bc}x;
qr{\Qa#b\Ec}x;
' 2>&1 | grep 'EXACT'
  1​: EXACT <a#bc> (3)
  1​: EXACT <a#bc> (3)
  1​: EXACT <a#bc> (3)
  1​: EXACT <a#b\\Ec> (4)
%

Hugo

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2017

From @khwilliamson

I looked at this a little, and it appears to me that there is a fundamental flaw in how \Q is processed.

An example is

qr/\Q\N{COLON}/

This evaluates to

Final program​:
  1​: EXACT <\\N{U+3A}> (4)
  4​: END (0)

which is very wrong.

perlop's "Gory Details" says that it looks for the end of a quoting construct, and in my example plus the ones in the ticket, it does find the ending pattern delimiter, but within the pattern, I don't see how it is looking for the end of \Q. In yylex(), it does this​:

  NEXTVAL_NEXTTOKE.ival = OP_QUOTEMETA;

And then calls itself recursively. I don't understand the overall picture of lexing, but this appears to me to be setting something up for runtime rather than for continued parsing. It would seem to me that \Q should set up some recursive state, which apparently sublex_push() does. but it isn't.

This kind of thing has been hashed over before:

Github issue #11145
http://nntp.perl.org/group/perl.perl5.porters/179078
http://nntp.perl.org/group/perl.perl5.porters/206466

@Grinnz
Copy link
Contributor

Grinnz commented Jun 30, 2021

To add another weird case I just found: https://perl.bot/p/5guy5u

use strict;
use warnings;

chomp(my $str = <<'EOF');
foo\\"bar
EOF

chomp(my $repl = <<'EOF');
\"
EOF

[$str =~ s/\Q\\"/X/gr, $str =~ s/\Q$repl/X/gr]
["fooXbar","foo\\Xbar"]

It seems that when writing a literal (escaped) backslash after \Q, the regex will match two literal backslashes, but when the backslash is included after \Q from interpolation, it only matches one literal backslash.

@xenu xenu removed the affects-5.19 label Nov 19, 2021
@xenu xenu removed the Severity Low label Dec 29, 2021
@khwilliamson khwilliamson changed the title /x on \Q#foo\E doesn't match '#foo', # becomes special \Q buggy, eg /x on \Q#foo\E doesn't match '#foo', # becomes special Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants