Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\c\@ and \c\\@ weirdness in patterns #14331

Open
p5pRT opened this issue Dec 13, 2014 · 3 comments
Open

\c\@ and \c\\@ weirdness in patterns #14331

p5pRT opened this issue Dec 13, 2014 · 3 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 13, 2014

Migrated from rt.perl.org#123423 (status was 'open')

Searchable as RT123423$

@p5pRT
Copy link
Author

p5pRT commented Dec 13, 2014

From @cpansprout

Let’s look at qq strings first​:

In the first case, the \c swallows up the \ and the @​a that follows interpolates​:

$ ./perl -Ilib -le 'use Devel​::Peek; @​a = "a"; Dump "\c\@​a"'
SV = PV(0x7f9d228071c8) at 0x7f9d2282ff30
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x7f9d224209b8 "\34a"\0
  CUR = 2
  LEN = 10

In the second case, \c\ again is treated as a unit and \@​a is a literal '@​a' because the @​ is escaped​:

$ ./perl -Ilib -le 'use Devel​::Peek; @​a = "a"; Dump "\c\\@​a"'
SV = PV(0x7fb0118072d8) at 0x7fb01182ff30
  REFCNT = 1
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
  PV = 0x7fb0114209b8 "\34@​a"\0
  CUR = 3
  LEN = 10
  COW_REFCNT = 0

But it behaves differently when it comes to patterns​:

$ ./perl -Ilib -Mre=debug -le '@​a = "a"; /\c\@​a/'
Compiling REx "\c\@​a"
Final program​:
  1​: EXACT <\34@​a> (3)
  3​: END (0)
anchored "%34@​a" at 0 (checking anchored isall) minlen 3
Freeing REx​: "\c\@​a"

There the backslash after the \c is \c’s character and *also* escapes the @​. It is doing two things at once.

With a double backslash, it is just bizarre​:

$ ./perl -Ilib -Mre=debug -le '@​a = "a"; /\c\\@​a/'
Compiling REx "\c\\a"
Final program​:
  1​: EXACT <\34\7> (3)
  3​: END (0)
anchored "%34%7" at 0 (checking anchored isall) minlen 2
Freeing REx​: "\c\\a"

Here it appears that the double backslash allows interpolation of @​a, resulting in \c\\a, which then compiles to \c\ (\34) followed by \a (\7).

So I managed to create an escape sequence with a literal backslash and an interpolated variable. That is quite something.

I think the regexp cases should behave very similarly to the qq cases. I think the fix for this is for the lexer to treat \c\ exactly the way it does \\​: treat it as one unit and skip over it. That would also affect "\c\\" (currently equivalent to "\34\\", believe it or not; it would be a syntax error; "\c\" would give "\34" and "\c\\\" would give "\34\\"), and possibly break about 3 CPAN modules.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 13, 2014

From @demerphq

On 13 December 2014 at 05​:36, Father Chrysostomos <perlbug-followup@​perl.org

wrote​:

# New Ticket Created by Father Chrysostomos
# Please include the string​: [perl #123423]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123423 >

Let’s look at qq strings first​:

In the first case, the \c swallows up the \ and the @​a that follows
interpolates​:

$ ./perl -Ilib -le 'use Devel​::Peek; @​a = "a"; Dump "\c\@​a"'
SV = PV(0x7f9d228071c8) at 0x7f9d2282ff30
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x7f9d224209b8 "\34a"\0
CUR = 2
LEN = 10

In the second case, \c\ again is treated as a unit and \@​a is a literal
'@​a' because the @​ is escaped​:

$ ./perl -Ilib -le 'use Devel​::Peek; @​a = "a"; Dump "\c\\@​a"'
SV = PV(0x7fb0118072d8) at 0x7fb01182ff30
REFCNT = 1
FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
PV = 0x7fb0114209b8 "\34@​a"\0
CUR = 3
LEN = 10
COW_REFCNT = 0

But it behaves differently when it comes to patterns​:

$ ./perl -Ilib -Mre=debug -le '@​a = "a"; /\c\@​a/'
Compiling REx "\c\@​a"
Final program​:
1​: EXACT <\34@​a> (3)
3​: END (0)
anchored "%34@​a" at 0 (checking anchored isall) minlen 3
Freeing REx​: "\c\@​a"

There the backslash after the \c is \c’s character and *also* escapes the
@​. It is doing two things at once.

With a double backslash, it is just bizarre​:

$ ./perl -Ilib -Mre=debug -le '@​a = "a"; /\c\\@​a/'
Compiling REx "\c\\a"
Final program​:
1​: EXACT <\34\7> (3)
3​: END (0)
anchored "%34%7" at 0 (checking anchored isall) minlen 2
Freeing REx​: "\c\\a"

Here it appears that the double backslash allows interpolation of @​a,
resulting in \c\\a, which then compiles to \c\ (\34) followed by \a (\7).

So I managed to create an escape sequence with a literal backslash and an
interpolated variable. That is quite something.

I think the regexp cases should behave very similarly to the qq cases. I
think the fix for this is for the lexer to treat \c\ exactly the way it
does \\​: treat it as one unit and skip over it. That would also affect
"\c\\" (currently equivalent to "\34\\", believe it or not; it would be a
syntax error; "\c\" would give "\34" and "\c\\\" would give "\34\\"), and
possibly break about 3 CPAN modules.

This all makes sense to me.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 13, 2014

The RT System itself - Status changed from 'new' to 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants