Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error producing ^\ (chr 28) with "\c\\" #875

Closed
p5pRT opened this issue Nov 18, 1999 · 27 comments
Closed

Error producing ^\ (chr 28) with "\c\\" #875

p5pRT opened this issue Nov 18, 1999 · 27 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 18, 1999

Migrated from rt.perl.org#1806 (status was 'resolved')

Searchable as RT1806$

@p5pRT
Copy link
Author

p5pRT commented Nov 18, 1999

From newton@ficus.frogspace.net

It appears to be impossible to produce a ^\ character (ASCII 28)
using \c notation. "\x1c" and "\034" work fine, but "\c\\" gives
a two-character string (ASCII 28, 92 i.e. ^\ followed by backslash)
and "\c\" gives "Can't find string terminator '"' anywhere before
EOF".

Examples​:

$ perl -wle 'print join ", ", map ord, split //, "\c\"'
Can't find string terminator '"' anywhere before EOF at -e line 1.
$ perl -wle 'print join ", ", map ord, split //, "\c\\"'
28, 92

"\c\\" whould be the correct syntax in my opinion; however, it appears
that the \c logic sees two backslashes -- that the two backslashes
aren't first reduced to one (as per double-quoting usually) before \c
sees it.

Perl Info


Site configuration information for perl 5.00503:

Configured by frogleg at Sun Aug  8 13:32:51 EDT 1999.

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=linux, osvers=5.2, archname=i686-linux
    uname='linux ficus.frogspace.net 2.2.6-ac3 #2 thu aug 5 09:35:04 edt 1999 i686 unknown '
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='gcc', optimize='-O2', gccversion=2.7.2.3
    cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.00503:
    /usr/local/lib/perl5/5.00503/i686-linux
    /usr/local/lib/perl5/5.00503
    /usr/local/lib/perl5/site_perl/5.005/i686-linux
    /usr/local/lib/perl5/site_perl/5.005
    .


Environment for perl 5.00503:
    HOME=/home/newton
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 19, 1999

From [Unknown Contact. See original ticket]

,,, writes​:

"\c\\" whould be the correct syntax in my opinion; however, it appears
that the \c logic sees two backslashes -- that the two backslashes
aren't first reduced to one (as per double-quoting usually) before \c
sees it.

And what would "\c\c\c\c\\" do, in your opinion? ;-)

perlop/"Gory details..."

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 19, 1999

From [Unknown Contact. See original ticket]

(cc'ed to Ilya Zakharevich)

On Fri, 19 Nov 1999, Ilya Zakharevich wrote​:

,,, writes​:

"\c\\" whould be the correct syntax in my opinion; however, it appears
that the \c logic sees two backslashes -- that the two backslashes
aren't first reduced to one (as per double-quoting usually) before \c
sees it.

And what would "\c\c\c\c\\" do, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\', i.e. ASCII 28, 99, 28, 99, 92. And
that's what Perl does.

This still doesn't explain to me, though, hough to produce a string
consisting solely of the character ^\.
--
Philip Newton <newton@​newton.digitalspace.net>

@p5pRT
Copy link
Author

p5pRT commented Nov 19, 1999

From [Unknown Contact. See original ticket]

On Fri, Nov 19, 1999 at 11​:17​:03AM -0500, Philip Newton wrote​:

And what would "\c\c\c\c\\" do, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\', i.e. ASCII 28, 99, 28, 99, 92. And
that's what Perl does.

So you want it to be parsed kleft-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

This still doesn't explain to me, though, hough to produce a string
consisting solely of the character ^\.

TIMTOWTDI. chr(ord('\\')-62) (or is it 32?) is one of them.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 1999

From @ysth

In article <19991119124913.F20768@​monk.mps.ohio-state.edu>,
Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote​:

On Fri, Nov 19, 1999 at 11​:17​:03AM -0500, Philip Newton wrote​:

And what would "\c\c\c\c\\" do, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\', i.e. ASCII 28, 99, 28, 99, 92. And
that's what Perl does.

So you want it to be parsed kleft-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

He stated he expected either "\c\\" or "\c\" to produce chr(28) and
found that neither of them did. He then said that in his opinion
"\c\\" should do it. I infer that what he *wants* is for either of
them to do it.

[D​:\susv2]perl -wlne "eval $_; print $@​ if $@​"
print length "\c\\"
2
print length "\c\"
Can't find string terminator '"' anywhere before EOF at (eval 2) line 1, <> chunk 2.
exit

I lean toward considering the second of these a bug.

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 1999

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes writes​:

[D​:\susv2]perl -wlne "eval $_; print $@​ if $@​"
print length "\c\\"
2
print length "\c\"
Can't find string terminator '"' anywhere before EOF at (eval 2) line 1, <> chunk 2.
exit

I lean toward considering the second of these a bug.

Then read the docs.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 1999

From @ysth

In article <199911210533.AAA01763@​monk.mps.ohio-state.edu>,
Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote​:

Yitzchak Scott-Thoennes writes​:

[D​:\susv2]perl -wlne "eval $_; print $@​ if $@​"
print length "\c\\"
2
print length "\c\"
Can't find string terminator '"' anywhere before EOF at (eval 2) line 1, <> chunk 2.
exit

I lean toward considering the second of these a bug.

Then read the docs.

Thank you, I already did when you gave the doc reference before.

I realise that this behavior is perfectly in accord with perlop/"Gory
details"/"Finding the end". That doesn't mean it's not a bug. It
just means that if it is a code bug it is also a doc bug.

IMO, it also contradicts the beginning of perlop/"Gory details"​:

  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  \(expanded to Do What I Mean \- not what I wrote\) to pick up
  the most probable interpretation of the source\.

A​: "\c\"
B​: "\c\"more string here"

The question is, is the current incomprehesible behavior of string B
more important to maintain than following DWIM for "string" A?

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 1999

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes writes​:

I realise that this behavior is perfectly in accord with perlop/"Gory
details"/"Finding the end". That doesn't mean it's not a bug. It
just means that if it is a code bug it is also a doc bug.

IMO, it also contradicts the beginning of perlop/"Gory details"​:

  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  \(expanded to Do What I Mean \- not what I wrote\) to pick up
  the most probable interpretation of the source\.

This is taken out of context.

A​: "\c\"
B​: "\c\"more string here"

The question is, is the current incomprehesible behavior of string B
more important to maintain than following DWIM for "string" A?

Perl uses a very simple rule to find an end of a quoted construct.
All one needs to do is to learn it.

Hope this help,
Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 1999

From @ysth

Cc'd to​: ilya@​math.ohio-state.edu

In article <199911210756.CAA02439@​monk.mps.ohio-state.edu>,
Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote​:

Yitzchak Scott-Thoennes writes​:

I realise that this behavior is perfectly in accord with perlop/"Gory
details"/"Finding the end". That doesn't mean it's not a bug. It
just means that if it is a code bug it is also a doc bug.

IMO, it also contradicts the beginning of perlop/"Gory details"​:

  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  \(expanded to Do What I Mean \- not what I wrote\) to pick up
  the most probable interpretation of the source\.

This is taken out of context.

I disagree. This reads to me like a "mission statement" for Perl
parsing. If one of the many detailed parsing rules that follow
violate this statement, then that is an indication that a modification
may be necessary.

A​: "\c\"
B​: "\c\"more string here"

The question is, is the current incomprehesible behavior of string B
more important to maintain than following DWIM for "string" A?

Perl uses a very simple rule to find an end of a quoted construct.
All one needs to do is to learn it.

Hope this help,
Ilya

Thanks, I thought I already said I understand the docs and behavior
for finding the end of a quoted construct are in agreement.

But application of these rules of finding the end, left-to-right
parse, etc. leaves the \c\ construct without a clear meaning.

One of the following should be true in all cases​:
1. \c\ produces a fatal exception
2. the second \ is a escape character and the following character is
  the 'argument' to \c
3. the \ is the 'argument' to \c and the following character is unaffected
4. the behavior is undefined and a warning is given

Currently none of these is true.
The behavior is pretty close to number 3 except for a couple of odd cases​:

[D​:\]perl -wlne "print map {length,'​:',map {;' ',ord} split //} eval"
"\c\\"
2​: 28 92
"\c\""
1​: 98
exit

In the first case, the string seems to contain two characters​: \c\ and \.
The anomaly is that the trailing backslash doesn't escape the ".

In the second case, the string should IMO end with the 2nd ". Instead,
the 2nd \ of \c\ escapes it, contrary to \c\ behavior elsewhere. Thus
the string becomes \c" -> chr((ord('"')-64) & 127) -> b.

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 1999

From [Unknown Contact. See original ticket]

On Sun, Nov 21, 1999 at 10​:04​:34AM -0800, Yitzchak Scott-Thoennes wrote​:

But application of these rules of finding the end, left-to-right
parse, etc. leaves the \c\ construct without a clear meaning.

On the opposite, they *give* \c\ a clear meaning​: see qq[\c\c].

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 1999

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes <sthoenna@​efn.org> writes​:

A​: "\c\"
B​: "\c\"more string here"

The question is, is the current incomprehesible behavior of string B
more important to maintain than following DWIM for "string" A?

Yes.
print "this is \"important\"!\n";

--
Nick Ing-Simmons

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From [Unknown Contact. See original ticket]

On Fri, 19 Nov 1999, Ilya Zakharevich wrote​:

On Fri, Nov 19, 1999 at 11​:17​:03AM -0500, Philip Newton wrote​:

And what would "\c\c\c\c\\" do, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\', i.e. ASCII 28, 99, 28, 99, 92. And
that's what Perl does.

So you want it to be parsed left-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass, before
the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

This still doesn't explain to me, though, hough to produce a string
consisting solely of the character ^\.

TIMTOWTDI. chr(ord('\\')-62) (or is it 32?) is one of them.

Well, and "\0x1c" and "\034" work, of course. I was just disappointed that
ctrl-\ seems to be the only character that's difficult to produce in \c
notation.

Cheers,
Philip

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From [Unknown Contact. See original ticket]

On Tue, Nov 23, 1999 at 05​:00​:24AM -0500, Philip Newton wrote​:

So you want it to be parsed left-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass, before
the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same, right?

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From @TimToady

Philip Newton writes​:
: Not quite. I want '\\' to be translated to \ in a previous pass, before
: the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

It would have to be done in the same pass, by pretending that \c is a
funny kind of \. We go to great lengths to avoid doing multiple passes
in Perl, and when we do do multiple passes, we go to great lengths to
hide that fact. For instance, we pretend that regular expressions
are interpolated and interpreted just like double-quoted strings, but
in fact, the lexer must treat them entirely differently to preserve
that illusion, because the regular expression parser does a separate
pass after interpolation. Not only must the lexer pass backslashed
sequences through to the regular expression parser, but it has to decide
which dollar signs indicate something to be interpolated immediately​:

  /$foo/

and which have to be passed through to the regular expression engine​:

  /foo$/
  /(foo$|bar$)/
  /(?{ $foo += 1 })/

Actually, that last one doesn't need to pass $foo--it could conceivably
just pass a pointer to some precompiled code, but I don't think it
does. If I recall, it's more like an eval.

But anyway, we don't cavalierly add multiple passes to Perl. Multiple
passes tend to make things easier for the implementer, but harder for
the user. Perl's loyalties lie with the user.

Larry

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From @ysth

Cc'd to​: larry@​wall.org

In article <199911231731.JAA16685@​kiev.wall.org>,
Larry Wall <larry@​wall.org> wrote​:

It would have to be done in the same pass, by pretending that \c is a
funny kind of \.

At last, someone with a glimmer of sense! \c *already* is a funny
kind of \. Except with respect to whichever character indicates the
end of the quoted string. This inconsistency is a *bug*.

For instance​:

$foo = 'bar'; print "\c$foo"; yields 'dfoo', not an error
$foo = 'bar'; print "\c\$foo"; yields chr(28).'bar', not 'dfoo'

From this, "\c\" should yield chr(28).
And "\c"" should yield 'b', just as qq'\c"' does.

Alternatively, \c\ should always apply the \c 'operator' to the
following character, so that "\c\"" works like qq'\c"' (as it
currently does) but it takes \c\\ to get a chr(28).

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From @ysth

Cc'd to​: nick@​ing-simmons.net

In article <199911212139.VAA03794@​bactrian.ni-s.u-net.com>,
Nick Ing-Simmons <nick@​ing-simmons.net> wrote​:

Yitzchak Scott-Thoennes <sthoenna@​efn.org> writes​:

A​: "\c\"
B​: "\c\"more string here"

The question is, is the current incomprehesible behavior of string B
more important to maintain than following DWIM for "string" A?

Yes.
print "this is \"important\"!\n";

$what='this';
print "\c$what do I print?";
print "\c\$what do I print?";

And why? When you understand what I am talking about, feel free to comment.

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From [Unknown Contact. See original ticket]

On Tue, 23 Nov 1999, Ilya Zakharevich wrote​:

On Tue, Nov 23, 1999 at 05​:00​:24AM -0500, Philip Newton wrote​:

So you want it to be parsed left-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass, before
the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same, right?

I don't get it. The first I would parse as three characters​: backslash
(the two backslashes become one), n, c. The second as one character​:
ctrl-C ("\x03").

Cheers,
Philip

@p5pRT
Copy link
Author

p5pRT commented Nov 23, 1999

From [Unknown Contact. See original ticket]

On Tue, 23 Nov 1999, Larry Wall wrote​:

Philip Newton writes​:
: Not quite. I want '\\' to be translated to \ in a previous pass, before
: the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

It would have to be done in the same pass, by pretending that \c is a
funny kind of \.

Is this then where the current problem comes from? "\c\\" gets parsed
left-to-right, and if \c is a funny escape sign, then we have the "token"
\c + backslash, followed by another backslash. \c + backslash is converted
to ^\, and the final backslash stays chr(92). The parser (lexer?) never
sees "\\" to convert to one backslash because the first backslash is
already eaten by the \c "escape".

Maybe some magic, then, which makes "\c\\" into one token, which gets
eaten whole by the "maximal munch" strategy?

Cheers,
Philip
--
Philip Newton <newton@​newton.digitalspace.net>

@p5pRT
Copy link
Author

p5pRT commented Nov 24, 1999

From [Unknown Contact. See original ticket]

On Wed, Nov 24, 1999 at 03​:10​:58AM -0500, Philip Newton wrote​:

So you want it to be parsed left-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass, before
the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same, right?

It should have been, of course, \\cc vs \cc

I don't get it. The first I would parse as three characters​: backslash
(the two backslashes become one), n, c. The second as one character​:
ctrl-C ("\x03").

Nope. You want two backslashes converted to one *before* \c
interpolation is done. This \\cc will behave the same as \cc.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 25, 1999

From [Unknown Contact. See original ticket]

On Wed, 24 Nov 1999, Ilya Zakharevich wrote​:

On Wed, Nov 24, 1999 at 03​:10​:58AM -0500, Philip Newton wrote​:

So you want it to be parsed left-to-right. But you want \c\\ to be
parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass, before
the \c mechanism sees it. After the '\\' -> \ pass, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same, right?

It should have been, of course, \\cc vs \cc

OK. I see what you mean now. No, I suppose I don't want that. I suppose
what I want was expressed, more or less, by Yitzchak Scott-Thoennes
elsewhere in this thread. Something along the lines of having \-conversion
(\n \f etc.) done in parallel with \c-conversion, and having \c\ do
roughly the same thing as \c if the following character is either a
backslash or the closing delimiter, e.g. "\c\\", "\c\"", qq^\c\^^.

Cheers,
Philip
--
Philip Newton <newton@​newton.digitalspace.net>

@p5pRT
Copy link
Author

p5pRT commented Nov 25, 1999

From [Unknown Contact. See original ticket]

Philip Newton (lists.p5p)​:

OK. I see what you mean now. No, I suppose I don't want that. I suppose
what I want was expressed, more or less, by Yitzchak Scott-Thoennes
elsewhere in this thread. Something along the lines of having \-conversion
(\n \f etc.) done in parallel with \c-conversion, and having \c\ do
roughly the same thing as \c if the following character is either a
backslash or the closing delimiter, e.g. "\c\\", "\c\"", qq^\c\^^.

This now makes sense, and that's roughly how I had thunk it should go.
At least this way is relatively easy to implement​: special-case
\c\[something] then fall back to parsing from \c if there wasn't a
following backslash - that way we could get it all in one left-right
pass, which seems the most intuitive, even if it isn't.

It strikes me as being the solution most mentally compatible with the
rest of Perl's escaping/metacharactering. Feel free to violently
disagree.

To save someone the bother of bringing up the degenerate case, what the
heck should \c\cx do?

--
Q​: How many IBM CPU's does it take to execute a job?
A​: Four; three to hold it down, and one to rip its head off.

@p5pRT
Copy link
Author

p5pRT commented Nov 25, 1999

From [Unknown Contact. See original ticket]

On Thu, Nov 25, 1999 at 04​:55​:59AM -0500, Philip Newton wrote​:

OK. I see what you mean now. No, I suppose I don't want that. I suppose
what I want was expressed, more or less, by Yitzchak Scott-Thoennes
elsewhere in this thread. Something along the lines of having \-conversion
(\n \f etc.) done in parallel with \c-conversion, and having \c\ do
roughly the same thing as \c if the following character is either a
backslash or the closing delimiter, e.g. "\c\\", "\c\"", qq^\c\^^.

you are missing the *most important* point again​: closing " is found first.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 25, 1999

From [Unknown Contact. See original ticket]

Simon Cozens writes​:

This now makes sense, and that's roughly how I had thunk it should go.
At least this way is relatively easy to implement​: special-case
\c\[something] then fall back to parsing from \c if there wasn't a
following backslash - that way we could get it all in one left-right
pass, which seems the most intuitive, even if it isn't.

As Larry explains, it breaks many other expectations one has about
quoting. Currently

  $a = '\c\\';
  /$a/

does what one expects. Your change will break it.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 26, 1999

From [Unknown Contact. See original ticket]

On Fri, 26 Nov 1999, Ilya Zakharevich wrote​:

On Thu, Nov 25, 1999 at 04​:55​:59AM -0500, Philip Newton wrote​:

Something along the lines of having \-conversion
(\n \f etc.) done in parallel with \c-conversion, and having \c\ do
roughly the same thing as \c if the following character is either a
backslash or the closing delimiter, e.g. "\c\\", "\c\"", qq^\c\^^.

you are missing the *most important* point again​: closing " is found first.

Yes, but according to perlop/Gory Details, while searching for the closing
" of "" (or closing ^ of qq^^, etc.), the combinations \\ and \" (or \^,
etc.) are skipped. Hence, according to my understanding, "\c\\" (after the
first "pass", finding the end) turns into >>\c\\ inside ""<<; "\c\"" into

\c\" inside ""<<; and qq^\c\^^ into >>\c\^ inside qq^^<<.

After this, backslash+delimiter are turned into plain delimiter (while
backslash+backslash is kept), and knowledge of the original delimiter is
lost, so the three strings become \c\\, \c" and \c^, respectively.

Hmmm, I think I begin to understand. No change is needed for
control-(closing delimiter), since the interpolation step doesn't see
backslashes there any more. However, I still believe that handling of \c\\
should be changed to produce ctrl-\ instead of ctrl-\, \.

Cheers,
Philip
--
Philip Newton <newton@​newton.digitalspace.net>

@p5pRT
Copy link
Author

p5pRT commented Nov 26, 1999

From [Unknown Contact. See original ticket]

On Fri, Nov 26, 1999 at 07​:26​:36AM -0500, Philip Newton wrote​:

you are missing the *most important* point again​: closing " is found first.

Yes, but according to perlop/Gory Details, while searching for the closing
" of "" (or closing ^ of qq^^, etc.), the combinations \\ and \" (or \^,
etc.) are skipped. Hence, according to my understanding, "\c\\" (after the
first "pass", finding the end) turns into >>\c\\ inside ""<<; "\c\"" into

\c\" inside ""<<; and qq^\c\^^ into >>\c\^ inside qq^^<<.

After this, backslash+delimiter are turned into plain delimiter (while
backslash+backslash is kept), and knowledge of the original delimiter is
lost, so the three strings become \c\\, \c" and \c^, respectively.

This is how I would expect things work.

Hmmm, I think I begin to understand. No change is needed for
control-(closing delimiter), since the interpolation step doesn't see
backslashes there any more. However, I still believe that handling of \c\\
should be changed to produce ctrl-\ instead of ctrl-\, \.

This will break compatibility with

  $x = 'whatever';
  /$x/;

It is not win-win situation. Being such, I do not feel any urge to
have it changed.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Nov 28, 1999

From [Unknown Contact. See original ticket]

On Sat, 27 Nov 1999, Ilya Zakharevich wrote​:

On Fri, Nov 26, 1999 at 07​:26​:36AM -0500, Philip Newton wrote​:

However, I still believe that handling of \c\\
should be changed to produce ctrl-\ instead of ctrl-\, \.

This will break compatibility with

$x = 'whatever';
/$x/;

I do not understand. Please provide a concrete example.

Cheers,
Philip
--
Philip Newton <newton@​newton.digitalspace.net>

@p5pRT p5pRT closed this as completed Nov 28, 2003
@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2007

From guido@imperia.net

Hi!

My perl version is 5.8.8. In the following I use vi notation for control
characters in strings, ie. CTRL-J is ^J.

When I wanted to include CTRL-\ in a double quoted string with \c
escapes, I first tried the obvious solution for me​:

  $ print "a\c\\nb";
  a^J
  b

It turns out that this yields ``a^\^Jb''. In other words​: \c consumes
exactly the next byte/character even if it is a backslash. This leads
to the interesting question how a string ending in CTRL-\ can be written​:

  print "...\c\";

This will not work, because the backslash escapes the trailing quote.

  print "...\c\\";

This compiles but produces the string ``...^\\'' instead of ``...^\'',
i.e. there is a gratuitous trailing backslash.

Of course, I can use other means like octal or hexadecimal escapes. But
I think that this shows an inconsistency. Take the double quoted string
"...\c\\". The tokenizer takes the last backslash as is, escaped by the
one before as the escaping character. The unescaper takes the last but
one backslash as is and silently tolerates the lone trailing backslash.
The double quoted string looks different from inside than from outside.

IMHO the correct solution would be to represent CTRL-\ as "\c\\". At the
end of the double quoted string and everywhere else. Breaking
compatibility here is probably acceptable. The case is really esoteric.

Cheers,
Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http​://www.imperia.net/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant