Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unterminateable heredocs caused by newline in delimiter #12267

Open
p5pRT opened this issue Jul 12, 2012 · 10 comments
Open

unterminateable heredocs caused by newline in delimiter #12267

p5pRT opened this issue Jul 12, 2012 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 12, 2012

Migrated from rt.perl.org#114102 (status was 'open')

Searchable as RT114102$

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2012

From @mauke

Created by @mauke

#!perl
$_ = <<"X
";
hi
X

__END__

Can't find string terminator "X
" anywhere before EOF at foo.pl line 2.

The error message is bogus because there is in fact a "X\n" in the code. The
real problem is that a heredoc like this cannot be terminated in any way.

Either the heredoc parser needs to be fixed or this case should be diagnosed as
soon as the tokenizer encounters a newline in the delimiter string.

Perl Info

Flags:
    category=core
    severity=low

This perlbug was built using Perl 5.12.1 - Thu Jun  3 20:09:15 CEST 2010
It is being executed now by  Perl 5.16.0 - Mon May 21 12:24:16 CEST 2012.

Site configuration information for perl 5.16.0:

Configured by mauke at Mon May 21 12:24:16 CEST 2012.

Summary of my perl5 (revision 5 version 16 subversion 0) configuration:
   
  Platform:
    osname=linux, osvers=2.6.38-gentoo-r6, archname=i686-linux
    uname='linux nora 2.6.38-gentoo-r6 #1 preempt sat aug 6 03:05:34 cest 2011 i686 amd athlon(tm) 64 processor 3200+ authenticamd gnulinux '
    config_args='-Dcc=cgcc -Dprefix=/home/mauke/usr/local -Dman1dir=none -Dman3dir=none -Dinc_version_list=none -Doptimize=-O2 -flto'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cgcc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -flto',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.6.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cgcc', ldflags ='-fstack-protector -L/usr/local/lib -O2 -flto'
    libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.14.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.14.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -flto -L/usr/local/lib -fstack-protector'

Locally applied patches:
    SAVEARGV0 - disable magic open in <ARGV>


@INC for perl 5.16.0:
    /home/mauke/usr/local/lib/perl5/site_perl/5.16.0/i686-linux
    /home/mauke/usr/local/lib/perl5/site_perl/5.16.0
    /home/mauke/usr/local/lib/perl5/5.16.0/i686-linux
    /home/mauke/usr/local/lib/perl5/5.16.0
    .


Environment for perl 5.16.0:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH=/home/mauke/usr/local/lib
    LOGDIR (unset)
    PATH=/home/mauke/usr/perlbrew/bin:/home/mauke/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.4.5:/opt/sun-jdk-1.4.2.13/bin:/opt/sun-jdk-1.4.2.13/jre/bin:/opt/sun-jdk-1.4.2.13/jre/javaws:/opt/dmd/bin:/usr/games/bin
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_PATH=/home/mauke/usr/perlbrew/bin
    PERLBREW_ROOT=/home/mauke/usr/perlbrew
    PERLBREW_VERSION=0.27
    PERL_BADLANG (unset)
    PERL_UNICODE=SAL
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2012

From @jkeenan

On Thu Jul 12 03​:06​:39 2012, l.mai@​web.de wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.39 running under perl 5.16.0.

#!perl
$_ = <<"X
";
hi
X

__END__

Can't find string terminator "X
" anywhere before EOF at foo.pl line 2.

The error message is bogus because there is in fact a "X\n" in the
code. The
real problem is that a heredoc like this cannot be terminated in any
way.

Which we have already documented. At the end of the section on
here-documents in 'perlop', I read this​:

#####
Finally, quoted strings cannot span multiple lines. The general rule is
that the identifier must be a string literal. Stick with that, and you
should be safe.
#####

Moreover, with respect to this particular error message, 'perldiag' says​:

#####
If you're getting this error from a here-document, you may have included
unseen whitespace before or after your closing tag or there may not be a
linebreak after it.
#####

Why does that not suffice?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2012

From @doy

On Fri, Jul 13, 2012 at 03​:18​:52PM -0700, James E Keenan via RT wrote​:

On Thu Jul 12 03​:06​:39 2012, l.mai@​web.de wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.39 running under perl 5.16.0.

#!perl
$_ = <<"X
";
hi
X

__END__

Can't find string terminator "X
" anywhere before EOF at foo.pl line 2.

The error message is bogus because there is in fact a "X\n" in the
code. The
real problem is that a heredoc like this cannot be terminated in any
way.

Which we have already documented. At the end of the section on
here-documents in 'perlop', I read this​:

#####
Finally, quoted strings cannot span multiple lines. The general rule is
that the identifier must be a string literal. Stick with that, and you
should be safe.
#####

If this is the case, then trying to use such a string should be a
compilation error.

-doy

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2012

From @mauke

On 2012-07-13 James E Keenan via RT wrote​:

On Thu Jul 12 03​:06​:39 2012, l.mai@​web.de wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.39 running under perl 5.16.0.

#!perl
$_ = <<"X
";
hi
X

__END__

Can't find string terminator "X
" anywhere before EOF at foo.pl line 2.

The error message is bogus because there is in fact a "X\n" in the
code. The
real problem is that a heredoc like this cannot be terminated in any
way.

Which we have already documented. At the end of the section on
here-documents in 'perlop', I read this​:

#####
Finally, quoted strings cannot span multiple lines. The general rule
is that the identifier must be a string literal. Stick with that,
and you should be safe.
#####

That description ("... must be a string literal") is nonsensical. "X
" *is* a string literal. And if they "cannot span multiple lines", then
perl should check that and complain loudly ("Invalid character in
heredoc terminator​: \n") instead of silently accepting it (which
happens to cause other errors later on in this case, but come on).

Moreover, with respect to this particular error message, 'perldiag'
says​:

#####
If you're getting this error from a here-document, you may have
included unseen whitespace before or after your closing tag or there
may not be a linebreak after it.
#####

Why does that not suffice?

Because that has nothing to do with the problem. There is a closing tag
in my code (X <newline> after "hi"), there are no spaces before or
after it, and there is another linebreak after it. Doesn't apply.

Lukas

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2012

From @nwc10

On Sat, Jul 14, 2012 at 12​:50​:38PM +0200, Lukas Mai wrote​:

That description ("... must be a string literal") is nonsensical. "X
" *is* a string literal. And if they "cannot span multiple lines", then
perl should check that and complain loudly ("Invalid character in
heredoc terminator​: \n") instead of silently accepting it (which
happens to cause other errors later on in this case, but come on).

Yes, if it can't ever work, why is it even accepted?

Strikes me that it's buggy to accept a terminator which contains a newline
(or anything else which the parser *cannot* later deal with), and that
really the only sane thing to do is to report it as an error early.

I also *think* that changing this can't actually change the behaviour of
any existing program, because (if I understand it correctly), all currently
fail to parse, due to the "missing" heredoc terminator.

All it does is change the error reported, to one that is meaningful.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2012

From @cpansprout

On Mon Jul 16 03​:04​:46 2012, nicholas wrote​:

Yes, if it can't ever work, why is it even accepted?

Strikes me that it's buggy to accept a terminator which contains a newline
(or anything else which the parser *cannot* later deal with), and that
really the only sane thing to do is to report it as an error early.

I also *think* that changing this can't actually change the behaviour of
any existing program, because (if I understand it correctly), all
currently
fail to parse, due to the "missing" heredoc terminator.

All it does is change the error reported, to one that is meaningful.

It works in string eval. See also #78348 and #114040.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2012

From @kentfredric

On 2 August 2012 19​:20, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

It works in string eval. See also #78348 and #114040.

Side note​: I saw this and decided to goof around a bit with different
values of X in

$foo = <<"X";

Seems a literal \r also triggers this.

Sample script base64 encoded to preserve \r

IyEvdXNyL2Jpbi9wZXJsIAoKdXNlIDUuMTYuMDsKdXNlIHN0cmljdDsKdXNlIHdhcm5pbmdzOwoK
bXkgJGNvbnRlbnQ7CiRjb250ZW50ID08PCINIjsKQmFkIApOZXdzCmZvciB5b3UKDQoKc2F5ICRj
b250ZW50OwoK

This also has the amusing side effect of displaying only

" anywhere before EOF at /tmp/test.pl line 8

On the terminal due to the \r being output unescaped.

--
Kent

perl -e "print substr( \"edrgmaM SPA NOcomil.ic\\@​tfrken\", \$_ * 3,
3 ) for ( 9,8,0,7,1,6,5,4,3,2 );"

http​://kent-fredric.fox.geek.nz

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2012

From @nwc10

On Thu, Aug 02, 2012 at 12​:20​:19AM -0700, Father Chrysostomos via RT wrote​:

On Mon Jul 16 03​:04​:46 2012, nicholas wrote​:

Yes, if it can't ever work, why is it even accepted?

Strikes me that it's buggy to accept a terminator which contains a newline
(or anything else which the parser *cannot* later deal with), and that
really the only sane thing to do is to report it as an error early.

I also *think* that changing this can't actually change the behaviour of
any existing program, because (if I understand it correctly), all
currently
fail to parse, due to the "missing" heredoc terminator.

All it does is change the error reported, to one that is meaningful.

It works in string eval. See also #78348 and #114040.

OK, so *right now* if we change the code to reject a terminator containing
a newline for not-a-string-eval we could at least give a more helpful
error message of "not yet supported" ?

And your comment in #114040

  It seems to me that toke.c was written under the assumption that the
  current buffer would only contain one line of code. And then string eval
  came along, so we ended up with if(in_eval) sprinkled here and there. If
  we could unify the code, we could avoid many discrepancies that result.

eval was added later​:

commit a559c25
Author​: Larry Wall <lwall@​jpl-devvax.jpl.nasa.gov>
Date​: Wed Jan 27 22​:18​:25 1988 +0000

  perl 1.0 patch 8​: perl needed an eval operator and a symbolic debugger

  I didn't add an eval operator to the original perl because
  I hadn't thought of any good uses for it. Recently I thought
  of some. Along with creating the eval operator, this patch
  introduces a symbolic debugger for perl scripts, which makes
  use of eval to interpret some debugging commands. Having eval
  also lets me emulate awk's FOO=bar command line behavior with
  a line such as the one a2p now inserts at the beginning of
  translated scripts.

Do we have a meta ticket for discrepancies due to eval being parsed with
all the source code available, versus files being parsed line by line?

Also, I'm not sure if we can fix all of them safely. By definition, we haven't
exhausted memory if we get to running the string eval :-)
(Something too big has already hit problems)

Whereas parsing a large file is potentially unbounded - when do we stop
reading ahead into ever growing buffers?

To be fair, this case, a heredoc terminator, the problem is bounded.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2012

From @cpansprout

On Thu Aug 02 02​:19​:38 2012, nicholas wrote​:

On Thu, Aug 02, 2012 at 12​:20​:19AM -0700, Father Chrysostomos via RT
wrote​:

On Mon Jul 16 03​:04​:46 2012, nicholas wrote​:

Yes, if it can't ever work, why is it even accepted?

Strikes me that it's buggy to accept a terminator which contains a
newline
(or anything else which the parser *cannot* later deal with), and
that
really the only sane thing to do is to report it as an error
early.

I also *think* that changing this can't actually change the
behaviour of
any existing program, because (if I understand it correctly), all
currently
fail to parse, due to the "missing" heredoc terminator.

All it does is change the error reported, to one that is
meaningful.

It works in string eval. See also #78348 and #114040.

OK, so *right now* if we change the code to reject a terminator
containing
a newline for not-a-string-eval we could at least give a more helpful
error message of "not yet supported" ?

We could. I’m not volunteering. :-) But I’ll consider any patch that
comes along.

And your comment in #114040

It seems to me that toke\.c was written under the assumption that

the
current buffer would only contain one line of code. And then
string eval
came along, so we ended up with if(in_eval) sprinkled here and
there. If
we could unify the code, we could avoid many discrepancies that
result.

eval was added later​:

commit a559c25
Author​: Larry Wall <lwall@​jpl-devvax.jpl.nasa.gov>
Date​: Wed Jan 27 22​:18​:25 1988 +0000

perl 1\.0 patch 8&#8203;: perl needed an eval operator and a symbolic

debugger

I didn't add an eval operator to the original perl because
I hadn't thought of any good uses for it\.  Recently I thought
of some\.  Along with creating the eval operator\, this patch
introduces a symbolic debugger for perl scripts\, which makes
use of eval to interpret some debugging commands\.  Having eval
also lets me emulate awk's FOO=bar command line behavior with
a line such as the one a2p now inserts at the beginning of
translated scripts\.

Do we have a meta ticket for discrepancies due to eval being parsed
with
all the source code available, versus files being parsed line by line?

Also, I'm not sure if we can fix all of them safely. By definition, we
haven't
exhausted memory if we get to running the string eval :-)
(Something too big has already hit problems)

Whereas parsing a large file is potentially unbounded - when do we
stop
reading ahead into ever growing buffers?

When we run out of memory. Yes, that does mean a missing " can cause
memory errors;

$ perl -e 'print q|"|; print " "x70, "\n" for 1..100000000' |perl
perl(40446) malloc​: *** mmap(size=2397048832) failed (error code=12)
*** error​: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Out of memory!

but the (worse) alternative is to impose arbitrary restrictions in a
place where people require otherwise. (I know that some code has 40K
strings embedded in it. It would not surprise me if others have used
much larger strings.)

To be fair, this case, a heredoc terminator, the problem is bounded.

Indeed.

$ perl -le 'print "<<", f x 254' | perl
Can't find string terminator
"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"
anywhere before EOF at - line 1.
Pint​:perl.git-copy sprout$ perl -le 'print "<<", f x 255' | perl
Delimiter for here document is too long at - line 1.

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants