Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl parser sometimes tolerates stray nulls, sometimes not #11799

Open
p5pRT opened this issue Dec 11, 2011 · 4 comments
Open

Perl parser sometimes tolerates stray nulls, sometimes not #11799

p5pRT opened this issue Dec 11, 2011 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 11, 2011

Migrated from rt.perl.org#105920 (status was 'open')

Searchable as RT105920$

@p5pRT
Copy link
Author

p5pRT commented Dec 11, 2011

From @cpansprout

Stray nulls are tolerated in files but not in string evals. Why is that? I know which piece of code is doing it, but is it by design? Why the inconsistency?

#!perl -l
print eval "6+5\0+3";
warn $@​ if $@​;

require PerlIO​::scalar;
unshift @​INC, sub {
  return unless $_[1] eq 'foo.pm';
  open my $fh, '<', \"6+5\0+3"; $fh
};
print require foo;
__END__

This prints​:

syntax error at (eval 1) line 1, at EOF
14


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.15.5​:

Configured by sprout at Sat Nov 26 11​:40​:22 PST 2011.

Summary of my perl5 (revision 5 version 15 subversion 5) configuration​:
  Snapshot of​: c071f8d
  Platform​:
  osname=darwin, osvers=10.5.0, archname=darwin-thread-multi-2level
  uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
  config_args='-de -Dusedevel -Duseithreads -Dmad'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-O3',
  cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.15.5​:
  /usr/local/lib/perl5/site_perl/5.15.5/darwin-thread-multi-2level
  /usr/local/lib/perl5/site_perl/5.15.5
  /usr/local/lib/perl5/5.15.5/darwin-thread-multi-2level
  /usr/local/lib/perl5/5.15.5
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.15.5​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/sprout
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2013

From @Hugmeir

On Sun Dec 11 13​:15​:49 2011, sprout wrote​:

Stray nulls are tolerated in files but not in string evals. Why is
that? I know which piece of code is doing it, but is it by design?
Why the inconsistency?

#!perl -l
print eval "6+5\0+3";
warn $@​ if $@​;

require PerlIO​::scalar;
unshift @​INC, sub {
return unless $_[1] eq 'foo.pm';
open my $fh, '<', \"6+5\0+3"; $fh
};
print require foo;
__END__

This prints​:

syntax error at (eval 1) line 1, at EOF
14

---
Flags​:
category=core
severity=low
---
Site configuration information for perl 5.15.5​:

Configured by sprout at Sat Nov 26 11​:40​:22 PST 2011.

Summary of my perl5 (revision 5 version 15 subversion 5)
configuration​:
Snapshot of​: c071f8d
Platform​:
osname=darwin, osvers=10.5.0, archname=darwin-thread-multi-2level
uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri
nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
config_args='-de -Dusedevel -Duseithreads -Dmad'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define,
usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include',
optimize='-O3',
cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)',
gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
-fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-L/usr/local/lib -fstack-protector'

Locally applied patches​:

---
@​INC for perl 5.15.5​:
/usr/local/lib/perl5/site_perl/5.15.5/darwin-thread-multi-2level
/usr/local/lib/perl5/site_perl/5.15.5
/usr/local/lib/perl5/5.15.5/darwin-thread-multi-2level
/usr/local/lib/perl5/5.15.5
/usr/local/lib/perl5/site_perl
.

---
Environment for perl 5.15.5​:
DYLD_LIBRARY_PATH (unset)
HOME=/Users/sprout
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin

PERL\_BADLANG \(unset\)
SHELL=/bin/bash

I would imagine this was not by design -- probably a leftover from back
when Perl used strlen() extensively.

--hugmeir

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 13, 2017

From zefram@fysh.org

Father Chrysostomos wrote​:

Stray nulls are tolerated in files but not in string evals. Why is that?
I know which piece of code is doing it, but is it by design? Why the
inconsistency?

Prior to Perl 3.0, the tokeniser made no attempt to handle NULs in source.
It perceived a null byte strictly as indicating the end of the buffer.
The treatment of end of buffer differs depending on whether we're parsing
a string or a file​: if it's a string then end of buffer is end of source,
but if it's a file then we try to read more into the buffer. When end
of file is reached, in some cases the tokeniser has some epilogue code
that it puts in the buffer and continues tokenising in string mode.

Perl 3.0 in 1989 was the first version that claimed to support binary
data. The tokeniser was changed to generally cope with the possibility
of NULs in source and so in the buffer. They're accepted as literal
characters in string syntax, for example. In general tokenising context,
then as now, a source NUL and end of buffer are both initially routed
to the same branch of the main switch. One would think that at that
point the first priority would be to distinguish source NUL from end of
buffer, but actually the new check for that wasn't put first. Instead,
the old check for string vs file was left as the first thing, with the
check for buffer end coming next, only executed if parsing from a file.

The treatment of a NUL, when detected, was as it is now, to skip past it.
The line in toke.c with the comment /* ignore stray nulls */ dates back
to 3.0; clearly to ignore NULs in files was intentional behaviour.

Indeed, it's a time-honoured interpretation of a NUL​: all bits zero means
this is a blank part of the paper tape that needs to be skipped past.
One could edit a tape by punching new content into sections left blank for
this purpose. (This is the origin of DEL​: all bits one means there used
to be some content here but it's been erased by punching all positions,
so this too needs to be skipped over.)

Of course, this time-honoured treatment of NUL is more at home in the
1960s than the 1980s, let alone today. Few of us use paper tape any more.
(The characteristics of Flash memory have revived interest in data
structures designed for overpunch-type editing, but in that context it's
not often applied to ASCII.) And if NULs are skipped on that basis then
it makes no conceptual sense to treat them differently depending on the
grammatical context, as is done by accepting NULs as literals in strings.
Even having NULs interrupt barewords, which Perl 3.0 and blead both
do, is incompatible with that interpretation. The syntactic treatment
of NULs in files is actually as if they're whitespace characters, an
interpretation that has much less historical justification.

Anyway, was it intentional that NULs in strings are errors? The answer
is in fact that the question is wrong. NULs in strings *aren't* errors,
and never have been. Did you notice that the error message you get is a
"syntax error", not the "unrecognized character" that you get by including
an arbitrary invalid character?

  $ perl -e 'eval "3+4\0"; print $@​'
  syntax error at (eval 1) line 1, at EOF
  $ perl -e 'eval "3+4\xa1"; print $@​'
  Unrecognized character \xA1; marked by <-- HERE after 3+4<-- HERE near column 4 at (eval 1) line 1.

The error is coming from the yacc parser, not from the tokeniser.
The tokeniser is interpreting the null byte in the pre-3.0 way​: as the
end of the string, and so the end of the source. It returns a YYEOF
token to perly. If the string content up to that point was good, then
parsing will succeed​:

  $ perl -e 'eval "print qq(hi\\n);\0garbage here"; print $@​ || "OK\n"'
  hi
  OK

Why do you usually get a syntax error? Because of semicolons.
The grammar specifies that normal statements must end with a semicolon.
The semicolon is implicit at the end of the source, and in the case of
an eval this is and always was implemented by appending a semicolon
character to the string before feeding it to the tokeniser. If you
terminate the string early with a NUL, you don't get the benefit of the
implicit semicolon.

It is clearly a bug that NUL in an eval string terminates tokenisation
early. Perl 3.0 didn't quite live up to its hype about binary
cleanliness, and neither has any subsequent Perl. The intent for Perl
3.0 was obviously that NULs would consistently behave as whitespace.
It would have achieved that goal had two conditions just been tested in
the opposite order. We could fulfil that intent in today's Perl by just
moving two lines of code, fixing that old bug.

However, as a matter of language design, it seems silly to be treating
NULs like this. NUL isn't a whitespace character. A more appealing
way to resolve the inconsistency is to deprecate both of the current
NUL behaviours, eventually making NUL illegal in general tokenisation
context, just like most other control characters are.

-zefram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants