require parses barewords strangely #11824

p5pRT · 2011-12-25T22:20:49Z

Migrated from rt.perl.org#107004 (status was 'open')

Searchable as RT107004$

p5pRT · 2011-12-25T22:20:49Z

From @cpansprout

When require is followed by a bareword, if that bareword is followed by an operator that has higher precedence than require, it falls back to treating its argument as a normal expression, but not quite.

Here is the special behaviour:

$ perl5.15.5 -MO=Concise -e 'require a::b'
5 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
4 <1> require sK/1 ->5
3 <$> const[PV "a/b.pm"] s/BARE ->4
-e syntax OK

Notice the "a/b.pm".

Now if we put ‘. 1’ after it (+1 doesn’t work, because of bug #105924):

$ perl5.15.5 -MO=Concise -we 'require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

That I can understand, as a::b is a string in the absence of any other interpretation, ‘a::b . 1’ being the argument to require, which is more than a single bareword.

But if there is a subroutine named a::b, things get strange:

$ perl5.15.5 -MO=Concise -we 'sub a::b; require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

The a::b should be a subroutine call.

This may be related to #36333.

Flags:
category=core
severity=low

Site configuration information for perl 5.15.5:

Configured by sprout at Sun Dec 18 11:26:14 PST 2011.

Summary of my perl5 (revision 5 version 15 subversion 5) configuration:
Snapshot of: 5dca8ed
Platform:
osname=darwin, osvers=10.5.0, archname=darwin-2level
uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0: fri nov 5 23:20:39 pdt 2010; root:xnu-1504.9.17~1release_i386 i386 '
config_args='-de -Dusedevel -DDEBUGGING=-g'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
optimize='-O3 -g',
cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector'

Locally applied patches:

@INC for perl 5.15.5:
/usr/local/lib/perl5/site_perl/5.15.5/darwin-2level
/usr/local/lib/perl5/site_perl/5.15.5
/usr/local/lib/perl5/5.15.5/darwin-2level
/usr/local/lib/perl5/5.15.5
/usr/local/lib/perl5/site_perl
.

Environment for perl 5.15.5:
DYLD_LIBRARY_PATH (unset)
HOME=/Users/sprout
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/bin
PERL_BADLANG (unset)
SHELL=/bin/bash

p5pRT · 2012-05-04T05:47:36Z

From @cpansprout

On Sun Dec 25 14:20:49 2011, sprout wrote:

When require is followed by a bareword, if that bareword is followed
by an operator that has higher precedence than require, it falls back
to treating its argument as a normal expression, but not quite.

Here is the special behaviour:

$ perl5.15.5 -MO=Concise -e 'require a::b'
5 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
4 <1> require sK/1 ->5
3 <$> const[PV "a/b.pm"] s/BARE ->4
-e syntax OK

Notice the "a/b.pm".

Now if we put ‘. 1’ after it (+1 doesn’t work, because of bug
#105924):

$ perl5.15.5 -MO=Concise -we 'require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

That I can understand, as a::b is a string in the absence of any other
interpretation, ‘a::b . 1’ being the argument to require, which is
more than a single bareword.

But if there is a subroutine named a::b, things get strange:

$ perl5.15.5 -MO=Concise -we 'sub a::b; require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

The a::b should be a subroutine call.

When ‘require’ is followed by a bareword, it is treated specially in
three different ways:

1. The token following it is forced to be a bareword, even if there is a
subroutine with that name, and even under strict mode.
2. The bareword has its double colons changed to slashes and has .pm
appended to the end.
3. A stash is autovivified during compilation.

Items 1 and 3 happen in the tokeniser, based on whether the token that
immediately follows the ‘require’ token is an identifier. Item 2
happens in op.c when the op tree is being built, and depends on whether
the child op is a single constant that was a bareword.

This means that cases like ‘require a::b . "foo"’ treat a::b as a
bareword, exempt from strict mode, but that bareword does not undergo
the s|::|/|g and .= ".pm" treatment. So the bareword is only half special.

(This also means that ‘require foo’ is allowed under strict, but
‘require(foo)’ isn’t, even though the latter, when used outside of
strict mode, turns into ‘require "foo.pm"’. However, ‘require(foo)’
treats foo as a sub call if there is a foo sub in scope. I don’t want
to deal with that just yet.)

I’m wondering whether it would be possible to make the require code in
toke.c scan past the bareword and see whether it is followed by
something that would make the child op of require into something more
than just a bareword; i.e., an infix op of higher precedence than
require (<< >> and above) or an opening parenthesis (for require foo()).
If such a character is *not* found, then the bareword can get its
special treatment via S_force_word, etc. Can anyone see anything I’ve
missed?

Simply moving all the bareword handling to op.c won’t work, because
‘require foo’ will turn into ‘require(foo())’ if there is a foo sub.
Though it’s usually possible to detect that, it isn’t with constant subs.

--

Father Chrysostomos

p5pRT · 2012-05-04T05:47:36Z

From [Unknown Contact. See original ticket]

On Sun Dec 25 14:20:49 2011, sprout wrote:

When require is followed by a bareword, if that bareword is followed
by an operator that has higher precedence than require, it falls back
to treating its argument as a normal expression, but not quite.

Here is the special behaviour:

$ perl5.15.5 -MO=Concise -e 'require a::b'
5 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
4 <1> require sK/1 ->5
3 <$> const[PV "a/b.pm"] s/BARE ->4
-e syntax OK

Notice the "a/b.pm".

Now if we put ‘. 1’ after it (+1 doesn’t work, because of bug
#105924):

$ perl5.15.5 -MO=Concise -we 'require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

That I can understand, as a::b is a string in the absence of any other
interpretation, ‘a::b . 1’ being the argument to require, which is
more than a single bareword.

But if there is a subroutine named a::b, things get strange:

$ perl5.15.5 -MO=Concise -we 'sub a::b; require a::b . 1'
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> require sK/1 ->7
5 <2> concat[t1] sK/2 ->6
3 <$> const[PV "a::b"] s/BARE ->4
4 <$> const[IV 1] s ->5
-e syntax OK

The a::b should be a subroutine call.

When ‘require’ is followed by a bareword, it is treated specially in
three different ways:

1. The token following it is forced to be a bareword, even if there is a
subroutine with that name, and even under strict mode.
2. The bareword has its double colons changed to slashes and has .pm
appended to the end.
3. A stash is autovivified during compilation.

Items 1 and 3 happen in the tokeniser, based on whether the token that
immediately follows the ‘require’ token is an identifier. Item 2
happens in op.c when the op tree is being built, and depends on whether
the child op is a single constant that was a bareword.

This means that cases like ‘require a::b . "foo"’ treat a::b as a
bareword, exempt from strict mode, but that bareword does not undergo
the s|::|/|g and .= ".pm" treatment. So the bareword is only half special.

(This also means that ‘require foo’ is allowed under strict, but
‘require(foo)’ isn’t, even though the latter, when used outside of
strict mode, turns into ‘require "foo.pm"’. However, ‘require(foo)’
treats foo as a sub call if there is a foo sub in scope. I don’t want
to deal with that just yet.)

I’m wondering whether it would be possible to make the require code in
toke.c scan past the bareword and see whether it is followed by
something that would make the child op of require into something more
than just a bareword; i.e., an infix op of higher precedence than
require (<< >> and above) or an opening parenthesis (for require foo()).
If such a character is *not* found, then the bareword can get its
special treatment via S_force_word, etc. Can anyone see anything I’ve
missed?

Simply moving all the bareword handling to op.c won’t work, because
‘require foo’ will turn into ‘require(foo())’ if there is a foo sub.
Though it’s usually possible to detect that, it isn’t with constant subs.

--

Father Chrysostomos

p5pRT · 2012-05-04T05:47:36Z

@cpansprout - Status changed from 'new' to 'open'

p5pRT · 2012-05-05T00:01:58Z

From @cpansprout

On Thu May 03 22:47:36 2012, sprout wrote:

When ‘require’ is followed by a bareword, it is treated specially in
three different ways:

1. The token following it is forced to be a bareword, even if there is a
subroutine with that name, and even under strict mode.
2. The bareword has its double colons changed to slashes and has .pm
appended to the end.
3. A stash is autovivified during compilation.

Items 1 and 3 happen in the tokeniser, based on whether the token that
immediately follows the ‘require’ token is an identifier. Item 2
happens in op.c when the op tree is being built, and depends on whether
the child op is a single constant that was a bareword.

This means that cases like ‘require a::b . "foo"’ treat a::b as a
bareword, exempt from strict mode, but that bareword does not undergo
the s|::|/|g and .= ".pm" treatment. So the bareword is only half
special.

(This also means that ‘require foo’ is allowed under strict, but
‘require(foo)’ isn’t, even though the latter, when used outside of
strict mode, turns into ‘require "foo.pm"’. However, ‘require(foo)’
treats foo as a sub call if there is a foo sub in scope. I don’t want
to deal with that just yet.)

I’m wondering whether it would be possible to make the require code in
toke.c scan past the bareword and see whether it is followed by
something that would make the child op of require into something more
than just a bareword; i.e., an infix op of higher precedence than
require (<< >> and above) or an opening parenthesis (for require foo()).
If such a character is *not* found, then the bareword can get its
special treatment via S_force_word, etc. Can anyone see anything I’ve
missed?

I would prefer to avoid this method, if possible. If we introduce
pluggable infix operators, this won’t work or will complicate things.

Simply moving all the bareword handling to op.c won’t work, because
‘require foo’ will turn into ‘require(foo())’ if there is a foo sub.
Though it’s usually possible to detect that, it isn’t with constant subs.

I wonder whether we should somehow record the name of a sub (more
precisely, the name whereby it is invoked) that is inlined as a constant.

--

Father Chrysostomos

p5pRT · 2012-05-05T00:01:59Z

From [Unknown Contact. See original ticket]

On Thu May 03 22:47:36 2012, sprout wrote:

When ‘require’ is followed by a bareword, it is treated specially in
three different ways:

1. The token following it is forced to be a bareword, even if there is a
subroutine with that name, and even under strict mode.
2. The bareword has its double colons changed to slashes and has .pm
appended to the end.
3. A stash is autovivified during compilation.

Items 1 and 3 happen in the tokeniser, based on whether the token that
immediately follows the ‘require’ token is an identifier. Item 2
happens in op.c when the op tree is being built, and depends on whether
the child op is a single constant that was a bareword.

This means that cases like ‘require a::b . "foo"’ treat a::b as a
bareword, exempt from strict mode, but that bareword does not undergo
the s|::|/|g and .= ".pm" treatment. So the bareword is only half
special.

(This also means that ‘require foo’ is allowed under strict, but
‘require(foo)’ isn’t, even though the latter, when used outside of
strict mode, turns into ‘require "foo.pm"’. However, ‘require(foo)’
treats foo as a sub call if there is a foo sub in scope. I don’t want
to deal with that just yet.)

I’m wondering whether it would be possible to make the require code in
toke.c scan past the bareword and see whether it is followed by
something that would make the child op of require into something more
than just a bareword; i.e., an infix op of higher precedence than
require (<< >> and above) or an opening parenthesis (for require foo()).
If such a character is *not* found, then the bareword can get its
special treatment via S_force_word, etc. Can anyone see anything I’ve
missed?

I would prefer to avoid this method, if possible. If we introduce
pluggable infix operators, this won’t work or will complicate things.

Simply moving all the bareword handling to op.c won’t work, because
‘require foo’ will turn into ‘require(foo())’ if there is a foo sub.
Though it’s usually possible to detect that, it isn’t with constant subs.

I wonder whether we should somehow record the name of a sub (more
precisely, the name whereby it is invoked) that is inlined as a constant.

--

Father Chrysostomos

p5pRT · 2012-05-11T01:38:26Z

From @cpansprout

The plot thickens:

$ perl -e 'sub v123{a} require v123()'
syntax error at -e line 1, near "require v123("
Execution of -e aborted due to compilation errors.

$ perl -e 'sub v123{a} require +v123()'
Warning: Use of "require" without parentheses is ambiguous at -e line 1.
Can't locate a in @INC (...) at -e line 1.

$ perl -e 'sub v123{a} require v123.""'
Warning: Use of "require" without parentheses is ambiguous at -e line 1.
Can't locate v123 in @INC (...) at -e line 1.

$ perl -e 'sub v123{a} require v123 .""'
Warning: Use of "require" without parentheses is ambiguous at -e line 1.
Can't locate { in @INC (...) at -e line 1.

$ perl -e 'sub v123{a} require v123'
Perl v123.0.0 required--this is only v5.10.1, stopped at -e line 1.

$ perl -e 'sub v123{a} require +v123'
Warning: Use of "require" without parentheses is ambiguous at -e line 1.
Can't locate a in @INC (...) at -e line 1.

That whitespace is significant is troubling. That v123 could be
interpreted as a bareword with no => following it is more troubling.

It is not at all clear how these things are supposed to behave.

--

Father Chrysostomos

p5pRT · 2012-05-11T01:38:26Z