Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split() in do{} and eval{} blocks lacks implicit limit #13104

Open
p5pRT opened this issue Jul 11, 2013 · 12 comments
Open

split() in do{} and eval{} blocks lacks implicit limit #13104

p5pRT opened this issue Jul 11, 2013 · 12 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 11, 2013

Migrated from rt.perl.org#118879 (status was 'open')

Searchable as RT118879$

@p5pRT
Copy link
Author

p5pRT commented Jul 11, 2013

From agalama@mailfish.de

perldoc split says​: "when assigning to a list, if LIMIT is omitted (or
zero), then LIMIT is treated as though it were one larger than the
number of variables in the list".

Apparently, this is not the case within a do{} or eval{} block​:

use Data​::Dumper;

# implicit limit = 3 in accordance with the docs
# -> empty trailing field preserved
($a, $b) = split /,/, 'a,';
print Dumper($b); # empty string

# implicit limit = 0 contrary to the docs
# -> empty trailing field stripped
($a, $b) = do { split /,/, 'a,' };
print Dumper($b); # undef

($a, $b) = eval { split /,/, 'a,' };
print Dumper($b); # undef

__END__


Flags​:
  category=core
  severity=low


This perlbug was built using Perl 5.14.4 in the Fedora build system.
It is being executed now by Perl 5.14.4 - Thu Apr 11 12​:56​:04 UTC 2013.

Site configuration information for perl 5.14.4​:

Configured by Red Hat, Inc. at Thu Apr 11 12​:56​:04 UTC 2013.

Summary of my perl5 (revision 5 version 14 subversion 4) configuration​:

  Platform​:
  osname=linux, osvers=2.6.32-358.2.1.el6.x86_64,
archname=x86_64-linux-thread-multi
  uname='linux buildvm-11.phx2.fedoraproject.org
2.6.32-358.2.1.el6.x86_64 #1 smp wed feb 20 12​:17​:37 est 2013 x86_64
x86_64 x86_64 gnulinux '
  config_args='-des -Doptimize=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic
-Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe
-Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic -Wl,-z,relro
-DDEBUGGING=-g -Dversion=5.14.4 -Dmyhostname=localhost
-Dperladmin=root@​localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr
-Dvendorprefix=/usr -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5
-Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl
-Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl
-Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64
/usr/lib64 -Duseshrplib -Dusethreads -Duseithreads
-Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto
-Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto
-Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.7.2 20120921 (Red Hat 4.7.2-2)',
gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -fstack-protector'
  libpth=/usr/local/lib64 /lib64 /usr/lib64
  libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread
-lc -lgdbm_compat
  perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.15'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef,
ccdlflags='-Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches​:


@​INC for perl 5.14.4​:
  /home/xxx/src/perl
  /usr/local/lib64/perl5
  /usr/local/share/perl5
  /usr/lib64/perl5/vendor_perl
  /usr/share/perl5/vendor_perl
  /usr/lib64/perl5
  /usr/share/perl5
  .


Environment for perl 5.14.4​:
  HOME=/home/xxx
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/usr/lib64/qt-3.3/bin​:/usr/local/bin​:/usr/bin​:/usr/local/sbin​:/usr/sbin​:/home/xxx/.local/bin​:/home/xxx/bin
  PERL5LIB=/home/xxx/src/perl
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jul 11, 2013

From @jkeenan

On Thu Jul 11 11​:35​:33 2013, galama wrote​:

perldoc split says​: "when assigning to a list, if LIMIT is omitted (or
zero), then LIMIT is treated as though it were one larger than the
number of variables in the list".

Apparently, this is not the case within a do{} or eval{} block​:

use Data​::Dumper;

# implicit limit = 3 in accordance with the docs
# -> empty trailing field preserved
($a, $b) = split /,/, 'a,';
print Dumper($b); # empty string

# implicit limit = 0 contrary to the docs
# -> empty trailing field stripped
($a, $b) = do { split /,/, 'a,' };
print Dumper($b); # undef

($a, $b) = eval { split /,/, 'a,' };
print Dumper($b); # undef

__END__

Confirmatory code​:

$ perl -MData​::Dumper -E '$Data​::Dumper​::Indent=0;($a, $b) = split /,/,
"a,"; say Dumper ([$a,$b]);'
$VAR1 = ['a',''];

$ perl -MData​::Dumper -E '$Data​::Dumper​::Indent=0;($a, $b) = do { split
/,/, "a," }; say Dumper ([$a,$b]);'
$VAR1 = ['a',undef];

$ perl -MData​::Dumper -E '$Data​::Dumper​::Indent=0;($a, $b) = eval {
split /,/, "a," }; say Dumper ([$a,$b]);'
$VAR1 = ['a',undef];

@p5pRT
Copy link
Author

p5pRT commented Jul 11, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2013

From @rjbs

* Anand Galama <perlbug-followup@​perl.org> [2013-07-11T14​:35​:33]

perldoc split says​: "when assigning to a list, if LIMIT is omitted (or
zero), then LIMIT is treated as though it were one larger than the
number of variables in the list".

Apparently, this is not the case within a do{} or eval{} block​:

I imagine the easiest and more appropriate fix is to say "when immediately
assigning" rather thant o try to address every form of indirection.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2013

From @cpansprout

On Sun Jul 14 05​:44​:46 2013, perl.p5p@​rjbs.manxome.org wrote​:

* Anand Galama <perlbug-followup@​perl.org> [2013-07-11T14​:35​:33]

perldoc split says​: "when assigning to a list, if LIMIT is omitted (or
zero), then LIMIT is treated as though it were one larger than the
number of variables in the list".

Apparently, this is not the case within a do{} or eval{} block​:

I imagine the easiest and more appropriate fix is to say "when immediately
assigning" rather thant o try to address every form of indirection.

I always thought this was just an optimisation and it was only
documented so that one could intentionally trigger it.

But as the OP points out, it can change undef into an empty string.

Do we want to make it so that this optimisation is purely an optimisation?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2013

From agalama@mailfish.de

On Sun Jul 14 05​:44​:46 2013, perl.p5p@​rjbs.manxome.org wrote​:

I imagine the easiest and more appropriate fix is to say "when immediately
assigning" rather thant o try to address every form of indirection.

Thanks, this just lead me to another even simpler example. :-)

($n, $a, $b) = (1, split /,/, 'a,');

This also makes $b undef just like in the do{} and eval{} case but I
would still call it an "immediate" assignment.

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2013

From @ap

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2013-07-14 16​:00]​:

Do we want to make it so that this optimisation is purely an
optimisation?

I think it should be.

But how to go about it? Do you want to expose this capability in some
way so the user can invoke it explicitly? If not, would you remove the
paragraph about optimisation from perlfunc?

This is also a very wide-spread usage, undoubtedly relied on unwittingly
by a *lot* of code. How to detect that? Regular deprecation does not
seem the answer, since this deprecation warning would pour forth from
perl like floodwaters.

Off the cuff the only thing I can think of is for `split` to return
a special magicalised empty string during the deprecation cycle, which
would trigger the deprecation warning when used in… certain contexts.
(Non-boolean?)

(No doubt this would also clean up code that is inadvertently buggy
though.)

* Anand Galama via RT <perlbug-followup@​perl.org> [2013-07-15 05​:55]​:

($n, $a, $b) = (1, split /,/, 'a,');

This also makes $b undef just like in the do{} and eval{} case but
I would still call it an "immediate" assignment.

Does this do it for you?

Inline Patch
diff --git i/pod/perlfunc.pod w/pod/perlfunc.pod
index 18ecd40..c108733 100644
--- i/pod/perlfunc.pod
+++ w/pod/perlfunc.pod
@@ -6959,8 +6959,9 @@ produces the output 'a:b:c', but the following:
 produces the output 'a:b:c:::'.

 In time-critical applications, it is worthwhile to avoid splitting
-into more fields than necessary.  Thus, when assigning to a list,
-if LIMIT is omitted (or zero), then LIMIT is treated as though it
+into more fields than necessary.  Thus, when the right-hand side of
+a list assignment consists only of a C<split>, and LIMIT is omitted
+(or zero), then LIMIT is treated as though it
 were one larger than the number of variables in the list; for the
 following, LIMIT is implicitly 3:

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2013

From @ap

* Aristotle Pagaltzis <pagaltzis@​gmx.de> [2013-07-16 03​:00]​:

(No doubt this would also clean up code that is inadvertently buggy
though.)

(By which I mean code that’s currently written to expect it will get
undef for missing trailing fields, but triggers the optimisation and
therefore doesn’t.)

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2013

From @cpansprout

On Mon Jul 15 17​:58​:37 2013, aristotle wrote​:

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2013-07-14
16​:00]​:

Do we want to make it so that this optimisation is purely an
optimisation?

I think it should be.

But how to go about it? Do you want to expose this capability in some
way so the user can invoke it explicitly?

An explicit third argument does that, does it not?

If not, would you remove the
paragraph about optimisation from perlfunc?

Instead of explaining how the optimisation works, just state the
circumstance in which it occurs.

Then we can fix #103212 as well.

This is also a very wide-spread usage, undoubtedly relied on unwittingly
by a *lot* of code. How to detect that? Regular deprecation does not
seem the answer, since this deprecation warning would pour forth from
perl like floodwaters.

Off the cuff the only thing I can think of is for `split` to return
a special magicalised empty string during the deprecation cycle, which
would trigger the deprecation warning when used in… certain contexts.
(Non-boolean?)

If the deprecation warning could be spelt ‘Use of uninitialized value’,
I think we already have such a scalar. :-)

Or do you mean one for which defined($_) returns true, but which is
treated as undef elsewhere? Yes, that would be possible.

But how much code is really relying on that single empty string?

$ ./perl -Ilib -e '($a,$b,$c,$d,$e,$f) = split //, "abc"; warn
"[",$_//"undef","]" for $a,$b,$c,$d,$e,$f'
[a] at -e line 1.
[b] at -e line 1.
[c] at -e line 1.
[] at -e line 1.
[undef] at -e line 1.
[undef] at -e line 1.

(No doubt this would also clean up code that is inadvertently buggy
though.)

* Anand Galama via RT <perlbug-followup@​perl.org> [2013-07-15 05​:55]​:

($n, $a, $b) = (1, split /,/, 'a,');

This also makes $b undef just like in the do{} and eval{} case but
I would still call it an "immediate" assignment.

Does this do it for you?

diff --git i/pod/perlfunc.pod w/pod/perlfunc.pod
index 18ecd40..c108733 100644
--- i/pod/perlfunc.pod
+++ w/pod/perlfunc.pod
@​@​ -6959,8 +6959,9 @​@​ produces the output 'a​:b​:c', but the following​:
produces the output 'a​:b​:c​::​:'.

In time-critical applications, it is worthwhile to avoid splitting
-into more fields than necessary. Thus, when assigning to a list,
-if LIMIT is omitted (or zero), then LIMIT is treated as though it
+into more fields than necessary. Thus, when the right-hand side of
+a list assignment consists only of a C<split>, and LIMIT is omitted
+(or zero), then LIMIT is treated as though it
were one larger than the number of variables in the list; for the
following, LIMIT is implicitly 3​:

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2013

From @ap

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2013-07-16 03​:15]​:

On Mon Jul 15 17​:58​:37 2013, aristotle wrote​:

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2013-07-14 16​:00]​:

Do we want to make it so that this optimisation is purely an
optimisation?

I think it should be.

But how to go about it? Do you want to expose this capability in some
way so the user can invoke it explicitly?

An explicit third argument does that, does it not?

No?

Now I’m not sure we’re even talking about the same thing.

An explicit third argument tells `split` to stop splitting at the given
number of fields, *and* return a trailing empty string field if any.

The user can omit the third argument (and trip up the optimisation) to
make `split` *not* stop splitting *and* *not* return any trailing empty
string field.

With the current interface there is no way for the user to ask for one
of these (stop splitting at a certain field) *but not* the other (return
a trailing empty string field).

What I understood you to be asking is whether

  perl -E '($n, $a, $b) = (1, split /,/, "a,"); say 1 if defined $b'
  perl -E ' ($a, $b) = (split /,/, "a,"); say 1 if defined $b'

should both be silent (where currently the latter isn’t) while at the
same time the `split` on the second line should know to stop splitting
at the third field, unlike the one on the first line (as is currently
true).

In other words, should `split` be able to do something (stop splitting
*but not* return a trailing empty field) that the user can’t currently
explicitly ask it for?

Now when you deparse `($a,$b) = split//`, it actually says this​:

  ($a, $b) = split(//, $_, 3);

If the optimisation were made transparent, then deparsing the `split`
like this would be a lie, because if the user wrote that, she would get
a subtly different behaviour than if the optimiser makes it like that.

Either the optimisation would have to trigger in some other way (within
`pp_split`?) and become invisible at the op tree level – i.e. deparsing
that statement would yield this​:

  ($a, $b) = split(//, $_);

If that is not the case, and the optimisation is reflected in the op
tree, then the interface to `split` would have to change such that when
the user writes the same code as Deparse outputs, she gets the exact
same behaviour (i.e. it stops splitting *but doesn’t* return a trailing
empty string).

This is also a very wide-spread usage, undoubtedly relied on
unwittingly by a *lot* of code. How to detect that? Regular
deprecation does not seem the answer, since this deprecation warning
would pour forth from perl like floodwaters.

Off the cuff the only thing I can think of is for `split` to return
a special magicalised empty string during the deprecation cycle,
which would trigger the deprecation warning when used in… certain
contexts. (Non-boolean?)

If the deprecation warning could be spelt ‘Use of uninitialized
value’, I think we already have such a scalar. :-)

I am *not* proposing to deprecate the absence of a warning by warning
that in the future an undef warning will be emitted. :-P

Or do you mean one for which defined($_) returns true, but which is
treated as undef elsewhere? Yes, that would be possible.

Yes, that is what I was thinking of.

Very specifically the case of `defined $trailing_field` is what would
need to deprecation-warn, even if nothing else does, because that is
unambiguously a case that would change behaviour after patching `split`.

Currently that expression returns true, after the change it would return
false – so any conditional that tests definedness would take the other
branch in the future. This is a change in the behaviour of user code. If
anything requires a deprecation warning, that does.

Other uses of $trailing_field are murkier in whether they will cause any
change in behaviour. I don’t know exactly how trigger-happy the warning
should be. I am still trying to reason it out methodically, which is not
my strength.

But how much code is really relying on that single empty string?

$ ./perl -Ilib -e '($a,$b,$c,$d,$e,$f) = split //, "abc"; warn
"[",$_//"undef","]" for $a,$b,$c,$d,$e,$f'
[a] at -e line 1.
[b] at -e line 1.
[c] at -e line 1.
[] at -e line 1.
[undef] at -e line 1.
[undef] at -e line 1.

I don’t know. I’m not worried about the total amount in and of itself.
It is more the sense that the amount of code relying on this might not
be entirely trivial, even if it is small, along with the sense that most
reliance on this behaviour is inadvertent. That is to say, if we change
this, a small but not asymptotically zero amount of code would change
behaviour in hard-to-pinpoint circumstances. That makes it important, to
my mind, to give users the help and time they’ll need to spot it.

It can also construed a documented and advertised feature if you read
the documentation hard enough. Therefore “fixing” it could be construed
as breaking a documented interface.

But I suppose running a CPAN smoke using the transparent optimisation
could tell us how much immediately obvious breakage results, before we
think further on this. My worry may be unwarranted?

PS.​: it is irritating how much verbiage it takes to talk clearly about
  these in-and-of-themselves fairly small issues. I would have
  written a shorter bikeshed, but I did not have the knowledge how.

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 16, 2013

From agalama@mailfish.de

On Mon Jul 15 17​:58​:37 2013, aristotle wrote​:

Does this do it for you?

diff --git i/pod/perlfunc.pod w/pod/perlfunc.pod
index 18ecd40..c108733 100644
--- i/pod/perlfunc.pod
+++ w/pod/perlfunc.pod
@​@​ -6959,8 +6959,9 @​@​ produces the output 'a​:b​:c', but the following​:
produces the output 'a​:b​:c​::​:'.

In time-critical applications, it is worthwhile to avoid splitting
-into more fields than necessary. Thus, when assigning to a list,
-if LIMIT is omitted (or zero), then LIMIT is treated as though it
+into more fields than necessary. Thus, when the right-hand side of
+a list assignment consists only of a C<split>, and LIMIT is omitted
+(or zero), then LIMIT is treated as though it
were one larger than the number of variables in the list; for the
following, LIMIT is implicitly 3​:

Hmmm​:

($a, $b) = split /,/, 'a,'; # ('a', '') as expected
($a, $b) = @​a = split /,/, 'a,'; # also ('a', '') (!) although​:
@​a = split /,/, 'a,'; # just ('a'), as expected

The second one would have to be changed if you change the docs your way.

Interestingly​:

($a, $b) = @​a[0,1] = split /,/, 'a,'; # ('a', undef)

I am sure Schrödinger would have called his cat "Split". :-)

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2013

From @cpansprout

On Mon Jul 15 19​:59​:07 2013, aristotle wrote​:

The user can omit the third argument (and trip up the optimisation) to
make `split` *not* stop splitting *and* *not* return any trailing
empty
string field.

With the current interface there is no way for the user to ask for one
of these (stop splitting at a certain field) *but not* the other
(return
a trailing empty string field).

I see what you are saying. That would require yet some other syntax. I
don’t see much value in it.

What I understood you to be asking is whether

perl \-E '\($n\, $a\, $b\) = \(1\, split /\,/\, "a\,"\); say 1 if defined $b'
perl \-E '    \($a\, $b\) =    \(split /\,/\, "a\,"\); say 1 if defined $b'

should both be silent (where currently the latter isn’t) while at the
same time the `split` on the second line should know to stop splitting
at the third field, unlike the one on the first line (as is currently
true).

In other words, should `split` be able to do something (stop splitting
*but not* return a trailing empty field) that the user can’t currently
explicitly ask it for?

If it is just an optimisation, then the calling code can’t know whether
perl finished splitting to the end of the string. (‘Write your code
like this, and you will make it easier for the compiler to optimise
things and make them faster.’) So does it matter?

Now when you deparse `($a,$b) = split//`, it actually says this​:

\($a\, $b\) = split\(//\, $\_\, 3\);

If the optimisation were made transparent, then deparsing the `split`
like this would be a lie, because if the user wrote that, she would
get
a subtly different behaviour than if the optimiser makes it like that.

If we change the behaviour, then we will obviously have to change the
deparsed output. It is as simple as setting a flag on the split op and
have B​::Deparse check that.

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants