Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$1 not localized when calling sub #16337

Open
p5pRT opened this issue Dec 23, 2017 · 9 comments
Open

$1 not localized when calling sub #16337

p5pRT opened this issue Dec 23, 2017 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 23, 2017

Migrated from rt.perl.org#132647 (status was 'open')

Searchable as RT132647$

@p5pRT
Copy link
Author

p5pRT commented Dec 23, 2017

From @jimav

Created by @jimav

This is a bug report for perl from jim.avera@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.26.0.

-----------------------------------------------------------------
If $1 is passed as an arg to a function, and that function
internally performs a regex match, then the argument seen from
inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize
aliases to $1 such as in @​_. This is a nasty trap, and it would be great
if perl could at least diagnose it if it happens (the passed-in $1 is,
after all, nominally read-only and a direct assignment results in a
a fatal "Modification of a read-only value attempted"; so one could argue
that any operation which similarly could modify that argument should
be flagged as well).

If not fixable or catchable, then I'd like to suggest adding an explicit
mention of this trap to the docs, e.g. «perlsub».

#!/usr/bin/perl
use strict; use warnings;

sub func($) {
  my $saved = $_[0];
  if ($_[0] =~ /(\d+)/) { }
  warn "\$_[0] MUTATED from '$saved' to '$_[0]'\n"
  if $_[0] ne $saved;
}

func "a123b";
if ("c456d" =~ /(.*)/) { func($1) }

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.26.0:

Configured by Debian Project at Fri Sep 15 16:13:42 UTC 2017.

Summary of my perl5 (revision 5 version 26 subversion 0) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.0
    archname=x86_64-linux-gnu-thread-multi
    uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-4JQEGJ/perl-5.26.0=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.26 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.26 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.26 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.26.0 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.26.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint
-Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.26.0'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='x86_64-linux-gnu-gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion=''
    gccversion='7.2.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='x86_64-linux-gnu-gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.26.so
    so=so
    useshrplib=true
    libperl=libperl.so.5.26
    gnulibc_version='2.26'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
    DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.26.0-8ubuntu1 in patchlevel.h
    DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories
    DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers
    DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798
    DEBPKG:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub
    DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize
    DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd
    DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint
    DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO
    DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units
    DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site
    DEBPKG:fixes/time_piece_doc - https://bugs.debian.org/817925 Time::Piece: Improve documentation for add_months and add_years
    DEBPKG:fixes/extutils_makemaker_reproducible - https://bugs.debian.org/835815 https://bugs.debian.org/834190 Make perllocal.pod files reproducible
    DEBPKG:fixes/file_path_hurd_errno - File-Path: Fix test failure in Hurd due to hard-coded ENOENT
    DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems
    DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters
    DEBPKG:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack.
    DEBPKG:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294)
    DEBPKG:fixes/getopt-long-2 - [rt.cpan.org #120300] Withdraw part of commit 5d9947fb445327c7299d8beb009d609bc70066c0, which tries to implement more GNU getopt_long campatibility. GNU
    DEBPKG:fixes/getopt-long-3 - provide a default value for optional arguments
    DEBPKG:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068.
    DEBPKG:fixes/fbm-instr-crash - [bb152a4] [perl #131575] don't call Perl_fbm_instr() with negative length
    DEBPKG:fixes/test-builder-reset - https://bugs.debian.org/865894 Reset inside subtest maintains parent
    DEBPKG:debian/CVE-2016-1238/base-pm-amends-pt2 - [a77da41] Limit dotless-INC effect on base.pm with guard:
    DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa
    DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4
    DEBPKG:fixes/json-pp-example - [rt.cpan.org #92793] https://bugs.debian.org/871837 fix RT-92793: bug in SYNOPSIS
    DEBPKG:debian/customized - Update customized.dat for files patched in Debian
    DEBPKG:fixes/CVE-2017-12837 - https://bugs.debian.org/875596 [perl #131582] [66288bb] regcomp [perl #131582]
    DEBPKG:fixes/CVE-2017-12883 - https://bugs.debian.org/875597 [perl #131598] [2692dda] PATCH: [perl #131598]


@INC for perl 5.26.0:
    /home/jima/lib/perl
    /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi
    /home/jima/perl5/lib/perl5/5.26.0/x86_64-linux-gnu-thread-multi
    /home/jima/perl5/lib/perl5/5.26.0
    /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi
    /home/jima/perl5/lib/perl5
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.26.0
    /usr/local/share/perl/5.26.0
    /usr/lib/x86_64-linux-gnu/perl5/5.26
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl/5.26
    /usr/share/perl/5.26
    /usr/local/lib/site_perl
    /usr/lib/x86_64-linux-gnu/perl-base


Environment for perl 5.26.0:
    HOME=/home/jima
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/jima/perl5/bin:/home/jima/bin:/home/jima/jima_tools/x86_64/bin:/home/jima/jima_tools/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/bin/X11:/usr/local/bin:/usr/local/sbin:/usr/games:/usr/local/games:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:.
    PERL5LIB=/home/jima/lib/perl:/home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/jima/perl5/lib/perl5
    PERL_BADLANG (unset)
    PERL_LOCAL_LIB_ROOT=/home/jima/perl5
    PERL_MB_OPT=--install_base /home/jima/perl5
    PERL_MM_OPT=INSTALL_BASE=/home/jima/perl5
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Dec 28, 2017

From @iabyn

On Sat, Dec 23, 2017 at 12​:05​:35PM -0800, via RT wrote​:

If $1 is passed as an arg to a function, and that function
internally performs a regex match, then the argument seen from
inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize
aliases to $1 such as in @​_. This is a nasty trap, and it would be great
if perl could at least diagnose it if it happens (the passed-in $1 is,
after all, nominally read-only and a direct assignment results in a
a fatal "Modification of a read-only value attempted"; so one could argue
that any operation which similarly could modify that argument should
be flagged as well).

$1 et al act like tied variables​: whenever their value is retrieved, they
are set to a value from the current match. They are not scoped or
localised, but the match object is.

This should explain all the behaviour you see.

If not fixable or catchable, then I'd like to suggest adding an explicit
mention of this trap to the docs, e.g. «perlsub».

I can't see any sane way to fix this without introducing weird
special-cased behaviour, e,g. turning every bare $N in a function call's
args into a "$N".

I suppose in places where $1 et al could get aliased (such as function
calls and maybe foreach) a warning could be emitted, but that might be
noisy. I don't know whether there are valid use cases, but grepping cpan
shows 1400+ distributions matching foo($N,...), although some of the foo's
are things like subtr and index.

--
"Strange women lying in ponds distributing swords is no basis for a system
of government. Supreme executive power derives from a mandate from the
masses, not from some farcical aquatic ceremony."
  -- Dennis, "Monty Python and the Holy Grail"

@p5pRT
Copy link
Author

p5pRT commented Dec 28, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 28, 2017

From @jimav

On 12/28/17 3​:40 AM, Dave Mitchell via RT wrote​:

On Sat, Dec 23, 2017 at 12​:05​:35PM -0800, via RT wrote​:

If $1 is passed as an arg to a function, and that function
internally performs a regex match, then the argument seen from
inside the func is corrupted
I can't see any sane way to fix this without introducing weird
special-cased behaviour, e,g. turning every bare $N in a function call's
args into a "$N"

It sounds like there's likely nothing to do about it, and making args
into "$1" etc would copy data.  I realize now that since the match
object doesn't contain matched data, there is no way to make $1 a real
alias because there's nothing to alias it to.

But I can think of a non-trivial solution...

Replace references to $N, %+ and related vars when they appear in
sub/foreach args with a dynamically-created object which decorates the
normal thingie with a check that the current match result is the same
one which was current when the arg was created; if not, it would  throw
an error "reference to no-longer-current match result".  I suppose there
might exist code which actually wants a $N passed as an arg to reference
a to-be-created-in-the-future match result.

-Jim

@p5pRT
Copy link
Author

p5pRT commented Dec 29, 2017

From @demerphq

On 28 Dec 2017 12​:41, "Dave Mitchell" <davem@​iabyn.com> wrote​:

On Sat, Dec 23, 2017 at 12​:05​:35PM -0800, via RT wrote​:

If $1 is passed as an arg to a function, and that function
internally performs a regex match, then the argument seen from
inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize
aliases to $1 such as in @​_. This is a nasty trap, and it would be great
if perl could at least diagnose it if it happens (the passed-in $1 is,
after all, nominally read-only and a direct assignment results in a
a fatal "Modification of a read-only value attempted"; so one could argue
that any operation which similarly could modify that argument should
be flagged as well).

$1 et al act like tied variables​: whenever their value is retrieved, they
are set to a value from the current match. They are not scoped or
localised, but the match object is.

This should explain all the behaviour you see.

We'll, that combined with the fact that perl is a pass by alias language.

The op seems to expect pass by value semantics which is simply a
fundamental misunderstanding of how Perls @​_ works.

If not fixable or catchable, then I'd like to suggest adding an explicit
mention of this trap to the docs, e.g. «perlsub».

I can't see any sane way to fix this without introducing weird
special-cased behaviour, e,g. turning every bare $N in a function call's
args into a "$N".

I don't think there is anything to fix. Newcomers to perl encounter this at
some point, then learn not to do this, either by copying regex vars early
or by explicitly copying the vars as arguments by double quoting them. You
will find this issue raised countless times on perlmonks.

I suppose in places where $1 et al could get aliased (such as function
calls and maybe foreach) a warning could be emitted, but that might be
noisy. I don't know whether there are valid use cases, but grepping cpan
shows 1400+ distributions matching foo($N,...), although some of the foo's
are things like subtr and index.

Are we to do this for every tied object? How are we to know which are
volatile?

A doc patch might be in order but imo no more, vars like $! and $1 are
volatile, it is the programmers responsibility to copy them to non volatile
storage or suffer the consequences.

Yves

@p5pRT
Copy link
Author

p5pRT commented Dec 29, 2017

From @jimav

On 12/29/17 1​:07 AM, yves orton via RT wrote​:

the fact that perl is a pass by alias languag
The op seems to expect pass by value semantics which is simply a
fundamental misunderstanding of how Perls @​_ works

Not exactly.   Unlike almost anything else in Perl, if $1 is passed as
an argument, it is not an alias to the caller's match result -- it is
more like a _name_ which is effectively eval'd inside the sub each time
it is referenced.  If a parameter bound to $1 actually aliased the
captured text, it would still refer to that text after another match
result was lexically pushed inside the sub.

Consider this analogy​:

  sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; }
  $_ = "foo";
  func($_);

No sane programmer would expect the function to print "bar", and it
doesn't.
The arg aliases $_, but the alias points the _data_ not the name "$_".

But this​:

  sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; }
  "foo" =~ /(.*)/;
  func($1);

does what no sane programmer would expect, i.e., it prints "bar".

Now I think Dave Mitchell's idea of converting $1 to "$1" when passed as
a sub arg might really be a Good Thing.  In an ideal universe Perl might
allow aliases which refer to a _substring_, and then $1 could really
refer directly to the captured text.  But it can't, so making a copy
when passed as a sub arg is good enough to shield the sub's code from
having to be aware of this issue. Bear in mind that $1 can't be used as
an lvalue anyway, so neither can $_[n] if it aliases $1; so
pass-by-value in this case should be invisible.

@p5pRT
Copy link
Author

p5pRT commented Dec 29, 2017

From @cpansprout

On Fri, 29 Dec 2017 13​:35​:53 -0800, jim.avera@​gmail.com wrote​:

In an ideal universe Perl might
allow aliases which refer to a _substring_,

Perl already does that with the return value from substr(). This works​:

$_ = "HELO";
for (substr $_, 1, 1) {
  $_ = "EL";
}

except that the special substr scalar does get its own copy of the substring internally, which is unavoidable due to the requirement that string buffers end in a null.

and then $1 could really
refer directly to the captured text.

I wondered for a moment why $1 could not be like a substr scalar, but then I realized​: you can modify the original string and $1 does not change.

However, $1 currently *does* retrieve its string value dynamically from the pre-match copy, which because of COW is usually the original string buffer. (But, again, because of null-termination, $1 does get its own copy of the string buffer when you use it.)

But it can't, so making a copy
when passed as a sub arg is good enough to shield the sub's code from
having to be aware of this issue. Bear in mind that $1 can't be used as
an lvalue anyway, so neither can $_[n] if it aliases $1; so
pass-by-value in this case should be invisible.

It wouldn’t be invisible, as referential identity would be lost. It might break a lot of introspection code.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 30, 2017

From @demerphq

On 29 December 2017 at 22​:35, Jim Avera <jim.avera@​gmail.com> wrote​:

On 12/29/17 1​:07 AM, yves orton via RT wrote​:

the fact that perl is a pass by alias languag
The op seems to expect pass by value semantics which is simply a
fundamental misunderstanding of how Perls @​_ works

Not exactly. Unlike almost anything else in Perl, if $1 is passed as an
argument, it is not an alias to the caller's match result -- it is more like
a _name_ which is effectively eval'd inside the sub each time it is
referenced. If a parameter bound to $1 actually aliased the captured text,
it would still refer to that text after another match result was lexically
pushed inside the sub.

Consider this analogy​:

sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; }
$_ = "foo";
func($_);

No sane programmer would expect the function to print "bar", and it doesn't.

This is not the same thing. Local changes what SV an identifier
resolves to, it does not change the SV itself, and it does not
interfere with any refs or alias to other versions.

  local $foo= "bar";
  my $bar_ref= \$foo;
  local $foo= "baz";
  print $$bar_ref;

prints out "bar" as I would expect. $_[0] in the case you showed is
still an alias to whatever SV $_ was pointing at in the first place.
Localization did not modify that var in any way.

Put another way, after the local call there are *two* SV's in
existence. With the regex case there is only one, $1.

The arg aliases $_, but the alias points the _data_ not the name "$_".

It points at the _container_ SV. _data_ implies that it is a value, it
is not, it is a container.

It is not uncommon for even experienced Perl programmers to conflate
values and containers when discussing scalars. 1 is a value. $x is a
scalar container which may contain 1.

Aliasing occurs at the *container* level.

But this​:

sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; }
"foo" =~ /(.*)/;
func($1);

does what no sane programmer would expect, i.e., it prints "bar".

It does what every experienced Perl programmer would expect.

$1 is a container which when used as an rvalue returns the value of
the most recent successful match in scope.

func($1)

calls func and puts an alias to the container $1 into $_[0].

You could just as easily have said​:

  func("$1")

and created a copy. Or you could have written func() like this​:

  sub func { my $thing= shift; "bar" =~ /(.*)/; print "func called
with $thing\n"; }

and created a copy inside the func instead.

This is a standard issue with aliasing. Because @​_ contains aliases to
the arguments, it is potentially volatile, and if this breaks your
expectations then you should make a copy.

Here is an example which I consider to be exactly equivalent to the
issue with $1 and @​_, but which uses no regex magic. IMO it is very
clear that how it behaves is by design and that none of this is a bug,
no matter how surprising you might consider it to be​:

  sub othersub { $_[0]->{x}++ }
  sub whatever {
  my ($value,$hashref)= @​_;
  print "$value​:$_[0]";
  othersub($hashref);
  print "$value​:$_[0]";
  $_[0]+=2;
  }
  my %hash=(x=>1);
  whatever($hash{x},\%hash);
  print $hash{x};

which prints out​:

1​:1
1​:2
4

Which to me is no different from the regex case.

So to me this thread is basically the result of mistaken assumptions
about how aliasing works and how regex magic variables work. We can
improve the docs to explain this stuff better, but I strongly feel
there is no bug here, and the best we can do is improve how we educate
people about the subtleties.

I mean, a simple rule is​:

Operating on @​_ directly has subtle implications which may surprise
the unwary or inexperienced. Copying the arguments as early as
possible ensures that many of these traps are avoided, and should be
general practice. In particular the programmer should remember that
any argument in @​_ could be volatile, and operations performed by the
subroutine may result in the arguments changing value between the time
of entry to the subroutine and the time of access of the variable.
When in doubt copy early.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2017

From @jimav

On 12/30/17 4​:25 AM, yves orton via RT wrote​:

The programmer should remember that any argument in @​_ could be volatile, and operations performed by the
subroutine may result in the arguments changing value...

Thanks, I understand what you are saying.  But a programmer should only
need to worry about "operations" which a) refer to sub arguments, or b)
use dynamic variables which have not been first first localized within
the sub.  I don't think application programmers should have to defend
against weird tied vars or equivalent which break normal localization
semantics.

The key point is that perlvar says "These variables are read-only and
dynamically-scoped".

As you mentioned, $1 is not a normal variable but "is a container
which...returns the value of the most recent successful match in
scope".  And I think that is the crux of the problem​: It does not behave
like "dynamically scoped" variables elsewhere in Perl.

If $1 were implicitly localized in scopes containing a regex match, but
otherwise behaved normally, then passing $1 to a sub would create an
alias to an SV and inside the sub $1 would, after being localized, point
to a different SV.  There would be no trap.

In reality, only the _data_ of match results is dynamically scoped,
magically, behind the scenes.  The variables used to get at that data
are effectively crippled so that localization has no effect (even an
explicit "local $1" does nothing).

perl can not localize $1 as long as captured text is not actually stored
anywhere as such.  That's efficient, but feels like a semantic wart.

If this behavior isn't changed, then, perhaps the docs could be modified
along these lines​:

  <perlvar> could say "Capture result data is read-only and
dynamically-scoped.  However the variables $1, %+ et. al. are magical
and can not be localized; these variables I<and any aliases> always
return the most recent successful match results which are in scope at
the point of reference.   For example, if you pass $1 as an argument to
a sub, then the sub must copy $_[n] before performing its own regex
match in order to see the caller's intended argument.

 <perlre> might say "[as-is​: Capture group contents are dynamically
scoped and available.. . [added​:]However "$1" and related variables, and
any aliases for them, are can not be localized and always refer to the
most recent successful match which is in scope at the point where the
variable or alias is referenced."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants