Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in the 'exists' function #1288

Closed
p5pRT opened this issue Mar 8, 2000 · 25 comments
Closed

bug in the 'exists' function #1288

p5pRT opened this issue Mar 8, 2000 · 25 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 8, 2000

Migrated from rt.perl.org#2285 (status was 'resolved')

Searchable as RT2285$

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From AMOSCM@eurodis.com

Hi

This email has 3 sections​:

1. perl version details from 'perl -V'
2. perl program demonstrating the (I think) bug
3. output from the program

Let me know if aI can be of help

Thanks
Chris Amos.



Section 1 - Perl version

Summary of my perl5 (5.0 patchlevel 4 subversion 4) configuration​:
  Platform​:
  osname=dec_osf, osvers=4.0, archname=alpha-dec_osf
  uname='osf1 nasaxp.rto.dec.com v4.0 564 alpha '
  hint=recommended, useposix=true, d_sigaction=define
  bincompat3=y useperlio=undef d_sfio=undef
  Compiler​:
  cc='cc', optimize='-O4', gccversion=
  cppflags='-std -D_INTRINSICS -I/usr/local/include -D__LANGUAGE_C__'
  ccflags ='-std -D_INTRINSICS -I/usr/local/include -D__LANGUAGE_C__'
  stdchar='unsigned char', d_stdstdio=define, usevfork=false
  voidflags=15, castflags=0, d_casti32=define, d_castneg=define
  intsize=4, alignbytes=8, usemymalloc=y, prototype=define
  Linker and Libraries​:
  ld='ld', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /usr/shlib /shlib /lib /usr/lib /usr/ccs/lib
  libs=-lgdbm -ldbm -ldb -lm
  libc=/usr/shlib/libc.so, so=so
  useshrplib=true, libperl=libperl.so
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='
-Wl,-rpath,/usr/opt/PERL5004/lib/perl5/alpha-dec_osf/5.00404/CORE'
  cccdlflags=' ', lddlflags='-shared -expect_unresolved "*" -O4 -msym -s
-L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Built under dec_osf
  Compiled at Nov 7 1997 20​:19​:49
  @​INC​:
  /usr/opt/PERL5004/lib/perl5/alpha-dec_osf/5.00404
  /usr/opt/PERL5004/lib/perl5
  /usr/opt/PERL5004/lib/perl5/site_perl/alpha-dec_osf
  /usr/opt/PERL5004/lib/perl5/site_perl
  .



Section 2 - the program

$h{key1_x}{key2_x} = 'I exist';

#
# Does the element we created exist?
#
print "Test 1 - 2D hash element we know exists\n";

if (exists $h{key1_x}{key2_x})
{
  #
  # The hash element exists - that's what we expected
  #
  print "Hoorahh! he exists​: '$h{key1_x}{key2_x}'\n";
}
else
{
  #
  # The hash element doesn't exist - that's bad
  #
  print "Uh oh, he's gone awol\n";
}

#################################################################

#
# Change key 1 and key2 of the hash - does it exist?
#
print "Test 2 - 2D hash element we know doesn't exists (new key1 and
key2)\n";

if (exists $h{key1_y}{key2_y})
{
  #
  # The hash element exists - that's bad
  #
  print "Uh oh, he does exist​: $h{key1_y}{key2_y}\n";
}
else
{
  #
  # The hash element doesn't exist - that's what we expected
  #
  print "Hoorahh! he doesn't exist\n";
}

#################################################################

print "Test 3 - Does key1 of the 2D hash exist (same key1 as test 2)?\n";

#
# Use key1 from test 2 - it shouldn't exist.
#
if (exists $h{key1_y})
{
  #
  # The hash element exists - that's bad
  #
  print "Uh oh, he does exist​: $h{key1_y}\n";
}
else
{
  #
  # The hash element doesn't exist - that's what we expected
  #
  print "Hoorahh! he doesn't exist\n";
}



Section 3 - The program output

Test 1 - 2D hash element we know exists
Hoorahh! he exists​: 'I exist'
Test 2 - 2D hash element we know doesn't exists (new key1 and key2)
Hoorahh! he doesn't exist
Test 3 - Does key1 of the 2D hash exist (same key1 as test 2)?
Uh oh, he does exist​: HASH(0x14000b718)



@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

Chris Amos writes​:
| Hi
|
| This email has 3 sections​:
|
| 1. perl version details from 'perl -V'
| 2. perl program demonstrating the (I think) bug
| 3. output from the program
|
| <mostly snipped>

| if (exists $h{key1_y}{key2_y})
| {
| print "Uh oh, he does exist​: $h{key1_y}{key2_y}\n";
| }
| else
| {
| print "Hoorahh! he doesn't exist\n";
| }
|
| if (exists $h{key1_y})
| {
| print "Uh oh, he does exist​: $h{key1_y}\n";
| }

Chris,
  There is no bug here. The test exists $h{key1_y}{key2_y} autovivifies
(calls into existence) all intermediate keys; i.e., $h{key1_y} here.
This is documented; from perldoc -f exists​:


  if (exists &{$ref->{A}{B}{$key}}) { }

Although the deepest nested array or hash will not
spring into existence just because its existence
was tested, any intervening ones will. Thus
`$ref->{"A"}' and `$ref->{"A"}->{"B"}' will spring
into existence due to the existence test for the
$key element above. This happens anywhere the
arrow operator is used, including even​:

  undef $ref;
  if (exists $ref->{"Some key"}) { }
  print $ref; # prints HASH(0x80d3d5c)


HTH,
Mx.

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

$h{key1_x}{key2_x} = 'I exist';
if (exists $h{key1_x}{key2_x})
if (exists $h{key1_y}{key2_y})
if (exists $h{key1_y})

This has nothing to do with what function you call. It's the fact
that Perl pre-evaluates the function's arguments before you call
them. That way when you call fn(2**8), that function fn sees one
argument, which is already in the form of the number 256, not as
the unevaluated "2**8". When you evaluate an expression using a
dereferencing arrow, even an implicit one that you don't see it
there, the act of that evaluation fills in the thing you're looking
at. Again, That it should happen to be exists() is completely
immaterial.

You're implicitly using the dereference arrow. Whenever you
dereference something, whether in lvalue or rvalue context, you
autovivify its left-hand operand.

  undef %h;
  $h{key1_x}{key2_x} = 'I define';
  if (defined $h{key1_x}{key2_x}) { print "yes 1\n" }
  if (defined $h{key1_y}{key2_y}) { print "yes 2\n" }
  if (defined $h{key1_y}) { print "yes 3\n" }
leads to
  yes 1
  yes 3

And​:

  undef %h;
  $h{key1_x}{key2_x} = 'I am';
  if ($h{key1_x}{key2_x}) { print "yes 4\n" }
  if ($h{key1_y}{key2_y}) { print "yes 5\n" }
  if ($h{key1_y}) { print "yes 6\n" }
leads to
  yes 4
  yes 6

In fact, even

  undef $hr;
  if ($hr) { print "yes 1" }
  if ($hr->[4]) { print "yes 2" }
  if ($hr) { print "yes 3" }
leads to
  yes 3

From perlfunc​:

  Note that the EXPR can be arbitrarily complicated as long as
  the final operation is a hash or array key lookup or subroutine
  name​:

  if (exists $ref->{A}->{B}->{$key}) { }
  if (exists $hash{A}{B}{$key}) { }

  if (exists $ref->{A}->{B}->[$ix]) { }
  if (exists $hash{A}{B}[$ix]) { }

  if (exists &{$ref->{A}{B}{$key}}) { }

  Although the deepest nested array or hash will not spring into
  existence just because its existence was tested, any intervening
  ones will. Thus $ref->{"A"}> and $ref->{"A"}->{"B"}> will
  spring into existence due to the existence test for the $key
  element above. This happens anywhere the arrow operator is
  used, including even​:

  undef $ref;
  if (exists $ref->{"Some key"}) { }
  print $ref; # prints HASH(0x80d3d5c)

  This surprising autovivification in what does not at first--or
  even second--glance appear to be an lvalue context may be fixed
  in a future release.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

if (exists &{$ref->{A}{B}{$key}}) { }

Although the deepest nested array or hash will not
spring into existence just because its existence
was tested, any intervening ones will.

Either people aren't reading that, or they aren't understanding
it. They keep thinking that this has anything to do with exists
(it doesn't) and now that I look at it, the phrasing there
may not help them out of this confusion. I wonder whether this
would help.

  As a workaround for rvalued autovivification of dereferencing
  an undefined reference, instead of writing this​:

  $ref->[$x][$y]{$key}[2] ) { ... }

  write this​:

  if ($ref &&
  $ref->[$x] &&
  $ref->[$x][$y] &&
  $ref->[$x][$y]{$key} &&
  $ref->[$x][$y]{$key}[2] ) { ... }

  Likewise, instead of writing this​:

  $answer = $ref->[$x][$y]{$key}[2];

  write this​:

  $answer = $ref &&
  $ref->[$x] &&
  $ref->[$x][$y] &&
  $ref->[$x][$y]{$key} &&
  $ref->[$x][$y]{$key}[2];

Thank goodness && returns the real thing, not just 1 or 0.
Or else they'd have to write​:

  $answer = ( $ref &&
  $ref->[$x] &&
  $ref->[$x][$y] &&
  $ref->[$x][$y]{$key}
  ) ? $ref->[$x][$y]{$key}[2]
  : undef;

Eek.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

As a workaround for rvalued autovivification of dereferencing
an undefined reference, instead of writing this​:

   $ref\->\[$x\]\[$y\]\{$key\}\[2\] \) \{ \.\.\. \}

Should of course be

  if ( $ref->[$x][$y]{$key}[2] ) { ... }

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

Hi Guys

I looked this up on my online docs before logging the "bug". My docs
definitely do not say anything about the autovivification. My perl version
is a couple of years old though. I was using the "man perlfunc" manpage.

Thanks for looking at it though. I worked around it by checking the
existence of the first reference before dereferencing it.

Cheers
Chris

-----Original Message-----
From​: Tom Christiansen [SMTP​:tchrist@​chthon.perl.com]
Sent​: Wednesday, March 08, 2000 12​:50 PM
To​: Martyn Pearce
Cc​: Chris Amos; perl5-porters@​perl.org; tchrist@​chthon.perl.com
Subject​: Re​: [ID 20000308.003] bug in the 'exists' function

if (exists &{$ref->{A}{B}{$key}}) { }

Although the deepest nested array or hash will not
spring into existence just because its existence
was tested, any intervening ones will.

Either people aren't reading that, or they aren't understanding
it. They keep thinking that this has anything to do with exists
(it doesn't) and now that I look at it, the phrasing there
may not help them out of this confusion. I wonder whether this
would help.

As a workaround for rvalued autovivification of dereferencing
an undefined reference\, instead of writing this&#8203;:

    $ref\->\[$x\]\[$y\]\{$key\}\[2\] \) \{ \.\.\. \}

write this&#8203;:

if \($ref                 &&
    $ref\->\[$x\]           &&
    $ref\->\[$x\]\[$y\]       &&
    $ref\->\[$x\]\[$y\]\{$key\} &&
    $ref\->\[$x\]\[$y\]\{$key\}\[2\] \) \{ \.\.\. \}

Likewise\, instead of writing this&#8203;:

    $answer = $ref\->\[$x\]\[$y\]\{$key\}\[2\];

write this&#8203;:

$answer = $ref                 &&
      $ref\->\[$x\]           &&
      $ref\->\[$x\]\[$y\]       &&
      $ref\->\[$x\]\[$y\]\{$key\} &&
      $ref\->\[$x\]\[$y\]\{$key\}\[2\];

Thank goodness && returns the real thing, not just 1 or 0.
Or else they'd have to write​:

$answer = \(   $ref                 &&
          $ref\->\[$x\]           &&
          $ref\->\[$x\]\[$y\]       &&
          $ref\->\[$x\]\[$y\]\{$key\} 
      \) ? $ref\->\[$x\]\[$y\]\{$key\}\[2\]
        : undef;

Eek.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

"Tom" == Tom Christiansen <tchrist@​chthon.perl.com> writes​:

  >> if (exists &{$ref->{A}{B}{$key}}) { } Although the deepest
  >> nested array or hash will not spring into existence just
  >> because its existence was tested, any intervening ones will.

  Tom> Either people aren't reading that, or they aren't
  Tom> understanding it. They keep thinking that this has anything
  Tom> to do with exists (it doesn't) and now that I look at it, the
  Tom> phrasing there may not help them out of this confusion. I
  Tom> wonder whether this would help.

  Tom> As a workaround for rvalued autovivification of dereferencing
  Tom> an undefined reference, instead of writing this​:

  Tom> $ref->[$x][$y]{$key}[2] ) { ... }

  Tom> write this​:

  Tom> if ($ref &&
  Tom> $ref->[$x] &&
  Tom> $ref->[$x][$y] &&
  Tom> $ref->[$x][$y]{$key} &&
  Tom> $ref->[$x][$y]{$key}[2] ) { ... }

  Tom> Likewise, instead of writing this​:

  Tom> $answer = $ref->[$x][$y]{$key}[2];

  Tom> write this​:

  Tom> $answer = $ref &&
  Tom> $ref->[$x] &&
  Tom> $ref->[$x][$y] &&
  Tom> $ref->[$x][$y]{$key} &&
  Tom> $ref->[$x][$y]{$key}[2];

  Tom> Thank goodness && returns the real thing, not just 1 or 0.
  Tom> Or else they'd have to write​:

  Tom> $answer = ( $ref &&
  Tom> $ref->[$x] &&
  Tom> $ref->[$x][$y] &&
  Tom> $ref->[$x][$y]{$key}
  Tom> ) ? $ref->[$x][$y]{$key}[2]
  Tom> : undef;

  Tom> Eek.

  Tom> --tom

I put a module up on CPAN to do this. Hash​::NoVivify allows you to do​:

 
  $answer = Exists($ref, $x, $y, $key, 2) ?
  $ref->[$x]->[$y]->{$key}->[2] :
  undef;

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

I put a module up on CPAN to do this. Hash​::NoVivify allows you to do​:

$answer = Exists($ref, $x, $y, $key, 2) ?
$ref->[$x]->[$y]->{$key}->[2] :
undef;

How does it know [] vs {}?
Also, oughtn't NoVivify be Mortify? :-)

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From @tamias

On Wed, Mar 08, 2000 at 10​:11​:34AM -0700, Tom Christiansen wrote​:

I put a module up on CPAN to do this. Hash​::NoVivify allows you to do​:

$answer = Exists($ref, $x, $y, $key, 2) ?
$ref->[$x]->[$y]->{$key}->[2] :
undef;

How does it know [] vs {}?

Probably calls ref on each step and checks whether it's ARRAY or HASH. :)

Ronald

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From @nwc10

On Wed, Mar 08, 2000 at 12​:09​:29PM -0500, Brent B. Powers wrote​:

"Tom" == Tom Christiansen <tchrist@​chthon.perl.com> writes​:

  (explanation)

I put a module up on CPAN to do this. Hash​::NoVivify allows you to do​:

$answer = Exists($ref, $x, $y, $key, 2) ?
$ref->[$x]->[$y]->{$key}->[2] :
undef;

Would it be a good idea to add Tom's fuller explanation of the problem
(or at least one paragraph in the same style explaining for
$ref->{A}->{B}->{$key} ) and a mention of Hash​::NoVivify to the docs for
exists, which for 5.005_63 (the most recent I've built) reads​:

  Note that the EXPR can be arbitrarily complicated as
  long as the final operation is a hash key lookup​:

  if (exists $ref->{A}->{B}->{$key}) { }
  if (exists $hash{A}{B}{$key}) { }

  existence just because its existence was tested,
  intervening ones will. Thus `$ref->{"A"}' and
  `$ref->{"A"}->{"B"}' will spring into existence due
  to the existence test for a $key element. This
  happens anywhere the arrow operator is used,
  including even

  undef $ref;
  if (exists $ref->{"Some key"}) { }
  print $ref; # prints HASH(0x80d3d5c)

  This surprising autovivification in what does not at
  first--or even second--glance appear to be an lvalue
  context may be fixed in a future release.

?

The entry for defined in perlfunc appears to be completely lacking anything
about the gotcha of autovivification of references midway in the expression.
Would it be worth adding a warning there?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

The entry for defined in perlfunc appears to be completely lacking anything
about the gotcha of autovivification of references midway in the expression.
Would it be worth adding a warning there?

You can say the same thing about *all* functions, operators, and
keywords. This is not a function of the function. This is a
function of the way expression evaluation works.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From @nwc10

On Wed, Mar 08, 2000 at 10​:31​:11AM -0700, Tom Christiansen wrote​:

The entry for defined in perlfunc appears to be completely lacking anything
about the gotcha of autovivification of references midway in the expression.
Would it be worth adding a warning there?

You can say the same thing about *all* functions, operators, and
keywords. This is not a function of the function. This is a
function of the way expression evaluation works.

I agree that it is true for all functins, but I think that it's most likely to
be a user gotcha (it got me) only for defined and exists. It's easy to make
the mistake for defined and exists because one can (wrongly) think of them as
functions that test for definedness or existence and hence assume that they
only test their arguments without vivifying.

Currently perlfunc's exists has a warning paragraph, but perlfunc's defined
does not, which seems inconsistent.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

I agree that it is true for all functins, but I think that it's most likely to
be a user gotcha (it got me) only for defined and exists. It's easy to make
the mistake for defined and exists because one can (wrongly) think of them as
functions that test for definedness or existence and hence assume that they
only test their arguments without vivifying.

They do. They do not autovivify their arguments. If I saw

  if ( $x{1} )
  if ( $x{1} - 2 )
  if ( defined $x{1} )
  if ( exists $x{1} )

These also fail to autovivify their argument​:

  if ( $x{1}[2] )
  if ( $x{1}[2] - 2 )
  if ( defined $x{1}[2] )
  if ( exists $x{1}[2] )

None of these autovivify their argument. However, many
people misunderstand what their argument is.

These, however, do​:

  $x{1}[2]++;
  $x{1}[2] *= 3;
  $x{1}[2] = "fred";

The [2] slot is all those if's are asking about. They do not
have built-in anding. They *do* answer the question. But no
one understands the question they're asking.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From [Unknown Contact. See original ticket]

Tom Christiansen writes​:

The entry for defined in perlfunc appears to be completely lacking anything
about the gotcha of autovivification of references midway in the expression.
Would it be worth adding a warning there?

You can say the same thing about *all* functions, operators, and
keywords. This is not a function of the function. This is a
function of the way expression evaluation works.

While this is absolutely correct, the behaviour you describe is *also*
a bug. While documenting a workaround for this bug, please be careful
in your wording to not codify the current (mis)behaviour.

The bug is that $foo{bar} of

  rvalue_context( $foo{bar}->{baz} )

is executed in lvalue context. In fact it should be executed in
rvalue context, but the following $ref->{baz} should be special-cased
to "allow" $ref to be undef.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Mar 8, 2000

From @nwc10

On Wed, Mar 08, 2000 at 11​:11​:25AM -0700, Tom Christiansen wrote​:

I agree that it is true for all functins, but I think that it's most likely to
be a user gotcha (it got me) only for defined and exists. It's easy to make
the mistake for defined and exists because one can (wrongly) think of them as
functions that test for definedness or existence and hence assume that they
only test their arguments without vivifying.

They do. They do not autovivify their arguments. If I saw

if \( $x\{1\} \)
if \( $x\{1\} \- 2 \)
if \( defined $x\{1\} \)
if \( exists $x\{1\} \)

These also fail to autovivify their argument​:

if \( $x\{1\}\[2\] \)
if \( $x\{1\}\[2\] \- 2 \)
if \( defined $x\{1\}\[2\] \)
if \( exists $x\{1\}\[2\] \)

None of these autovivify their argument. However, many
people misunderstand what their argument is.

That's the important bit. Which I failed to express anywhere near clearly

These, however, do​:

$x\{1\}\[2\]\+\+;
$x\{1\}\[2\] \*= 3;
$x\{1\}\[2\] = "fred";

The [2] slot is all those if's are asking about. They do not
have built-in anding. They *do* answer the question. But no
one understands the question they're asking.

Because it is easy for people to misunderstand what the argument to defined
and exist is, and this misunderstanding becomes the cause of bugs in people's
scripts, is it worth

1​: copying the paragraph warning about this from the exists section to defined?
2​: slightly expanding the paragraph with your code showing explit anding to
  ram home the distinction in the reader's mind and hence stop many falling
  into the trap?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2000

From @nwc10

On Wed, Mar 08, 2000 at 03​:32​:27PM -0500, Ilya Zakharevich wrote​:

Tom Christiansen writes​:

The entry for defined in perlfunc appears to be completely lacking anything
about the gotcha of autovivification of references midway in the expression.
Would it be worth adding a warning there?

You can say the same thing about *all* functions, operators, and
keywords. This is not a function of the function. This is a
function of the way expression evaluation works.

While this is absolutely correct, the behaviour you describe is *also*
a bug. While documenting a workaround for this bug, please be careful
in your wording to not codify the current (mis)behaviour.

Aren't we arguing semantics here?
The current behaviour is no bug of perl as implemented; behaviour and
documentation are consistent.
It's a "bug" in that it doesn't "do what I mean" for most people's idea of
"what I mean".

The bug is that $foo{bar} of

rvalue_context( $foo{bar}->{baz} )

is executed in lvalue context. In fact it should be executed in
rvalue context, but the following $ref->{baz} should be special-cased
to "allow" $ref to be undef.

The "bug" is that perl doesn't have an rvalue context that encompasses the
whole expression and hence could prevent auto-vivification.
I'm not convinced that the current rvalue context is a "bug"; the current
behaviour and the behaviour that you propose are simply different-but-equal
rather than one-wrong-one-right.

(As an aside I don't know if it's even easy to change rvalue context not
to autovivify. And if it is, is it practical to distinguish between normal
rvalue context and defined context, if such a distinction is desired?)

I don't like the idea of the special case

Current behaviour means you get
bash-2.02$ perl -le 'print defined $ref; print $ref->{baz} - 0; print defined $ref'

0
1
bash-2.02$

If we make rvalue context not autovivify for defined and exists, I'd argue
that the special case would be bad and hence we'd need to change it for all
rvalue context. So my script output would change to

bash-2.02$ perl -le 'print defined $ref; print $ref->{baz} - 0; print defined $ref'

0

bash-2.02$

Would this break many existing programs?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2000

From [Unknown Contact. See original ticket]

If we make rvalue context not autovivify for defined and exists, I'd argue
that the special case would be bad and hence we'd need to change it for all
rvalue context.

Yes, adding special magical cases doesn't seem the way to go.

Would this break many existing programs?

We've had a long history of rvalue/lvalue existential crises.
Remember that functions are pass-by-alias(implicit ref). It certainly
took a while to get consistent behaviour on so that the function
could alter $_[0] and that hash elt would magically populate, but
if the function did not modify $_[0], that element did not.

  sub fx { $_[0] = $_[1] if @​_ > 1 }

  printf "1​: FOO exists %d\n", exists $H{FOO};
  fx $H{FOO};
  printf "2​: FOO exists %d\n", exists $H{FOO};
  fx $H{FOO}, 1;
  printf "3​: FOO exists %d\n", exists $H{FOO};

  printf "4​: BAR exists %d\n", exists $H{BAR};
  fx $H{BAR};
  printf "5​: BAR exists %d\n", exists $H{BAR};
  fx $H{BAR}, undef;
  printf "6​: BAR exists %d\n", exists $H{BAR};

Which of course produces​:

  1​: FOO exists 0
  2​: FOO exists 0
  3​: FOO exists 1
  4​: BAR exists 0
  5​: BAR exists 0
  6​: BAR exists 1

Now, consider

  fx $H{FEE}{FI}{FO}{FUM}, "stuff";

Right now, only {FUM} is treated gingerly, which is consistent and
correct, but as we have witnessed, nevertheless unexpected by the
unwary. The rest have been evaluated already, and filled in en
route. Can you imagine the pain needed to make this "work" the way
they're expecting? There's quite a lot. You'd have to carry
all values of all the subscripts with you.

Now, if that's not enough to strike fear into your heart, consider

  fx $A[ fN($i++) ]{ fS($i++) }[ fN($i++) ]{ fS($i++) }, "stuff";

See what I mean?

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2000

From @pjscott

At 05​:14 AM 3/9/00 -0700, Tom Christiansen wrote​:

Now, consider

fx $H\{FEE\}\{FI\}\{FO\}\{FUM\}\, "stuff";

Right now, only {FUM} is treated gingerly, which is consistent and
correct, but as we have witnessed, nevertheless unexpected by the
unwary. The rest have been evaluated already, and filled in en
route. Can you imagine the pain needed to make this "work" the way
they're expecting? There's quite a lot. You'd have to carry
all values of all the subscripts with you.

Now, if that's not enough to strike fear into your heart, consider

fx $A\[ fN\($i\+\+\) \]\{ fS\($i\+\+\) \}\[ fN\($i\+\+\) \]\{ fS\($i\+\+\) \}\, "stuff";

Not that my brain has any chance of encompassing a fix, but can we agree in
principle that it should be changed to the "expected" behavior if at all
possible? It violates the principle of least surprise; one just doesn't
expect the act of observing something to change it, or something else. It
screams Heisenbug. Simple enough to remember the exception once you learn
it, but it's not Perlish to make the user accommodate themselves to a
language rather than the other way around.

Not that I am undervaluing in any way your estimate of the difficulty of
the task.

--
Peter Scott
Pacific Systems Design Technologies

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2000

From [Unknown Contact. See original ticket]

Now, if that's not enough to strike fear into your heart, consider
fx($A[ fN($i++) ]{ fS($i++) }[ fN($i++) ]{ fS($i++) }, "stuff");

Not that my brain has any chance of encompassing a fix, but can we agree in
principle that it should be changed to the "expected" behavior if at all
possible?

I not think it is possible without breaking Perl, which, I would argue,
would *not* be expected behavior.

It violates the principle of least surprise; one just doesn't
expect the act of observing something to change it, or something else.

But that's not merely "observing" something. You're calling
functions. You're changing variables. I do not believe that you
can change this without altering the semantics of the language,
which guarantees that those operations will always be run, and that
they will always be *before* the function is invoked. Furthermore,
it is guaranteed that assigning to $_[0] will alter the scalar
originally passed. In order for such a scalar to be assignable,
Perl has to know where to assign it.

Here's another example​:

  sysrw(FH, $buff[$i][$j]);
  sub rw {
  if ($READING) {
  sysread $_[0], $_[1], $BUZSIZE;
  }
  else {
  syswrite $_[0], $_[1];
  }
  }

Feel free to invent an understandable semantic here, but right now,
I cannot see how you can get what you want. There are fundamental
flaws that I cannot see how to overcome. Just dealing with something
as simple as $x[fx()][fy()] will drive you crazy. Did you or did
you not call both of those functions? Right now, the language
behaves completely deterministically. In a WYSIWYG display, those
functions are indeed called, and they are called right in the act
of expression evaluation.

It's all very well and good to say that since

  $a[$i] = "V" if $b[$i];

doesn't create $b[$i] but does create $a[$i], and since

  $a[$i][$j] = "V" if $b[$i][$j];

doesn't create $b[$i][$j] but does create $a[$i][$j], that this
latter example should apparently lead one to believe that it should
also create only $a[$i] yet not $b[$i].

This is easy to voice, but that's not the hard problem.

I don't understand how you can make this "work" without forbidding
such things.

  $a[fn($i)][fn($j)] = "V" if $b[fn($i)][fn($j)];

(Remember also that $i and $j may have been modified, or other
side-effects incurred, and that this is guaranteed not to peter out
and short circuit.)

And I don't know how you can make

  if (fn($a[$i][$j][$k]) { ... }

work, where fn() may sometimes want to assign to its argument--the
scalar sitting at the kth slot, not at the ith or jth ones.

One other crippling concern here is that you cannot do lexical
analysis on fn() to know what it's going to do, nor even what it
*could* do! These fall in areas provably unsolvable for halting-problem
reasons, the possibility of dynamic code generation and consequent
late-binding, little autoloading annoyances, indirect function calls
such as from method calls or CODE refs, and because of externally
linked C-coded subroutines.

If you have some idea for how all this could possibly be done without
breaking Perl, do tell. Maybe it's obvious, but I just don't see
it. Meanwhile, the best I can think to do is document this.
Again. :-( As one cannot hope to document it in all possible
functions and expressions, probably it should go in under the
references section. There's a section in perlref mentioning lvalue
autovivification. One might consider expanding this.

--tom

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2000

From [Unknown Contact. See original ticket]

On Thu, Mar 09, 2000 at 11​:44​:56AM +0000, Nicholas Clark wrote​:

Aren't we arguing semantics here?

Looks like *you* are. ;-)

The current behaviour is no bug of perl as implemented; behaviour and
documentation are consistent.

This still does not make it a non-bug. We do not document when things
are autovivified, but this would not allow

  $a = $b + 3;

making $b into ['added', 'to',3] if $b were undef.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2000

From [Unknown Contact. See original ticket]

Tom Christiansen <tchrist@​chthon.perl.com>, writes​:
|| >>Now, if that's not enough to strike fear into your heart, consider
|| >> fx($A[ fN($i++) ]{ fS($i++) }[ fN($i++) ]{ fS($i++) }, "stuff");
||
|| >Not that my brain has any chance of encompassing a fix, but can we agree in
|| >principle that it should be changed to the "expected" behavior if at all
|| >possible?
||
|| I not think it is possible without breaking Perl, which, I would argue,
|| would *not* be expected behavior.

I was thinking it might be possible, but...

The first rule, though, would have to unstrike the fear from our hearts.
Avoiding auto-vivification should not short-circuit subscript evaluation.
So, in Tom's example above, the various increments of $i would all happen
even if nothing in @​A got auto-vivified.

In thinking about multi-level auto_vivification, though, I thought of a
case which may cause problems even to the current implementation (that
avoids auto-vivification of a subroutine argument unless it actually gets
used in an lvalue context.

In​:

  my $a = { yes=>1 };
 
  sub foo { $a->{maybe} = 2; }
 
  sub bar { foo }
 
  bar $a->{maybe};

Is there any conflict between the code that is determining that
the argument to bar does not need to be auto-vivified and the
code that actually does auto-vivify the same element in the
nested call to foo?

This sort of situation gets worse in a multi-level auto-vivify
scenario, since the code that makes the tentative auto-vivify
actually occur or not occur would have more places that might
have been independently vivified. (Change the argument to bar
to $a->{maybe}[10] and the assignment in foo to assign to
$a->{maybe}[1] for example.)

An even worse problem for the multi-level auto-vivify would occur
if the argument to bar were $a->{maybe}{hash} while the assignment
was to $a->{maybe}[1]. Then there is a time frame where the two
auto-vivifications are in conflict. If the argument to the sub
were assigned to, the assignment in foo would be illegal. After
the assignment in foo, the argument to bar can no longer be legally
auto-vivified. That has significant effects on how the multi-level
potential auto-vivify could be implemented - different ways would
manifest themselves in different ways in what they determined to be
illegal and when.

|| Feel free to invent an understandable semantic here, but right now,
|| I cannot see how you can get what you want. There are fundamental
|| flaws that I cannot see how to overcome. Just dealing with something
|| as simple as $x[fx()][fy()] will drive you crazy. Did you or did
|| you not call both of those functions? Right now, the language
|| behaves completely deterministically. In a WYSIWYG display, those
|| functions are indeed called, and they are called right in the act
|| of expression evaluation.

That deterministic behaviour should remain - both functions
should be called, even in an rvalue context which was made to
not auto-vivify $x[fx()]. fy() would be called regardless.

|| And I don't know how you can make
||
|| if (fn($a[$i][$j][$k]) { ... }
||
|| work, where fn() may sometimes want to assign to its argument--the
|| scalar sitting at the kth slot, not at the ith or jth ones.

I can see two ways, but both have problems. Either the argument
to fn has a kind of magic set that causes the multiple levels of
auto-vivify to occur the first time the argument is used in an
lvalue context, or else the caller does a tentative auto-vivify
and undoes it after the return.

|| One other crippling concern here is that you cannot do lexical
|| analysis on fn() to know what it's going to do, nor even what it
|| *could* do! These fall in areas provably unsolvable for halting-problem
|| reasons, the possibility of dynamic code generation and consequent
|| late-binding, little autoloading annoyances, indirect function calls
|| such as from method calls or CODE refs, and because of externally
|| linked C-coded subroutines.

You cannot require such analysis to be done. You can however use
such analysis, when it can be done, to provide optimizations. In
my example above, if $a where a my variable declared after the foo
and bar subs and not visible to any other subs, the compiler could
know that there could be no alternate vivifications and use a much
simpler form that only worried about whether the entire chain of
auto-vivifies was done. When there is potential aliasing, as in
foo above, a slower form of argument that dealt with issues of
partial and conflicting auto-vivification occurring would be used.

|| If you have some idea for how all this could possibly be done without
|| breaking Perl, do tell. Maybe it's obvious, but I just don't see
|| it. Meanwhile, the best I can think to do is document this.
|| Again. :-( As one cannot hope to document it in all possible
|| functions and expressions, probably it should go in under the
|| references section. There's a section in perlref mentioning lvalue
|| autovivification. One might consider expanding this.

Document it for 5.6. There is the possibility (but I'm rather
pessimistic) of really dealing with it for 5.7. Certainly it is
not obvious how to deal with the entire problem; although I think
that the rule of "never short-circuit subscript evaluation solely
due to auto-vivification considerations" does take care of one
field of landmines.

--
John Macdonald jmm@​jmm.pickering.elegant.com

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2000

From [Unknown Contact. See original ticket]

John Macdonald wrote​:

I can see two ways, but both have problems. Either the argument
to fn has a kind of magic set that causes the multiple levels of
auto-vivify to occur the first time the argument is used in an
lvalue context, or else the caller does a tentative auto-vivify
and undoes it after the return.

A third possiblilty (and I acknowledge that this is an extremely nasty
kludge ;) would be to add (yet another) flag​:

When a thingy is auto-vivified in a (for want of a better term) chained
rvalue context, set a flag which would cause that thingy to report
itself as undefined (and, for hash elements, non-existent) on subsequent
rvalue accesses. Clear the flag on first lvalue access. An amalgamation
of your two possibilities, if you like; the possibility of really
undoing the auto-vivifications to free up memory remains should it be
deemed safe...

I'm assuming, without looking at the code, that Perl can (or could) tell
the difference at auto-vivification time between, say, C<$foo{BAR}=1>
and C<$baz=$foo{BAR}{FNORD}>.

Pete
--
use Disclaimer​::Standard; # Motorola GSM Software Factory
my ($phone, $fax)=map {"+44 1793 $_"} 564450, 566918;
"'Not twisted,' Salzy once said of her own passion, 'it is helical. That
sounds better.'"

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2000

From [Unknown Contact. See original ticket]

Pete Jordan <pjordan1@​email.mot.com> writes​:
|| John Macdonald wrote​:
||
|| > I can see two ways, but both have problems. Either the argument
|| > to fn has a kind of magic set that causes the multiple levels of
|| > auto-vivify to occur the first time the argument is used in an
|| > lvalue context, or else the caller does a tentative auto-vivify
|| > and undoes it after the return.
||
|| A third possiblilty (and I acknowledge that this is an extremely nasty
|| kludge ;) would be to add (yet another) flag​:
||
|| When a thingy is auto-vivified in a (for want of a better term) chained
|| rvalue context, set a flag which would cause that thingy to report
|| itself as undefined (and, for hash elements, non-existent) on subsequent
|| rvalue accesses. Clear the flag on first lvalue access. An amalgamation
|| of your two possibilities, if you like; the possibility of really
|| undoing the auto-vivifications to free up memory remains should it be
|| deemed safe...

That's essentially what I figured would be done for the tentative
creation (so that it would known whether it is necessary to blow
it away again after the return). I hadn't thought of leaving the
tentatively vivified chunk around even if it hadn't become truly
vivified - that would be a big win if you had a loop calling subs
with the same unvivified element since it wouldn't have to be
created and destroyed each time. There would be a wasted space
cost most of the time, though, and removing them later gets tricky
(determining that a tentative item is really available to be freed
requires confirming that no parent subroutine has it as an argument
that might be lvalue-established when the parent is returned to).

--
John Macdonald jmm@​jmm.pickering.elegant.com

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2000

From [Unknown Contact. See original ticket]

John Macdonald writes​:

The first rule, though, would have to unstrike the fear from our hearts.
Avoiding auto-vivification should not short-circuit subscript evaluation.

How could it?

perl -Ilib -MO=Terse,exec -e "$a->[2][3]"

SVOP (0x48c08) gv GV (0x47958) *a
UNOP (0xf2888) rv2sv
UNOP (0x114608) rv2av [1]
SVOP (0x114388) const IV (0x31200) 2
BINOP (0x114748) aelem
UNOP (0x112e48) rv2av [2]
SVOP (0x1141c8) const IV (0xd4c14) 3
BINOP (0x112d88) aelem

The only change would be omiting the rv2av nodes, and changing aelem
to aelem_undef_ok.

In thinking about multi-level auto_vivification, though, I thought of a
case which may cause problems even to the current implementation (that
avoids auto-vivification of a subroutine argument unless it actually gets
used in an lvalue context.

This (postponed autovivication) was a questionable "feature". But if
we suppose we *want* to make an argument to a function to be
not-an-lvalue (as it should be!), then a complicated scheme similar to
what you propose may be needed. We need aelem_undef_postpone which
would propagate undefs supplied with an autovivication magic.

This sort of situation gets worse in a multi-level auto-vivify
scenario,

Do not think so. Some recursion in the autovivication magic will do it.

An even worse problem for the multi-level auto-vivify would occur
if the argument to bar were $a->{maybe}{hash} while the assignment
was to $a->{maybe}[1]. Then there is a time frame where the two
auto-vivifications are in conflict.

Does not matter. Autovivication magic would check that what it
autovivifies is undef or already of correct type. If it is not, it
will croak.

Document it for 5.6. There is the possibility (but I'm rather
pessimistic) of really dealing with it for 5.7. Certainly it is
not obvious how to deal with the entire problem;

Certainly it is not any bit harder than what we already do.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Mar 11, 2000

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:
|| John Macdonald writes​:
|| > The first rule, though, would have to unstrike the fear from our hearts.
|| > Avoiding auto-vivification should not short-circuit subscript evaluation.
||
|| How could it?

If the compiler is being changed so that in an rvalue context,
no autovivification occurs (instead of the current situation
where the terminating auto-vivification does not occur but
earlier ones do), then an "obvious" optimization would be to
short-circuit evaluation of subscript index values after the
initial undefined one was found​:

  @​a = ( 1 );
  $x = $a[2][$i+$j];

As soon as the index 2 is used, the result of the right hand side
is known to be undef and it is tempting to skip evaluating $i+$j
since that result will have no effect. But if it were $i++
instead, there is a side-effect and short-circuiting the
side-effect would be bad. As Tom worried, it would make it
difficult to write correct programs or understand programs, since
you might not know until run-time just where this short-circuit
would be applied.

Anyhow, this is not going to happen for 5.6, so let's drop this
discussion for now.

--
John Macdonald jmm@​jmm.pickering.elegant.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant