Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SV *Perl_cv_const_sv_or_av(const CV *const): Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed (op.c:7926) #15548

Open
p5pRT opened this issue Aug 24, 2016 · 10 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Aug 24, 2016

Migrated from rt.perl.org#129068 (status was 'open')

Searchable as RT129068$

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2016

From @geeknik

v5.25.4-5-g92d73bf

./perl -e 'my __PACKAGE__(&p0000;0;p0000'

perl​: op.c​:7926​: Perl_cv_const_sv_or_av​: Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff6cf2067 in __GI_raise (sig=sig@​entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c​:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c​: No such file or directory.
(gdb) bt
#0 0x00007ffff6cf2067 in __GI_raise (sig=sig@​entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c​:56
#1 0x00007ffff6cf3448 in __GI_abort () at abort.c​:89
#2 0x00007ffff6ceb266 in __assert_fail_base (fmt=0x7ffff6e24238 "%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
  assertion=assertion@​entry=0x6023e8 "((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM",
  file=file@​entry=0x6c0ae2 "op.c", line=line@​entry=7926,
  function=function@​entry=0x6157b0 <__PRETTY_FUNCTION__.18188> "Perl_cv_const_sv_or_av") at assert.c​:92
#3 0x00007ffff6ceb312 in __GI___assert_fail (
  assertion=assertion@​entry=0x6023e8 "((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM",
  file=file@​entry=0x6c0ae2 "op.c", line=line@​entry=7926,
  function=function@​entry=0x6157b0 <__PRETTY_FUNCTION__.18188> "Perl_cv_const_sv_or_av") at assert.c​:101
#4 0x0000000000426f1d in Perl_cv_const_sv_or_av (cv=<optimized out>) at op.c​:7926
#5 0x0000000000476323 in Perl_yylex () at toke.c​:7154
#6 0x000000000048a223 in Perl_yyparse (gramtype=103) at perly.c​:334
#7 0x0000000000450cc8 in S_parse_body (env=env@​entry=0x0, xsinit=xsinit@​entry=0x421940 <xs_init>) at perl.c​:2373
#8 0x000000000045285d in perl_parse (my_perl=<optimized out>, xsinit=xsinit@​entry=0x421940 <xs_init>, argc=3, argv=0x7fffffffe6a8,
  env=env@​entry=0x0) at perl.c​:1689
#9 0x00000000004217b0 in main (argc=3, argv=0x7fffffffe6a8, env=0x7fffffffe6c8) at perlmain.c​:121

@p5pRT
Copy link
Author

p5pRT commented Jan 29, 2017

From zefram@fysh.org

Brian Carpenter wrote​:

./perl -e 'my __PACKAGE__(&p0000;0;p0000'

Reduces to 'my main(&z;0;z'. This failure mode raises wider issues that
need to be decided in order to determine how to fix it. It's all about
how my(...) lists are parsed.

Where a single item is being lexicalised, as in "my $x", the item is
syntactically required to be a scalar, array, or hash. Thus "my &z"
is rejected early on. But where a parenthesised list of items is being
lexicalised, the syntax permits the parens to contain any expression
whatsoever. The restriction on what can be lexicalised is instead
implemented by walking the optree of the completed list, checking that
it semantically only contains acceptable items. Hence this difference
in diagnostics​:

$ ./perl -le 'my &z'
syntax error at -e line 1, near "my &z"
Execution of -e aborted due to compilation errors.
$ ./perl -le 'my(&z)'
Can't declare subroutine entry in "my" at -e line 1, at EOF
Execution of -e aborted due to compilation errors.

Actually it's a little more complicated because things like "my $$p"
are syntactically valid but subject to the same sematic check. I haven't
managed to do anything really interesting with that, so I won't consider
it further.

If the list is well behaved, the difference in the mode of checking
doesn't matter. But it's possible for a carefully crafted list to
sneak contraband past the semantic check, where the syntactic check
would have caught it. The danger in skipping the check is that things
that were syntactically mentioned in the list are declared as lexicals
even if they are not valid for this purpose. They get added to the pad
optimistically, by the lexer, while in a my(...) list, with the lexer
relying on the parser's checks to reject invalid stuff before any code
can be affected by the declaration. In particular, "&z" is good enough
to get this provisional declaration behaviour, even though it's never
valid in this kind of "my" expression.

In the case with which this ticket is concerned (and using my minimised
form of it), the partial list "(&z" is enough to get a pad entry named
"&z". Omitting the closing paren succeeds in skipping the semantic check,
because the list is never syntactically complete. The later instance of
"z" is then looked up in the pad, which is a problem because it's not a
fully-formed lexical sub. The lexer wants to see whether it's a constant
sub, but it picks up the type-constraining stash instead of an actual CV,
leading to the assertion failure. This failure mode is rather tricky,
because skipping the semantic check came at the cost of creating a syntax
error, so the details are tied up in the parser's error recovery.

A more enlightening way to skip the semantic check is to use a
conditional expression with constant condition. The false branch of the
conditional doesn't appear in the optree, and so doesn't get checked for
acceptability, but the lexer saw it. Putting aside the lexical-sub case
for a minute, consider what mischief we can get up to with just scalars.
What should this program output​:

  sub foo {
  my (1 ? $x : $y);
  $y++;
  print $y;
  }
  $y = 5;
  foo; foo;

If the lexical declaration were merely "my ($x)" then all instances of
$y would refer to the package variable, and we'd get "67". If it were
"my ($x, $y)" then we'd get "11", incrementing $y from undef afresh on
each call. What we actually get is "12". The "my" has the effect of
declaring $y as a lexical, so the later uses of $y in the sub refer to
the lexical. But with $y having been elided from the optree of the "my"
expression, it doesn't get reset for each call. The lexical declaration
acts as "my $x; my $y if 0", but doesn't get the deprecation warning that
"if 0" provokes.

The deparser doesn't handle this situation. It goes by the optree, and
so the "my (1 ? $x : $y)" is emitted as "my $x", but the later references
in the sub to the lexical $y are still emitted as "$y". Compiling the
resulting code loses the lexicalness of $y. The same problem happens
with "my $y if 0", except that in that case the deparser emits "'???'",
at least showing that something was optimised out.

Getting back to "&z", we can use the conditional to evoke the same
failure mode without a syntax error​:

$ perl -le 'my main (1 ? $x : &z); z'
perl​: op.c​:8062​: Perl_cv_const_sv_or_av​: Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed.
Abort

We can also get a related failure by omitting the type restriction​:

$ perl -le 'my (1 ? $x : &z); z'
perl​: pp.c​:183​: Perl_pp_clonecv​: Assertion `protocv' failed.
Abort

There are even failures without any subsequent reference to the broken
lexical sub​:

$ perl -le 'my (1 ? $x : &z);'
perl​: pp.c​:183​: Perl_pp_clonecv​: Assertion `protocv' failed.
Abort
$ perl -le 'my main (1 ? $x : &z);'
Segmentation fault

Note that the latter SEGVs even in a debugging build, rather than
asserting.

We need to decide how to treat these conditionals in "my" lists.
Determine it for the (relatively) easy cases, and that will point to
how to handle the tricky case with which this ticket started.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jan 29, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2017

From @iabyn

On Sun, Jan 29, 2017 at 02​:27​:31AM +0000, Zefram wrote​:

Where a single item is being lexicalised, as in "my $x", the item is
syntactically required to be a scalar, array, or hash. Thus "my &z"
is rejected early on. But where a parenthesised list of items is being
lexicalised, the syntax permits the parens to contain any expression
whatsoever. The restriction on what can be lexicalised is instead
implemented by walking the optree of the completed list, checking that
it semantically only contains acceptable items. Hence this difference
in diagnostics​:

$ ./perl -le 'my &z'
syntax error at -e line 1, near "my &z"
Execution of -e aborted due to compilation errors.
$ ./perl -le 'my(&z)'
Can't declare subroutine entry in "my" at -e line 1, at EOF
Execution of -e aborted due to compilation errors.

Is there any reason why a my() list can't be enforced (in a strict manner)
by the lexer / grammar itself rather than post-hoc by optree inspection?

--
"Foul and greedy Dwarf - you have eaten the last candle."
  -- "Hordes of the Things", BBC Radio.

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2017

From zefram@fysh.org

Dave Mitchell wrote​:

Is there any reason why a my() list can't be enforced (in a strict manner)
by the lexer / grammar itself rather than post-hoc by optree inspection?

If we were building this from scratch then grammatical enforcement
would clearly be a good idea, with only the minor downside that we
need to duplicate the productions for list syntax. Having already
got into the present state, though, we can't just casually change it.
It's still a good direction to go in, but we'd need a deprecation cycle,
in case anyone is using syntactically strange stuff to legitimate effect.
I can't imagine that there's any usage in that category that we'd actually
want to preserve.

A deprecation to narrow a grammatical production like this is a tricky
thing to arrange. perl is not able to retry parsing of arbritrary
code. I fear that we would need to duplicate a lot of the expression
productions, putting deprecation warnings on all the productions other
than the small group of approved items. It would be way more hassle
than the qw-as-list deprecation of a few years ago.

A grammatical restriction on the list content would sort out these issues
with conditionals and subroutines. There might remain issues with being
able to sneak stuff into single "my" items, although I wasn't able to
find such a problem. If one turns up, we could do a similar deprecation
to narrow the syntax of what's permitted in each "my" item.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

From @cpansprout

On Tue, 28 Mar 2017 16​:17​:43 -0700, zefram@​fysh.org wrote​:

Dave Mitchell wrote​:

Is there any reason why a my() list can't be enforced (in a strict manner)
by the lexer / grammar itself rather than post-hoc by optree inspection?

If we were building this from scratch then grammatical enforcement
would clearly be a good idea, with only the minor downside that we
need to duplicate the productions for list syntax. Having already
got into the present state, though, we can't just casually change it.
It's still a good direction to go in, but we'd need a deprecation cycle,
in case anyone is using syntactically strange stuff to legitimate effect.
I can't imagine that there's any usage in that category that we'd actually
want to preserve.

One that I can think of is unary plus. It’s not likely to occur often, but it is often used for disambiguation and could conceivably end up in a my() list after refactoring (and somebody forgot to remove the +, but the code worked anyway).

Yes, this is only theoretical. I don‘t know whether it is worth preserving that. my +$x is an error.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

One that I can think of is unary plus.

Ah, indeed, that could appear in a legitimate program. It's not a usage
to preserve, there being no need for such disambiguation in this context.
(If we did want to preserve it, for consistency a similar unary plus
ought to be permitted on parameters declared in subroutine signatures.)

A related usage is nested parens. These are legal, don't break anything,
and could arise by accident from refactoring. But, like unary plus,
they also don't achieve anything in the "my" context, and it's not worth
the complexity of preserving their permissibility.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

From @iabyn

On Wed, Mar 29, 2017 at 03​:27​:41AM +0100, Zefram wrote​:

Father Chrysostomos via RT wrote​:

One that I can think of is unary plus.

Ah, indeed, that could appear in a legitimate program. It's not a usage
to preserve, there being no need for such disambiguation in this context.
(If we did want to preserve it, for consistency a similar unary plus
ought to be permitted on parameters declared in subroutine signatures.)

A related usage is nested parens. These are legal, don't break anything,
and could arise by accident from refactoring. But, like unary plus,
they also don't achieve anything in the "my" context, and it's not worth
the complexity of preserving their permissibility.

Is there any reason we couldn't just add a check to Perl_localize,
Perl_my_attrs (ant maybe a few other places), which gives a deprecation
warning if o isn't of one or two simple forms (like
list/pushmark/(pad[sah]v x n) ?

--
This is a great day for France!
  -- Nixon at Charles De Gaulle's funeral

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2017

From zefram@fysh.org

Dave Mitchell wrote​:

Is there any reason we couldn't just add a check to Perl_localize,
Perl_my_attrs (ant maybe a few other places), which gives a deprecation
warning if o isn't of one or two simple forms (like
list/pushmark/(pad[sah]v x n) ?

That's effectively what we've already got. It's an error rather
than a warning, and the check is in S_my_kid() which is called
from Perl_my_attrs(). This ticket is concerned with items that have
problematic effects when lexed in a "my" list but duck this semantic check
(by causing a parse error or by not leaving any evidence in the optree).

-zefram

@demerphq
Copy link
Collaborator

demerphq commented Sep 8, 2022

The examples zefram provided still segfault or assert in 5.37.4 and 5.34.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants