Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBC Commit 73b9584 breaks Roman::Unicode #16915

Closed
p5pRT opened this issue Apr 1, 2019 · 11 comments
Closed

BBC Commit 73b9584 breaks Roman::Unicode #16915

p5pRT opened this issue Apr 1, 2019 · 11 comments
Assignees
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)

Comments

@p5pRT
Copy link

p5pRT commented Apr 1, 2019

Migrated from rt.perl.org#133970 (status was 'open')

Searchable as RT133970$

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2019

From @khwilliamson

The commit involved moving user-defined properties into core. It is a
separate issue from #16850 which stems from the same commit, but
the cause is actually that module inadvertently relying on a bug that
this commit fixes.

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2019

From @khwilliamson

On Sun, 31 Mar 2019 21​:50​:32 -0700, public@​khwilliamson.com wrote​:

This is a bug report for perl from khw@​khw-XPS-
8930.hsd1.co.comcast.net,
generated with the help of perlbug 1.40 running under perl 5.26.1.

-----------------------------------------------------------------
The commit involved moving user-defined properties into core. It is a
separate issue from [perl #133860] which stems from the same commit,
but
the cause is actually that module inadvertently relying on a bug that
this commit fixes.
-----------------------------------------------------------------
---
Flags​:
category=core
severity=high
---

I wrote this ticket with the system perl info instead of blead's. But it is actually irrelevant.

Tony Cook++ tracked down the cause of this for me. The stack needed to pushed/popped around the call to the user-defined perl subroutine. But then he wanted to see what the code this replaces does, and it turns out it does several more things, which are similar to what the code block regex code does, which is similar to what other places do that call out to perl code.

But not exact

So is it a bug that for code blocks, there is no SAVEHINTS, or that it pushes (the undocumented) PERLSI_REQUIRE instead of (the undocumented) PERLSI_MAGIC? And when should one use save_re_context()? My guess is that SAVETMPS and FREETMPS are just niceties, but I don't know.

None of this is obvious to me. My guess is that the disparities in calling things are actually bugs, and there should be a routine that standardizes things, or at least documentation as to what to do.

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2019

From @tonycoz

On Sun, 31 Mar 2019 22​:12​:02 -0700, khw wrote​:

On Sun, 31 Mar 2019 21​:50​:32 -0700, public@​khwilliamson.com wrote​:

This is a bug report for perl from khw@​khw-XPS-
8930.hsd1.co.comcast.net,
generated with the help of perlbug 1.40 running under perl 5.26.1.

-----------------------------------------------------------------
The commit involved moving user-defined properties into core. It is
a
separate issue from [perl #133860] which stems from the same commit,
but
the cause is actually that module inadvertently relying on a bug that
this commit fixes.
-----------------------------------------------------------------
---
Flags​:
category=core
severity=high
---

I wrote this ticket with the system perl info instead of blead's. But
it is actually irrelevant.

Tony Cook++ tracked down the cause of this for me. The stack needed
to pushed/popped around the call to the user-defined perl subroutine.
But then he wanted to see what the code this replaces does, and it
turns out it does several more things, which are similar to what the
code block regex code does, which is similar to what other places do
that call out to perl code.

But not exact

So is it a bug that for code blocks, there is no SAVEHINTS, or that it
pushes (the undocumented) PERLSI_REQUIRE instead of (the undocumented)
PERLSI_MAGIC? And when should one use save_re_context()? My guess is
that SAVETMPS and FREETMPS are just niceties, but I don't know.

None of this is obvious to me. My guess is that the disparities in
calling things are actually bugs, and there should be a routine that
standardizes things, or at least documentation as to what to do.

They're required because you're doing something unusual - calling a perl sub while compiling a regular expression, *or* while matching[1] a regular expression.

For compile-time the SAVEHINTS is required so that changes to $^H or %^H don't result in lexical changes to the code being compiled (like turning off strict.)

The PUSHSTACK is needed because the code seems to be called from places that don't expect the stack to move. I'm not sure exactly where that is in this case, but it's an issue for various types of magic, eg.

  pv = SvPV(somesv); /* can call get or overload magic */

but since that's the uncommon case it's best to optimize for magic not being called and put the magic code through the extra work of switching stacks.

The re_save_context() is more a run-time issue, so that the code you call doesn't modify $1 etc from underneath the caller.

All that said, I don't know that I would have realized this was necessary if I was writing the code.

Tony

[1] IIRC from a #p5p conversation a while back, the user sub can be called at runtime if it wasn't defined at regular expression compile-time. It did appear to be called at runtime when I was debugging this issue.

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2019

From @iabyn

On Mon, Apr 01, 2019 at 04​:34​:33PM -0700, Tony Cook via RT wrote​:

On Sun, 31 Mar 2019 22​:12​:02 -0700, khw wrote​:

None of this is obvious to me. My guess is that the disparities in
calling things are actually bugs, and there should be a routine that
standardizes things, or at least documentation as to what to do.

Basically what Tony says below.

They're required because you're doing something unusual - calling a perl
sub while compiling a regular expression, *or* while matching[1] a
regular expression.

For compile-time the SAVEHINTS is required so that changes to $^H or %^H
don't result in lexical changes to the code being compiled (like turning
off strict.)

The PUSHSTACK is needed because the code seems to be called from places
that don't expect the stack to move. I'm not sure exactly where that is
in this case, but it's an issue for various types of magic, eg.

pv = SvPV(somesv); /* can call get or overload magic */

Yeah, I think the "stack-of-stacks" mechanism was introduced when it was
realised that calling out to magic or overload methods could extend and
relloc() the stack from under you. But we still try very hard in the
caller code to be immune to that too, via PUTBACK/SPAGAIN etc. So a sort
of belt and braces approach.

The re_save_context() is more a run-time issue, so that the code you
call doesn't modify $1 etc from underneath the caller.

s/re_save_context/save_re_context/

I remember save_re_context() being a bit of a hack, but can't remember
why.

On Sun, 31 Mar 2019 22​:12​:02 -0700, khw wrote​:

So is it a bug that for code blocks, there is no SAVEHINTS, or that it
pushes (the undocumented) PERLSI_REQUIRE instead of (the undocumented)
PERLSI_MAGIC? And when should one use save_re_context()? My guess is
that SAVETMPS and FREETMPS are just niceties, but I don't know.

The regex code-block calling mechanism (?{...}) is very specialised,
because its optimised to call the same code block many times in quick
succession, and because the design of code blocks decreed that there is
no local scope, so in (?{ local $x = ... }) the locals are only undone
at the end of the match, or during backtracking, not at the end of executing an individual code block.

In general, when calling code you should do ENTER/LEAVE and
SAVETMPS/FREETMPS to ensure things are cleaned up in a timely manner.
Otherwise destructors may not get called until the caller also exits its
own scope, and multiple calls to the code may accumulate temps which
again aren't freed until the caller exits.

--
The Enterprise is involved in a bizarre time-warp experience which is in
some way unconnected with the Late 20th Century.
  -- Things That Never Happen in "Star Trek" #14

@p5pRT
Copy link
Author

p5pRT commented Apr 10, 2019

From @khwilliamson

On 4/3/19 4​:39 AM, Dave Mitchell wrote​:

On Mon, Apr 01, 2019 at 04​:34​:33PM -0700, Tony Cook via RT wrote​:

On Sun, 31 Mar 2019 22​:12​:02 -0700, khw wrote​:

None of this is obvious to me. My guess is that the disparities in
calling things are actually bugs, and there should be a routine that
standardizes things, or at least documentation as to what to do.

Basically what Tony says below.

They're required because you're doing something unusual - calling a perl
sub while compiling a regular expression, *or* while matching[1] a
regular expression.

For compile-time the SAVEHINTS is required so that changes to $^H or %^H
don't result in lexical changes to the code being compiled (like turning
off strict.)

The PUSHSTACK is needed because the code seems to be called from places
that don't expect the stack to move. I'm not sure exactly where that is
in this case, but it's an issue for various types of magic, eg.

pv = SvPV(somesv); /* can call get or overload magic */

Yeah, I think the "stack-of-stacks" mechanism was introduced when it was
realised that calling out to magic or overload methods could extend and
relloc() the stack from under you. But we still try very hard in the
caller code to be immune to that too, via PUTBACK/SPAGAIN etc. So a sort
of belt and braces approach.

The re_save_context() is more a run-time issue, so that the code you
call doesn't modify $1 etc from underneath the caller.

s/re_save_context/save_re_context/

I remember save_re_context() being a bit of a hack, but can't remember
why.

On Sun, 31 Mar 2019 22​:12​:02 -0700, khw wrote​:

So is it a bug that for code blocks, there is no SAVEHINTS, or that it
pushes (the undocumented) PERLSI_REQUIRE instead of (the undocumented)
PERLSI_MAGIC? And when should one use save_re_context()? My guess is
that SAVETMPS and FREETMPS are just niceties, but I don't know.

The regex code-block calling mechanism (?{...}) is very specialised,
because its optimised to call the same code block many times in quick
succession, and because the design of code blocks decreed that there is
no local scope, so in (?{ local $x = ... }) the locals are only undone
at the end of the match, or during backtracking, not at the end of executing an individual code block.

In general, when calling code you should do ENTER/LEAVE and
SAVETMPS/FREETMPS to ensure things are cleaned up in a timely manner.
Otherwise destructors may not get called until the caller also exits its
own scope, and multiple calls to the code may accumulate temps which
again aren't freed until the caller exits.

The code in swash_init has this​:

/* We might get here via a subroutine signature which uses a utf8
  * parameter name, at which point PL_subname will have been set
* but not yet used. */
save_item(PL_subname);

I'm wondering if that could affect code blocks

@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2019

From @iabyn

On Wed, Apr 10, 2019 at 03​:21​:21PM -0600, Karl Williamson wrote​:

The code in swash_init has this​:

/* We might get here via a subroutine signature which uses a utf8 *
parameter name, at which point PL_subname will have been set
* but not yet used. */
save_item(PL_subname);

I'm wondering if that could affect code blocks

PL_subname is the name of the currently-compiling sub, so I can't really
see that pattern code-blocks, which are called at runtime, would be
affected. The swash stuff was a bit special because it could be called at
compile time.

--
I before E. Except when it isn't.

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2019

From @khwilliamson

The module now tests OK, so removing from blockers list. I still need to write a test so keeping the ticket open
--
Karl Williamson

@jkeenan
Copy link
Contributor

jkeenan commented Feb 3, 2020

The module now tests OK, so removing from blockers list. I still need to write a test so keeping the ticket open

Karl Williamson

@khwilliamson Have you written the test indicated above? If so, is this ticket closable?

Thank you very much.
Jim Keenan

@jkeenan jkeenan added the Closable? We might be able to close this ticket, but we need to check with the reporter label Feb 3, 2020
@toddr toddr added BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) and removed Closable? We might be able to close this ticket, but we need to check with the reporter labels Feb 17, 2020
@toddr toddr added this to the 5.32.0 milestone Feb 17, 2020
@xsawyerx
Copy link
Member

xsawyerx commented Apr 1, 2020

I think we can resolve this ticket and open a ticket for the test we want to write. I don't consider this a blocker anymore since the BBC tests pass.

Any strong objections?

@khwilliamson khwilliamson removed this from the 5.32.0 milestone Apr 3, 2020
@khwilliamson
Copy link
Contributor

I'm afraid I can't remember what test I was supposed to write, nor do I see any guidance in the ticket.

But, since then, there have been a bunch of tests written exercising this area of the code. So I am closing the ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)
Projects
None yet
Development

No branches or pull requests

5 participants