Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: swash_fetch got swatch of unexpected bit width #14600

Closed
p5pRT opened this issue Mar 18, 2015 · 18 comments
Closed

panic: swash_fetch got swatch of unexpected bit width #14600

p5pRT opened this issue Mar 18, 2015 · 18 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 18, 2015

Migrated from rt.perl.org#124109 (status was 'resolved')

Searchable as RT124109$

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

From citrin@citrin.ru

After upgrading to perl 5.20.2 my code fails with message​:
'panic​: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64'

Simple test case attached.
On perl 5.20.1 this code works without panic.

The bug is subtle and is not reproduced after small modifications to test case (like storing string in intermediate variable).

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

From citrin@citrin.ru

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
1400ква

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

From [Unknown Contact. See original ticket]

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
1400ква

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

From @Tux

On Wed Mar 18 08​:45​:28 2015, citrin.ru wrote​:

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on
perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc
$1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc
$1'
1400ква

=== base/tperl5.8.9 5.008009 x86_64-linux-thread-multi-ld
Unrecognized switch​: -E (-h will show valid options).
Exit status​: 512
=== base/perl5.10.0 5.010000 x86_64-linux
москва
=== base/tperl5.10.0 5.010000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.10.1 5.010001 x86_64-linux
москва
=== base/tperl5.10.1 5.010001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.0 5.011000 x86_64-linux
москва
=== base/tperl5.11.0 5.011000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.1 5.011001 x86_64-linux
москва
=== base/tperl5.11.1 5.011001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.2 5.011002 x86_64-linux
москва
=== base/tperl5.11.2 5.011002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.3 5.011003 x86_64-linux
москва
=== base/tperl5.11.3 5.011003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.4 5.011004 x86_64-linux
москва
=== base/tperl5.11.4 5.011004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.11.5 5.011005 x86_64-linux
москва
=== base/tperl5.11.5 5.011005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.0 5.012000 x86_64-linux
москва
=== base/tperl5.12.0 5.012000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.1 5.012001 x86_64-linux
москва
=== base/tperl5.12.1 5.012001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.2 5.012002 x86_64-linux
москва
=== base/tperl5.12.2 5.012002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.3 5.012003 x86_64-linux
москва
=== base/tperl5.12.3 5.012003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.4 5.012004 x86_64-linux
москва
=== base/tperl5.12.4 5.012004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.12.5 5.012005 x86_64-linux
москва
=== base/tperl5.12.5 5.012005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.0 5.013000 x86_64-linux
москва
=== base/tperl5.13.0 5.013000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.1 5.013001 x86_64-linux
москва
=== base/tperl5.13.1 5.013001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.2 5.013002 x86_64-linux
москва
=== base/tperl5.13.2 5.013002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.3 5.013003 x86_64-linux
москва
=== base/tperl5.13.3 5.013003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.4 5.013004 x86_64-linux
москва
=== base/tperl5.13.4 5.013004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.5 5.013005 x86_64-linux
москва
=== base/tperl5.13.5 5.013005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.6 5.013006 x86_64-linux
москва
=== base/tperl5.13.6 5.013006 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.7 5.013007 x86_64-linux
москва
=== base/tperl5.13.7 5.013007 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.8 5.013008 x86_64-linux
москва
=== base/tperl5.13.8 5.013008 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.9 5.013009 x86_64-linux
москва
=== base/tperl5.13.9 5.013009 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.10 5.013010 x86_64-linux
москва
=== base/tperl5.13.10 5.013010 x86_64-linux-thread-multi-ld
москва
=== base/perl5.13.11 5.013011 x86_64-linux
москва
=== base/tperl5.13.11 5.013011 x86_64-linux-thread-multi-ld
москва
=== base/perl5.14.0 5.014000 x86_64-linux
москва
=== base/tperl5.14.0 5.014000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.14.1 5.014001 x86_64-linux
москва
=== base/tperl5.14.1 5.014001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.14.2 5.014002 x86_64-linux
москва
=== base/tperl5.14.2 5.014002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.14.3 5.014003 x86_64-linux
москва
=== base/tperl5.14.3 5.014003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.14.4 5.014004 x86_64-linux
москва
=== base/tperl5.14.4 5.014004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.0 5.015000 x86_64-linux
москва
=== base/tperl5.15.0 5.015000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.1 5.015001 x86_64-linux
москва
=== base/tperl5.15.1 5.015001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.2 5.015002 x86_64-linux
москва
=== base/tperl5.15.2 5.015002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.3 5.015003 x86_64-linux
москва
=== base/tperl5.15.3 5.015003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.4 5.015004 x86_64-linux
москва
=== base/tperl5.15.4 5.015004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.5 5.015005 x86_64-linux
москва
=== base/tperl5.15.5 5.015005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.6 5.015006 x86_64-linux
москва
=== base/tperl5.15.6 5.015006 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.7 5.015007 x86_64-linux
москва
=== base/tperl5.15.7 5.015007 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.8 5.015008 x86_64-linux
москва
=== base/tperl5.15.8 5.015008 x86_64-linux-thread-multi-ld
москва
=== base/perl5.15.9 5.015009 x86_64-linux
москва
=== base/tperl5.15.9 5.015009 x86_64-linux-thread-multi-ld
москва
=== base/perl5.16.0 5.016000 x86_64-linux
москва
=== base/tperl5.16.0 5.016000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.16.1 5.016001 x86_64-linux
москва
=== base/tperl5.16.1 5.016001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.16.2 5.016002 x86_64-linux
москва
=== base/tperl5.16.2 5.016002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.16.3 5.016003 x86_64-linux
москва
=== base/tperl5.16.3 5.016003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.0 5.017000 x86_64-linux
москва
=== base/tperl5.17.0 5.017000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.1 5.017001 x86_64-linux
москва
=== base/tperl5.17.1 5.017001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.2 5.017002 x86_64-linux
москва
=== base/tperl5.17.2 5.017002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.3 5.017003 x86_64-linux
москва
=== base/tperl5.17.3 5.017003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.4 5.017004 x86_64-linux
москва
=== base/tperl5.17.4 5.017004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.5 5.017005 x86_64-linux
москва
=== base/tperl5.17.5 5.017005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.6 5.017006 x86_64-linux
москва
=== base/tperl5.17.6 5.017006 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.7 5.017007 x86_64-linux
москва
=== base/tperl5.17.7 5.017007 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.8 5.017008 x86_64-linux
москва
=== base/tperl5.17.8 5.017008 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.9 5.017009 x86_64-linux
москва
=== base/tperl5.17.9 5.017009 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.10 5.017010 x86_64-linux
москва
=== base/tperl5.17.10 5.017010 x86_64-linux-thread-multi-ld
москва
=== base/perl5.17.11 5.017011 x86_64-linux
москва
=== base/tperl5.17.11 5.017011 x86_64-linux-thread-multi-ld
москва
=== base/perl5.18.0 5.018000 x86_64-linux
москва
=== base/tperl5.18.0 5.018000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.18.1 5.018001 x86_64-linux
москва
=== base/tperl5.18.1 5.018001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.18.2 5.018002 x86_64-linux
москва
=== base/tperl5.18.2 5.018002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.18.3 5.018003 x86_64-linux
москва
=== base/tperl5.18.3 5.018003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.18.4 5.018004 x86_64-linux
москва
=== base/tperl5.18.4 5.018004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.0 5.019000 x86_64-linux
москва
=== base/tperl5.19.0 5.019000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.1 5.019001 x86_64-linux
москва
=== base/tperl5.19.1 5.019001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.2 5.019002 x86_64-linux
москва
=== base/tperl5.19.2 5.019002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.3 5.019003 x86_64-linux
москва
=== base/tperl5.19.3 5.019003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.4 5.019004 x86_64-linux
москва
=== base/tperl5.19.4 5.019004 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.5 5.019005 x86_64-linux
москва
=== base/tperl5.19.5 5.019005 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.6 5.019006 x86_64-linux
москва
=== base/tperl5.19.6 5.019006 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.7 5.019007 x86_64-linux
москва
=== base/tperl5.19.7 5.019007 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.8 5.019008 x86_64-linux
москва
=== base/tperl5.19.8 5.019008 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.9 5.019009 x86_64-linux
москва
=== base/tperl5.19.9 5.019009 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.10 5.019010 x86_64-linux
москва
=== base/tperl5.19.10 5.019010 x86_64-linux-thread-multi-ld
москва
=== base/perl5.19.11 5.019011 x86_64-linux
москва
=== base/tperl5.19.11 5.019011 x86_64-linux-thread-multi-ld
москва
=== base/perl5.20.0 5.020000 x86_64-linux
москва
=== base/tperl5.20.0 5.020000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.20.1 5.020001 x86_64-linux
москва
=== base/tperl5.20.1 5.020001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.20.2 5.020002 x86_64-linux
1400ква
=== base/tperl5.20.2 5.020002 x86_64-linux-thread-multi-ld
1400ква
=== base/perl5.21.0 5.021000 x86_64-linux
москва
=== base/tperl5.21.0 5.021000 x86_64-linux-thread-multi-ld
москва
=== base/perl5.21.1 5.021001 x86_64-linux
москва
=== base/tperl5.21.1 5.021001 x86_64-linux-thread-multi-ld
москва
=== base/perl5.21.2 5.021002 x86_64-linux
москва
=== base/tperl5.21.2 5.021002 x86_64-linux-thread-multi-ld
москва
=== base/perl5.21.3 5.021003 x86_64-linux
москва
=== base/tperl5.21.3 5.021003 x86_64-linux-thread-multi-ld
москва
=== base/perl5.21.4 5.021004 x86_64-linux
18a0ква
=== base/tperl5.21.4 5.021004 x86_64-linux-thread-multi-ld
18a0ква
=== base/perl5.21.5 5.021005 x86_64-linux
18a0ква
=== base/tperl5.21.5 5.021005 x86_64-linux-thread-multi-ld
18a0ква
=== base/perl5.21.6 5.021006 x86_64-linux
18a0ква
=== base/tperl5.21.6 5.021006 x86_64-linux-thread-multi-ld
18a0ква
=== base/perl5.21.7 5.021007 x86_64-linux
18a0ква
=== base/tperl5.21.7 5.021007 x86_64-linux-thread-multi-ld
18a0ква
=== base/perl5.21.8 5.021008 x86_64-linux
18a0ква
=== base/tperl5.21.8 5.021008 x86_64-linux-thread-multi-ld
18a0ква
=== base/perl5.21.9 5.021009 x86_64-linux
18a0ква
=== base/tperl5.21.9 5.021009 x86_64-linux-thread-multi-ld
18a0ква
=== /usr/bin/perl 5.010001 x86_64-linux-thread-multi
москва

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 18, 2015

From @iabyn

On Wed, Mar 18, 2015 at 08​:45​:29AM -0700, Anton V. Yuzhaninov via RT wrote​:

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
1400ква

It bisects to this​:

commit 2c1f00b
Author​: Yves Orton <demerphq@​gmail.com>
AuthorDate​: Thu Sep 11 21​:55​:08 2014 +0200
Commit​: Yves Orton <demerphq@​gmail.com>
CommitDate​: Thu Sep 11 22​:45​:31 2014 +0200

  perl #122747​: localize PL_curpm to null in _core_swash_init
 
  Set PL_curpm to null before we do any swash intialization
  in _core_swash_init(). This "hides" the current regop from the
  swash code, with the intent of prevent weird reentrancy bugs
  when the swashes are initialized.
 
  Long term you could argue that we should just not use the regex
  engine to initialize a swash, and then this would be unnecessary.
 
  Thanks to FC for the suggestion!

which was cherry-picked into 5.20.2 by
b0ed92c.

--
The Enterprise's efficient long-range scanners detect a temporal vortex
distortion in good time, allowing it to be safely avoided via a minor
course correction.
  -- Things That Never Happen in "Star Trek" #21

@p5pRT
Copy link
Author

p5pRT commented Mar 19, 2015

From @iabyn

On Wed, Mar 18, 2015 at 09​:13​:21PM +0000, Dave Mitchell wrote​:

On Wed, Mar 18, 2015 at 08​:45​:29AM -0700, Anton V. Yuzhaninov via RT wrote​:

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
1400ква

It bisects to this​:

I'm just about to have a look at this, in case anyone's already started.

--
Music lesson​: a symbiotic relationship whereby a pupil's embellishments
concerning the amount of practice performed since the last lesson are
rewarded with embellishments from the teacher concerning the pupil's
progress over the corresponding period.

@p5pRT
Copy link
Author

p5pRT commented Mar 20, 2015

From @iabyn

On Thu, Mar 19, 2015 at 03​:04​:08PM +0000, Dave Mitchell wrote​:

On Wed, Mar 18, 2015 at 09​:13​:21PM +0000, Dave Mitchell wrote​:

On Wed, Mar 18, 2015 at 08​:45​:29AM -0700, Anton V. Yuzhaninov via RT wrote​:

There is related issue with utf8 and regexps in perl 5.20.2

This oneliner works correctly on perl 5.20.1 and show wrong result on perl 5.20.2

perl 5.20.1 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
москва
perl 5.20.2 > perl -C -Mutf8 -E '$x = "Москва"; $x =~ /^(.*)$/; say lc $1'
1400ква

It bisects to this​:

I'm just about to have a look at this, in case anyone's already started.

I've now pushed the branch smoke-me/davem/save_re for smoking.
The top commit is shown below; two earlier commits revert removing
Perl_save_re_context().

For maint-5.20, I'm not sure whether its best to revert Yves'
2c1f00b, which fixes a longstanding "use re 'taint'" issue but
introduces a regression; or cherry-pick my fix, which fixes the
regression, leaves the longstanding issue fixed, but might introduce
other regressions - since it now scans %​:: decide which of $1,$2...
to localise, rather than using RX_NPARENS(PL_curpm).

commit 9c186b0
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Thu Mar 19 20​:26​:39 2015 +0000
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Fri Mar 20 11​:42​:13 2015 +0000

  Reinstate save_re_context()'s "local $n" action
 
  RT #124109.
 
  2c1f00b localised PL_curpm to NULL when calling swash init code
  (i.e. perl-level code that is loaded and executed when something
  like "lc $large_codepoint" is executed).
 
  b4fa55d followed this up by gutting Perl_save_re_context(), since
  that function did, basically,
 
  if (PL_curpm) {
  for (i = 1; i <= RX_NPARENS(PM_GETRE(PL_curpm))) {
  do the C equivalent of the perl code "local ${i}";
  }
  }
 
  and now that PL_curpm was null, the code wasn't called any more. However,
  it turns out that the localisation *was* still needed, it's just that
  nothing in the test suite actually tested for it.
 
  In something like the following​:
 
  $x = "\x{41c}";
  $x =~ /(.*)/;
  $s = lc $1;
 
  pp_lc() calls get magic on $1, which sets $1's PV value to a copy of the
  substring captured by the current pattern match.
  Then pp_lc() calls a function to convert the string to upper case, which
  triggers a swash load, which calls perl code that does a pattern match
  and, most importantly, uses the value of $1. This triggers get magic on
  $1, which overwrites $1's PV value with a new value. When control returns
  to pp_lc(), $1 now holds the wrong string value.
 
  Hence $1, $2 etc need localising as well as PL_curpm.
 
  The old way that Perl_save_re_context() used to work (localising
  $1..${RX_NPARENS}) is no longer viable, as we don't know what regex the
  caller has as its current pm any more. Instead I've made
  Perl_save_re_context() scan %​:: for any entries that match $`,$&amp;,$', $1,
  $2,.... and localise them.

--
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." -- Lady Croom, "Arcadia"

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2015

From @cpansprout

On Fri Mar 20 04​:56​:03 2015, davem wrote​:

I've now pushed the branch smoke-me/davem/save_re for smoking.
The top commit is shown below; two earlier commits revert removing
Perl_save_re_context().

For maint-5.20, I'm not sure whether its best to revert Yves'
2c1f00b, which fixes a longstanding "use re 'taint'" issue but
introduces a regression; or cherry-pick my fix, which fixes the
regression, leaves the longstanding issue fixed, but might introduce
other regressions - since it now scans %​:: decide which of $1,$2...
to localise, rather than using RX_NPARENS(PL_curpm).

I think for that reason it is problematic, not only for maint, but also for blead. We should not internally be resetting the iterator provided for Perl space. This would break​:

while(my($key,$val) = each %​::) {
  # read $1
}

If this is the only possible approach, then it seems that ‘manual’ iterator without using the iterator is the way to go. I think we already have code like that in dump.c.

pp_lc() calls get magic on $1, which sets $1's PV value to a copy of
the
substring captured by the current pattern match.
Then pp_lc() calls a function to convert the string to upper case,
which
triggers a swash load, which calls perl code that does a pattern match

Is that in utf8_heavy.pl?

and, most importantly, uses the value of $1. This triggers get magic
on
$1, which overwrites $1's PV value with a new value. When control
returns
to pp_lc(), $1 now holds the wrong string value.

Ow. That potentially affects any magical variable read by swash code. Until we find a cleaner solution, wouldn’t it be better to localise only $1, $2 and $3, which are all that utf8_heavy.pl uses? We could, as an interim measure, add a porting test (oh no!) that ensures we do not accidentally add $4 to utf8_heavy.pl without also localising it.

Also, if we handle this issue when swashes are loaded, rather than in the regexp engine, wouldn’t the localisation take place less often? Doing the localisation in a different place perhaps should way, though, till after 5.22. After all, we *know* it works in the current spot.

Then pp_lc() calls a function to convert the string to upper case,

Surely it converts it to lowercase.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2015

From @iabyn

On Tue, Mar 24, 2015 at 11​:28​:27PM -0700, Father Chrysostomos via RT wrote​:

pp_lc() calls get magic on $1, which sets $1's PV value to a copy of
the
substring captured by the current pattern match.
Then pp_lc() calls a function to convert the string to upper case,
which
triggers a swash load, which calls perl code that does a pattern match

Is that in utf8_heavy.pl?

Yes, it's the method SWASHNEW() in utf8_heavy.pl.
But note that the swash mechanism loads utf8.pm, which has an AUTOLOAD
method, so that when SWASHNEW() is called, utf8_heavy.pl is required,
which also loads strict, warnings and re. So quite a bit a code is
required and/or executed on first swash loading.

and, most importantly, uses the value of $1. This triggers get magic
on
$1, which overwrites $1's PV value with a new value. When control
returns
to pp_lc(), $1 now holds the wrong string value.

Ow. That potentially affects any magical variable read by swash code.
Until we find a cleaner solution, wouldn’t it be better to localise only
$1, $2 and $3, which are all that utf8_heavy.pl uses? We could, as an
interim measure, add a porting test (oh no!) that ensures we do not
accidentally add $4 to utf8_heavy.pl without also localising it.

Also, if we handle this issue when swashes are loaded, rather than in
the regexp engine, wouldn’t the localisation take place less often?
Doing the localisation in a different place perhaps should way, though,
till after 5.22. After all, we *know* it works in the current spot.

Note that it isn't just swash code. Any code that "unexpectedly" calls
into Perl from C can have similar side effects, for example ties​:

  sub TIESCALAR { bless [] }
  sub FETCH {
  "xyz" =~ /(...)/ or die;
  my $x = $1;
  "foo";
  }
  my $s;
  tie $s, 'main';
  "abc" =~ /(...)/ or die;
  my $x = $1 . $s;
  print "x=[$x]\n";

This outputs "xyzfoo" rather than "abcfoo".

Note also that save_re_context() isn't called just when loading swatches,
but at other places, such as calling PL_warnhook/diehook. So just testing
utf8_heavy.pl (and the modules it loads) in porting isn't sufficient as
an interim measure (But save_re_context() is *not* called before calling
tie methods).

Perhaps​:

* we restore the original behaviour of save_re_context(), that does
  local ${$_} for 1..RX_NPARENS()
  This makes all all the other callers of save_re_context() no worse than
  5.20.
* In addition for the case where PL_curpm is null, rather than skipping
  the localisation, we localise $1..$3. This fixes the 5.21.x regression.
  We then add a porting test that checks that utf8.pm, utf8_heavy.pl and
  dependences don't contain any punctuation vars in their source apart
  from $1,$2,$3.

* Post 5.22, we try and do something more clever.

Then pp_lc() calls a function to convert the string to upper case,

Surely it converts it to lowercase.

Damn, no wonder my code has been failing for all these years ;-)

--
"I do not resent criticism, even when, for the sake of emphasis,
it parts for the time with reality".
  -- Winston Churchill, House of Commons, 22nd Jan 1941.

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2015

From @cpansprout

On Wed Mar 25 05​:11​:38 2015, davem wrote​:

Perhaps​:

* we restore the original behaviour of save_re_context(), that does
local ${$_} for 1..RX_NPARENS()
This makes all all the other callers of save_re_context() no worse than
5.20.
* In addition for the case where PL_curpm is null, rather than skipping
the localisation, we localise $1..$3. This fixes the 5.21.x regression.
We then add a porting test that checks that utf8.pm, utf8_heavy.pl and
dependences don't contain any punctuation vars in their source apart
from $1,$2,$3.

* Post 5.22, we try and do something more clever.

I think that suggestion is good.

Then pp_lc() calls a function to convert the string to upper case,

Surely it converts it to lowercase.

Damn, no wonder my code has been failing for all these years ;-)

:-)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2015

From @iabyn

On Wed, Mar 25, 2015 at 08​:45​:57AM -0700, Father Chrysostomos via RT wrote​:

On Wed Mar 25 05​:11​:38 2015, davem wrote​:

Perhaps​:

* we restore the original behaviour of save_re_context(), that does
local ${$_} for 1..RX_NPARENS()
This makes all all the other callers of save_re_context() no worse than
5.20.
* In addition for the case where PL_curpm is null, rather than skipping
the localisation, we localise $1..$3. This fixes the 5.21.x regression.
We then add a porting test that checks that utf8.pm, utf8_heavy.pl and
dependences don't contain any punctuation vars in their source apart
from $1,$2,$3.

* Post 5.22, we try and do something more clever.

I think that suggestion is good.

Now done as smoke-me/davem/save_re2

--
Music lesson​: a symbiotic relationship whereby a pupil's embellishments
concerning the amount of practice performed since the last lesson are
rewarded with embellishments from the teacher concerning the pupil's
progress over the corresponding period.

@p5pRT
Copy link
Author

p5pRT commented Mar 30, 2015

From @iabyn

On Wed, Mar 25, 2015 at 05​:21​:56PM +0000, Dave Mitchell wrote​:

On Wed, Mar 25, 2015 at 08​:45​:57AM -0700, Father Chrysostomos via RT wrote​:

On Wed Mar 25 05​:11​:38 2015, davem wrote​:

Perhaps​:

* we restore the original behaviour of save_re_context(), that does
local ${$_} for 1..RX_NPARENS()
This makes all all the other callers of save_re_context() no worse than
5.20.
* In addition for the case where PL_curpm is null, rather than skipping
the localisation, we localise $1..$3. This fixes the 5.21.x regression.
We then add a porting test that checks that utf8.pm, utf8_heavy.pl and
dependences don't contain any punctuation vars in their source apart
from $1,$2,$3.

* Post 5.22, we try and do something more clever.

I think that suggestion is good.

Now done as smoke-me/davem/save_re2

Now merged into blead as

  4e0341d Perl_save_re_context()​: re-indent after last commit
  3553f4f save_re_context()​: do "local $n" with no PL_curpm
  e8d8f80 Revert "Gut Perl_save_re_context"
  2782061 Revert "Don’t call save_re_context"
  7c6e85a Revert "Mathomise save_re_context"

--
Never do today what you can put off till tomorrow.

@p5pRT
Copy link
Author

p5pRT commented Mar 30, 2015

@iabyn - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Mar 30, 2015

From @iabyn

On Mon, Mar 30, 2015 at 12​:15​:42PM +0100, Dave Mitchell wrote​:

Now merged into blead as

4e0341d Perl\_save\_re\_context\(\)&#8203;: re\-indent after last commit
3553f4f save\_re\_context\(\)&#8203;: do "local $n" with no PL\_curpm
e8d8f80 Revert "Gut Perl\_save\_re\_context"
2782061 Revert "Don’t call save\_re\_context"
7c6e85a Revert "Mathomise save\_re\_context"

3553f4f (and 4e0341d) will need cherry-picking for maint-5.20

--
The Enterprise is captured by a vastly superior alien intelligence which
does not put them on trial.
  -- Things That Never Happen in "Star Trek" #10

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

From @khwilliamson

Thank you for submitting this ticket.

The issue should now be resolved with the release today of Perl v5.22, which is available at http​://www.perl.org/get.html
--
Karl Williamson for the Perl 5 team

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant