Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bleadperl v5.22.0-131-gaa8f6ce breaks GFUJI/App-test-travis-v0.9.7.tar.gz #15035

Closed
p5pRT opened this issue Nov 9, 2015 · 12 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Nov 9, 2015

Migrated from rt.perl.org#126593 (status was 'resolved')

Searchable as RT126593$

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2015

From @andk

bisect


commit aa8f6ce
Author​: Rafael Garcia-Suarez <rgs@​consttype.org>
Date​: Wed Jun 17 13​:18​:59 2015 +0200

  Microoptimize some matches in utf8_heavy.pl

cpantesters


http​://www.cpantesters.org/cpan/report/a496ba7c-357b-11e5-be3b-567b92740143

discovered by slaven​: gfx/App-test-travis#6

perl -V


Summary of my perl5 (revision 5 version 23 subversion 1) configuration​:
 
  Platform​:
  osname=solaris, osvers=2.11, archname=i86pc-solaris-ld
  uname='sunos solaris.antelope.net 5.11 11.2 i86pc i386 i86pc '
  config_args='-de -Dprefix=/export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld -Duselongdouble -Dusedevel -Aeval​:scriptdir=/export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/bin'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='/opt/solarisstudio12.4/bin/cc', ccflags ='-I/usr/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DPERL_USE_SAFE_PUTENV',
  optimize='-O',
  cppflags='-I/usr/gnu/include'
  ccversion='Sun C 5.13 SunOS_i386 2014/10/20', gccversion='', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234, doublekind=3
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12, longdblkind=3
  ivtype='long', ivsize=4, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='/opt/solarisstudio12.4/bin/cc', ldflags =' -L/usr/lib -L/usr/ccs/lib -L/opt/solarisstudio12.4/lib/compilers/staticlib -L/opt/solarisstudio12.4/lib/compilers/sse2 -L/opt/solarisstudio12.4/lib/compilers -L/lib -L/usr/gnu/lib '
  libpth=/usr/lib /usr/ccs/lib /opt/solarisstudio12.4/lib/compilers/staticlib /opt/solarisstudio12.4/lib/compilers/sse2 /opt/solarisstudio12.4/lib/compilers /lib /usr/gnu/lib
  libs=-lpthread -lsocket -lnsl -lgdbm -ldb -ldl -lm -lc -lsunmath
  perllibs=-lpthread -lsocket -lnsl -ldl -lm -lc -lsunmath
  libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
  cccdlflags='-KPIC', lddlflags='-G -L/usr/lib -L/usr/ccs/lib -L/opt/solarisstudio12.4/lib/compilers/staticlib -L/opt/solarisstudio12.4/lib/compilers/sse2 -L/opt/solarisstudio12.4/lib/compilers -L/lib -L/usr/gnu/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: HAS_TIMES PERLIO_LAYERS PERL_COPY_ON_WRITE
  PERL_DONT_CREATE_GVSV
  PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
  PERL_PRESERVE_IVUV PERL_USE_DEVEL
  PERL_USE_SAFE_PUTENV USE_LARGE_FILES USE_LOCALE
  USE_LOCALE_COLLATE USE_LOCALE_CTYPE
  USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_LONG_DOUBLE
  USE_PERLIO USE_PERL_ATOF
  Locally applied patches​:
  Devel​::PatchPerl 1.38
  Built under solaris
  Compiled at Jul 25 2015 00​:29​:56
  %ENV​:
  PERL5LIB=""
  PERL5OPT=""
  PERL5_CPANPLUS_IS_RUNNING="7313"
  PERL5_CPAN_IS_RUNNING="7313"
  PERL5_CPAN_IS_RUNNING_IN_RECURSION="28602,7313"
  PERLBREW_BASHRC_VERSION="0.73"
  PERLBREW_HOME="/export/home/jmaslak/.perlbrew"
  PERLBREW_LIB=""
  PERLBREW_MANPATH="/export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/man"
  PERLBREW_PATH="/export/home/jmaslak/perl5/perlbrew/bin​:/export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/bin"
  PERLBREW_PERL="perl-5.23.1-ld"
  PERLBREW_ROOT="/export/home/jmaslak/perl5/perlbrew"
  PERLBREW_VERSION="0.73"
  PERL_CR_SMOKER_CURRENT="App-test-travis-v0.9.7"
  PERL_EXTUTILS_AUTOINSTALL="--defaultdeps"
  PERL_LOCAL_LIB_ROOT=""
  PERL_MM_USE_DEFAULT="1"
  @​INC​:
  /export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/lib/site_perl/5.23.1/i86pc-solaris-ld
  /export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/lib/site_perl/5.23.1
  /export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/lib/5.23.1/i86pc-solaris-ld
  /export/home/jmaslak/perl5/perlbrew/perls/perl-5.23.1-ld/lib/5.23.1
  .

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2015

From @tonycoz

On Sun Nov 08 22​:27​:21 2015, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

bisect
------
commit aa8f6ce
Author​: Rafael Garcia-Suarez <rgs@​consttype.org>
Date​: Wed Jun 17 13​:18​:59 2015 +0200

Microoptimize some matches in utf8_heavy.pl

cpantesters
-----------
http​://www.cpantesters.org/cpan/report/a496ba7c-357b-11e5-be3b-
567b92740143

discovered by slaven​: gfx/App-test-travis#6

App​::test​::travis loads encoding​::warnings which sets ${^ENCODING}.

When ${^ENCODING} is set S_scan_str() (toke.c) treats the match and replacement strings as unicode for this line​:

$VERSION =~ tr/_//d;

in Carp.pm, S_pmtrans() (op.c) sees that they're SvUTF8() and rather than using the simple lookup table you'd expect for such a simple tr/// it builds a swash definition and attempts to create a new swash by calling utf8​::SWASHNEW.

This in turn calls the utf8​::AUTOLOAD, which loads utf8_heavy.pl, which compiles until this line​:

  (my $loose = $_[0]) =~ tr/-_ \t//d;

At this point both Carp.pm and utf8_heavy.pl are only partly loaded, and again S_pmtrans sees UTF8 marked match and replacement strings and again attempts to all utf8​::SWASHNEW, again falling back to utf8​::AUTOLOAD.

This time around utf_heavy.pl and Carp.pm are already marked as loading, so the requires​:

sub AUTOLOAD {
  require "utf8_heavy.pl";
  goto &$AUTOLOAD if defined &$AUTOLOAD;
  require Carp;
  Carp​::croak("Undefined subroutine $AUTOLOAD called");
}

are skipped and we see an error for the missing croak() definition.

Possible fixes​:

1) modify S_scan_str() to attempt to downgrade the string when IN_ENCODING

This is probably the correct fix, if ${^ENCODING} wasn't deprecated.

2) add C<< local ${^ENCODING}; >> to the top of utf8​::AUTOLOAD

A simple workaround.

3) Finally remove the deprecated ${^ENCODING}.

4) Partly revert aa8f6ce (leaving a mine for someone else to step on in the future)

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2015

From @khwilliamson

aa8f6ce
is the correct commit number
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2015

From @khwilliamson

On 11/11/2015 10​:22 PM, Tony Cook via RT wrote​:

On Sun Nov 08 22​:27​:21 2015, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

bisect
------
commit aa8f6ce
Author​: Rafael Garcia-Suarez <rgs@​consttype.org>
Date​: Wed Jun 17 13​:18​:59 2015 +0200

Microoptimize some matches in utf8_heavy.pl

cpantesters
-----------
http​://www.cpantesters.org/cpan/report/a496ba7c-357b-11e5-be3b-
567b92740143

discovered by slaven​: gfx/App-test-travis#6

App​::test​::travis loads encoding​::warnings which sets ${^ENCODING}.

When ${^ENCODING} is set S_scan_str() (toke.c) treats the match and replacement strings as unicode for this line​:

$VERSION =~ tr/_//d;

in Carp.pm, S_pmtrans() (op.c) sees that they're SvUTF8() and rather than using the simple lookup table you'd expect for such a simple tr/// it builds a swash definition and attempts to create a new swash by calling utf8​::SWASHNEW.

This in turn calls the utf8​::AUTOLOAD, which loads utf8_heavy.pl, which compiles until this line​:

\(my $loose = $\_\[0\]\) =~ tr/\-\_ \\t//d;

At this point both Carp.pm and utf8_heavy.pl are only partly loaded, and again S_pmtrans sees UTF8 marked match and replacement strings and again attempts to all utf8​::SWASHNEW, again falling back to utf8​::AUTOLOAD.

This time around utf_heavy.pl and Carp.pm are already marked as loading, so the requires​:

sub AUTOLOAD {
require "utf8_heavy.pl";
goto &$AUTOLOAD if defined &$AUTOLOAD;
require Carp;
Carp​::croak("Undefined subroutine $AUTOLOAD called");
}

are skipped and we see an error for the missing croak() definition.

Possible fixes​:

1) modify S_scan_str() to attempt to downgrade the string when IN_ENCODING

This is probably the correct fix, if ${^ENCODING} wasn't deprecated.

2) add C<< local ${^ENCODING}; >> to the top of utf8​::AUTOLOAD

A simple workaround.

3) Finally remove the deprecated ${^ENCODING}.

4) Partly revert aa8f6ce (leaving a mine for someone else to step on in the future)

Tony

I haven't looked at this in detail, but this seems to me to likely be
just one possibility for breakage, that this could occur outside of
${^ENCODING}.

I think that the problem is that tr/// can load utf8_heavy.pl which
since the blamed commit also uses tr///. We've had this problem in
regards to regular expression patterns before, and had to make work
arounds. I believe that save_re_context() was written so that
utf8_heavy.pl can use regular expressions, since it gets called from
within regular expressions. It is a similar sort of recursive behavior.

We could write perhaps a save_tr_context(), but save_re_context() is
problematic, and likely so would be save_tr_context()

It seems to me to be the best short term thing to forbid tr/// in
utf8_heavy.pl. This was a micro optimization, after all. I find it
hard to believe that using tr/// over s/// makes a significant time
difference. It would be easy to write a fast porting test that makes
sure there aren't tr's in utf8_heavy.pl, so that the minefield is avoided.

Longer term, it would be better to not have tr/// use swashes at all,
avoiding the problem. I had started looking into that, but have put it
aside for higher priority issues. It is very likely that inversion
lists (or maps) would be faster and use less memory than swashes for
this purpose.

Even longer term is getting rid of utf8_heavy altogether. Aaron Crane
has suggested pre-compiling all Unicode properties into the perl
executable, increasing its text space. Again, that's something I
started looking at, but have put aside for now. It would increase the
text segment size of the interpreter.

I looked last year at the amount of work involved in getting rid of 'use
encoding', and it seemed like it would be hard to get all the cpan
authors to cooperate. Karen Etheridge was looking into it.

---
via perlbug​: queue​: perl5 status​: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=126593

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2015

From @rgarcia

On 12 November 2015 at 19​:16, Karl Williamson <public@​khwilliamson.com> wrote​:

On 11/11/2015 10​:22 PM, Tony Cook via RT wrote​:

On Sun Nov 08 22​:27​:21 2015, andreas.koenig.7os6VVqR@​franz.ak.mind.de
wrote​:

bisect
------
commit aa8f6ce
Author​: Rafael Garcia-Suarez <rgs@​consttype.org>
Date​: Wed Jun 17 13​:18​:59 2015 +0200

Microoptimize some matches in utf8_heavy.pl

cpantesters
-----------
http​://www.cpantesters.org/cpan/report/a496ba7c-357b-11e5-be3b-
567b92740143

discovered by slaven​: gfx/App-test-travis#6

App​::test​::travis loads encoding​::warnings which sets ${^ENCODING}.

When ${^ENCODING} is set S_scan_str() (toke.c) treats the match and
replacement strings as unicode for this line​:

$VERSION =~ tr/_//d;

in Carp.pm, S_pmtrans() (op.c) sees that they're SvUTF8() and rather than
using the simple lookup table you'd expect for such a simple tr/// it builds
a swash definition and attempts to create a new swash by calling
utf8​::SWASHNEW.

This in turn calls the utf8​::AUTOLOAD, which loads utf8_heavy.pl, which
compiles until this line​:

\(my $loose = $\_\[0\]\) =~ tr/\-\_ \\t//d;

At this point both Carp.pm and utf8_heavy.pl are only partly loaded, and
again S_pmtrans sees UTF8 marked match and replacement strings and again
attempts to all utf8​::SWASHNEW, again falling back to utf8​::AUTOLOAD.

This time around utf_heavy.pl and Carp.pm are already marked as loading,
so the requires​:

sub AUTOLOAD {
require "utf8_heavy.pl";
goto &$AUTOLOAD if defined &$AUTOLOAD;
require Carp;
Carp​::croak("Undefined subroutine $AUTOLOAD called");
}

are skipped and we see an error for the missing croak() definition.

Possible fixes​:

1) modify S_scan_str() to attempt to downgrade the string when IN_ENCODING

This is probably the correct fix, if ${^ENCODING} wasn't deprecated.

2) add C<< local ${^ENCODING}; >> to the top of utf8​::AUTOLOAD

A simple workaround.

3) Finally remove the deprecated ${^ENCODING}.

4) Partly revert aa8f6ce (leaving a mine
for someone else to step on in the future)

Tony

I haven't looked at this in detail, but this seems to me to likely be just
one possibility for breakage, that this could occur outside of ${^ENCODING}.

I think that the problem is that tr/// can load utf8_heavy.pl which since
the blamed commit also uses tr///. We've had this problem in regards to
regular expression patterns before, and had to make work arounds. I believe
that save_re_context() was written so that utf8_heavy.pl can use regular
expressions, since it gets called from within regular expressions. It is a
similar sort of recursive behavior.

We could write perhaps a save_tr_context(), but save_re_context() is
problematic, and likely so would be save_tr_context()

It seems to me to be the best short term thing to forbid tr/// in
utf8_heavy.pl. This was a micro optimization, after all. I find it hard to
believe that using tr/// over s/// makes a significant time difference. It
would be easy to write a fast porting test that makes sure there aren't tr's
in utf8_heavy.pl, so that the minefield is avoided.

I have hard evidence of the time difference, but it's relatively tiny. I'm
fine with your suggested short and long term approaches.

Longer term, it would be better to not have tr/// use swashes at all,
avoiding the problem. I had started looking into that, but have put it
aside for higher priority issues. It is very likely that inversion lists
(or maps) would be faster and use less memory than swashes for this purpose.

Even longer term is getting rid of utf8_heavy altogether. Aaron Crane has
suggested pre-compiling all Unicode properties into the perl executable,
increasing its text space. Again, that's something I started looking at,
but have put aside for now. It would increase the text segment size of the
interpreter.

I looked last year at the amount of work involved in getting rid of 'use
encoding', and it seemed like it would be hard to get all the cpan authors
to cooperate. Karen Etheridge was looking into it.

I suspect lots of cargo culting in usage of encoding.pm.

@p5pRT
Copy link
Author

p5pRT commented Nov 26, 2015

From @tonycoz

On Thu Nov 12 10​:25​:43 2015, public@​khwilliamson.com wrote​:

It seems to me to be the best short term thing to forbid tr/// in
utf8_heavy.pl. This was a micro optimization, after all. I find it
hard to believe that using tr/// over s/// makes a significant time
difference. It would be easy to write a fast porting test that makes
sure there aren't tr's in utf8_heavy.pl, so that the minefield is
avoided.

Perhaps something like the attached.

Tony

@p5pRT
Copy link
Author

p5pRT commented Nov 26, 2015

From @tonycoz

0001-perl-126593-make-sure-utf8_heavy.pl-doesn-t-depend-o.patch
From ca0f8d3b24deeaf0fb8e0297ec69a3614a0e2143 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 26 Nov 2015 16:22:04 +1100
Subject: [perl #126593] make sure utf8_heavy.pl doesn't depend on itself

With ${^ENCODING} set, it did.

Partly reverts:

commit aa8f6cef961dc2009604f7464c66106421c3ae81
Author: Rafael Garcia-Suarez <rgs@consttype.org>
Date:   Wed Jun 17 13:18:59 2015 +0200

    Microoptimize some matches in utf8_heavy.pl
---
 MANIFEST          |  1 +
 lib/utf8_heavy.pl |  2 +-
 t/uni/heavy.t     | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 t/uni/heavy.t

diff --git a/MANIFEST b/MANIFEST
index ca2c455..b10f84f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5550,6 +5550,7 @@ t/uni/fold.t			See if Unicode folding works
 t/uni/goto.t			See if Unicode goto &sub works
 t/uni/greek.t			See if Unicode in greek works
 t/uni/gv.t			See if Unicode GVs work.
+t/uni/heavy.t			See if utf8_heavy.pl uses perl that depends on it
 t/uni/labels.t			See if Unicode labels work
 t/uni/latin2.t			See if Unicode in latin2 works
 t/uni/lex_utf8.t		See if Unicode in lexer works
diff --git a/lib/utf8_heavy.pl b/lib/utf8_heavy.pl
index 0d2e662..872704a 100644
--- a/lib/utf8_heavy.pl
+++ b/lib/utf8_heavy.pl
@@ -20,7 +20,7 @@ sub _loose_name ($) {
     # out blanks, underscores and dashes.  The complication stems from the
     # grandfathered-in 'L_', which retains a single trailing underscore.
 
-    (my $loose = $_[0]) =~ tr/-_ \t//d;
+    (my $loose = $_[0]) =~ s/[-_ \t]//g;
 
     return $loose if $loose !~ / ^ (?: is | to )? l $/x;
     return 'l_' if $_[0] =~ / l .* _ /x;    # If original had a trailing '_'
diff --git a/t/uni/heavy.t b/t/uni/heavy.t
new file mode 100644
index 0000000..c257dbc
--- /dev/null
+++ b/t/uni/heavy.t
@@ -0,0 +1,40 @@
+#!./perl -w
+# tests that utf8_heavy.pl doesn't use anything that prevents it loading
+BEGIN {
+    chdir 't' if -d 't';
+    @INC = '../lib';
+    require './test.pl';
+}
+
+plan tests => 1;
+
+# see [perl #126593]
+fresh_perl_is(<<'EOP', "", { stderr => 1 }, "doesn't break with \${^ENCODING}");
+no warnings qw(deprecated);
+package Foo;
+sub cat_decode {
+    # stolen from Encode.pm
+    my ( undef, undef, undef, $pos, $trm ) = @_;
+    my ( $rdst, $rsrc, $rpos ) = \@_[ 1, 2, 3 ];
+    use bytes;
+    if ( ( my $npos = index( $$rsrc, $trm, $pos ) ) >= 0 ) {
+        $$rdst .=
+          substr( $$rsrc, $pos, $npos - $pos + length($trm) );
+        $$rpos = $npos + length($trm);
+        return 1;
+    }
+    $$rdst .= substr( $$rsrc, $pos );
+    $$rpos = length($$rsrc);
+    return q();
+}
+
+sub decode {
+   my (undef, $tmp) = @_;
+   utf8::decode($tmp);
+   $tmp;
+}
+
+BEGIN { ${^ENCODING} = bless [], q(Foo) };
+
+(my $tmp = q(abc)) =~ tr/abc/123/;
+EOP
-- 
2.1.4

@p5pRT
Copy link
Author

p5pRT commented Dec 9, 2015

From @tonycoz

On Wed Nov 25 21​:25​:25 2015, tonyc wrote​:

On Thu Nov 12 10​:25​:43 2015, public@​khwilliamson.com wrote​:

It seems to me to be the best short term thing to forbid tr/// in
utf8_heavy.pl. This was a micro optimization, after all. I find it
hard to believe that using tr/// over s/// makes a significant time
difference. It would be easy to write a fast porting test that makes
sure there aren't tr's in utf8_heavy.pl, so that the minefield is
avoided.

Perhaps something like the attached.

Pushed as 0fd86aa.

Tony

@p5pRT
Copy link
Author

p5pRT commented Dec 9, 2015

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 13, 2016

From @khwilliamson

Thank you for submitting this report. You have helped make Perl better.
 
With the release of Perl 5.24.0 on May 9, 2016, this and 149 other issues have been resolved.

Perl 5.24.0 may be downloaded via https://metacpan.org/release/RJBS/perl-5.24.0

@p5pRT
Copy link
Author

p5pRT commented May 13, 2016

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant