Skip Menu |
Report information
Id: 132800
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: randir <sergey.aleynikov [at] gmail.com>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: core
Perl Version: 5.27.9
Fixed In: (no value)



To: perlbug [...] perl.org
From: Sergey Aleynikov <sergey.aleynikov [...] gmail.com>
Date: Fri, 2 Feb 2018 00:21:52 +0300
Subject: lib/unicore/mktables takes too long
Download (untitled) / with headers
text/plain 4.4k
This is a bug report for perl from sergey.aleynikov@gmail.com, generated with the help of perlbug 1.41 running under perl 5.27.9. ----------------------------------------------------------------- [Please describe your issue here] High-parallel builds of perl are currently stranded by the lib/unicore/mktables step. It takes ~60% of the total build time: git clean -dfx && ./Configure -de -Dusedevel && time make -j20 test_prep make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total ./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest -p -w 15.82s user 0.23s system 99% cpu 16.057 total If I include configure step in the total time measurement, it's still ~27% bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s user 14.36s system 340% cpu 58.733 total This happened only during this development cycle, this step has been taking around 12-16 seconds on 5.20-5.26 with ~second growth each release. Can this be addressed in some way? This makes bisecting through 5.27.x and, later, all future perls, much slower. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.27.9: Configured by dur-randir at Wed Jan 31 10:14:37 MSK 2018. Summary of my perl5 (revision 5 version 27 subversion 9) configuration: Commit id: 577d3e04be845580196418dd9df1575e2cb4c0b6 Platform: osname=darwin osvers=13.4.0 archname=darwin-2level uname='darwin isengard.local 13.4.0 darwin kernel version 13.4.0: mon jan 11 18:17:34 pst 2016; root:xnu-2422.115.15~1release_x86_64 x86_64 ' config_args='-de -Dusedevel' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9 -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -DPERL_USE_SAFE_PUTENV' optimize='-O3' cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9 -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='' gccversion='4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -mmacosx-version-min=10.9 -fstack-protector -L/usr/local/lib' libpth=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/lib /usr/local/lib /usr/lib libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc perllibs=-lpthread -ldl -lm -lutil -lc libc= so=dylib useshrplib=false libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=bundle d_dlsymun=undef ccdlflags=' ' cccdlflags=' ' lddlflags=' -mmacosx-version-min=10.9 -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector' --- @INC for perl 5.27.9: lib /usr/local/lib/perl5/site_perl/5.27.9/darwin-2level /usr/local/lib/perl5/site_perl/5.27.9 /usr/local/lib/perl5/5.27.9/darwin-2level /usr/local/lib/perl5/5.27.9 --- Environment for perl 5.27.9: DYLD_LIBRARY_PATH (unset) HOME=/Users/dur-randir LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin PERLBREW_BASHRC_VERSION=0.80 PERLBREW_HOME=/Users/dur-randir/.perlbrew PERLBREW_MANPATH=/Users/dur-randir/perlbrew/perls/perl-5.22.1/man PERLBREW_PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin PERLBREW_PERL=perl-5.22.1 PERLBREW_ROOT=/Users/dur-randir/perlbrew PERLBREW_VERSION=0.80 PERL_BADLANG (unset) SHELL=/usr/local/bin/zsh
RT-Send-CC: perl5-porters [...] perl.org
On Thu, 01 Feb 2018 21:21:59 GMT, randir wrote: Show quoted text
> This is a bug report for perl from sergey.aleynikov@gmail.com, > generated with the help of perlbug 1.41 running under perl 5.27.9. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > High-parallel builds of perl are currently stranded by the > lib/unicore/mktables step. It takes ~60% of the total build time: > > git clean -dfx && ./Configure -de -Dusedevel && time make -j20 > test_prep > make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total > > ./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest > -p -w 15.82s user 0.23s system 99% cpu 16.057 total > > If I include configure step in the total time measurement, it's still > ~27% > > bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s > user 14.36s system 340% cpu 58.733 total > > This happened only during this development cycle, this step has been > taking around 12-16 seconds on 5.20-5.26 with ~second growth each > release. Can this be addressed in some way? This makes bisecting > through 5.27.x and, later, all future perls, much slower. >
I don't find compelling evidence of this trend. I built perl at v5.24.0, v5.26.0 and HEAD twice in each of three environments. See the attachment for results. I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results. The data is obviously not statistically significant, but my impressions are that: * there is more variation in timings between different 'make' runs for the same Perl version than there is between different Perl versions; * if one machine is inherently faster than another (typically, more cores), then the percentage of total clock time taken up by ./Configure is greater on that machine than the other; ('make' flies like a rocket on dromedary where 'nproc -all' returns 24); * if you're running Porting/bisect.pl with the '--module' switch, the total time spent in building and testing prerequisites swamps that taken during 'make'. -- James E Keenan (jkeenan@cpan.org)
Subject: make-test-prep-timings.txt
$ git clean -dfx && regular_configure && /usr/bin/time -f "\t%e real,\t%U user,\t%S sys" make -j${TEST_JOBS} test_prep # zareason # v5.24.0 (be2c0c650b028f54e427f2469a59942edfdff8a9) 67.48 real, 219.17 user, 8.35 sys 71.22 real, 226.06 user, 8.65 sys # v5.26.0 (95388f2eb27e74cdbfb715c0097f16aeba4e6e4e) 72.21 real, 232.49 user, 8.97 sys 92.11 real, 249.32 user, 10.05 sys # HEAD (1654584e05038fee2cc4307f292f18e445d0e50f) 72.38 real, 237.66 user, 9.01 sys 69.66 real, 232.01 user, 8.06 sys ##### # Linode # v5.24.0 (be2c0c650b028f54e427f2469a59942edfdff8a9) 116.54 real, 212.20 user, 11.78 sys 112.83 real, 203.69 user, 11.34 sys # v5.26.0 (95388f2eb27e74cdbfb715c0097f16aeba4e6e4e) 123.03 real, 222.28 user, 12.84 sys 128.87 real, 234.61 user, 13.41 sys # HEAD (1654584e05038fee2cc4307f292f18e445d0e50f) 124.46 real, 227.50 user, 12.93 sys 120.40 real, 218.99 user, 11.85 sys ##### # dromedary # v5.24.0 (be2c0c650b028f54e427f2469a59942edfdff8a9) 40.89 real, 182.25 user, 13.89 sys 32.69 real, 90.58 user, 9.01 sys # v5.26.0 (95388f2eb27e74cdbfb715c0097f16aeba4e6e4e) 34.62 real, 90.69 user, 8.91 sys 34.43 real, 92.46 user, 9.18 sys # HEAD (1654584e05038fee2cc4307f292f18e445d0e50f) 44.22 real, 186.02 user, 13.55 sys 35.09 real, 91.11 user, 8.91 sys
To: perlbug-followup [...] perl.org
Date: Thu, 1 Feb 2018 22:36:50 -0700
Subject: Re: [perl #132800] lib/unicore/mktables takes too long
CC: perl5-porters [...] perl.org
From: Karl Williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 3.6k
On 02/01/2018 06:20 PM, James E Keenan via RT wrote: Show quoted text
> On Thu, 01 Feb 2018 21:21:59 GMT, randir wrote:
>> This is a bug report for perl from sergey.aleynikov@gmail.com, >> generated with the help of perlbug 1.41 running under perl 5.27.9. >> >> >> ----------------------------------------------------------------- >> [Please describe your issue here] >> >> High-parallel builds of perl are currently stranded by the >> lib/unicore/mktables step. It takes ~60% of the total build time: >> >> git clean -dfx && ./Configure -de -Dusedevel && time make -j20 >> test_prep >> make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total >> >> ./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest >> -p -w 15.82s user 0.23s system 99% cpu 16.057 total >> >> If I include configure step in the total time measurement, it's still >> ~27% >> >> bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s >> user 14.36s system 340% cpu 58.733 total >> >> This happened only during this development cycle, this step has been >> taking around 12-16 seconds on 5.20-5.26 with ~second growth each >> release. Can this be addressed in some way? This makes bisecting >> through 5.27.x and, later, all future perls, much slower. >>
> > I don't find compelling evidence of this trend. I built perl at v5.24.0, v5.26.0 and HEAD twice in each of three environments. See the attachment for results. I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results. The data is obviously not statistically significant, but my impressions are that: > > * there is more variation in timings between different 'make' runs for the same Perl version than there is between different Perl versions; > > * if one machine is inherently faster than another (typically, more cores), then the percentage of total clock time taken up by ./Configure is greater on that machine than the other; ('make' flies like a rocket on dromedary where 'nproc -all' returns 24); > > * if you're running Porting/bisect.pl with the '--module' switch, the total time spent in building and testing prerequisites swamps that taken during 'make'. >
I also haven't anecdotally noticed any marked decrease in speed. I looked through the list of commits in 5.27 to see what might be causing it, and nothing stands out. Each release of Unicode brings more data that has to be processed, including whole new files and properties. There was some of that in Unicode 10. Note that my goal is to make mktables not run very frequently. It really only needs to be run when a yearly new Unicode release comes along, or it itself changes. Although, suppose there is an underlying bug in the core that causes it to generate defective tables, and that gets fixed. Then we would need to run it, but we wouldn't know that we do. So it does get complicated. There in fact may still be a bug which affects the outputs, but it generally only causes comment lines to get garbaged, so there's no real harm. I actually haven't noticed this bug in a long time, and I don't know if it still happens. (Once Tux and I were trying to debug a mktables problem. It was slow going, and then he rebased and the problem was suddenly gone. Independently Dave Mitchell had fixed a problem in blead, and that was our underlying cause. So this scenario has happened.) And I have a question for you git people. I was looking at the blame history https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables It includes https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45 in the list of recent changes. But I don't see that that commit actually modified mktables. What's going on?
CC: Perl RT Bug Tracker <perlbug-followup [...] perl.org>, Perl5 Porteros <perl5-porters [...] perl.org>
To: Karl Williamson <public [...] khwilliamson.com>
Date: Fri, 2 Feb 2018 10:24:15 +0100
Subject: Re: [perl #132800] lib/unicore/mktables takes too long
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 4.2k
On 2 February 2018 at 06:36, Karl Williamson <public@khwilliamson.com> wrote: Show quoted text
> On 02/01/2018 06:20 PM, James E Keenan via RT wrote:
>> >> On Thu, 01 Feb 2018 21:21:59 GMT, randir wrote:
>>> >>> This is a bug report for perl from sergey.aleynikov@gmail.com, >>> generated with the help of perlbug 1.41 running under perl 5.27.9. >>> >>> >>> ----------------------------------------------------------------- >>> [Please describe your issue here] >>> >>> High-parallel builds of perl are currently stranded by the >>> lib/unicore/mktables step. It takes ~60% of the total build time: >>> >>> git clean -dfx && ./Configure -de -Dusedevel && time make -j20 >>> test_prep >>> make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total >>> >>> ./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest >>> -p -w 15.82s user 0.23s system 99% cpu 16.057 total >>> >>> If I include configure step in the total time measurement, it's still >>> ~27% >>> >>> bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s >>> user 14.36s system 340% cpu 58.733 total >>> >>> This happened only during this development cycle, this step has been >>> taking around 12-16 seconds on 5.20-5.26 with ~second growth each >>> release. Can this be addressed in some way? This makes bisecting >>> through 5.27.x and, later, all future perls, much slower. >>>
>> >> I don't find compelling evidence of this trend. I built perl at v5.24.0, >> v5.26.0 and HEAD twice in each of three environments. See the attachment >> for results. I focus on 'real' time because with bisection the clock time >> measures the time a human is waiting for results. The data is obviously not >> statistically significant, but my impressions are that: >> >> * there is more variation in timings between different 'make' runs for the >> same Perl version than there is between different Perl versions; >> >> * if one machine is inherently faster than another (typically, more >> cores), then the percentage of total clock time taken up by ./Configure is >> greater on that machine than the other; ('make' flies like a rocket on >> dromedary where 'nproc -all' returns 24); >> >> * if you're running Porting/bisect.pl with the '--module' switch, the >> total time spent in building and testing prerequisites swamps that taken >> during 'make'. >>
> > I also haven't anecdotally noticed any marked decrease in speed. I looked > through the list of commits in 5.27 to see what might be causing it, and > nothing stands out. Each release of Unicode brings more data that has to be > processed, including whole new files and properties. There was some of that > in Unicode 10. > > Note that my goal is to make mktables not run very frequently. It really > only needs to be run when a yearly new Unicode release comes along, or it > itself changes. Although, suppose there is an underlying bug in the core > that causes it to generate defective tables, and that gets fixed. Then we > would need to run it, but we wouldn't know that we do. So it does get > complicated. There in fact may still be a bug which affects the outputs, > but it generally only causes comment lines to get garbaged, so there's no > real harm. I actually haven't noticed this bug in a long time, and I don't > know if it still happens. (Once Tux and I were trying to debug a mktables > problem. It was slow going, and then he rebased and the problem was > suddenly gone. Independently Dave Mitchell had fixed a problem in blead, > and that was our underlying cause. So this scenario has happened.) > > And I have a question for you git people. I was looking at the blame > history > > https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables > > It includes > > https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45 > > in the list of recent changes. But I don't see that that commit actually > modified mktables. What's going on?
Personally I have never understood why we treat the output of mktables any differently than we treat the output of say, Porting/regcharclass.pl or Porting/regen_headers.pl IMO we should just check the output files into the repo, and only run mktables when the input data is updated. I see no point in regenerating the tables every time someone does a clean build. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.5k
On Thu, 01 Feb 2018 17:20:00 -0800, jkeenan wrote: Show quoted text
> * there is more variation in timings between different 'make' runs for > the same Perl version than there is between different Perl versions;
Yes, this depends on a machine load. My measurements have been taken during low-load times, so the were repeatable. What you've measured is only total build time, not what it constitutes of, but I'm focusing on the latter here. Show quoted text
>I focus on 'real' time because with bisection the clock time measures the time? I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results.
For me, as a human, only real-world waiting time matters - as it's about how fast can I move on with doing different things. Show quoted text
> * if one machine is inherently faster than another (typically, more > cores), then the percentage of total clock time taken up by
But why should we make building perl artificially slower? As of now, mktables step already taken half as much as Configure step, - I've shown this in my timings. Show quoted text
> * if you're running Porting/bisect.pl with the '--module' switch, the > total time spent in building and testing prerequisites swamps that > taken during 'make'.
That's not about --module, that's for example about bisecting fuzzer findings. And, again, not making perl build artificially slower. 2018-02-02 8:36 GMT+03:00 Karl Williamson <public@khwilliamson.com> Show quoted text
>There was some of that in Unicode 10.
I don't want to be in a situation that, after some years and Unicode 20, mktables takes ages to complete due to standard being bloated. So I've raised this question sooner.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 744b
On Thu, 01 Feb 2018 21:37:27 -0800, public@khwilliamson.com wrote: Show quoted text
> And I have a question for you git people. I was looking at the blame > history > > It includes > https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45 > > in the list of recent changes. But I don't see that that commit > actually modified mktables. What's going on?
Each merge-commit in git (and mercurial) have at least two parents. So, for da4e040f4242, they are commit da4e040f42421764ef069371d77c008e6b801f45 Merge: b2cd5cb1d8 d6374f3d79 and now we can look at separate diffs between merge-commit and each of it's parents: git diff b2cd5cb1d8 da4e040f4242 git diff d6374f3d79 da4e040f4242 And the second diff does touch mktables.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 171b
I ran across this thread which happened in a previous incarnation of this issue. It may be useful http://nntp.perl.org/group/perl.perl5.porters/168880 -- Karl Williamson
From: "Craig A. Berry" <craig.a.berry [...] gmail.com>
To: Karl Williamson <public [...] khwilliamson.com>
Subject: merge showing up in history of every file (Re: [perl #132800] lib/unicore/mktables takes too long)
Date: Fri, 2 Mar 2018 11:59:32 -0600
CC: "Perl5 Porters (E-mail)" <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 939b
On Thu, Feb 1, 2018 at 11:36 PM, Karl Williamson <public@khwilliamson.com> wrote: Show quoted text
> And I have a question for you git people. I was looking at the blame > history > > https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables > > It includes > > https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45 > > in the list of recent changes. But I don't see that that commit actually > modified mktables. What's going on?
I believe it's because that merge was not a fast-forward merge so it altered the state of the entire repository and now the history of every file includes that commit. Someone with more git fu can probably explain it better. This IMO is reason enough not to do merges that way and there was some discussion at the time about that. IIRC there is some way to exclude such commits with git log when working from the command line, but that doesn't do you any good with gitweb.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 410b
On Thu, 01 Feb 2018 21:37:27 -0800, public@khwilliamson.com wrote: Show quoted text
> I also haven't anecdotally noticed any marked decrease in speed.
I tried running it under NYTProf. I managed to find a roughly 10% speed up by re-implementing get_invalid_code_point() to avoid inverting the range list each time. You can see the profile results after that change at: http://files.develop-help.com/nytprof/index.html Tony
Subject: 0001-perl-132800-slightly-speed-up-mktables.patch
From 1eac41dd9489898aadbc753de0bb63b4c6621ac7 Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Thu, 8 Mar 2018 10:42:59 +1100 Subject: [PATCH] (perl #132800) slightly speed up mktables get_invalid_code_point() inverted the supplied range list and then called get_valid_code_point() on the result. Inverting is cheap for simple range lists, but it's more expensive for more complex lists, and in most cases only first few entries (or only the first) ranges need to be checked for a code point. --- lib/unicore/mktables | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/lib/unicore/mktables b/lib/unicore/mktables index 88d9c036a3..81914f0600 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -5297,8 +5297,31 @@ sub trace { return main::trace(@_); } my $self = shift; Carp::carp_extra_args(\@_) if main::DEBUG && @_; - # Just find a valid code point of the inverse, if any. - return Range_List->new(Initialize => ~ $self)->get_valid_code_point; + # On first pass, don't choose less desirable code points; if no good + # one is found, repeat, allowing a less desirable one to be selected. + for my $try_hard (0, 1) { + my $end = $try_hard ? $MAX_WORKING_CODEPOINT : $MAX_UNICODE_CODEPOINT; + + # Look through all the gaps between ranges for a usable code point. + for my $set (reverse $self->ranges) { + # Try the edge cases first, starting with the end point of the + # gap. + my $start = $set->end+1; + return $end if $end >= $start && is_code_point_usable($end, $try_hard); + + # End point didn't, work. Start at the beginning and try + # every one until find one that does work. + for my $trial ($start .. $end - 1) { + return $trial if is_code_point_usable($trial, $try_hard); + } + $end = $set->start-1; + } + my $start = 0; + for my $trial ($start .. $end) { + return $trial if is_code_point_usable($trial, $try_hard); + } + } + return (); # If none found, give up. } } # end closure for Range_List -- 2.11.0


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org