New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perl5.10.0: pos function is much slower with "progressive match" and unicode #9207
Comments
From @hknutzenHello, the parser of netspoc (http://netspoc.berlios.de) runs 4 times slower I tracked this down to a small test case: perl -e 'print "x"x200000'>large On my computer with AMD Athlon BE-2350 I get these results: The problem doesn't appear Attached you will find the output of "perl -V" for both versions of perl. -- |
From @hknutzenSummary of my perl5 (revision 5 version 8 subversion 8) configuration: Characteristics of this binary (from libperl): |
From @hknutzenSummary of my perl5 (revision 5 version 10 subversion 0) configuration: Characteristics of this binary (from libperl): |
From @iabynOn Fri, Jan 25, 2008 at 02:20:23PM -0800, Heinz Knutzen wrote:
Profiling shows that 99.5% of the time is spent in Perl_utf8_length(); -- |
The RT System itself - Status changed from 'new' to 'open' |
From p5p@spam.wizbit.beWanted to do a binary search on this but I'm currently unable to The difference between my perl-5.10.0 -V and the one in the bug report: My perl: ccversion='', gccversion='3.4.6', gccosandvers='' Bug report: Note that I do not know if the difference is relevant... it might be |
From alex@chmrr.netOn Thu May 28 10:39:27 2009, animator wrote:
My binary search points to ec07b5e as |
From alex@chmrr.netAt Sat Jan 26 09:56:39 -0500 2008, Dave Mitchell wrote:
Perl_utf8_length is only called 3 times per loop if perl is compiled $ time perl5.8.8 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large $ time perl-ab455f6 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large $ time perl-6448472 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large $ time bleadperl -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large (for reference, with debugging on, the performance is _abysmal_:) Revision 6448472 changed the subroutine that calculates length to $ time bleadperl-patched -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large Revision ab455f6 added an extra cache entry, which causes the Unfortunately, and I don't know enough about the problem it intends to - Alex |
From alex@chmrr.net0001-Faster-utf8_length-method-fixes-RT-50250.patchFrom 3555d40ff5fd0ef7a4adde39caf181e0974932c5 Mon Sep 17 00:00:00 2001
From: Alex Vandiver <alex@chmrr.net>
Date: Sat, 30 May 2009 12:38:28 -0400
Subject: [PATCH] Faster utf8_length method -- fixes [RT#50250]
UTF8SKIP appears to be a rather slow call; use UTF8_IS_INVARIANT to
skip it whenever possible. We also move the malformed utf8 check
until after the loop, since it can be checked after the termination
condition, instead of at every pass through the loop.
---
utf8.c | 28 +++++++++++++++-------------
1 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/utf8.c b/utf8.c
index 4f4c3ea..b5a3809 100644
--- a/utf8.c
+++ b/utf8.c
@@ -682,7 +682,6 @@ Perl_utf8_length(pTHX_ const U8 *s, const U8 *e)
{
dVAR;
STRLEN len = 0;
- U8 t = 0;
PERL_ARGS_ASSERT_UTF8_LENGTH;
@@ -693,20 +692,23 @@ Perl_utf8_length(pTHX_ const U8 *s, const U8 *e)
if (e < s)
goto warn_and_return;
while (s < e) {
- t = UTF8SKIP(s);
- if (e - s < t) {
- warn_and_return:
- if (ckWARN_d(WARN_UTF8)) {
- if (PL_op)
- Perl_warner(aTHX_ packWARN(WARN_UTF8),
+ if (!UTF8_IS_INVARIANT(*s))
+ s += UTF8SKIP(s);
+ else
+ s++;
+ len++;
+ }
+
+ if (e != s) {
+ len--;
+ warn_and_return:
+ if (ckWARN_d(WARN_UTF8)) {
+ if (PL_op)
+ Perl_warner(aTHX_ packWARN(WARN_UTF8),
"%s in %s", unees, OP_DESC(PL_op));
- else
- Perl_warner(aTHX_ packWARN(WARN_UTF8), unees);
- }
- return len;
+ else
+ Perl_warner(aTHX_ packWARN(WARN_UTF8), unees);
}
- s += t;
- len++;
}
return len;
--
1.6.3.204.g8c948
|
From @greergaOn Sat, 30 May 2009, Alex Vandiver wrote:
Note that at least for Fedora this is the default build because they pass Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Characteristics of this binary (from libperl): Granted, their default helped highlight RT #60508, so it isn't all bad. -George Greer |
From alex@chmrr.netAt Mon Jun 01 19:52:12 -0400 2009, George Greer wrote:
Hm. That's their call; short of writing a |
From @rgs2009/5/31 Alex Vandiver <alex@chmrr.net>:
Looks good enough to be applied, thanks, as |
@rgs - Status changed from 'open' to 'resolved' |
From @nwc10On Sat, May 30, 2009 at 07:16:46PM -0400, Alex Vandiver wrote:
You have to hand speed data from ab455f6 and its immediate parent 8f230aa?
ugly doesn't necessarily mean slow: #define THREEWAY_SQUARE(a,b,c,d) \ 3 subtractions, 3 multiplies, 2 adds. Nothing fancy.
The THREEWAY_SQUARE path is where the cache code knows that it has now has I don't have any insight above what's in the comments of the code I wrote I had assumed that having 2 cache positions would be better than one. Nicholas Clark |
From alex@chmrr.netAt Sat Jun 06 08:17:50 -0400 2009, Nicholas Clark wrote:
Here is a chart of various perls compiled with -Doptimize='-ggdb -g3' length 5.8.8 8f230aa ab455f6 6448472 f699e95 8e91ec7
However, the macro is called three time per cache miss, which is
As far as I can tell, it thrashes on _every_ call to pos in this - Alex |
Migrated from rt.perl.org#50250 (status was 'resolved')
Searchable as RT50250$
The text was updated successfully, but these errors were encountered: