Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl5.10.0: pos function is much slower with "progressive match" and unicode #9207

Closed
p5pRT opened this issue Jan 25, 2008 · 15 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Jan 25, 2008

Migrated from rt.perl.org#50250 (status was 'resolved')

Searchable as RT50250$

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2008

From @hknutzen

Hello,

the parser of netspoc (http​://netspoc.berlios.de) runs 4 times slower
with perl5.10.0 than with perl5.8.8.

I tracked this down to a small test case​:

perl -e 'print "x"x200000'>large
time perl5.8.8 -C -e '$_=<>;while(1){/\G./g;pos||last;}' large
time perl5.10.0 -C -e '$_=<>;while(1){/\G./g;pos||last;}' large

On my computer with AMD Athlon BE-2350 I get these results​:
perl5.8.8​: 0.25 seconds
perl5.10.0​: 490.00 seconds
--> a factor of nearly 2000!

The problem doesn't appear
- if unicode isn't enabled or
- if pos() is run without progessive matching.

Attached you will find the output of "perl -V" for both versions of perl.

--
Heinz Knutzen

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2008

From @hknutzen

Summary of my perl5 (revision 5 version 8 subversion 8) configuration​:
  Platform​:
  osname=linux, osvers=2.6.15.7, archname=i486-linux-gnu-thread-multi
  uname='linux terranova 2.6.15.7 #1 smp thu jul 12 14​:27​:56 utc 2007 i686 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.6.1.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
  gnulibc_version='2.6.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: MULTIPLICITY PERL_IMPLICIT_CONTEXT
  PERL_MALLOC_WRAP THREADS_HAVE_PIDS USE_ITHREADS
  USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
  Built under linux
  Compiled at Dec 4 2007 08​:56​:39
  @​INC​:
  /etc/perl
  /usr/local/lib/perl/5.8.8
  /usr/local/share/perl/5.8.8
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.8
  /usr/share/perl/5.8
  /usr/local/lib/site_perl
  .

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2008

From @hknutzen

Summary of my perl5 (revision 5 version 10 subversion 0) configuration​:
  Platform​:
  osname=linux, osvers=2.6.22-14-generic, archname=i686-linux
  uname='linux home 2.6.22-14-generic #1 smp tue dec 18 08​:02​:57 utc 2007 i686 gnulinux '
  config_args=''
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.6.1.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.6.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
  USE_LARGE_FILES USE_PERLIO
  Built under linux
  Compiled at Jan 5 2008 14​:51​:10
  @​INC​:
  /usr/local/lib/perl5/5.10.0/i686-linux
  /usr/local/lib/perl5/5.10.0
  /usr/local/lib/perl5/site_perl/5.10.0/i686-linux
  /usr/local/lib/perl5/site_perl/5.10.0
  .

@p5pRT
Copy link
Author

p5pRT commented Jan 26, 2008

From @iabyn

On Fri, Jan 25, 2008 at 02​:20​:23PM -0800, Heinz Knutzen wrote​:

I tracked this down to a small test case​:

perl -e 'print "x"x200000'>large
time perl5.8.8 -C -e '$_=<>;while(1){/\G./g;pos||last;}' large
time perl5.10.0 -C -e '$_=<>;while(1){/\G./g;pos||last;}' large

On my computer with AMD Athlon BE-2350 I get these results​:
perl5.8.8​: 0.25 seconds
perl5.10.0​: 490.00 seconds
--> a factor of nearly 2000!

Profiling shows that 99.5% of the time is spent in Perl_utf8_length();
which is called 3 times per loop. I guess there's something up with the
utf8 length caching.

--
There's a traditional definition of a shyster​: a lawyer who, when the law
is against him, pounds on the facts; when the facts are against him,
pounds on the law; and when both the facts and the law are against him,
pounds on the table.
  -- Eben Moglen referring to SCO

@p5pRT
Copy link
Author

p5pRT commented Jan 26, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From p5p@spam.wizbit.be

Wanted to do a binary search on this but I'm currently unable to
reproduce it...

The difference between my perl-5.10.0 -V and the one in the bug report​:

My perl​:

  ccversion='', gccversion='3.4.6', gccosandvers=''
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.3.6.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.3.6'

Bug report​:
  ccversion='', gccversion='4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-
16ubuntu2)', gccosandvers=''
  libs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.6.1.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.6.1'

Note that I do not know if the difference is relevant... it might be
something else in the environment...

@p5pRT
Copy link
Author

p5pRT commented May 29, 2009

From alex@chmrr.net

On Thu May 28 10​:39​:27 2009, animator wrote​:

Wanted to do a binary search on this but I'm currently unable to
reproduce it...

My binary search points to ec07b5e as
the culprit.
- Alex

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2009

From alex@chmrr.net

At Sat Jan 26 09​:56​:39 -0500 2008, Dave Mitchell wrote​:

Profiling shows that 99.5% of the time is spent in Perl_utf8_length();
which is called 3 times per loop. I guess there's something up with the
utf8 length caching.

Perl_utf8_length is only called 3 times per loop if perl is compiled
with -DDEBUGGING, which disables the utf8 length cache -- this caught
me, too, leading me to initially think ec07b5e was at fault, because
it added a PL_utf8cache check. I suspect the original reporter ran
one of his tests on a debugging build, which led to such horrid
performance (though the perl -V he posted was not -DDEBUGGING,
strangely). With debugging off, I see​:

$ time perl5.8.8 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m0.273s
user 0m0.220s
sys 0m0.003s

$ time perl-ab455f6 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m3.664s
user 0m3.373s
sys 0m0.003s

$ time perl-6448472 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m15.475s
user 0m13.696s
sys 0m0.017s

$ time bleadperl -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m14.475s
user 0m13.586s
sys 0m0.017s

(for reference, with debugging on, the performance is _abysmal_​:)
$ time bleadperl-DEBUGGING -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 23m6.296s
user 20m48.512s
sys 0m7.713s

Revision 6448472 changed the subroutine that calculates length to
Perl_utf8_length, which the attached patch refactors. I don't exactly
understand why UTF8SKIP is so slow (if I'm chasing macros right, it
should just be an array index), but avoidint it with UTF8_IS_INVARIANT
is a large speed win. This returns it to ab455f6-level speeds​:

$ time bleadperl-patched -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m3.487s
user 0m3.386s
sys 0m0.013s

Revision ab455f6 added an extra cache entry, which causes the
remainder of the slowdown -- the code repeatedly ends up going down
the THREEWAY_SQUARE path of S_utf8_mg_pos_cache_update, which accounts
for the rest of the speed loss (with three guaranteed ugly float
computations per pass).

Unfortunately, and I don't know enough about the problem it intends to
solve to know if this is fixable. Nick, that code is originally
yours -- any insight?

- Alex
--
Networking -- only one letter away from not working

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2009

From alex@chmrr.net

0001-Faster-utf8_length-method-fixes-RT-50250.patch
From 3555d40ff5fd0ef7a4adde39caf181e0974932c5 Mon Sep 17 00:00:00 2001
From: Alex Vandiver <alex@chmrr.net>
Date: Sat, 30 May 2009 12:38:28 -0400
Subject: [PATCH] Faster utf8_length method -- fixes [RT#50250]

UTF8SKIP appears to be a rather slow call; use UTF8_IS_INVARIANT to
skip it whenever possible.  We also move the malformed utf8 check
until after the loop, since it can be checked after the termination
condition, instead of at every pass through the loop.

---
 utf8.c |   28 +++++++++++++++-------------
 1 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/utf8.c b/utf8.c
index 4f4c3ea..b5a3809 100644
--- a/utf8.c
+++ b/utf8.c
@@ -682,7 +682,6 @@ Perl_utf8_length(pTHX_ const U8 *s, const U8 *e)
 {
     dVAR;
     STRLEN len = 0;
-    U8 t = 0;
 
     PERL_ARGS_ASSERT_UTF8_LENGTH;
 
@@ -693,20 +692,23 @@ Perl_utf8_length(pTHX_ const U8 *s, const U8 *e)
     if (e < s)
 	goto warn_and_return;
     while (s < e) {
-	t = UTF8SKIP(s);
-	if (e - s < t) {
-	    warn_and_return:
-	    if (ckWARN_d(WARN_UTF8)) {
-	        if (PL_op)
-		    Perl_warner(aTHX_ packWARN(WARN_UTF8),
+	if (!UTF8_IS_INVARIANT(*s))
+	    s += UTF8SKIP(s);
+	else
+	    s++;
+	len++;
+    }
+
+    if (e != s) {
+	len--;
+        warn_and_return:
+	if (ckWARN_d(WARN_UTF8)) {
+	    if (PL_op)
+		Perl_warner(aTHX_ packWARN(WARN_UTF8),
 			    "%s in %s", unees, OP_DESC(PL_op));
-		else
-		    Perl_warner(aTHX_ packWARN(WARN_UTF8), unees);
-	    }
-	    return len;
+	    else
+		Perl_warner(aTHX_ packWARN(WARN_UTF8), unees);
 	}
-	s += t;
-	len++;
     }
 
     return len;
-- 
1.6.3.204.g8c948

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2009

From @greerga

On Sat, 30 May 2009, Alex Vandiver wrote​:

At Sat Jan 26 09​:56​:39 -0500 2008, Dave Mitchell wrote​:

Profiling shows that 99.5% of the time is spent in Perl_utf8_length();
which is called 3 times per loop. I guess there's something up with the
utf8 length caching.

Perl_utf8_length is only called 3 times per loop if perl is compiled
with -DDEBUGGING, which disables the utf8 length cache [...]

Note that at least for Fedora this is the default build because they pass
"-g" to -Doptimize​:

Summary of my perl5 (revision 5 version 10 subversion 0) configuration​:
  Platform​:
  osname=linux, osvers=2.6.18-128.1.6.el5xen,
archname=i386-linux-thread-multi
  uname='linux xenbuilder4.fedora.phx.redhat.com 2.6.18-128.1.6.el5xen
#1 smp tue mar 24 12​:28​:27 edt 2009 i686 i686 i386 gnulinux '
  config_args='-des -Doptimize=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -Dversion=5.10.0
-Dmyhostname=localhost -Dperladmin=root@​localhost -Dcc=gcc -Dcf_by=Red
Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local
-Dprivlib=/usr/lib/perl5/5.10.0
-Dsitelib=/usr/local/lib/perl5/site_perl/5.10.0
-Dvendorlib=/usr/lib/perl5/vendor_perl/5.10.0
-Darchlib=/usr/lib/perl5/5.10.0/i386-linux-thread-multi
-Dsitearch=/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
-Dvendorarch=/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
-Dinc_version_list=none -Darchname=i386-linux-thread-multi -Duseshrplib
-Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun
-Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less
-isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto
-Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto
-Ud_setservent_r_proto -Dscriptdir=/usr/bin
-Dotherlibdirs=/usr/lib/perl5/site_perl'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
  optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386
-mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing
-pipe -I/usr/local/include -I/usr/include/gdbm'
  ccversion='', gccversion='4.3.2 20081105 (Red Hat 4.3.2-7)',
gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=/lib/libc-2.9.so, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.9'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING MULTIPLICITY PERL_DONT_CREATE_GVSV
  PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
  PERL_TRACK_MEMPOOL PERL_USE_SAFE_PUTENV
USE_ITHREADS
  USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
  Built under linux
  Compiled at Apr 14 2009 07​:24​:03
  @​INC​:
  /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
  /usr/local/lib/perl5/site_perl/5.10.0
  /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.10.0
  /usr/lib/perl5/vendor_perl
  /usr/lib/perl5/5.10.0/i386-linux-thread-multi
  /usr/lib/perl5/5.10.0
  /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
  /usr/lib/perl5/site_perl/5.10.0
  /usr/lib/perl5/site_perl

Granted, their default helped highlight RT #60508, so it isn't all bad.

-George Greer

@p5pRT
Copy link
Author

p5pRT commented Jun 3, 2009

From alex@chmrr.net

At Mon Jun 01 19​:52​:12 -0400 2009, George Greer wrote​:

Note that at least for Fedora this is the default build because they pass
"-g" to -Doptimize​:
[snip]
Granted, their default helped highlight RT #60508, so it isn't all bad.

Hm. That's their call; short of writing a
-DONLY_CHEAP_DEBUGGING_PLEASE which re-enables the cache, they're
getting exactly what they're asking for if they're compiling with -g.
- Alex
--
Networking -- only one letter away from not working

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2009

From @rgs

2009/5/31 Alex Vandiver <alex@​chmrr.net>​:

At Sat Jan 26 09​:56​:39 -0500 2008, Dave Mitchell wrote​:

Profiling shows that 99.5% of the time is spent in Perl_utf8_length();
which is called 3 times per loop. I guess there's something up with the
utf8 length caching.

Perl_utf8_length is only called 3 times per loop if perl is compiled
with -DDEBUGGING, which disables the utf8 length cache -- this caught
me, too, leading me to initially think ec07b5e was at fault, because
it added a PL_utf8cache check.  I suspect the original reporter ran
one of his tests on a debugging build, which led to such horrid
performance (though the perl -V he posted was not -DDEBUGGING,
strangely).  With debugging off, I see​:

$ time perl5.8.8 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   0m0.273s
user   0m0.220s
sys    0m0.003s

$ time perl-ab455f6 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   0m3.664s
user   0m3.373s
sys    0m0.003s

$ time perl-6448472 -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   0m15.475s
user   0m13.696s
sys    0m0.017s

$ time bleadperl -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   0m14.475s
user   0m13.586s
sys    0m0.017s

(for reference, with debugging on, the performance is _abysmal_​:)
$ time bleadperl-DEBUGGING -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   23m6.296s
user   20m48.512s
sys    0m7.713s

Revision 6448472 changed the subroutine that calculates length to
Perl_utf8_length, which the attached patch refactors.  I don't exactly
understand why UTF8SKIP is so slow (if I'm chasing macros right, it
should just be an array index), but avoidint it with UTF8_IS_INVARIANT
is a large speed win.  This returns it to ab455f6-level speeds​:

$ time bleadperl-patched -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real   0m3.487s
user   0m3.386s
sys    0m0.013s

Revision ab455f6 added an extra cache entry, which causes the
remainder of the slowdown -- the code repeatedly ends up going down
the THREEWAY_SQUARE path of S_utf8_mg_pos_cache_update, which accounts
for the rest of the speed loss (with three guaranteed ugly float
computations per pass).

Unfortunately, and I don't know enough about the problem it intends to
solve to know if this is fixable.  Nick, that code is originally
yours -- any insight?

Looks good enough to be applied, thanks, as
8e91ec7

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2009

@rgs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jun 6, 2009
@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2009

From @nwc10

On Sat, May 30, 2009 at 07​:16​:46PM -0400, Alex Vandiver wrote​:

Revision 6448472 changed the subroutine that calculates length to
Perl_utf8_length, which the attached patch refactors. I don't exactly
understand why UTF8SKIP is so slow (if I'm chasing macros right, it
should just be an array index), but avoidint it with UTF8_IS_INVARIANT
is a large speed win. This returns it to ab455f6-level speeds​:

$ time bleadperl-patched -w -C -e '$_=<>;while(1){/\G./g;pos||last;}' ../large
real 0m3.487s
user 0m3.386s
sys 0m0.013s

Revision ab455f6 added an extra cache entry, which causes the
remainder of the slowdown -- the code repeatedly ends up going down

You have to hand speed data from ab455f6 and its immediate parent 8f230aa?

the THREEWAY_SQUARE path of S_utf8_mg_pos_cache_update, which accounts
for the rest of the speed loss (with three guaranteed ugly float
computations per pass).

ugly doesn't necessarily mean slow​:

#define THREEWAY_SQUARE(a,b,c,d) \
  ((float)((d) - (c))) * ((float)((d) - (c))) \
  + ((float)((c) - (b))) * ((float)((c) - (b))) \
  + ((float)((b) - (a))) * ((float)((b) - (a)))

3 subtractions, 3 multiplies, 2 adds. Nothing fancy.

Unfortunately, and I don't know enough about the problem it intends to
solve to know if this is fixable. Nick, that code is originally
yours -- any insight?

The THREEWAY_SQUARE path is where the cache code knows that it has now has
3 position pairs, but there are only two slots in the cache, so it's trying
to guess what will be most useful to remember.

I don't have any insight above what's in the comments of the code I wrote
quite a while ago. What you're seeing is an interaction between something
I wrote, and something SADAHIRO Tomoyuki wrote.

I had assumed that having 2 cache positions would be better than one.
Is this a particular pathological case where the cache is thrashing?
Or does it thrash quite a lot, and so we'd actually be better with one
cached pair?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2009

From alex@chmrr.net

At Sat Jun 06 08​:17​:50 -0400 2009, Nicholas Clark wrote​:

You have to hand speed data from ab455f6 and its immediate parent 8f230aa?

Here is a chart of various perls compiled with -Doptimize='-ggdb -g3'
-DEBUGGING=none (which accomplishes the recently-added TODO of
splitting perl debugging and C debugging), at various string lengths​:

length 5.8.8 8f230aa ab455f6 6448472 f699e95 8e91ec7
10000 0.02240 0.02372 0.06486 0.11841 0.13055 0.06473
15000 0.03023 0.03209 0.11864 0.24367 0.25378 0.11810
20000 0.03385 0.03877 0.19152 0.42500 0.43528 0.20740
25000 0.04073 0.04530 0.30155 0.64568 0.63335 0.27076
50000 0.07052 0.08146 1.06689 2.38080 2.38303 0.96817
150000 0.19605 0.24511 8.34487 21.0554 21.3973 6.61086
200000 0.24872 0.32866 14.7482 37.4100 39.8097 11.1907
250000 0.30480 0.40801 22.8625 58.1171 61.2601 17.2045
300000 0.36523 0.44259 32.9970 83.9910 83.9212 24.4791
350000 0.42694 0.52846 44.6930 113.299 114.409 32.9523
400000 0.50617 0.59822 58.7526 149.829 149.186 43.0359

3 subtractions, 3 multiplies, 2 adds. Nothing fancy.

However, the macro is called three time per cache miss, which is
called once per iteration. At least in this tight loop, it is a hot
spot.

I had assumed that having 2 cache positions would be better than one.
Is this a particular pathological case where the cache is thrashing?
Or does it thrash quite a lot, and so we'd actually be better with one
cached pair?

As far as I can tell, it thrashes on _every_ call to pos in this
context, which leads to the slowdown. I have a couple guesses on how
to speed it up while keeping the same number of cache entries, but I
won't have tuits to really beat on it until I next week sometime.

- Alex
--
Networking -- only one letter away from not working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant