Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex and utf-8 performance problem #5684

Closed
p5pRT opened this issue Jun 27, 2002 · 3 comments
Closed

regex and utf-8 performance problem #5684

p5pRT opened this issue Jun 27, 2002 · 3 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 27, 2002

Migrated from rt.perl.org#10000 (status was 'resolved')

Searchable as RT10000$

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2002

From stefan@hello-penguin.com

Created by stefan@hello-penguin.com

Some re's are extremly slow on 5.8.0pre1 when
using utf-8...

----------------------------------------
use Convert​::Scalar;
use Benchmark;

sub wrap_text {
  my $x;
  for (split /\n/, $_[0]) {
  s/\G(.{1,$_[1]})(?​:\s+|$)/$1\n/gm;
  $x .= $_;
  }
  $x =~ s/[ \t\015]+$//g;
  $x;
}
my ($x,$y);

$x = "hello\t" x30;
$x = "$x\n"x250;

$y = $x;
Convert​::Scalar​::utf8_upgrade($y);

timethese(300, { "ascii" => sub { wrap_text $x, 80 },
  "utf-8" => sub { wrap_text $y, 80 }
});

----------------------------------------

Benchmark​: timing 300 iterations of ascii, utf-8...
  ascii​: 3 wallclock secs ( 2.91 usr + 0.00 sys = 2.91 CPU) @​ 103.09/s (n=300)
  utf-8​: 26 wallclock secs (26.24 usr + 0.00 sys = 26.24 CPU) @​ 11.43/s (n=300)

 

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.0:

Configured by root at Fri Jun 14 22:39:17 MEST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0 patch 17247) configuration:
  Platform:
    osname=linux, osvers=2.4.19-pre9, archname=i686-linux
    uname='linux stefan 2.4.19-pre9 #1 sam jun 1 12:43:24 mest 2002 i686 unknown '
    config_args='-Dprefix=/usr/websys -Doptimize=-O3 -march=i686 -momit-leaf-frame-pointer -Duseperlio=true -Dusethreads=undef -Dperladmin=stefan@hello-penguin.com -Dusemultiplicity=undef -Dusedevel -Dusemymalloc=true -Duselargefiles=true -Duseposix=true -Dlocincpth=/usr/websys/include /opt/include /usr/local/include -Dloclibpth=/usr/websys/lib /opt/lib /usr/local/lib -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/websys/include -I/opt/include -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3 -march=i686 -momit-leaf-frame-pointer',
    cppflags='-fno-strict-aliasing -I/usr/websys/include -I/opt/include -I/usr/local/include'
    ccversion='', gccversion='2.95.4 20011002 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/websys/lib -L/opt/lib -L/usr/local/lib'
    libpth=/usr/websys/lib /opt/lib /usr/local/lib /lib /usr/lib
    libs=-lnsl -ldb -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.2.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/websys/lib -L/opt/lib -L/usr/local/lib'

Locally applied patches:
    DEVEL17237


@INC for perl v5.8.0:
    /usr/websys/lib/perl5/5.8.0/i686-linux
    /usr/websys/lib/perl5/5.8.0
    /usr/websys/lib/perl5/site_perl/5.8.0/i686-linux
    /usr/websys/lib/perl5/site_perl/5.8.0
    /usr/websys/lib/perl5/site_perl
    .


Environment for perl v5.8.0:
    HOME=/.localvol000/home/stefan
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=de_DE.ISO-8859-1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/websys/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bin:/usr/X11R6/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/usr/bin/X11
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2002

From @jhi

Hi, I now submitted a patch that brings the ASCII and UTF-8
cases within 30-60% of each other (though even before this patch
there had been considerable speedup since 5.8.0). (I was unable
to reproduce quite so dramatic speed difference as you had, though,
I could see only a factor of 2 to 3; maybe a difference in glibc versions, or CPUs, or something.) I've got further ideas of how
to speed up the Unicode regexes even more, but the ideas need more
work. But this problem ticket I'm marking as resolved (no need to
reply). The patch will be in Perl 5.8.1, whenever that happens.

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2002

@jhi - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant