Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex optimization affected by threads #9373

Closed
p5pRT opened this issue Jun 10, 2008 · 6 comments · Fixed by #20656
Closed

regex optimization affected by threads #9373

p5pRT opened this issue Jun 10, 2008 · 6 comments · Fixed by #20656

Comments

@p5pRT
Copy link

p5pRT commented Jun 10, 2008

Migrated from rt.perl.org#55600 (status was 'open')

Searchable as RT55600$

@p5pRT
Copy link
Author

p5pRT commented Jun 10, 2008

From jgmyers@proofpoint.com

Created by jgmyers@pong.us.proofpoint.com

The following test program shows a regular expression running quickly
in the main thread, but extremely slowly when run in a thread. When
run in a thread, it for some reason doesn't follow the find_byclass()
path in the regular expression engine, backtracking like mad.

This reproduces under 5.10.0 as well.

use threads;

sub start_thread {
  split /[.;]+[\'\"]+/, $_[0];
}

my $buffer = '.' x 15000;

if ($ARGV[0]) {
  my $thr = threads->create('start_thread', $buffer);
  $thr->join();
} else {
  start_thread $buffer;
}

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.8:

Configured by jgmyers at Tue Feb 13 10:14:49 PST 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.9-42.0.8.elsmp, 
archname=i686-linux-thread-multi
    uname='linux pong 2.6.9-42.0.8.elsmp #1 smp tue jan 30 12:33:47 est 
2007 i686 i686 i386 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc-4.1', ccflags ='-D_REENTRANT -D_GNU_SOURCE 
-DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe 
-Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING 
-fno-strict-aliasing -pipe -Wdeclaration-after-statement 
-I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.1.1', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc-4.1', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.4.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:



@INC for perl v5.8.8:
    /u/jgmyers/perl/lib/5.8.8/i686-linux-thread-multi
    /u/jgmyers/perl/lib/5.8.8
    /u/jgmyers/perl/lib/site_perl/5.8.8/i686-linux-thread-multi
    /u/jgmyers/perl/lib/site_perl/5.8.8
    /u/jgmyers/perl/lib/site_perl
    .


Environment for perl v5.8.8:
    HOME=/u/jgmyers
    LANG=en_US.utf8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/tools/x/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/u/jgmyers/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2008

From @iabyn

On Tue, Jun 10, 2008 at 03​:22​:40PM -0700, John Gardiner Myers wrote​:

The following test program shows a regular expression running quickly
in the main thread, but extremely slowly when run in a thread. When
run in a thread, it for some reason doesn't follow the find_byclass()
path in the regular expression engine, backtracking like mad.

This reproduces under 5.10.0 as well.

use threads;

sub start_thread {
split /[.;]+[\'\"]+/, $_[0];
}

my $buffer = '.' x 15000;

if ($ARGV[0]) {
my $thr = threads->create('start_thread', $buffer);
$thr->join();
} else {
start_thread $buffer;
}

When the thread is created and the regex is dup'ed, the
rx->pprivate->regstclass field of the duped regex is set to NULL, whcih
disables the optimisation.

The place where it might get set to something, Perl_regdupe_internal(),
doesn't because data->what[] only contains 's' and 's', which don't
qualify for getting regstclass copied.

This is a bit outside my area of expertise.

--
I before E. Except when it isn't.

@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2008

The RT System itself - Status changed from 'new' to 'open'

@xenu xenu removed the affects-5.8 label Nov 19, 2021
demerphq added a commit that referenced this issue Dec 30, 2022
We were only handling "synthetic start classes", not ones that are in
the program itself, because those dont have an entry in the data array.
So after copying the program after ruling out that the regstclass is
synthetic we can assume that if its non-null it points into the program
itself, and simply set it up the copy accordingly.

Fixes #9373
@demerphq
Copy link
Collaborator

I believe that this is fixed by #20656

In short, we werent setting up the stclass entries that are actually pointers into the compiled program, only the synthetic start classes that create a data entry. Thanks to @iabyn for the pointer on what was wrong here, it made it easy to fix.

@todd-richmond
Copy link

thx! We get burned by this once a year as some feature that works perfectly single-threaded, blows up and takes up to several minutes when multi-threaded - thereby causing our watchdog to kill the service. It's completely non-intuitive and often takes quite a while to realize is the same bug since our app can run in both modes
We build our own perl so we'll apply a patch

demerphq added a commit that referenced this issue Jan 6, 2023
We were only handling "synthetic start classes", not ones that are in
the program itself, because those dont have an entry in the data array.
So after copying the program after ruling out that the regstclass is
synthetic we can assume that if its non-null it points into the program
itself, and simply set it up the copy accordingly.

Fixes #9373
@demerphq
Copy link
Collaborator

demerphq commented Jan 6, 2023

@todd-richmond #20656 should backport nicely IMO. It looks like this goes back to the beginning of threads. :-( Sorry it took so long to realize it was easily fixable.

pjacklam pushed a commit to pjacklam/perl5 that referenced this issue May 20, 2023
We were only handling "synthetic start classes", not ones that are in
the program itself, because those dont have an entry in the data array.
So after copying the program after ruling out that the regstclass is
synthetic we can assume that if its non-null it points into the program
itself, and simply set it up the copy accordingly.

Fixes Perl#9373
pjacklam pushed a commit to pjacklam/perl5 that referenced this issue May 20, 2023
We were only handling "synthetic start classes", not ones that are in
the program itself, because those dont have an entry in the data array.
So after copying the program after ruling out that the regstclass is
synthetic we can assume that if its non-null it points into the program
itself, and simply set it up the copy accordingly.

Fixes Perl#9373
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this issue Jul 10, 2023
We were only handling "synthetic start classes", not ones that are in
the program itself, because those dont have an entry in the data array.
So after copying the program after ruling out that the regstclass is
synthetic we can assume that if its non-null it points into the program
itself, and simply set it up the copy accordingly.

Fixes Perl#9373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants