Skip Menu |
 
Report information
Id: 73464
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: nicholas <nick [at] ccl4.org>
talby <talby [at] trap.mtview.ca.us>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: regex
Perl Version: (no value)
Fixed In: (no value)



Subject: Safe signals mean that regexps can't be interupted
Date: Tue, 9 Mar 2010 15:30:04 +0000
To: perlbug [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.5k
As of (I think) perl 5.8.0, we're using "safe signals", where the actual C signal handler only sets a flag, and the signal is actually handled at the "next convenient moment" when it is safe, typically in the run loop, but (currently) also in many IO operations, and in one part of the signal handler code in mg.c This has the side effect that alarm signals can no longer be used to interrupt the regexp engine. Something like this will run "forever": $ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;' It seems that a viable solution to this is to dispatch signals at convenient points within the regexp engine. That way, if an alarm signal fires, the handler can run, and if *it* calls die, the regexp will be aborted. My assumption is that "forever" regexps are only slow because they backtrack repeatedly - the actual matching is not. Hence, I think that the best fix is actually pretty small: diff --git a/regexec.c b/regexec.c index 17a0dc6..bca2e65 100644 --- a/regexec.c +++ b/regexec.c @@ -5521,6 +5521,7 @@ no_silent: yes_state = st->u.yes.prev_yes_state; state_num = st->resume_state + 1; /* failure = success + 1 */ + PERL_ASYNC_CHECK(); goto reenter_switch; } result = 0; With that patch I see this: $ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;' Timeout at -e line 1. real 0m3.002s user 0m3.000s sys 0m0.004s All tests pass, and (it happens that) perlbench can't spot a difference. Nicholas Clark
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 10:35:01 +0000
To: perl5-porters [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 835b
On Tue, Mar 09, 2010 at 07:30:54AM -0800, Nicholas Clark wrote: Show quoted text
> My assumption is that "forever" regexps are only slow because they backtrack > repeatedly - the actual matching is not. Hence, I think that the best fix is > actually pretty small: > > diff --git a/regexec.c b/regexec.c > index 17a0dc6..bca2e65 100644 > --- a/regexec.c > +++ b/regexec.c > @@ -5521,6 +5521,7 @@ no_silent: > yes_state = st->u.yes.prev_yes_state; > > state_num = st->resume_state + 1; /* failure = success + 1 */ > + PERL_ASYNC_CHECK();
Ben Morrow (correctly) observes that this would mean that signal handlers now couldn't use regexp. I'm wondering if the "save regexp engine state" code that swash init uses is good enough. Show quoted text
> All tests pass, and (it happens that) perlbench can't spot a difference.
Need more tests :-) Nicholas Clark
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 12:07:28 +0000
To: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 703b
On Wed, Mar 10, 2010 at 10:35:01AM +0000, Nicholas Clark wrote: Show quoted text
> Ben Morrow (correctly) observes that this would mean that signal handlers > now couldn't use regexp. I'm wondering if the "save regexp engine state" > code that swash init uses is good enough.
Hopefully making regexes fully re-entrant will be something I'll get a chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug Squashing (assuming I haven't fled to Bermuda with the loot in the meantime). -- More than any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly. -- Woody Allen
CC: perl5-porters [...] perl.org
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 16:42:29 +0000
To: Dave Mitchell <davem [...] iabyn.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 797b
On Wed, Mar 10, 2010 at 12:07:28PM +0000, Dave Mitchell wrote: Show quoted text
> On Wed, Mar 10, 2010 at 10:35:01AM +0000, Nicholas Clark wrote:
> > Ben Morrow (correctly) observes that this would mean that signal handlers > > now couldn't use regexp. I'm wondering if the "save regexp engine state" > > code that swash init uses is good enough.
> > Hopefully making regexes fully re-entrant will be something I'll get a > chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug > Squashing (assuming I haven't fled to Bermuda with the loot in the > meantime).
I'd recommend North Cyprus (as I often do in the office). We don't have an extradition treaty with them. The weather is probably as nice, but they don't have a cricket team capable of beating the USA. (If that matters) Nicholas Clark
Subject: complex regexes can delay signal handlers
Date: Fri, 12 Mar 2010 14:20:55 -0800
To: perlbug [...] perl.org
From: Robert Stone <talby [...] trap.mtview.ca.us>
Download (untitled) / with headers
text/plain 5.1k
This is a bug report for perl from talby@trap.mtview.ca.us, generated with the help of perlbug 1.36 running under perl 5.10.0. ----------------------------------------------------------------- Signal handling appears to be delayed during the processing of some regular expressions for what can be long periods of time. I have been unable to find any way to break out of regex processing that does not segfault perl or corrupt the stack (either immediately or during the next regex match attempt). I originally discovered my problem in perl v5.8.8, but have reproduced it with v5.10.0 and v5.10.1. I have a sample which seems like it should prevent the regex from spending more than roughly 10 seconds before giving up and moving on, but it takes significantly longer for me. #!/usr/bin/perl use strict; use warnings; my $s = join '', qw(xxxxxtxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxtxxtxxxxxtxxtxxxxxxxxxxtxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxtxxxtxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxtxxxxxxxxt xxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxtxtxxxxxxxxxxxxxtxtxxxxxxxxxxtxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxx xtxxxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx xxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxtxtx xxxxxxxxxxxtxxxxxxxtxttttttxxxxtxtxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxt xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxxxxxxtxxxtxxxxxxxtxxxxxxxxxxxxx xttxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx xxxxxxxxxtxxxxtxxxxtxxxxxtxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx); my $re = qr/(?:.{0,1000}){0,1000}tttttt/s; # hangs #my $re = qr/(?{ 1 while 1 })/s; # segfaults my $res; eval { local $SIG{ALRM} = sub { die "got impatient" }; alarm(10); $res = $s =~ $re; alarm(0); }; $res = $@ ? 'failed' : int $res; print "result: $res\n"; ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.10.0: Configured by Debian Project at Thu Oct 1 22:38:45 UTC 2009. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.24-23-server, archname=i486-linux-gnu-thread-multi uname='linux vernadsky 2.6.24-23-server #1 smp wed apr 1 22:22:14 utc 2009 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='4.4.1', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0 gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib' Locally applied patches: --- @INC for perl 5.10.0: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . --- Environment for perl 5.10.0: HOME=/home/talby LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/talby/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/talby/bin PERL_BADLANG (unset) SHELL=/bin/bash
Subject: Re: [perl #73546] complex regexes can delay signal handlers
Date: Sat, 13 Mar 2010 11:08:05 -0800
To: perl5-porters [...] perl.org
From: Michael G Schwern <schwern [...] pobox.com>
Download (untitled) / with headers
text/plain 1.5k
talby@trap.mtview.ca.us (via RT) wrote: Show quoted text
> Signal handling appears to be delayed during the processing of some > regular expressions for what can be long periods of time. I have been > unable to find any way to break out of regex processing that does not > segfault perl or corrupt the stack (either immediately or during the > next regex match attempt). > > I originally discovered my problem in perl v5.8.8, but have reproduced > it with v5.10.0 and v5.10.1. > > I have a sample which seems like it should prevent the regex from > spending more than roughly 10 seconds before giving up and moving on, > but it takes significantly longer for me.
Thank you for your report. This is likely a consequence of "safe signals" which were introduced in 5.8. Signals are only handled between op codes, and a regex is a single op. The behavior you're seeing is a known consequence. Sorry for the inconvenience. Safe signals can be disabled by setting the PERL_SIGNALS environment variable to "unsafe", but, as you may have guessed, this is not entirely safe. If the signal handler does any memory allocation you risk a segfault. Unfortunately, the signal handling policy must be set at the start of the program so it cannot be selectively turned on an off. If all you're doing is trapping an alarm you should be ok. Please read http://perldoc.perl.org/perlipc.html#Deferred-Signals-%28Safe-Signals%29 for more information. -- Reality is that which, when you stop believing in it, doesn't go away. -- Phillip K. Dick
Subject: Re: [perl #73546] complex regexes can delay signal handlers
Date: Sat, 13 Mar 2010 20:51:59 -0800
To: Michael G Schwern via RT <perlbug-followup [...] perl.org>
From: Robert Stone <talby [...] trap.mtview.ca.us>
Download (untitled) / with headers
text/plain 511b
Your explanation fits everything I've observed. Thanks very much for the additional insight! The workaround will be difficult to employ in my application, but it's a foothold I think I can work with. Michael G Schwern via RT wrote: Show quoted text
> > Thank you for your report. This is likely a consequence of "safe signals" > which were introduced in 5.8. Signals are only handled between op codes, and > a regex is a single op. The behavior you're seeing is a known consequence. > Sorry for the inconvenience. >
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 867b
On Wed Mar 10 02:35:58 2010, nicholas wrote: Show quoted text
> On Tue, Mar 09, 2010 at 07:30:54AM -0800, Nicholas Clark wrote: >
> > My assumption is that "forever" regexps are only slow because they
> backtrack
> > repeatedly - the actual matching is not. Hence, I think that the
> best fix is
> > actually pretty small: > > > > diff --git a/regexec.c b/regexec.c > > index 17a0dc6..bca2e65 100644 > > --- a/regexec.c > > +++ b/regexec.c > > @@ -5521,6 +5521,7 @@ no_silent: > > yes_state = st->u.yes.prev_yes_state; > > > > state_num = st->resume_state + 1; /* failure = success + 1 */ > > + PERL_ASYNC_CHECK();
> > Ben Morrow (correctly) observes that this would mean that signal > handlers > now couldn't use regexp.
But the regexp engine has been made re├źntrant now, by commit 91332126. Does that mean your patch will work now, or does it still need a bit of work?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 113b
Nicholas, any thoughts on this? It seems pretty straightforward, but I'm not familiar with the regex code at all.
CC: talby [...] trap.mtview.ca.us
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Tue, 3 Jul 2012 14:25:21 +0100
To: Jesse Luehrs via RT <perlbug-followup [...] perl.org>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 496b
On Mon, Jul 02, 2012 at 05:13:28PM -0700, Jesse Luehrs via RT wrote: Show quoted text
> Nicholas, any thoughts on this? It seems pretty straightforward, but I'm > not familiar with the regex code at all.
I'm not sufficiently familiar with what *really* needs saving in the regex engine to get this right. I've screwed things up before there. Also, not sure if the advice about North Cyprus is still valid, for practical purposes: https://rt.perl.org/rt3/Ticket/Display.html?id=73464#txn-667128 Nicholas Clark


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org