Skip Menu |
Report information
Id: 73464
Status: open
Priority: 0/
Queue: perl5

Owner: demerphq <demerphq [at] gmail.com>
Requestors: nicholas <nick [at] ccl4.org>
talby <talby [at] trap.mtview.ca.us>
Cc:
AdminCc:

Operating System: All
PatchStatus: Applied
Severity: low
Type: regex
Perl Version: (no value)
Fixed In: 5.25.3



Subject: Safe signals mean that regexps can't be interupted
Date: Tue, 9 Mar 2010 15:30:04 +0000
To: perlbug [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.5k
As of (I think) perl 5.8.0, we're using "safe signals", where the actual C signal handler only sets a flag, and the signal is actually handled at the "next convenient moment" when it is safe, typically in the run loop, but (currently) also in many IO operations, and in one part of the signal handler code in mg.c This has the side effect that alarm signals can no longer be used to interrupt the regexp engine. Something like this will run "forever": $ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;' It seems that a viable solution to this is to dispatch signals at convenient points within the regexp engine. That way, if an alarm signal fires, the handler can run, and if *it* calls die, the regexp will be aborted. My assumption is that "forever" regexps are only slow because they backtrack repeatedly - the actual matching is not. Hence, I think that the best fix is actually pretty small: diff --git a/regexec.c b/regexec.c index 17a0dc6..bca2e65 100644 --- a/regexec.c +++ b/regexec.c @@ -5521,6 +5521,7 @@ no_silent: yes_state = st->u.yes.prev_yes_state; state_num = st->resume_state + 1; /* failure = success + 1 */ + PERL_ASYNC_CHECK(); goto reenter_switch; } result = 0; With that patch I see this: $ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;' Timeout at -e line 1. real 0m3.002s user 0m3.000s sys 0m0.004s All tests pass, and (it happens that) perlbench can't spot a difference. Nicholas Clark
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 10:35:01 +0000
To: perl5-porters [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 835b
On Tue, Mar 09, 2010 at 07:30:54AM -0800, Nicholas Clark wrote: Show quoted text
> My assumption is that "forever" regexps are only slow because they backtrack > repeatedly - the actual matching is not. Hence, I think that the best fix is > actually pretty small: > > diff --git a/regexec.c b/regexec.c > index 17a0dc6..bca2e65 100644 > --- a/regexec.c > +++ b/regexec.c > @@ -5521,6 +5521,7 @@ no_silent: > yes_state = st->u.yes.prev_yes_state; > > state_num = st->resume_state + 1; /* failure = success + 1 */ > + PERL_ASYNC_CHECK();
Ben Morrow (correctly) observes that this would mean that signal handlers now couldn't use regexp. I'm wondering if the "save regexp engine state" code that swash init uses is good enough. Show quoted text
> All tests pass, and (it happens that) perlbench can't spot a difference.
Need more tests :-) Nicholas Clark
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 12:07:28 +0000
To: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 703b
On Wed, Mar 10, 2010 at 10:35:01AM +0000, Nicholas Clark wrote: Show quoted text
> Ben Morrow (correctly) observes that this would mean that signal handlers > now couldn't use regexp. I'm wondering if the "save regexp engine state" > code that swash init uses is good enough.
Hopefully making regexes fully re-entrant will be something I'll get a chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug Squashing (assuming I haven't fled to Bermuda with the loot in the meantime). -- More than any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly. -- Woody Allen
CC: perl5-porters [...] perl.org
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Wed, 10 Mar 2010 16:42:29 +0000
To: Dave Mitchell <davem [...] iabyn.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 797b
On Wed, Mar 10, 2010 at 12:07:28PM +0000, Dave Mitchell wrote: Show quoted text
> On Wed, Mar 10, 2010 at 10:35:01AM +0000, Nicholas Clark wrote:
> > Ben Morrow (correctly) observes that this would mean that signal handlers > > now couldn't use regexp. I'm wondering if the "save regexp engine state" > > code that swash init uses is good enough.
> > Hopefully making regexes fully re-entrant will be something I'll get a > chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug > Squashing (assuming I haven't fled to Bermuda with the loot in the > meantime).
I'd recommend North Cyprus (as I often do in the office). We don't have an extradition treaty with them. The weather is probably as nice, but they don't have a cricket team capable of beating the USA. (If that matters) Nicholas Clark
Subject: complex regexes can delay signal handlers
Date: Fri, 12 Mar 2010 14:20:55 -0800
To: perlbug [...] perl.org
From: Robert Stone <talby [...] trap.mtview.ca.us>
Download (untitled) / with headers
text/plain 5.1k
This is a bug report for perl from talby@trap.mtview.ca.us, generated with the help of perlbug 1.36 running under perl 5.10.0. ----------------------------------------------------------------- Signal handling appears to be delayed during the processing of some regular expressions for what can be long periods of time. I have been unable to find any way to break out of regex processing that does not segfault perl or corrupt the stack (either immediately or during the next regex match attempt). I originally discovered my problem in perl v5.8.8, but have reproduced it with v5.10.0 and v5.10.1. I have a sample which seems like it should prevent the regex from spending more than roughly 10 seconds before giving up and moving on, but it takes significantly longer for me. #!/usr/bin/perl use strict; use warnings; my $s = join '', qw(xxxxxtxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxtxxtxxxxxtxxtxxxxxxxxxxtxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxtxxxtxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxtxxxxxxxxt xxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxtxtxxxxxxxxxxxxxtxtxxxxxxxxxxtxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxx xtxxxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx xxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxtxtx xxxxxxxxxxxtxxxxxxxtxttttttxxxxtxtxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxt xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxxxxxxtxxxtxxxxxxxtxxxxxxxxxxxxx xttxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx xxxxxxxxxtxxxxtxxxxtxxxxxtxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx); my $re = qr/(?:.{0,1000}){0,1000}tttttt/s; # hangs #my $re = qr/(?{ 1 while 1 })/s; # segfaults my $res; eval { local $SIG{ALRM} = sub { die "got impatient" }; alarm(10); $res = $s =~ $re; alarm(0); }; $res = $@ ? 'failed' : int $res; print "result: $res\n"; ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.10.0: Configured by Debian Project at Thu Oct 1 22:38:45 UTC 2009. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.24-23-server, archname=i486-linux-gnu-thread-multi uname='linux vernadsky 2.6.24-23-server #1 smp wed apr 1 22:22:14 utc 2009 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='4.4.1', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0 gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib' Locally applied patches: --- @INC for perl 5.10.0: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . --- Environment for perl 5.10.0: HOME=/home/talby LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/talby/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/talby/bin PERL_BADLANG (unset) SHELL=/bin/bash
Subject: Re: [perl #73546] complex regexes can delay signal handlers
Date: Sat, 13 Mar 2010 11:08:05 -0800
To: perl5-porters [...] perl.org
From: Michael G Schwern <schwern [...] pobox.com>
Download (untitled) / with headers
text/plain 1.5k
talby@trap.mtview.ca.us (via RT) wrote: Show quoted text
> Signal handling appears to be delayed during the processing of some > regular expressions for what can be long periods of time. I have been > unable to find any way to break out of regex processing that does not > segfault perl or corrupt the stack (either immediately or during the > next regex match attempt). > > I originally discovered my problem in perl v5.8.8, but have reproduced > it with v5.10.0 and v5.10.1. > > I have a sample which seems like it should prevent the regex from > spending more than roughly 10 seconds before giving up and moving on, > but it takes significantly longer for me.
Thank you for your report. This is likely a consequence of "safe signals" which were introduced in 5.8. Signals are only handled between op codes, and a regex is a single op. The behavior you're seeing is a known consequence. Sorry for the inconvenience. Safe signals can be disabled by setting the PERL_SIGNALS environment variable to "unsafe", but, as you may have guessed, this is not entirely safe. If the signal handler does any memory allocation you risk a segfault. Unfortunately, the signal handling policy must be set at the start of the program so it cannot be selectively turned on an off. If all you're doing is trapping an alarm you should be ok. Please read http://perldoc.perl.org/perlipc.html#Deferred-Signals-%28Safe-Signals%29 for more information. -- Reality is that which, when you stop believing in it, doesn't go away. -- Phillip K. Dick
Subject: Re: [perl #73546] complex regexes can delay signal handlers
Date: Sat, 13 Mar 2010 20:51:59 -0800
To: Michael G Schwern via RT <perlbug-followup [...] perl.org>
From: Robert Stone <talby [...] trap.mtview.ca.us>
Download (untitled) / with headers
text/plain 511b
Your explanation fits everything I've observed. Thanks very much for the additional insight! The workaround will be difficult to employ in my application, but it's a foothold I think I can work with. Michael G Schwern via RT wrote: Show quoted text
> > Thank you for your report. This is likely a consequence of "safe signals" > which were introduced in 5.8. Signals are only handled between op codes, and > a regex is a single op. The behavior you're seeing is a known consequence. > Sorry for the inconvenience. >
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 867b
On Wed Mar 10 02:35:58 2010, nicholas wrote: Show quoted text
> On Tue, Mar 09, 2010 at 07:30:54AM -0800, Nicholas Clark wrote: >
> > My assumption is that "forever" regexps are only slow because they
> backtrack
> > repeatedly - the actual matching is not. Hence, I think that the
> best fix is
> > actually pretty small: > > > > diff --git a/regexec.c b/regexec.c > > index 17a0dc6..bca2e65 100644 > > --- a/regexec.c > > +++ b/regexec.c > > @@ -5521,6 +5521,7 @@ no_silent: > > yes_state = st->u.yes.prev_yes_state; > > > > state_num = st->resume_state + 1; /* failure = success + 1 */ > > + PERL_ASYNC_CHECK();
> > Ben Morrow (correctly) observes that this would mean that signal > handlers > now couldn't use regexp.
But the regexp engine has been made re├źntrant now, by commit 91332126. Does that mean your patch will work now, or does it still need a bit of work?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 113b
Nicholas, any thoughts on this? It seems pretty straightforward, but I'm not familiar with the regex code at all.
CC: talby [...] trap.mtview.ca.us
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Tue, 3 Jul 2012 14:25:21 +0100
To: Jesse Luehrs via RT <perlbug-followup [...] perl.org>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 496b
On Mon, Jul 02, 2012 at 05:13:28PM -0700, Jesse Luehrs via RT wrote: Show quoted text
> Nicholas, any thoughts on this? It seems pretty straightforward, but I'm > not familiar with the regex code at all.
I'm not sufficiently familiar with what *really* needs saving in the regex engine to get this right. I've screwed things up before there. Also, not sure if the advice about North Cyprus is still valid, for practical purposes: https://rt.perl.org/rt3/Ticket/Display.html?id=73464#txn-667128 Nicholas Clark
CC: Jesse Luehrs via RT <perlbug-followup [...] perl.org>, talby [...] trap.mtview.ca.us
Subject: Re: [perl #73464] Safe signals mean that regexps can't be interupted
Date: Tue, 15 Mar 2016 20:56:29 +0100
From: demerphq <demerphq [...] gmail.com>
To: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 861b
On 3 July 2012 at 15:25, Nicholas Clark <nick@ccl4.org> wrote: Show quoted text
> On Mon, Jul 02, 2012 at 05:13:28PM -0700, Jesse Luehrs via RT wrote:
>> Nicholas, any thoughts on this? It seems pretty straightforward, but I'm >> not familiar with the regex code at all.
> > I'm not sufficiently familiar with what *really* needs saving in the regex > engine to get this right. I've screwed things up before there. > > Also, not sure if the advice about North Cyprus is still valid, for practical > purposes: https://rt.perl.org/rt3/Ticket/Display.html?id=73464#txn-667128
As far as I know we are properly re-entrant now, so I think I will apply this patch assuming it passes tests. I played around with trying to trigger a problem by using the regex engine from a signal handler and I couldnt get anything to fail. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 421b
On Tue Mar 15 12:56:55 2016, demerphq wrote: Show quoted text
> As far as I know we are properly re-entrant now, so I think I will > apply this patch assuming it passes tests. I played around with trying > to trigger a problem by using the regex engine from a signal handler > and I couldnt get anything to fail. > > Yves
It appears that Yves applied this patch in 8df928d23. Is there any more to be done? -- Respectfully, Dan Collins


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org