Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe signals mean that regexps can't be interrupted #10220

Closed
p5pRT opened this issue Mar 9, 2010 · 14 comments
Closed

Safe signals mean that regexps can't be interrupted #10220

p5pRT opened this issue Mar 9, 2010 · 14 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 9, 2010

Migrated from rt.perl.org#73464 (status was 'open')

Searchable as RT73464$

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2010

From @nwc10

As of (I think) perl 5.8.0, we're using "safe signals", where the actual C
signal handler only sets a flag, and the signal is actually handled at the
"next convenient moment" when it is safe, typically in the run loop, but
(currently) also in many IO operations, and in one part of the signal handler
code in mg.c

This has the side effect that alarm signals can no longer be used to interrupt
the regexp engine.

Something like this will run "forever"​:

$ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;'

It seems that a viable solution to this is to dispatch signals at convenient
points within the regexp engine. That way, if an alarm signal fires, the
handler can run, and if *it* calls die, the regexp will be aborted.

My assumption is that "forever" regexps are only slow because they backtrack
repeatedly - the actual matching is not. Hence, I think that the best fix is
actually pretty small​:

Inline Patch
diff --git a/regexec.c b/regexec.c
index 17a0dc6..bca2e65 100644
--- a/regexec.c
+++ b/regexec.c
@@ -5521,6 +5521,7 @@ no_silent:
 	    yes_state = st->u.yes.prev_yes_state;
 
 	state_num = st->resume_state + 1; /* failure = success + 1 */
+	PERL_ASYNC_CHECK();
 	goto reenter_switch;
     }
     result = 0;


With that patch I see this:

$ time ./perl -we '$SIG{ALRM} = sub {die "Timeout"}; alarm 3; $_ = "a" x 1000 . "b" x 1000 . "c" x 1000; /.*a.*b.*c.*d.*/;'
Timeout at -e line 1.

real 0m3.002s
user 0m3.000s
sys 0m0.004s

All tests pass, and (it happens that) perlbench can't spot a difference.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2010

From @nwc10

On Tue, Mar 09, 2010 at 07​:30​:54AM -0800, Nicholas Clark wrote​:

My assumption is that "forever" regexps are only slow because they backtrack
repeatedly - the actual matching is not. Hence, I think that the best fix is
actually pretty small​:

diff --git a/regexec.c b/regexec.c
index 17a0dc6..bca2e65 100644
--- a/regexec.c
+++ b/regexec.c
@​@​ -5521,6 +5521,7 @​@​ no_silent​:
yes_state = st->u.yes.prev_yes_state;

 state\_num = st\->resume\_state \+ 1; /\* failure = success \+ 1 \*/

+ PERL_ASYNC_CHECK();

Ben Morrow (correctly) observes that this would mean that signal handlers
now couldn't use regexp. I'm wondering if the "save regexp engine state"
code that swash init uses is good enough.

All tests pass, and (it happens that) perlbench can't spot a difference.

Need more tests :-)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2010

From @iabyn

On Wed, Mar 10, 2010 at 10​:35​:01AM +0000, Nicholas Clark wrote​:

Ben Morrow (correctly) observes that this would mean that signal handlers
now couldn't use regexp. I'm wondering if the "save regexp engine state"
code that swash init uses is good enough.

Hopefully making regexes fully re-entrant will be something I'll get a
chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug
Squashing (assuming I haven't fled to Bermuda with the loot in the
meantime).

--
More than any other time in history, mankind faces a crossroads. One path
leads to despair and utter hopelessness. The other, to total extinction.
Let us pray we have the wisdom to choose correctly.
  -- Woody Allen

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2010

From @nwc10

On Wed, Mar 10, 2010 at 12​:07​:28PM +0000, Dave Mitchell wrote​:

On Wed, Mar 10, 2010 at 10​:35​:01AM +0000, Nicholas Clark wrote​:

Ben Morrow (correctly) observes that this would mean that signal handlers
now couldn't use regexp. I'm wondering if the "save regexp engine state"
code that swash init uses is good enough.

Hopefully making regexes fully re-entrant will be something I'll get a
chance to look at in my Summer of Code^W^W^W Spring and Summer of Bug
Squashing (assuming I haven't fled to Bermuda with the loot in the
meantime).

I'd recommend North Cyprus (as I often do in the office). We don't have an
extradition treaty with them. The weather is probably as nice, but they don't
have a cricket team capable of beating the USA. (If that matters)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 12, 2010

From talby@trap.mtview.ca.us

This is a bug report for perl from talby@​trap.mtview.ca.us,
generated with the help of perlbug 1.36 running under perl 5.10.0.


Signal handling appears to be delayed during the processing of some
regular expressions for what can be long periods of time. I have been
unable to find any way to break out of regex processing that does not
segfault perl or corrupt the stack (either immediately or during the
next regex match attempt).

I originally discovered my problem in perl v5.8.8, but have reproduced
it with v5.10.0 and v5.10.1.

I have a sample which seems like it should prevent the regex from
spending more than roughly 10 seconds before giving up and moving on,
but it takes significantly longer for me.

#!/usr/bin/perl
use strict;
use warnings;
my $s = join '', qw(xxxxxtxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxtxxtxxxxxtxxtxxxxxxxxxxtxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxxxxxtxxxtxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxtxxxxxxxxt
  xxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  xxxxtxtxxxxxxxxxxxxxtxtxxxxxxxxxxtxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxx
  xtxxxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx
  xxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxtxtx
  xxxxxxxxxxxtxxxxxxxtxttttttxxxxtxtxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxt
  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxtxxxxxxxxxxtxxxxxxxxxxxxxxxxtxxxtxxxxxxxtxxxxxxxxxxxxx
  xttxxxxxxxxxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxx
  xxxxxxxxxtxxxxtxxxxtxxxxxtxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxxxxxxxxxtxxxxxxxxtxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx);
my $re = qr/(?​:.{0,1000}){0,1000}tttttt/s; # hangs
#my $re = qr/(?{ 1 while 1 })/s; # segfaults
my $res;
eval {
  local $SIG{ALRM} = sub { die "got impatient" };
  alarm(10);
  $res = $s =~ $re;
  alarm(0);
};
$res = $@​ ? 'failed' : int $res;
print "result​: $res\n";



Flags​:
  category=core
  severity=low


Site configuration information for perl 5.10.0​:

Configured by Debian Project at Thu Oct 1 22​:38​:45 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration​:
  Platform​:
  osname=linux, osvers=2.6.24-23-server, archname=i486-linux-gnu-thread-multi
  uname='linux vernadsky 2.6.24-23-server #1 smp wed apr 1 22​:22​:14 utc 2009 i686 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.0 -Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='4.4.1', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib /usr/lib64
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0
  gnulibc_version='2.10.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib'

Locally applied patches​:
 


@​INC for perl 5.10.0​:
  /etc/perl
  /usr/local/lib/perl/5.10.0
  /usr/local/share/perl/5.10.0
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.10
  /usr/share/perl/5.10
  /usr/local/lib/site_perl
  .


Environment for perl 5.10.0​:
  HOME=/home/talby
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/talby/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/bin​:/usr/games​:/home/talby/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 13, 2010

From @schwern

talby@​trap.mtview.ca.us (via RT) wrote​:

Signal handling appears to be delayed during the processing of some
regular expressions for what can be long periods of time. I have been
unable to find any way to break out of regex processing that does not
segfault perl or corrupt the stack (either immediately or during the
next regex match attempt).

I originally discovered my problem in perl v5.8.8, but have reproduced
it with v5.10.0 and v5.10.1.

I have a sample which seems like it should prevent the regex from
spending more than roughly 10 seconds before giving up and moving on,
but it takes significantly longer for me.

Thank you for your report. This is likely a consequence of "safe signals"
which were introduced in 5.8. Signals are only handled between op codes, and
a regex is a single op. The behavior you're seeing is a known consequence.
Sorry for the inconvenience.

Safe signals can be disabled by setting the PERL_SIGNALS environment variable
to "unsafe", but, as you may have guessed, this is not entirely safe. If the
signal handler does any memory allocation you risk a segfault. Unfortunately,
the signal handling policy must be set at the start of the program so it
cannot be selectively turned on an off. If all you're doing is trapping an
alarm you should be ok.

Please read
http​://perldoc.perl.org/perlipc.html#Deferred-Signals-%28Safe-Signals%29 for
more information.

--
Reality is that which, when you stop believing in it, doesn't go away.
  -- Phillip K. Dick

@p5pRT
Copy link
Author

p5pRT commented Mar 13, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 14, 2010

From talby@trap.mtview.ca.us

Your explanation fits everything I've observed. Thanks very much for
the additional insight! The workaround will be difficult to employ in
my application, but it's a foothold I think I can work with.

Michael G Schwern via RT wrote​:

Thank you for your report. This is likely a consequence of "safe signals"
which were introduced in 5.8. Signals are only handled between op codes, and
a regex is a single op. The behavior you're seeing is a known consequence.
Sorry for the inconvenience.

@p5pRT
Copy link
Author

p5pRT commented Dec 12, 2010

From @cpansprout

On Wed Mar 10 02​:35​:58 2010, nicholas wrote​:

On Tue, Mar 09, 2010 at 07​:30​:54AM -0800, Nicholas Clark wrote​:

My assumption is that "forever" regexps are only slow because they
backtrack
repeatedly - the actual matching is not. Hence, I think that the
best fix is
actually pretty small​:

diff --git a/regexec.c b/regexec.c
index 17a0dc6..bca2e65 100644
--- a/regexec.c
+++ b/regexec.c
@​@​ -5521,6 +5521,7 @​@​ no_silent​:
yes_state = st->u.yes.prev_yes_state;

 state\_num = st\->resume\_state \+ 1; /\* failure = success \+ 1 \*/

+ PERL_ASYNC_CHECK();

Ben Morrow (correctly) observes that this would mean that signal
handlers
now couldn't use regexp.

But the regexp engine has been made reëntrant now, by commit 9133212.

Does that mean your patch will work now, or does it still need a bit of
work?

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2012

From @doy

Nicholas, any thoughts on this? It seems pretty straightforward, but I'm
not familiar with the regex code at all.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2012

From @nwc10

On Mon, Jul 02, 2012 at 05​:13​:28PM -0700, Jesse Luehrs via RT wrote​:

Nicholas, any thoughts on this? It seems pretty straightforward, but I'm
not familiar with the regex code at all.

I'm not sufficiently familiar with what *really* needs saving in the regex
engine to get this right. I've screwed things up before there.

Also, not sure if the advice about North Cyprus is still valid, for practical
purposes​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=73464#txn-667128

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2016

From @demerphq

On 3 July 2012 at 15​:25, Nicholas Clark <nick@​ccl4.org> wrote​:

On Mon, Jul 02, 2012 at 05​:13​:28PM -0700, Jesse Luehrs via RT wrote​:

Nicholas, any thoughts on this? It seems pretty straightforward, but I'm
not familiar with the regex code at all.

I'm not sufficiently familiar with what *really* needs saving in the regex
engine to get this right. I've screwed things up before there.

Also, not sure if the advice about North Cyprus is still valid, for practical
purposes​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=73464#txn-667128

As far as I know we are properly re-entrant now, so I think I will
apply this patch assuming it passes tests. I played around with trying
to trigger a problem by using the regex engine from a signal handler
and I couldnt get anything to fail.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2016

From @dcollinsn

On Tue Mar 15 12​:56​:55 2016, demerphq wrote​:

As far as I know we are properly re-entrant now, so I think I will
apply this patch assuming it passes tests. I played around with trying
to trigger a problem by using the regex engine from a signal handler
and I couldnt get anything to fail.

Yves

It appears that Yves applied this patch in 8df928d. Is there any more to be done?

--
Respectfully,
Dan Collins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants