Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no protection against potentially infinite quantification on zero-width assertion #1806

Open
p6rt opened this issue Jun 7, 2010 · 21 comments
Labels
LTA Less Than Awesome; typically an error message that could be better regex Regular expressions, pattern matching, user-defined grammars, tokens and rules

Comments

@p6rt
Copy link

p6rt commented Jun 7, 2010

Migrated from rt.perl.org#75586 (status was 'open')

Searchable as RT75586$

@p6rt
Copy link
Author

p6rt commented Mar 22, 2009

From @mathw

Run this​:

perl6 -e "my $x = 'wibble'; $x.subst(/<ws>+ $$/, '');"

Watch your processor usage go through the roof. It doesn't seem to
terminate (as far as I can tell without solving the halting problem),
but memory usage remains stable.

@p6rt
Copy link
Author

p6rt commented Mar 23, 2009

From @mathw

Turns out that the loop is actually caused by the <ws>+, not the subst,
as I had the same inside an ordinary match. jnthn tells me that the + is
a bit redundant, but it shouldn't cause a loop.

<ws>* also succumbs.

@p6rt
Copy link
Author

p6rt commented Jun 7, 2010

From @cognominal

To reproduce :

say 'a' ~~ m/''*/

and wait until the end of times.

--
cognominal stef

@p6rt
Copy link
Author

p6rt commented Jun 9, 2010

From @bbkr

[16​:25] <bbkr> rakudo​: grammar X { token TOP { <ws>+ } }; X.parse(" "); # why
it goes into infinite loop (despite the fact that + is not necessary) ?
[16​:25] <p6eval> rakudo a54677​: ( no output )
[16​:25] <bbkr> known bug?
[16​:26] <masak> haven't seen it before, no.
[16​:26] <jnthn> Me either.
[16​:26] * bbkr reports
[16​:26] <jnthn> I guess since <ws> can match nothing, it can also match
nothing a lot of times. ;-)
[16​:27] <masak> that might explain it.
[16​:27] <masak> so maybe it's expected behavior.
[16​:27] <masak> maybe even something that merits a warning from the compiler.
[16​:27] <masak> if someone does <ws>+ with the standard <ws> rule, they get
a "you probably don't mean that" warning.
[16​:27] <bbkr> rakudo​: grammar X { token TOP { ""+ } }; X.parse(" "); # looks
like jnthn i right
[16​:28] <p6eval> rakudo a54677​: ( no output )
[16​:28] <-- jaldhar_ has left this server (Remote host closed the connection).
[16​:28] <masak> rakudo​: " " ~~ / [ '' ]+ /; say 'alive'
[16​:28] <p6eval> rakudo a54677​: ( no output )
[16​:28] <jnthn> Ah, *any* zero-width match seems to exhibit it.
[16​:29] <jnthn> That probably is a bug.
[16​:29] <masak> no, it's not.
[16​:29] <masak> think about it. it's expected behavior.
[16​:29] <jnthn> rakudo​: " " ~~ / [ x? ]+ /; say 'alive'
[16​:29] <masak> regexes are *expected* to do this.
[16​:29] <p6eval> rakudo a54677​: ( no output )
[16​:29] <masak> I have a plan for letting GGE detect these things, though.
[16​:29] <masak> it tends to bite people.
[16​:30] <masak> I also seem to recall that with a Thompson engine, you can't
even think that kind of wrong thought. ;)
[16​:30] <jnthn> masak​: It's inconsistent with Perl 5.
[16​:30] <jnthn> C​:\>perl -e "' ' =~ /(x?)+/; print 42;"
[16​:30] <jnthn> 42
[16​:30] <masak> oh?
[16​:30] <masak> then it probably merits pmichaud's and TimToady's attention.

@p6rt
Copy link
Author

p6rt commented Jun 9, 2010

From @bbkr

another part of discussion

[16​:36] <jnthn> mathw​: I'm trying to work out why it might not be so
simple as "check if we advanced by any characters"
[16​:37] <mathw> jnthn​: it'd be nice if it is that simple
[16​:39] <jnthn> rakudo​: for 1..5 { my $x = 0; " " ~~ /(<?{ $x++; rand()

0.5 }> x?)+/; say $x; }
[16​:39] <p6eval> rakudo a54677​: OUTPUT«===SORRY!===â�¤Unsupported use of
rand(); in Perl 6 please use rand at line 11, near "() > 0.5 }"�»
[16​:40] <jnthn> rakudo​: for 1..5 { my $x = 0; " " ~~ /(<?{ $x++; rand >
0.5 }> x?)+/; say $x; }
[16​:40] <p6eval> rakudo a54677​: OUTPUT«2â�¤2â�¤2â�¤5â�¤4â�¤Â»
[16​:40] <jnthn> Well, we'd break that. ;-)
[16​:40] <mathw> oh boo
[16​:40] <mathw> I'm conflicted
[16​:40] <mathw> on the one hand is the side of me which says "people who
rely on that kind of trick can figure out another way"
[16​:41] <mathw> it also says "we shouldn't make it trivial to introduce
infinite loops into regexps via a common mistake"
[16​:41] <mathw> however the other side of me says "Perl gives you enough
rope to hang yourself"
[16​:41] <mathw> but I can imagine being in here for years answering
about why <ws>+ causes infinite loops
[16​:44] <bbkr> <ws>+ causing infinite loop is not DWIM-style, i think
most people understan it as "\s+". beside​: i cannot imagine where this
"bug" can be used for purpose

@p6rt
Copy link
Author

p6rt commented Jul 27, 2010

@p6rt
Copy link
Author

p6rt commented May 29, 2012

From @diakopter

On Wed Jun 09 07​:49​:39 2010, bbkr wrote​:

another part of discussion

[16​:36] <jnthn> mathw​: I'm trying to work out why it might not be so
simple as "check if we advanced by any characters"
[16​:37] <mathw> jnthn​: it'd be nice if it is that simple

14​:21 < diakopter> (that's what I did in the javascript peg for MGrammar -
check if it advanced; seemed to work great)

@p6rt
Copy link
Author

p6rt commented May 29, 2012

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Mar 27, 2015

From @drforr

OS​: Ubuntu 14.04 LTS on VirtualBox
Host​: Windows 8, dual Core i5

Rakudo version​: current as of 3/25/2015

This edge case invokes the OOM killer on my test machine. It requires at
least one level of nesting, and is probably analogous to the exponential
backtracking in the '(a(a(a(...)*)*)*)' regular expression, although
that expression will terminate. I don't think this one does :) It is a
fairly extreme edge case, but if I did this by accident I'm sure someone
else will. It also feels like something not resetting pos() after
backtracking, but I don't claim to know the new regex's internals.

The code is here​:
--cut here--
grammar Bug {
  token blank { \s* }
  token TOP { <blank>* }
}
Bug.parse('');
--cut here--

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From tomentiruran@gmail.com

This is a simple two-regex grammar. It is as if the �?� modifier is set on the wildcards, and matching nothing doesn�t stop the parser.
--
Andrew Buchanan
tomentiruran@​gmail.com <mailto​:tomentiruran@​gmail.com>

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From tomentiruran@gmail.com

use v6;

# This Grammar goes into an infinite loop
grammar G {
  regex Body { <Text>+ }
  # With this, only one character is chosen at a time, except when there is a \v
  regex Text { \V*​: % \v+ }
  # With this, the input is found, and then it goes into an infinite loop
  #regex Text { \V* }
  # This works
  # regex Text { \V+ }
}
class A {
  method FALLBACK($fn, $match) {
  say "<$fn> matched '$match'";
  }
}

my $g = G.new;
my $actions = A.new;
my $rule = 'Body';
say "Calling parse";
$g.parse("Line 1", :$rule, :$actions);

=finish
moar​::dcbrule=@​​:
moar​::ar=ar
moar​::mastdir=/Users/andrewbuchanan/.rakudobrew/moar-nom/install/share/nqp/lib/MAST
moar​::tomrule=$(AR) $(ARFLAGS) $@​ 3rdparty/libtommath/*.o
moar​::shaobjects=3rdparty/sha1/sha1.o
moar​::ccshared=
moar​::be=0
moar​::dynasmlua=./3rdparty/dynasm/dynasm.lua
moar​::ld=clang
moar​::mtlib=3rdparty/tinymt/libtinymt.a
moar​::mtobjects=3rdparty/tinymt/tinymt64.o
moar​::noreturnspecifier=
moar​::perl=perl
moar​::shaclean=$(RM) 3rdparty/sha1/libsha1.a 3rdparty/sha1/*.o
moar​::static_inline=static __inline__
moar​::ccdebugflags=-g3
moar​::auxclean=@​​:
moar​::thirdpartylibs=3rdparty/dyncall/dyncall/libdyncall_s.a 3rdparty/dyncall/dyncallback/libdyncallback_s.a 3rdparty/dyncall/dynload/libdynload_s.a 3rdparty/libatomic_ops/src/libatomic_ops.a 3rdparty/tinymt/libtinymt.a 3rdparty/sha1/libsha1.a 3rdparty/libtommath/libtommath.a 3rdparty/libuv/libuv.a
moar​::config=--optimize --prefix=/Users/andrewbuchanan/.rakudobrew/moar-nom/install --make-install
moar​::ccinc=-I
moar​::booltype=_Bool
moar​::ldout=-o
moar​::moarlib=libmoar.a
moar​::mtrule=$(AR) $(ARFLAGS) $@​ 3rdparty/tinymt/*.o
moar​::dcrule=cd 3rdparty/dyncall && ./configure && CC='$(CC)' CFLAGS='$(CFLAGS)' $(MAKE) -f Makefile
moar​::dcbobjects=
moar​::cflags=-fno-omit-frame-pointer -fno-optimize-sibling-calls -O3 -DNDEBUG -Wno-logical-op-parentheses -D_DARWIN_USE_64_BIT_INODE=1
moar​::laolib=3rdparty/libatomic_ops/src/libatomic_ops.a
moar​::libdir=/Users/andrewbuchanan/.rakudobrew/moar-nom/install/lib
moar​::nul=/dev/null
moar​::osvers=15.0
moar​::translate_newline_output=0
moar​::dlrule=@​​:
moar​::cat=cat
moar​::can_unaligned_int32=1
moar​::ldimp=
moar​::pkgconfig=/usr/bin/pkg-config
moar​::ccdef=-D
moar​::lib=lib%s.a
moar​::tomclean=$(RM) 3rdparty/libtommath/libtommath.a 3rdparty/libtommath/*.o
moar​::dlllocal=__attribute__ ((visibility ("hidden")))
moar​::dcbclean=$(RM) 3rdparty/dyncall/dyncallback/libdyncallback_s.a
moar​::noreturnattribute=__attribute__((noreturn))
moar​::ptr_size=8
moar​::dllimport=__attribute__ ((visibility ("default")))
moar​::ccout=-o
moar​::can_unaligned_num64=1
moar​::has_pthread_yield=0
moar​::ldrpath=-Wl,-rpath,"//Users/andrewbuchanan/.rakudobrew/moar-nom/install/lib" -Wl,-rpath,"/Users/andrewbuchanan/.rakudobrew/moar-nom/install/share/perl6/site/lib"
moar​::ldusr=-l%s
moar​::mainflags=-DMVM_SHARED
moar​::name=moar
moar​::dll=lib%s.dylib
moar​::cincludes= -I3rdparty/libuv/include -I3rdparty/libuv/src -I3rdparty/libatomic_ops/src -I3rdparty/libtommath -I3rdparty/dynasm -I3rdparty/dyncall/dynload -I3rdparty/dyncall/dyncall -I3rdparty/dyncall/dyncallback
moar​::uvrule=$(AR) $(ARFLAGS) $@​ $(UV_DARWIN)
moar​::tomlib=3rdparty/libtommath/libtommath.a
moar​::staticlib=
moar​::dlclean=$(RM) 3rdparty/dyncall/dynload/libdynload_s.a
moar​::make=make
moar​::ccinstflags=-fsanitize=address
moar​::ldflags= -O3 -DNDEBUG -Wl,-rpath,"//Users/andrewbuchanan/.rakudobrew/moar-nom/install/lib" -Wl,-rpath,"/Users/andrewbuchanan/.rakudobrew/moar-nom/install/share/perl6/site/lib"
moar​::crossconf=
moar​::obj=.o
moar​::usrlibs[0]=pthread
moar​::dlobjects=
moar​::defs[0]=_DARWIN_USE_64_BIT_INODE=1
moar​::cppout=>
moar​::ccoptiflags=-O3 -DNDEBUG
moar​::asmswitch=-S
moar​::ldoptiflags=-O3 -DNDEBUG
moar​::mkflags=
moar​::moar=libmoar.dylib
moar​::version=2016.09
moar​::dcclean=cd 3rdparty/dyncall && $(MAKE) -f Makefile clean
moar​::moarshared=-install_name "/Users/andrewbuchanan/.rakudobrew/moar-nom/install/lib/libmoar.dylib"
moar​::sharule=$(AR) $(ARFLAGS) $@​ 3rdparty/sha1/*.o
moar​::ccwarnflags=-Wno-logical-op-parentheses
moar​::jit=$(JIT_POSIX_X64)
moar​::moardll=libmoar.dylib
moar​::asm=.s
moar​::mknoisy=ifneq ($(NOISY), 1)MSG = @​echoCMD = @​NOOUT = > /dev/nullNOERR = 2> /dev/nullendif
moar​::dclib=3rdparty/dyncall/dyncall/libdyncall_s.a
moar​::ccdefflags=-D_DARWIN_USE_64_BIT_INODE=1
moar​::cc=clang
moar​::exe=
moar​::lddir=-L
moar​::uvlib=3rdparty/libuv/libuv.a
moar​::dllib=3rdparty/dyncall/dynload/libdynload_s.a
moar​::shaincludedir=3rdparty/sha1
moar​::versionmajor=2016
moar​::cppswitch=-E
moar​::can_unaligned_int64=1
moar​::lua=./3rdparty/dynasm/minilua
moar​::dcobjects=
moar​::cancgoto=1
moar​::ldlibs=-lpthread
moar​::mainlibs=-L. -lmoar
moar​::nativecall_backend=dyncall
moar​::objflags=-DMVM_BUILD_SHARED
moar​::versionpatch=0
moar​::laoclean=cd 3rdparty/libatomic_ops/src && $(MAKE) distclean
moar​::versionminor=09
moar​::ccmiscflags=-fno-omit-frame-pointer -fno-optimize-sibling-calls
moar​::install= $(MKPATH) $(DESTDIR)$(PREFIX)/include/libuv $(CP) 3rdparty/libuv/include/*.h $(DESTDIR)$(PREFIX)/include/libuv $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/armcc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/gcc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/hpc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/ibmc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/icc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/loadstore $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/msftc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/sunc $(CP) 3rdparty/libatomic_ops/src/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops $(CP) 3rdparty/libatomic_ops/src/atomic_ops/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/armcc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/armcc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/gcc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/gcc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/hpc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/hpc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/ibmc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/ibmc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/icc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/icc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/loadstore/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/loadstore $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/msftc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/msftc $(CP) 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/sunc/*.h $(DESTDIR)$(PREFIX)/include/libatomic_ops/atomic_ops/sysdeps/sunc $(MKPATH) $(DESTDIR)$(PREFIX)/include/libtommath $(CP) 3rdparty/libtommath/*.h $(DESTDIR)$(PREFIX)/include/libtommath $(MKPATH) $(DESTDIR)$(PREFIX)/include/dyncall $(CP) 3rdparty/dyncall/dynload/*.h $(DESTDIR)$(PREFIX)/include/dyncall $(CP) 3rdparty/dyncall/dyncall/*.h $(DESTDIR)$(PREFIX)/include/dyncall $(CP) 3rdparty/dyncall/dyncallback/*.h $(DESTDIR)$(PREFIX)/include/dyncall
moar​::laoobjects=
moar​::mtclean=$(RM) 3rdparty/tinymt/libtinymt.a 3rdparty/tinymt/*.o
moar​::rm=rm -f
moar​::uvclean=$(RM) 3rdparty/libuv/libuv.a $(UV_DARWIN)
moar​::sharedlib=libmoar.dylib
moar​::dllexport=__attribute__ ((visibility ("default")))
moar​::shalib=3rdparty/sha1/libsha1.a
moar​::arout=
moar​::lddebugflags=-g3
moar​::osname=darwin
moar​::prefix=/Users/andrewbuchanan/.rakudobrew/moar-nom/install
moar​::uvobjects=$(UV_DARWIN)
moar​::dcblib=3rdparty/dyncall/dyncallback/libdyncallback_s.a
moar​::bindir=/Users/andrewbuchanan/.rakudobrew/moar-nom/install/bin
moar​::asmout=-o
moar​::havebooltype=1
moar​::laorule=cd 3rdparty/libatomic_ops && CC='$(CC)' CFLAGS='$(CFLAGS)' ./configure && cd src && $(MAKE) && cd ..
moar​::os=darwin
moar​::sh=/bin/sh
moar​::arflags=rcs
moar​::ldshared=-dynamiclib
moar​::impinst=libmoar.dylib
moar​::ldinstflags=-fsanitize=address
moar​::ldmiscflags=
moar​::ldsys=-l%s
moar​::platform=$(PLATFORM_POSIX)
moar​::ccswitch=-c
moar​::formatattribute=__attribute__((format(X, Y, Z)))
moar​::tomobjects=3rdparty/libtommath/bn_error.o 3rdparty/libtommath/bn_fast_mp_invmod.o 3rdparty/libtommath/bn_fast_mp_montgomery_reduce.o 3rdparty/libtommath/bn_fast_s_mp_mul_digs.o 3rdparty/libtommath/bn_fast_s_mp_mul_high_digs.o 3rdparty/libtommath/bn_fast_s_mp_sqr.o 3rdparty/libtommath/bn_mp_2expt.o 3rdparty/libtommath/bn_mp_abs.o 3rdparty/libtommath/bn_mp_add.o 3rdparty/libtommath/bn_mp_add_d.o 3rdparty/libtommath/bn_mp_addmod.o 3rdparty/libtommath/bn_mp_and.o 3rdparty/libtommath/bn_mp_clamp.o 3rdparty/libtommath/bn_mp_clear.o 3rdparty/libtommath/bn_mp_clear_multi.o 3rdparty/libtommath/bn_mp_cmp.o 3rdparty/libtommath/bn_mp_cmp_d.o 3rdparty/libtommath/bn_mp_cmp_mag.o 3rdparty/libtommath/bn_mp_cnt_lsb.o 3rdparty/libtommath/bn_mp_copy.o 3rdparty/libtommath/bn_mp_count_bits.o 3rdparty/libtommath/bn_mp_div.o 3rdparty/libtommath/bn_mp_div_2.o 3rdparty/libtommath/bn_mp_div_2d.o 3rdparty/libtommath/bn_mp_div_3.o 3rdparty/libtommath/bn_mp_div_d.o 3rdparty/libtommath/bn_mp_dr_is_modulus.o 3rdparty/libtommath/bn_mp_dr_reduce.o 3rdparty/libtommath/bn_mp_dr_setup.o 3rdparty/libtommath/bn_mp_exch.o 3rdparty/libtommath/bn_mp_expt_d.o 3rdparty/libtommath/bn_mp_exptmod.o 3rdparty/libtommath/bn_mp_exptmod_fast.o 3rdparty/libtommath/bn_mp_exteuclid.o 3rdparty/libtommath/bn_mp_fread.o 3rdparty/libtommath/bn_mp_fwrite.o 3rdparty/libtommath/bn_mp_gcd.o 3rdparty/libtommath/bn_mp_get_int.o 3rdparty/libtommath/bn_mp_get_long.o 3rdparty/libtommath/bn_mp_grow.o 3rdparty/libtommath/bn_mp_init.o 3rdparty/libtommath/bn_mp_init_copy.o 3rdparty/libtommath/bn_mp_init_multi.o 3rdparty/libtommath/bn_mp_init_set.o 3rdparty/libtommath/bn_mp_init_set_int.o 3rdparty/libtommath/bn_mp_init_size.o 3rdparty/libtommath/bn_mp_invmod.o 3rdparty/libtommath/bn_mp_invmod_slow.o 3rdparty/libtommath/bn_mp_is_square.o 3rdparty/libtommath/bn_mp_jacobi.o 3rdparty/libtommath/bn_mp_karatsuba_mul.o 3rdparty/libtommath/bn_mp_karatsuba_sqr.o 3rdparty/libtommath/bn_mp_lcm.o 3rdparty/libtommath/bn_mp_lshd.o 3rdparty/libtommath/bn_mp_mod.o 3rdparty/libtommath/bn_mp_mod_2d.o 3rdparty/libtommath/bn_mp_mod_d.o 3rdparty/libtommath/bn_mp_montgomery_calc_normalization.o 3rdparty/libtommath/bn_mp_montgomery_reduce.o 3rdparty/libtommath/bn_mp_montgomery_setup.o 3rdparty/libtommath/bn_mp_mul.o 3rdparty/libtommath/bn_mp_mul_2.o 3rdparty/libtommath/bn_mp_mul_2d.o 3rdparty/libtommath/bn_mp_mul_d.o 3rdparty/libtommath/bn_mp_mulmod.o 3rdparty/libtommath/bn_mp_n_root.o 3rdparty/libtommath/bn_mp_neg.o 3rdparty/libtommath/bn_mp_or.o 3rdparty/libtommath/bn_mp_prime_fermat.o 3rdparty/libtommath/bn_mp_prime_is_divisible.o 3rdparty/libtommath/bn_mp_prime_is_prime.o 3rdparty/libtommath/bn_mp_prime_miller_rabin.o 3rdparty/libtommath/bn_mp_prime_next_prime.o 3rdparty/libtommath/bn_mp_prime_rabin_miller_trials.o 3rdparty/libtommath/bn_mp_prime_random_ex.o 3rdparty/libtommath/bn_mp_radix_size.o 3rdparty/libtommath/bn_mp_radix_smap.o 3rdparty/libtommath/bn_mp_rand.o 3rdparty/libtommath/bn_mp_read_radix.o 3rdparty/libtommath/bn_mp_read_signed_bin.o 3rdparty/libtommath/bn_mp_read_unsigned_bin.o 3rdparty/libtommath/bn_mp_reduce.o 3rdparty/libtommath/bn_mp_reduce_2k.o 3rdparty/libtommath/bn_mp_reduce_2k_l.o 3rdparty/libtommath/bn_mp_reduce_2k_setup.o 3rdparty/libtommath/bn_mp_reduce_2k_setup_l.o 3rdparty/libtommath/bn_mp_reduce_is_2k.o 3rdparty/libtommath/bn_mp_reduce_is_2k_l.o 3rdparty/libtommath/bn_mp_reduce_setup.o 3rdparty/libtommath/bn_mp_rshd.o 3rdparty/libtommath/bn_mp_set.o 3rdparty/libtommath/bn_mp_set_int.o 3rdparty/libtommath/bn_mp_set_long.o 3rdparty/libtommath/bn_mp_shrink.o 3rdparty/libtommath/bn_mp_signed_bin_size.o 3rdparty/libtommath/bn_mp_sqr.o 3rdparty/libtommath/bn_mp_sqrmod.o 3rdparty/libtommath/bn_mp_sqrt.o 3rdparty/libtommath/bn_mp_sub.o 3rdparty/libtommath/bn_mp_sub_d.o 3rdparty/libtommath/bn_mp_submod.o 3rdparty/libtommath/bn_mp_to_signed_bin.o 3rdparty/libtommath/bn_mp_to_signed_bin_n.o 3rdparty/libtommath/bn_mp_to_unsigned_bin.o 3rdparty/libtommath/bn_mp_to_unsigned_bin_n.o 3rdparty/libtommath/bn_mp_toom_mul.o 3rdparty/libtommath/bn_mp_toom_sqr.o 3rdparty/libtommath/bn_mp_toradix.o 3rdparty/libtommath/bn_mp_toradix_n.o 3rdparty/libtommath/bn_mp_unsigned_bin_size.o 3rdparty/libtommath/bn_mp_xor.o 3rdparty/libtommath/bn_mp_zero.o 3rdparty/libtommath/bn_prime_tab.o 3rdparty/libtommath/bn_reverse.o 3rdparty/libtommath/bn_s_mp_add.o 3rdparty/libtommath/bn_s_mp_exptmod.o 3rdparty/libtommath/bn_s_mp_mul_digs.o 3rdparty/libtommath/bn_s_mp_mul_high_digs.o 3rdparty/libtommath/bn_s_mp_sqr.o 3rdparty/libtommath/bn_s_mp_sub.o 3rdparty/libtommath/bncore.o
perl6​::language_version=6.c
perl6​::codename=
perl6​::release-number=
perl6​::build-date=2016-09-19T13​:43​:11Z
perl6​::version=2016.09-15-g4b1864b
perl6​::implementation=Rakudo

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From @zoffixznet

I'm not seeing the bug here, to be honest.

The `Body` is asking for one or more tokens `Text`, *nothing* is a valid match for those tokens, so after matching the provided text, your grammar continues to match nothing infinite number of times.

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From @zoffixznet

Here's a much shorter way to reproduce it​:

  perl6 -e '"foo" ~~ /(.*)+/' # hangs

While my previous explanation for why this occurs makes sense, it's worth noting this behaviour is not observed in Perl 5, for example​:

  perl -e '"foo" =~ /(.*)+/' # does not hang

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From @coke

On Tue Sep 20 06​:06​:59 2016, cpan@​zoffix.com wrote​:

Here's a much shorter way to reproduce it​:

perl6 -e '"foo" ~~ /(.*)+/' # hangs

While my previous explanation for why this occurs makes sense, it's
worth noting this behaviour is not observed in Perl 5, for example​:

perl -e '"foo" =~ /(.*)+/' # does not hang

This sounds like a dupe of #​75586

--
Will "Coke" Coleda

@p6rt
Copy link
Author

p6rt commented Sep 20, 2016

From tomentiruran@gmail.com

Is this also technically correct, even though it clearly shouldn't match?

perl6 -e '"foo" ~~ /(.*)+\​:/' # hangs

In either case, going into an infinite loop is not exactly DWIM.

On 20 Sep 2016, at 9​:12 PM, Will Coleda via RT <perl6-bugs-followup@​perl.org> wrote​:

On Tue Sep 20 06​:06​:59 2016, cpan@​zoffix.com wrote​:

Here's a much shorter way to reproduce it​:

perl6 -e '"foo" ~~ /(.*)+/' # hangs

While my previous explanation for why this occurs makes sense, it's
worth noting this behaviour is not observed in Perl 5, for example​:

perl -e '"foo" =~ /(.*)+/' # does not hang

This sounds like a dupe of #​75586

--
Will "Coke" Coleda

@p6rt
Copy link
Author

p6rt commented Sep 21, 2016

From @smls

Is this also technically correct, even though it clearly shouldn't
match?

perl6 -e '"foo" ~~ /(.*)+\​:/' # hangs

Looks right to me. You're asking for "Capture zero or more characters, as often as possible". So you get​:

1) three characters
2) zero characters
3) zero characters
4) zero characters
...and so on.

In either case, going into an infinite loop is not exactly DWIM.

Regexes are code, and when you write an infinite loop, it hangs - just like when you write an infinite loop in normal Perl 6 code.

Perl 5 has a special fallback mechanism to force-advance the regex cursor by one character when it sees that it has not moved for two iterations of the same quantifier, and that makes some things "DWIM", but also causes its fair share of confusion.

In Perl 6, where regex code and normal code can be intermixed easily and safely, Perl 5's special rule could mess up people's expectations for the execution order of embedded code blocks, and it's not even that crazy to imagine situations where you *want* an alternation to keep looping on the spot (e.g. until an embedded code block sets some condition to move one).

Not to mention that Perl 6 doubled down on the whole "regexes are code, not magic strings" principle.

So I'm not surprised that infinite loops behave exactly as they would in normal code, and the auto-advance mechanism from Perl 5 was not ported over.

@p6rt
Copy link
Author

p6rt commented Sep 21, 2016

From @smls

For the record the problems the auto-advance feature causes even in Perl 5 (where embedded code blocks are experimental and rarely used), are twofold​:

1) Writing an infinite loop can be indicative of a programmer mistake, and the auto-advance feature hides it by making the regex "do something" (which may or may not be what the programmer intended) instead of hanging (which would have caused the programmer to re-write the regex).

2) It can cause mysterious-seeming double captures. When there is a capture group in a quantified zero-width assertion, it will capture the same thing twice, because auto-advance kicks in when the cursor has not advanced after *two* iterations. Now, Perl 5 lets repeated captures of the same capture group overwrite each other, so this does not normally show up other than as a performance degradation (whereas in Perl 6 it would cause the same sub-matches to appear twice in $/). But it can show up in even Perl 5 with `while(m//g) { }` global matching. I remember a discussion on perlmonks a few years ago, where even experienced Perl hackers were tearing their hair out trying to figure out why such a while loop iterated over the same match twice.

@p6rt
Copy link
Author

p6rt commented Sep 22, 2016

From tomentiruran@gmail.com

Ok, that clarifies things. Now that I understand what is happening, it is straightforward to recognise and fix the problem. A sentence in the documentation might help other perl 5 transitioners from getting bitten, perhaps at the explanation of the * quantifier.

On 21 Sep 2016, at 6​:35 PM, Sam S. via RT <perl6-bugs-followup@​perl.org> wrote​:

For the record the problems the auto-advance feature causes even in Perl 5 (where embedded code blocks are experimental and rarely used), are twofold​:

1) Writing an infinite loop can be indicative of a programmer mistake, and the auto-advance feature hides it by making the regex "do something" (which may or may not be what the programmer intended) instead of hanging (which would have caused the programmer to re-write the regex).

2) It can cause mysterious-seeming double captures. When there is a capture group in a quantified zero-width assertion, it will capture the same thing twice, because auto-advance kicks in when the cursor has not advanced after *two* iterations. Now, Perl 5 lets repeated captures of the same capture group overwrite each other, so this does not normally show up other than as a performance degradation (whereas in Perl 6 it would cause the same sub-matches to appear twice in $/). But it can show up in even Perl 5 with `while(m//g) { }` global matching. I remember a discussion on perlmonks a few years ago, where even experienced Perl hackers were tearing their hair out trying to figure out why such a while loop iterated over the same match twice.

@p6rt
Copy link
Author

p6rt commented Oct 2, 2017

From @AlexDaniel

This reminds me of https://rt-archive.perl.org/perl6/Ticket/Display.html?id=132004

On 2015-03-27 15​:01​:38, drforr@​pobox.com wrote​:

OS​: Ubuntu 14.04 LTS on VirtualBox
Host​: Windows 8, dual Core i5

Rakudo version​: current as of 3/25/2015

This edge case invokes the OOM killer on my test machine. It requires at
least one level of nesting, and is probably analogous to the exponential
backtracking in the '(a(a(a(...)*)*)*)' regular expression, although
that expression will terminate. I don't think this one does :) It is a
fairly extreme edge case, but if I did this by accident I'm sure someone
else will. It also feels like something not resetting pos() after
backtracking, but I don't claim to know the new regex's internals.

The code is here​:
--cut here--
grammar Bug {
token blank { \s* }
token TOP { <blank>* }
}
Bug.parse('');
--cut here--

@p6rt
Copy link
Author

p6rt commented Oct 2, 2017

The RT System itself - Status changed from 'new' to 'open'

@p6rt p6rt added LTA Less Than Awesome; typically an error message that could be better regex Regular expressions, pattern matching, user-defined grammars, tokens and rules labels Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTA Less Than Awesome; typically an error message that could be better regex Regular expressions, pattern matching, user-defined grammars, tokens and rules
Projects
None yet
Development

No branches or pull requests

1 participant