Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(^|something) to left of \G breaks zero-length-match rule, causing looping #11640

Open
p5pRT opened this issue Sep 8, 2011 · 9 comments
Open

Comments

@p5pRT
Copy link

p5pRT commented Sep 8, 2011

Migrated from rt.perl.org#98716 (status was 'open')

Searchable as RT98716$

@p5pRT
Copy link
Author

p5pRT commented Sep 8, 2011

From @jimav

This is a bug report for perl from james_avera@​yahoo.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.


A compound expression like (^|.) before \G seems to break Perl's
zero-length-match mechanism, allowing the RE to match forever
without advancing.

As stated in the perlre man page, 'contents to the left of "\G" is not
counted when determining the length of the match'. The example in
the man page is​:

  $_ = 'ABC';
  pos($_) = 1;
  while (/.\G/g) {
  print $&;
  }

which prints 'A' and then terminates because it considers the match
to be zero-width, and thus will not match at the same position twice in
a row.

However, if the "." is replaced by "(.|^)" then the example loops forever.

In summary​:

  /(.)\G/g fails on the second try (won't match zero-width twice)
  /(^)\G/g fails on the first try (because pos==1)
  /(^|.)\G/g succeeds forever (this may be a bug)



Flags​:
  category=core
  severity=medium


Site configuration information for perl 5.10.1​:

Configured by Debian Project at Fri Apr 22 19​:20​:15 UTC 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration​:

  Platform​:
  osname=linux, osvers=2.6.24-28-server,
archname=x86_64-linux-gnu-thread-multi
  uname='linux allspice 2.6.24-28-server #1 smp wed aug 18 21​:17​:51
utc 2010 x86_64 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.10.1
-Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio
-Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib
-Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.4.5', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.12.1.so, so=so, useshrplib=true,
libperl=libperl.so.5.10.1
  gnulibc_version='2.12.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib
-fstack-protector'

Locally applied patches​:
  DEBPKG​:debian/arm_thread_stress_timeout -
http​://bugs.debian.org/501970 Raise the timeout of
ext/threads/shared/t/stress.t to accommodate slower build hosts
  DEBPKG​:debian/cpan_config_path - Set location of CPAN​::Config to
/etc/perl as /usr may not be writable.
  DEBPKG​:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS
default for modules installed from CPAN.
  DEBPKG​:debian/db_file_ver - http​://bugs.debian.org/340047 Remove
overly restrictive DB_File version check.
  DEBPKG​:debian/doc_info - Replace generic man(1) instructions with
Debian-specific information.
  DEBPKG​:debian/enc2xs_inc - http​://bugs.debian.org/290336 Tweak
enc2xs to follow symlinks and ignore missing @​INC directories.
  DEBPKG​:debian/errno_ver - http​://bugs.debian.org/343351 Remove
Errno version check due to upgrade problems with long-running processes.
  DEBPKG​:debian/extutils_hacks - Various debian-specific ExtUtils changes
  DEBPKG​:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the
binary targets.
  DEBPKG​:debian/instmodsh_doc - Debian policy doesn't install
.packlist files for core or vendor.
  DEBPKG​:debian/ld_run_path - Remove standard libs from LD_RUN_PATH
as per Debian policy.
  DEBPKG​:debian/libnet_config_path - Set location of libnet.cfg to
/etc/perl/Net as /usr may not be writable.
  DEBPKG​:debian/m68k_thread_stress - http​://bugs.debian.org/495826
Disable some threads tests on m68k for now due to missing TLS.
  DEBPKG​:debian/mod_paths - Tweak @​INC ordering for Debian
  DEBPKG​:debian/module_build_man_extensions -
http​://bugs.debian.org/479460 Adjust Module​::Build manual page
extensions for the Debian Perl policy
  DEBPKG​:debian/perl_synopsis - http​://bugs.debian.org/278323
Rearrange perl.pod
  DEBPKG​:debian/prune_libs - http​://bugs.debian.org/128355 Prune the
list of libraries wanted to what we actually need.
  DEBPKG​:debian/use_gdbm - Explicitly link against -lgdbm_compat in
ODBM_File/NDBM_File.
  DEBPKG​:fixes/assorted_docs - http​://bugs.debian.org/443733
[384f06a] Math​::BigInt​::CalcEmu documentation grammar fix
  DEBPKG​:fixes/net_smtp_docs - http​://bugs.debian.org/100195
[rt.cpan.org #36038] Document the Net​::SMTP 'Port' option
  DEBPKG​:fixes/processPL - http​://bugs.debian.org/357264 [rt.cpan.org
#17224] Always use PERLRUNINST when building perl modules.
  DEBPKG​:debian/perlivp - http​://bugs.debian.org/510895 Make perlivp
skip include directories in /usr/local
  DEBPKG​:fixes/pod2man-index-backslash -
http​://bugs.debian.org/521256 Escape backslashes in .IX entries
  DEBPKG​:debian/disable-zlib-bundling - Disable zlib bundling in
Compress​::Raw​::Zlib
  DEBPKG​:fixes/kfreebsd_cppsymbols - http​://bugs.debian.org/533098
[3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD.
  DEBPKG​:debian/cpanplus_definstalldirs -
http​://bugs.debian.org/533707 Configure CPANPLUS to use the site
directories by default.
  DEBPKG​:debian/cpanplus_config_path - Save local versions of
CPANPLUS​::Config​::System into /etc/perl.
  DEBPKG​:fixes/kfreebsd-filecopy-pipes -
http​://bugs.debian.org/537555 [16f708c] Fix File​::Copy​::copy with pipes
on GNU/kFreeBSD
  DEBPKG​:fixes/anon-tmpfile-dir - http​://bugs.debian.org/528544 [perl
#66452] Honor TMPDIR when open()ing an anonymous temporary file
  DEBPKG​:fixes/abstract-sockets - http​://bugs.debian.org/329291
[89904c0] Add support for Abstract namespace sockets.
  DEBPKG​:fixes/hurd_cppsymbols - http​://bugs.debian.org/544307
[eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
  DEBPKG​:fixes/autodie-flock - http​://bugs.debian.org/543731 Allow
for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
  DEBPKG​:fixes/archive-tar-instance-error -
http​://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive​::Tar
instance error strings from each other
  DEBPKG​:fixes/positive-gpos - http​://bugs.debian.org/545234 [perl
#69056] [c584a96] Fix \\G crash on first match
  DEBPKG​:debian/devel-ppport-ia64-optim -
http​://bugs.debian.org/548943 Work around an ICE on ia64
  DEBPKG​:fixes/trie-logic-match - http​://bugs.debian.org/552291 [perl
#69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
  DEBPKG​:fixes/hppa-thread-eagain - http​://bugs.debian.org/554218
make the threads-shared test suite more robust, fixing failures on hppa
  DEBPKG​:fixes/crash-on-undefined-destroy -
http​://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer
dereference when looking for a DESTROY method
  DEBPKG​:fixes/tainted-errno - http​://bugs.debian.org/574129 [perl
#61976] [be1cf43] fix an errno stringification bug in taint mode
  DEBPKG​:patchlevel - http​://bugs.debian.org/567489 List packaged
patches for 5.10.1-12 in patchlevel.h


@​INC for perl 5.10.1​:
  /home/jima/local/lib/perl5/x86_64-linux-gnu-thread-multi
  /home/jima/local/lib/perl5
  /etc/perl
  /usr/local/lib/perl/5.10.1
  /usr/local/share/perl/5.10.1
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.10
  /usr/share/perl/5.10
  /usr/local/lib/site_perl
  .


Environment for perl 5.10.1​:
  HOME=/home/jima
  LANG=en_US.utf8
  LANGUAGE (unset)
  LD_LIBRARY_PATH=/home/jima/local/lib
  LOGDIR (unset)
 
PATH=/home/jima/bin​:/home/jima/local/bin​:/home/jima/jima_tools/linux86_64/bin​:/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/bin/X11​:/usr/local/bin​:/usr/local/sbin​:/usr/games​:.
  PERL5LIB=/home/jima/local/lib/perl5
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 1, 2017

From @jkeenan

On Thu, 08 Sep 2011 16​:57​:01 GMT, jimav wrote​:

This is a bug report for perl from james_avera@​yahoo.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
A compound expression like (^|.) before \G seems to break Perl's
zero-length-match mechanism, allowing the RE to match forever
without advancing.

As stated in the perlre man page, 'contents to the left of "\G" is not
counted when determining the length of the match'. The example in
the man page is​:

$_ = 'ABC';
pos($_) = 1;
while (/.\G/g) {
print $&;
}

which prints 'A' and then terminates because it considers the match
to be zero-width, and thus will not match at the same position twice
in
a row.

However, if the "." is replaced by "(.|^)" then the example loops
forever.

In summary​:

/(.)\G/g fails on the second try (won't match zero-width twice)
/(^)\G/g fails on the first try (because pos==1)
/(^|.)\G/g succeeds forever (this may be a bug)

Still present in 5.24.0.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jan 1, 2017

The RT System itself - Status changed from 'new' to 'open'

@khwilliamson
Copy link
Contributor

The example in perlre now loops forever

@khwilliamson
Copy link
Contributor

Thus this is a 5.36 blocker

@demerphq
Copy link
Collaborator

demerphq commented Apr 9, 2022

I can take a look tomorrow maybe.

@demerphq
Copy link
Collaborator

demerphq commented Apr 9, 2022

Did I do this wrong?

$ cat t.pl
$_ = 'ABC';
pos($_) = 1;
while (/.\G/g) {
    print $&, "\n";
}
$ ./perl t.pl
A

That isn't the right output, but it doesnt loop forever.

@demerphq
Copy link
Collaborator

demerphq commented Apr 9, 2022

Oh, wait that is the right output, but the (.|^) case still loops forever.

@demerphq
Copy link
Collaborator

@khwilliamson when I execute the example from perlre.pod at line 1119 (modified to add a newline when it prints) I do not see an infinite loop:

$ cat ~/scratch/t_11640.pl
my $string = 'ABC';
pos($string) = 1;
while ($string =~ /(.\G)/g) {
    print $1,"\n";
}

$ ./perl -Ilib ~/scratch/t_11640.pl
A

Also, the docs in perlop.pod at line 2131 say this:

Note also
that, currently, C<\G> is only properly supported when anchored at the
very beginning of the pattern.

so I do not think this ticket should be a blocker. Note most of the docs for /g and \G are actually in perlop.pod.

It is a bit weird that perlre.pod has an example that is specifically NOT supported by the docs in perlop.pod. I am not sure why so much of the documentation of the /g mode lives in perlop.pod and not in perlre.pod.

It seems to me that if we don't expect \G at anywhere other than the start of a pattern to work properly then maybe we should forbid it being used at any other position and remove this example. (Probably not a good idea for 5.36)

Also I did find the cause of this fwiw, regcomp.c line 6549:

        else if (OP(scan) == GPOS) {
            if (!(RExC_rx->intflags & PREGf_GPOS_FLOAT) &&
                !(delta || is_inf || (data && data->pos_delta)))
            {
                if (!(RExC_rx->intflags & PREGf_ANCH) && (flags & SCF_DO_SUBSTR))
                    RExC_rx->intflags |= PREGf_ANCH_GPOS;
                if (RExC_rx->gofs < (STRLEN)min)
                    RExC_rx->gofs = min;
            } else {
                RExC_rx->intflags |= PREGf_GPOS_FLOAT;
                RExC_rx->gofs = 0;
            }
        }

Basically it treats any variable length prefix to \G to be equivalent to an infinitely long prefix like \w+ or \w*. Arguably this is wrong, but I also am a bit doubtful we can get this fixed in time for 5.36. I have added looking into it to my todo list. If I change the clause that says !(delta || is_inf || (data && data->pos_delta)) to say !is_inf and change the logic related to min to use delta+min instead then the the example of `/(.|^)\G/g does not infinite loop, but I have a feeling it will break other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants