Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex bug: Getting two consecutive global matches to match with zero length #5773

Closed
p5pRT opened this issue Jul 23, 2002 · 5 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Jul 23, 2002

Migrated from rt.perl.org#15431 (status was 'resolved')

Searchable as RT15431$

@p5pRT
Copy link
Author

p5pRT commented Jul 23, 2002

From luisivan.tubert-brohman@yale.edu

This is a bug report for perl from luisivan.tubert-brohman@​yale.edu,
generated with the help of perlbug 1.26 running under perl 5.00503.

Regex bug​: Getting two consecutive global matches to match with zero length

Global matches can match zero characters, leaving "pos" unchanged.
However, I have troble getting two succesful zero-character matches
in a row.

I have the following test program​:

  $s = 'abcde';
  print "a?​: ";
  print $s =~ /\G(a?)/g ? "Matched '$1'\n" : "Failed!\n";
  print "b?​: ";
  print $s =~ /\G(b?)/g ? "Matched '$1'\n" : "Failed!\n";
  print "c?​: ";
  print $s =~ /\G(c?)/g ? "Matched '$1'\n" : "Failed!\n";

which gives the following output, as expected​:

  a?​: Matched 'a'
  b?​: Matched 'b'
  c?​: Matched 'c'

Using $s = 'bcde' gives the expected result again​:

  a?​: Matched ''
  b?​: Matched 'b'
  c?​: Matched 'c'

Same for $s = 'acde'​:

  a?​: Matched 'a'
  b?​: Matched ''
  c?​: Matched 'c'

But when $s = 'cde' doesn't do what I expect​:

  a?​: Matched ''
  b?​: Failed!
  c?​: Matched 'c'

The same kind of problem happens with $s = 'ade'​:

  a?​: Matched 'a'
  b?​: Matched ''
  c?​: Failed!

Running the $s = 'cde' case with "use re 'debug'" gives the following​:
(...)
  Compiling REx `\G(b?)'
  size 10 first at 2
  1​: GPOS(2)
  2​: OPEN1(4)
  4​: CURLY {0,1}(8)
  6​: EXACT <b>(0)
  8​: CLOSE1(10)
  10​: END(0)
  anchored(GPOS) GPOS minlen 0
(...)
  Matching REx `\G(a?)' against `cde'
  Setting an EVAL scope, savestack=3
  0 <> <cde> | 1​: GPOS
  0 <> <cde> | 2​: OPEN1
  0 <> <cde> | 4​: CURLY {0,1}
  EXACT <a> can match 0 times out of 1...
  Setting an EVAL scope, savestack=3
  0 <> <cde> | 8​: CLOSE1
  0 <> <cde> | 10​: END
  Match successful!
  Matching REx `\G(b?)' against `cde'
  Setting an EVAL scope, savestack=3
  0 <> <cde> | 1​: GPOS
  0 <> <cde> | 2​: OPEN1
  0 <> <cde> | 4​: CURLY {0,1}
  EXACT <b> can match 0 times out of 1...
  Setting an EVAL scope, savestack=3
  0 <> <cde> | 8​: CLOSE1
  0 <> <cde> | 10​: END
  Match possible, but length=0 is smaller than requested=1, failing!
  failed...
  Match failed
(...)

The fist regular expression matches zero characters as expected. The second
regular expression matches zero characters, but then complains that the
length is smaller than requested=1. I don't think anyone requested that length,
and "minlen" was zero when the regex was compiled!

The same thing happens with perl 5.0, 5.6, and 5.8.


Site configuration information for perl 5.00503​:

Configured by root at Mon Aug 30 23​:08​:56 EDT 1999.

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration​:
  Platform​:
  osname=linux, osvers=2.2.5-22smp, archname=i386-linux
  uname='linux porky.devel.redhat.com 2.2.5-22smp #1 smp wed jun 2 09​:11​:51 edt 1999 i686 unknown '
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef useperlio=undef d_sfio=undef
  Compiler​:
  cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
  cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
  ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
  stdchar='char', d_stdstdio=undef, usevfork=false
  intsize=4, longsize=4, ptrsize=4, doublesize=8
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -ldl -lm -lc -lposix -lcrypt
  libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
  cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:
 


@​INC for perl 5.00503​:
  /home/ivan/perl/i386-linux
  /home/ivan/perl
  /home/ivan/perl/lib/site_perl/5.005/i386-linux
  /home/ivan/perl/lib/site_perl/5.005
  /usr/lib/perl5/5.00503/i386-linux
  /usr/lib/perl5/5.00503
  /usr/lib/perl5/site_perl/5.005/i386-linux
  /usr/lib/perl5/site_perl/5.005
  .


Environment for perl 5.00503​:
  HOME=/home/ivan
  LANG=en_US
  LANGUAGE (unset)
  LC_ALL=en_US
  LD_LIBRARY_PATH=/usr/local/g98/bsd​:/usr/local/g98
  LOGDIR (unset)
  PATH=/usr/bin​:/usr/bin​:/bin​:/home/ivan/bin​:/usr/X11R6/bin​:/usr/local/g98/bsd​:/usr/local/g98​:/home/ivan/bin
  PERL5LIB=/home/ivan/perl​:/home/ivan/perl/lib/site_perl/5.005
  PERL_BADLANG (unset)
  SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Jul 23, 2002

From @tamias

On Tue, Jul 23, 2002 at 11​:10​:03PM -0000, Ivan Tubert wrote​:

# New Ticket Created by Ivan Tubert
# Please include the string​: [perl #15431]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt2/Ticket/Display.html?id=15431 >

This is a bug report for perl from luisivan.tubert-brohman@​yale.edu,
generated with the help of perlbug 1.26 running under perl 5.00503.

Regex bug​: Getting two consecutive global matches to match with zero length

Global matches can match zero characters, leaving "pos" unchanged.
However, I have troble getting two succesful zero-character matches
in a row.

This is by design, and as documented in perlre. In order to prevent
infinite loops when matching zero-length expressions, the match after a
zero-length match is prohibited from also being a zero-length match.

For example​:

s/\w??/<$&>/g;

would match zero characters at the start of the string over and over.

This endless loop would not occur in your test program, since you only
apply each regex once, but the regex engine is not smart enough to make
that distinction.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2002

From ivan@ramana.chem.yale.edu

This is by design, and as documented in perlre.

Thank you very much! I'm sorry I didn't notice that part of the
documentation. I looked, but not hard enough, it seems. I should have
suspected that something like that could only happen by design.

Ivan

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2002

From @hvds

Ronald J Kimball <rjk@​linguist.Thayer.dartmouth.edu> wrote​:
:> However, I have troble getting two succesful zero-character matches
:> in a row.
:
:This is by design, and as documented in perlre. In order to prevent
:infinite loops when matching zero-length expressions, the match after a
:zero-length match is prohibited from also being a zero-length match.

Note that the manner in which the information "I've already matched
a zero-width at this position" is stored means that it is lost on
assignment. This is probably a bug, and may even get fixed one day
(though it is difficult), but in the meantime you can probably work
around the feature by introducing something like C< pos($s) = pos($s) >
between matches.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2002

@hvds - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant