Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal matching discrepancy found by pcre author #834

Closed
p5pRT opened this issue Nov 10, 1999 · 4 comments
Closed

minimal matching discrepancy found by pcre author #834

p5pRT opened this issue Nov 10, 1999 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 10, 1999

Migrated from rt.perl.org#1762 (status was 'resolved')

Searchable as RT1762$

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 1999

From @jhi

The author of the PCRE (Perl-Compatible Regular Expressions)
(ftp​://ftp.cus.cam.ac.uk/pub/software/programs/pcre/), Philip Hazel,
has noticed the following oddity in Perl (back in 5.005_02, but it
still seems to be present). Surely the below examples should work
similarly (the second one should print "bb")?

./perl -wle '"aba" =~ /^(a(b)?)+$/;print $2'
b

./perl -wle '"aabbaa" =~ /^(aa(bb)?)+$/;print $2'
Use of uninitialized value at -e line 1.

Perl Info


Site configuration information for perl 5.00562:

Configured by jhi at Wed Nov 10 01:04:17 EET 1999.

Summary of my perl5 (revision 5.0 version 5 subversion 62) configuration:
  Platform:
    osname=solaris, osvers=2.7, archname=sun4-solaris
    uname='sunos mimosa.hut.fi 5.7 generic_106541-05 sun4u sparc '
    config_args='-ders -Dcc=gcc'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
    use64bits=undef usemultiplicity=undef
  Compiler:
    cc='gcc', optimize='-O', gccversion=2.8.1
    cppflags='-DUSE_LARGE_FILES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccflags ='-DUSE_LARGE_FILES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    alignbytes=8, usemymalloc=y, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' '
    libpth=/lib /usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc -lcrypt -lsec
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G'

Locally applied patches:
    


@INC for perl 5.00562:
    lib
    /u/vieraat/vieraat/jhi/Perl/lib
    /opt/lib/perl5/5.00562/sun4-solaris
    /opt/lib/perl5/5.00562
    /opt/lib/site_perl/5.00562/sun4-solaris
    /opt/lib/site_perl
    .


Environment for perl 5.00562:
    HOME=/u/vieraat/vieraat/jhi
    LANG=C
    LANGUAGE (unset)
    LC_CTYPE=iso_8859_1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/u/vieraat/vieraat/jhi/.s:/p/bin:/p/adm/bin:/usr/bin:/usr/sbin:/sbin:/bin:/usr/ccs/bin:/usr/lib:/etc:/lib:/p/X6/bin:/usr/bin/X11:/usr/lib/acct:/usr/5bin:/u/vieraat/vieraat/jhi
    PERLLIB=/u/vieraat/vieraat/jhi/Perl/lib
    PERL_BADLANG (unset)
    SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 1999

From [Unknown Contact. See original ticket]

In message <199911101255.OAA16865@​mimosa.hut.fi>,
  Jarkko Hietaniemi writes​:

: ./perl -wle '"aabbaa" =~ /^(aa(bb)?)+$/;print $2'
: Use of uninitialized value at -e line 1.

That's because there's no bb at the end​:

  #! /usr/bin/perl -w

  use strict;

  print q{"aabbaa" =~ /^(aa(bb)?)+$/}, "​:\n";
  if ("aabbaa" =~ /^(aa(bb)?)+$/) {
  print "\$1 = `", defined $1 ? $1 : "<undef>", "'\n",
  "\$2 = `", defined $2 ? $2 : "<undef>", "'\n";
  }
  else {
  print "No match.\n";
  }

  print q{"aabbcc" =~ /^((?​:aa|cc)(bb)?)+$/}, "​:\n";
  if ("aabbcc" =~ /^((?​:aa|cc)(bb)?)+$/) {
  print "\$1 = `", defined $1 ? $1 : "<undef>", "'\n",
  "\$2 = `", defined $2 ? $2 : "<undef>", "'\n";
  }
  else {
  print "No match.\n";
  }
  [8​:57] ruby% ./try
  "aabbaa" =~ /^(aa(bb)?)+$/​:
  $1 = `aa'
  $2 = `<undef>'
  "aabbcc" =~ /^((?​:aa|cc)(bb)?)+$/​:
  $1 = `cc'
  $2 = `<undef>'

Greg

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 1999

From [Unknown Contact. See original ticket]

The author of the PCRE (Perl-Compatible Regular Expressions)
(ftp​://ftp.cus.cam.ac.uk/pub/software/programs/pcre/), Philip Hazel,
has noticed the following oddity in Perl (back in 5.005_02, but it
still seems to be present). Surely the below examples should work
similarly (the second one should print "bb")?

./perl -wle '"aba" =~ /^(a(b)?)+$/;print $2'
b

./perl -wle '"aabbaa" =~ /^(aa(bb)?)+$/;print $2'
Use of uninitialized value at -e line 1.

But not in 5.004_02, it prints
bb
in that case!

Good luck,
Vadim

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 1999

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi writes​:

The author of the PCRE (Perl-Compatible Regular Expressions)
(ftp​://ftp.cus.cam.ac.uk/pub/software/programs/pcre/), Philip Hazel,
has noticed the following oddity in Perl (back in 5.005_02, but it
still seems to be present).

All his remarks are correct and I more or less tracked them down to
the level of a handlful of C statements. *Fixing* them is a different
topic​: I might introduce one-shot fixups which will fix any one of
them, but it is the general architecture of REx engine which needs to
be fixed.

Say, I have no idea why REx engine works clean in so many cases. I
know no reason why multiple backtracking hacks there in C code lead to
more or less "intuitive" behaviour.

It looks like any bug report was leading to a fix or two, and the
total collection of these special cases more or less keeps water, but
not more. I have in mind several bulletproof implemenations of
backtracking, but they should be significantly slower than current
hacks. I'm still thinking about this.

Hope this helps,
Ilya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant