Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in perl 5.005_61 #699

Closed
p5pRT opened this issue Oct 11, 1999 · 30 comments
Closed

bug in perl 5.005_61 #699

p5pRT opened this issue Oct 11, 1999 · 30 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 11, 1999

Migrated from rt.perl.org#1601 (status was 'resolved')

Searchable as RT1601$

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 1999

From swiftkid@bigfoot.com

Created by swiftkid@bigfoot.com

Under Perl 5.005_61, this line produces an error​:

$email !~ /^[\w-.]+\@​[\w-]+\.[\w-.]+$/

Unmatched regexp [] ...

So I reinstalled Perl 5.005_3, and it works fine with it!

Something is wrong in the regexp lib.

Perl Info


Site configuration information for perl 5.00503:

Configured by faisal at Mon Oct 11 16:08:02 PKT 1999.

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.2.5-15, archname=i686-linux
    uname='linux localhost.localdomain 2.2.5-15 #1 mon apr 19 23:00:46 edt
1999 i686 unknown '
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux
(egcs-1.1.2 release)
    cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    stdchar='char', d_stdstdio=undef, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:



@INC for perl 5.00503:
    /usr/lib/perl5/5.00503/i686-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i686-linux
    /usr/lib/perl5/site_perl/5.005
    .


Environment for perl 5.00503:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/usr/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/faisal/bin:/
usr/local/bin/perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash

--
Faisal Nasim (the Whiz Kid)
Web: http://wss.hypermart.net/
FAX: (815) 846-2877

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 1999

From @RandalSchwartz

"Faisal" == Faisal Nasim <swiftkid@​bigfoot.com> writes​:

Faisal> Under Perl 5.005_61, this line produces an error​:

Faisal> $email !~ /^[\w-.]+\@​[\w-]+\.[\w-.]+$/

Not that your bug report isn't perhaps valid, you should also know
that this is an evil line of code that cannot be used in real life, if
you are really matching "valid email addresses".

I won't quote the FAQ here. Please read it.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@​stonehenge.com> <URL​:http​://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 1999

From [Unknown Contact. See original ticket]

Under Perl 5.005_61, this line produces an error​:

$email !~ /^[\w-.]+\@​[\w-]+\.[\w-.]+$/

Unmatched regexp [] ...

So I reinstalled Perl 5.005_3, and it works fine with it!

Something is wrong in the regexp lib.

The '-' inside [...] is considered as a range. Try '\-' instead.

Good luck,
Vadim.

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 1999

From [Unknown Contact. See original ticket]

Konovalov, Vadim writes​:

$email !~ /^[\w-.]+\@​[\w-]+\.[\w-.]+$/

Unmatched regexp [] ...

So I reinstalled Perl 5.005_3, and it works fine with it!

Something is wrong in the regexp lib.

The '-' inside [...] is considered as a range. Try '\-' instead.

After \w?! It should not. I wonder whether this is POSIX-classes
slip... Jarkko?

Ilya

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Ilya Zakharevich writes​:

Konovalov, Vadim writes​:

$email !~ /^[\w-.]+\@​[\w-]+\.[\w-.]+$/

Unmatched regexp [] ...

So I reinstalled Perl 5.005_3, and it works fine with it!

Something is wrong in the regexp lib.

The '-' inside [...] is considered as a range. Try '\-' instead.

After \w?! It should not. I wonder whether this is POSIX-classes
slip... Jarkko?

In 62-to-be I get

/^[\w-.]+\@​[\w-]+\.[\w-.]+$/​: invalid [] range in regexp at -e line 1.

which is what I would have expected. A range like \w-. makes no sense.
Try

/^[\w.]+\@​\w+\.[\w.]+$/

I plead guilty of putting in a stricter check for bogus ranges.

[ 3926] By​: jhi on 1999/08/05 17​:25​:19
  Log​: Fix regex charclass parsing so that bogus ranges
  like [0-\d] and [[​:word​:]-z] are no more allowed.
  The anomaly was noticed by Guy Decoux.
  Branch​: cfgperl
  ! pod/perldiag.pod pod/perlre.pod regcomp.c t/op/re_tests

If anybody can explain to me what [\w-.] would mean, I will rethink.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Oops, I take my bitching about a range ending into "nothing" back​:

perlre of both 5.005_04 and 5.005_03​:

  If you want "-" itself to be a member of a class, put it
  at the start or end of the list, or escape it with a backslash.

But the other issue remains​:

  Note that \w matches a single alphanumeric character, not a whole word. To
  match a word you'd need to say \w+. You may use \w, \W, \s, \S, \d and \D
  within character classes (though not as either end of a range).
  ***********************************

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

On Tue, Oct 12, 1999 at 11​:12​:54AM +0300, Jarkko Hietaniemi wrote​:

Oops, I take my bitching about a range ending into "nothing" back​:

perlre of both 5.005_04 and 5.005_03​:

If you want "-" itself to be a member of a class, put it
at the start or end of the list, or escape it with a backslash.

But the other issue remains​:

Note that \w matches a single alphanumeric character, not a whole word. To
match a word you'd need to say \w+. You may use \w, \W, \s, \S, \d and \D
within character classes (though not as either end of a range).
***********************************

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Ilya

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

match a word you'd need to say \w+. You may use \w, \W, \s, \S, \d and \D
within character classes (though not as either end of a range).
***********************************

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Hmmm. On the other hand, it seems that constructs like [\w-.] have
worked in the past as [\w\-.], so naughty people who have not read the
documentation and just used such constructs, will have their scripts broken.

Either we take out my patch and fix the documentation to comply,
or leave in my patch and face the bug reports.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

: > > match a word you'd need to say \w+. You may use \w, \W, \s, \S, \d
and \D
: > > within character classes (though not as either end of a range).
: > > ***********************************
: >
: > This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
: > I was thinking about the first case, but maybe it is good to have it
: > as error too...
:
: Hmmm. On the other hand, it seems that constructs like [\w-.] have
: worked in the past as [\w\-.], so naughty people who have not read the
: documentation and just used such constructs, will have their scripts
broken.
:
: Either we take out my patch and fix the documentation to comply,
: or leave in my patch and face the bug reports.

That is what I said in my initial message.... it worked in Perl 5.005_3.

I vote it to be automatically escaped.... so \w-. whould still be possible.

I wonder if my vote matters! :-)

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Faisal Nasim writes​:

That is what I said in my initial message.... it worked in Perl 5.005_3.

But you still didn't read the documentation :-)

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Since it currently works, this is a bug in the documentation. :-)

The documentation should be fixed rather than breaking currently
working programs.

But if we were designing this from scratch, I'd vote for an error.

Hmm.. After writing the above, I actually tried some experiments.

  [\w-.] works (i.e. treats - as escaped)
  [.-\w] doesn't (equivalent to \w )
  [\s-\w] works

So I also recommend fixing [.-\w] , for consistency.

Mike Guy

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

M.J.T. Guy writes​:

Hmm.. After writing the above, I actually tried some experiments.

[\w-.] works (i.e. treats - as escaped)
[.-\w] doesn't (equivalent to \w )
[\s-\w] works

So I also recommend fixing [.-\w] , for consistency.

It seems that at least my patch consistently made all the above illegal.
Oh well, I'll un-fix this and fix the docs.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @tamias

On Tue, Oct 12, 1999 at 02​:49​:38PM +0100, M.J.T. Guy wrote​:

Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Since it currently works, this is a bug in the documentation. :-)

The documentation should be fixed rather than breaking currently
working programs.

I strongly disagree. Since this is documented not to work, it is an error
in the implementation. :-)

This is a clear case of programs taking advantage of an obscure bug in
Perl. That bug should be fixed. [\w-.] has no clear meaning and should be
a syntax error.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Ronald J Kimball writes​:

On Tue, Oct 12, 1999 at 02​:49​:38PM +0100, M.J.T. Guy wrote​:

Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Since it currently works, this is a bug in the documentation. :-)

The documentation should be fixed rather than breaking currently
working programs.

I strongly disagree. Since this is documented not to work, it is an error
in the implementation. :-)

This is a clear case of programs taking advantage of an obscure bug in
Perl. That bug should be fixed. [\w-.] has no clear meaning and should be
a syntax error.

I can fix it both ways, let me know when the hydra called p5p has
agreed on something. Or on anything. :-)

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Jarkko Hietaniemi writes​:

I can fix it both ways, let me know when the hydra called p5p has

Errr, "either way", methinks...

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Ronald J Kimball writes​:

On Tue, Oct 12, 1999 at 02​:49​:38PM +0100, M.J.T. Guy wrote​:

Ilya Zakharevich <ilya@​math.ohio-state.edu> wrote

This is still ambiguous​: \w-. may be either \w\-., or a syntax error.
I was thinking about the first case, but maybe it is good to have it
as error too...

Since it currently works, this is a bug in the documentation. :-)

The documentation should be fixed rather than breaking currently
working programs.

I strongly disagree. Since this is documented not to work, it is an error
in the implementation. :-)

This is a clear case of programs taking advantage of an obscure bug in
Perl. That bug should be fixed. [\w-.] has no clear meaning and should be
a syntax error.

As I just happened to finish testing the patch to return to the
"understand-the-dash-as-literal" behavior, I think I will now check
it in to the repository, this way we are at least backward compatible.
If the collective we later agrees to strictly classify the discussed
range constructs as fatal errors, the patch is easily revertible.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

: I strongly disagree. Since this is documented not to work, it is an error
: in the implementation. :-)
:
: This is a clear case of programs taking advantage of an obscure bug in
: Perl. That bug should be fixed. [\w-.] has no clear meaning and should
be
: a syntax error.

And can't you consider it as a feature?

if "." is followed by a multiple pattern match regexp, its considered
to be as a '-' and not range.... I would pronounce it as a feature :-)

How can I contribute to Perl? (is there any age requirement? :P)
I know C and C++. (Don't just tell me to look into the tons of
the files.... :P) ... is there a proper documentation about the source
code written? any docs, pointers etc?

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @TimToady

jhi@​iki.fi writes​:
: As I just happened to finish testing the patch to return to the
: "understand-the-dash-as-literal" behavior, I think I will now check
: it in to the repository, this way we are at least backward compatible.
: If the collective we later agrees to strictly classify the discussed
: range constructs as fatal errors, the patch is easily revertible.

Nobody's mentioned the option of issuing a warning. I think I'd prefer
to nudge people towards quoting the dash if that's what they mean.
Otherwise we'll never help the people who think \d-\w produces a range,
for some definition of think.

Larry

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @tamias

On Tue, Oct 12, 1999 at 09​:39​:27PM +0500, Faisal Nasim wrote​:

: I strongly disagree. Since this is documented not to work, it is an error
: in the implementation. :-)
:
: This is a clear case of programs taking advantage of an obscure bug in
: Perl. That bug should be fixed. [\w-.] has no clear meaning and should
be
: a syntax error.

And can't you consider it as a feature?

if "." is followed by a multiple pattern match regexp, its considered
to be as a '-' and not range.... I would pronounce it as a feature :-)

If you want the character class [\w\-.], I think it's just as easy to write
it that way, or as [\w.-], thus avoiding the contradiction of an inner
hypen which does not specify a range.

The documentation makes it clear that, just like in regex syntaxes of most
other languages and tools, a hyphen is a literal hyphen if it is at the
beginning or end of the character class, and a range if it is in the
middle. I don't see any benefit to complicating that.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

: Nobody's mentioned the option of issuing a warning. I think I'd prefer
: to nudge people towards quoting the dash if that's what they mean.
: Otherwise we'll never help the people who think \d-\w produces a range,
: for some definition of think.
:
: Larry

How about issuing a warning? :-)

Why are there qq/q/qx? When you could simply​:

print "blah is \"cool\" and foobar is \"not\"cool";
print 'blah is \'cool\' and foobar is \'not\'cool';
print `hostname`;

?

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

: Why are there qq/q/qx? When you could simply​:
:
: print "blah is \"cool\" and foobar is \"not\"cool";
: print 'blah is \'cool\' and foobar is \'not\'cool';
: print `hostname`;
:

and don't forget qw.

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

Larry Wall <larry@​wall.org> writes​:

Nobody's mentioned the option of issuing a warning. I think I'd prefer
to nudge people towards quoting the dash if that's what they mean.
Otherwise we'll never help the people who think \d-\w produces a range,
for some definition of think.

But it is "obvious" that \w-\d is a "range" of alphabetics and '_' ;-)

--
Nick Ing-Simmons <nik@​tiuk.ti.com>
Via, but not speaking for​: Texas Instruments Ltd.

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

At 06​:05 PM 10/12/99 +0100, Nick Ing-Simmons wrote​:

But it is "obvious" that \w-\d is a "range" of alphabetics and '_' ;-)

Hey, yeah! Set operations in character classes! Unicode has the set
operators as individual characters, right? :-)

  Dan

----------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
dan@​sidhe.org have teddy bears and even
  teddy bears get drunk

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Dan Sugalski writes​:

At 06​:05 PM 10/12/99 +0100, Nick Ing-Simmons wrote​:

But it is "obvious" that \w-\d is a "range" of alphabetics and '_' ;-)

Hey, yeah! Set operations in character classes! Unicode has the set
operators as individual characters, right? :-)

Yes, it has.

<...> Ummm, it seems the "+" is kind of taken... :-)

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From @jhi

Larry Wall writes​:

jhi@​iki.fi writes​:
: As I just happened to finish testing the patch to return to the
: "understand-the-dash-as-literal" behavior, I think I will now check
: it in to the repository, this way we are at least backward compatible.
: If the collective we later agrees to strictly classify the discussed
: range constructs as fatal errors, the patch is easily revertible.

Nobody's mentioned the option of issuing a warning. I think I'd prefer
to nudge people towards quoting the dash if that's what they mean.
Otherwise we'll never help the people who think \d-\w produces a range,
for some definition of think.

Aargh, a third option! :-) But an option I find awfully tempting...

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 12, 1999

From [Unknown Contact. See original ticket]

At 09​:07 PM 10/12/99 +0300, Jarkko Hietaniemi wrote​:

Dan Sugalski writes​:

At 06​:05 PM 10/12/99 +0100, Nick Ing-Simmons wrote​:

But it is "obvious" that \w-\d is a "range" of alphabetics and '_' ;-)

Hey, yeah! Set operations in character classes! Unicode has the set
operators as individual characters, right? :-)

Yes, it has.

<...> Ummm, it seems the "+" is kind of taken... :-)

I was thinking of the horseshoes (or whatever they're called. Union and
intersection, I suppose) and the funky not character. (I still think we
should reserve unicode characters for all the current perl keywords, but
that's another topic altogether.)

  Dan

----------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
dan@​sidhe.org have teddy bears and even
  teddy bears get drunk

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 1999

From @jhi

Now 62-to-be is rigged to accept dashes as literals in "false ranges"
but it moans about them under -w​:

/akjhfd[xa-\dz]/​: false [] range "a-\d" in regexp at -e line 1.

The perldiag entry says​:

(W) A character class range must start and end at a literal character, not
another character class like C<\d> or C<[​:alpha​:]>. The "-" in your false
range is interpreted as a literal "-". Consider quoting the "-", "\-".
See L<perlre>.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 1999

From @jhi

While staring at regclass() I noticed a possible additional source
of -w noise​: inside character classes we seem to pass assertions
like \A, \z through without a peep (they are parsed as A, z, ...).

On the other hand, in 5.005_5X Ilya's patch that complains about
completely unknown backslash stuff, like, say, \q or \y, was added​:
"Unrecognized escape \y passed through".

Should I add a similar warning for character classes,
"Unrecognized escape \y passed through in character class"?

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 1999

From [Unknown Contact. See original ticket]

On Fri, Oct 15, 1999 at 12​:33​:40AM +0300, Jarkko Hietaniemi wrote​:

While staring at regclass() I noticed a possible additional source
of -w noise​: inside character classes we seem to pass assertions
like \A, \z through without a peep (they are parsed as A, z, ...).

On the other hand, in 5.005_5X Ilya's patch that complains about
completely unknown backslash stuff, like, say, \q or \y, was added​:
"Unrecognized escape \y passed through".

Should I add a similar warning for character classes,
"Unrecognized escape \y passed through in character class"?

Thanks, yes. I missed this case...

Ilya

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 1999

From @jhi

Should I add a similar warning for character classes,
"Unrecognized escape \y passed through in character class"?

Thanks, yes. I missed this case...

Done. Now e.g. [a\zb] (with -w) triggers such a warning.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant