Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/(? :o)/: Sequence (? ...) not recognized, when using /x. #2368

Closed
p5pRT opened this issue Aug 15, 2000 · 9 comments · Fixed by #20790
Closed

/(? :o)/: Sequence (? ...) not recognized, when using /x. #2368

p5pRT opened this issue Aug 15, 2000 · 9 comments · Fixed by #20790

Comments

@p5pRT
Copy link

p5pRT commented Aug 15, 2000

Migrated from rt.perl.org#3697 (status was 'open')

Searchable as RT3697$

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2000

From @Abigail

Created by @Abigail

  $ perl -wle '"foo" =~ /(? :o)/x'
  /(? :o)/​: Sequence (? ...) not recognized
  $

But 'man perlre' writes​:

  The `/x' modifier itself needs a little more explanation.
  It tells the regular expression parser to ignore whites­
  pace that is neither backslashed nor within a character
  class. You can use this to break up your regular expres­
  sion into (slightly) more readable parts.

Either the doc is wrong, or the regular expression parser. Fixing
the docs is easiest, but fixing the parser is the most useful.
(It's useful if you have a 10k regex you want to print out, and
put some linebreaks in them).

The Camel III doesn't mention the restriction on placing whitspace either.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.6.0:

Configured by abigail at Wed Jun 14 21:00:02 EDT 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.13, archname=i686-linux-64int
    uname='linux alexandra 2.2.13 #5 tue feb 8 15:37:54 est 2000 i686 unknown '
    config_args='-Dprefix=/opt/perl -d -Uinstallusrbinperl -Doptimize=-g -Dusemorebits'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=define use64bitall=undef uselongdouble=define usesocks=undef
  Compiler:
    cc='cc', optimize='-g', gccversion=2.95.2 19991024 (release)
    cppflags='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include'
    ccflags ='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=/lib/libc-2.1.2.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.6.0:
    /home/abigail/Perl
    /home/abigail/Sybase
    /opt/perl/lib/5.6.0/i686-linux-64int
    /opt/perl/lib/5.6.0
    /opt/perl/lib/site_perl/5.6.0/i686-linux-64int
    /opt/perl/lib/site_perl/5.6.0
    /opt/perl/lib/site_perl/5.005
    /opt/perl/lib/site_perl
    .


Environment for perl v5.6.0:
    HOME=/home/abigail
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/lib:/opt/tcl/lib:/opt/tk/lib/tk8.0
    LOGDIR (unset)
    PATH=/home/abigail/Bin:/opt/perl/bin:/opt/tcl/bin:/opt/tk/bin:/usr/local/bin:/usr/local/X11/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/opt/povray/bin:/opt/teTeX/bin/i686-pc-linux-gnu:/opt/python/bin
    PERL5LIB=/home/abigail/Perl:/home/abigail/Sybase
    PERLDIR=/opt/perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Aug 22, 2000

From @vanstyn

In <20000815160207.13763.qmail@​foad.org>, abigail@​foad.org writes​:
: $ perl -wle '"foo" =~ /(? :o)/x'
: /(? :o)/​: Sequence (? ...) not recognized
: $

Hmm, how far do we want to go with this? Anything in regcomp.c that
directly increments PL_regcomp_parse rather than calling nextchar(),
or reads beyond the character it currently points to could be
perceived as a bug here. That means all the (? parsing, and others
such as /a{1 }/x. It also begs questions around '.{1 0}' and '\1 0'.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2000

From [Unknown Contact. See original ticket]

Hugo <hv@​crypt.compulink.co.uk> wrote

In <20000815160207.13763.qmail@​foad.org>, abigail@​foad.org writes​:
: $ perl -wle '"foo" =~ /(? :o)/x'
: /(? :o)/​: Sequence (? ...) not recognized
: $

Hmm, how far do we want to go with this? Anything in regcomp.c that
directly increments PL_regcomp_parse rather than calling nextchar(),
or reads beyond the character it currently points to could be
perceived as a bug here. That means all the (? parsing, and others
such as /a{1 }/x. It also begs questions around '.{1 0}' and '\1 0'.

It should skip whitespace almost everywhere. I'd suggest the
exceptions as

  within numbers (i.e. your two cases '.{1 0}' and '\1 0')

  within character classes

  after \

Mike Guy

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2000

From [Unknown Contact. See original ticket]

Hugo wrote​:
|> In <20000815160207.13763.qmail@​foad.org>, abigail@​foad.org writes​:
|> : $ perl -wle '"foo" =~ /(? :o)/x'
|> : /(? :o)/​: Sequence (? ...) not recognized
|> : $
|>
|> Hmm, how far do we want to go with this? Anything in regcomp.c that
|> directly increments PL_regcomp_parse rather than calling nextchar(),
|> or reads beyond the character it currently points to could be
|> perceived as a bug here. That means all the (? parsing, and others
|> such as /a{1 }/x. It also begs questions around '.{1 0}' and '\1 0'.

Using /x turns non-class whitespace that would otherwise match itself into
metacharacters that are then ignored. Whitespace within
  /(? :o)/
or
  /x{1 }/
were never part of that, so it seems consistant to me that these examples
remain syntax errors.

But then, by that logic, it would seem consistant for /x? ?/x to remain
a syntax error, but it's currently taken as /x??/.
  Jeffrey

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2000

From [Unknown Contact. See original ticket]

Jeffrey Friedl <jfriedl@​yahoo-inc.com> wrote

Using /x turns non-class whitespace that would otherwise match itself into
metacharacters that are then ignored. Whitespace within
/(? :o)/
or
/x{1 }/
were never part of that, so it seems consistant to me that these examples
remain syntax errors.

That's thinking in terms of the implementation rather than the intent
of /x. It's certainly not how it's documented.

/x was intended to mean and should mean "ignore whitespace and # comments".

Mike Guy

@demerphq
Copy link
Collaborator

I cant decide what to do about this. On one hand I personally see metapatterns like '(?>' as indivisible symbols, much as '++' is indivisible in perl. ($x + +; is a syntax error). On other hand I would say the same thing about quantifiers, but the precedent has been set that we ignore spaces, so x + + is the same as x++ (which i personally think is a bit daft, i can kinda understand allowing a space between the x and the ++ but not spaces between the the '+' symbols themselves). I can also /sort of/ understand wanting to put spaces inside of the option specifiers like (? i :, but what about other multi-character symbols? should ( ? > be treated the same as (?> as well? we have a LOT of these. Should ( ? & foo ) be treated as (?&foo) also?

So we have an awkward set of precedent here, and some imo difficult decisions. It is likely relatively simple to support whitespace in more places, but I really question whether we should. Regardless if we don't we should fix the docs to explain the subtleties here.

@khwilliamson
Copy link
Contributor

I have a bias about this issue from my experience in Fortran. It was a serious mistake they made in not requiring white space between tokens. It's been a long time since I looked at this but a classic example was fori1ton. or something like that could have been a for loop, or a variable name. I thought language design had learned from that lesson. '+ +' is more obscure than ++. I don't believe we should take any extra steps towards enabling obscure expressions of intent.

@demerphq
Copy link
Collaborator

Closing as not-a-bug.

@demerphq
Copy link
Collaborator

Note, I have filed a doc fix related to this as #20790 and I am reopening until it is applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants