Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate error for invalid curly quantifier in regex #12539

Closed
p5pRT opened this issue Nov 9, 2012 · 9 comments
Closed

Generate error for invalid curly quantifier in regex #12539

p5pRT opened this issue Nov 9, 2012 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 9, 2012

Migrated from rt.perl.org#115652 (status was 'open')

Searchable as RT115652$

@p5pRT
Copy link
Author

p5pRT commented Nov 9, 2012

From jrw32982@yahoo.com

Created by jrw32982@yahoo.com

Enhancement request.

Please tighten up the syntax for regexes, perhaps by adding
  use feature 'strict_quantifiers'
 
The effect would be that the presence of left curly in a regex would be
valid only if it was used in a syntactically correct quantifier
(i.e. {n}, {n,}, {n,m}). Of course, this shouldn't affect the use
of various interpolations, such as $hash{key}, into a regex.

Currently, if the use of a left curly doesn't match a valid quantifier
pattern, the left curly is treated as a literal character. I just spent
several minutes puzzling my way through the output of a program which had this
regex​: /\G(.{1,$max)/ until I finally noticed the mismatched left curly. When
I finally found it I was quite surprised that the unbalanced curly did not
generate an error. After the fact, when I checked the perlre doc, I see that
this behavior is documented, but I can't imagine why it would be a good idea
to make the syntax and semantics of curlies so loose.

Of course, any such change would not be backwards compatible, so maybe a
feature pragma would be the best way to add it into new code. I wonder how
many other similar gotchas there are in the definition of the way the regex
engine works. Maybe other "misfeatures" could/should be tightened down at
the same time?

Thanks!

Perl Info

Flags:
    category=core
    severity=low

This perlbug was built using Perl v5.6.1 - Wed May  2 17:14:53 DFT 2001
It is being executed now by  Perl v5.8.8 - Tue Jun  2 16:08:32 PAKDT 2009.

Site configuration information for perl v5.8.8:

Configured by root at Tue Jun  2 16:08:32 PAKDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=aix, osvers=5.3.0.0, archname=aix-thread-multi
    uname='aix akash79 3 5 00011a85d600 '
    config_args='-desr -Dinstallprefix=/usr/opt/perl5 -Dprefix=/usr/opt/perl5 -Dcc=xlc_r -Duseshrplib -Dusethreads'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc_r', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=-1 -qnoansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES -qlonglong',
    optimize='-O',
    cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=-1 -qnoansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT'
    ccversion='9.0.0.2', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='ld', ldflags =' -brtl -bdynamic -b32'
    libpth=/lib /usr/lib /usr/ccs/lib
    libs=-lbind -lnsl -lgdbm -ldbm -ldb -ldl -lld -lm -lcrypt -lpthreads -lc -lbsd
    perllibs=-lbind -lnsl -ldl -lld -lm -lcrypt -lpthreads -lc -lbsd
    libc=, so=a, useshrplib=true, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags='  -bE:/usr/opt/perl5/lib/5.8.8/aix-thread-multi/CORE/perl.exp'
    cccdlflags=' ', lddlflags='-bhalt:4 -bexpall -G -bnoentry -lpthreads -lc'

Locally applied patches:
    


@INC for perl v5.8.8:
    /usr/opt/perl5/lib/5.8.8/aix-thread-multi
    /usr/opt/perl5/lib/5.8.8
    /usr/opt/perl5/lib/site_perl/5.8.8/aix-thread-multi
    /usr/opt/perl5/lib/site_perl/5.8.8
    /usr/opt/perl5/lib/site_perl
    .


Environment for perl v5.8.8:
    HOME=/home/jrw32982
    LANG=C
    LANGUAGE (unset)
    LC_ALL=C
    LC__FASTMSG=true
    LD_LIBRARY_PATH (unset)
    LIBPATH (unset)
    LOGDIR (unset)
    PATH=/home/jrw32982/std/bin:/home/jrw32982/sh:/home/jrw32982/bin:/home/jrw32982/vim72/bin:/usr/local/bin:/bin:/usr/bin:/usr/ucb:/usr/bin/X11:/etc:/sbin:/usr/sbin:/home/jrw32982/local:.
    PERL_BADLANG (unset)
    SHELL=/usr/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2012

From @jkeenan

On Fri Nov 09 09​:01​:52 2012, jrw32982@​yahoo.com wrote​:

This is a bug report for perl from jrw32982@​yahoo.com,
generated with the help of perlbug 1.33 running under perl v5.8.8.

-----------------------------------------------------------------
[Please enter your report here]

Enhancement request.

Please tighten up the syntax for regexes, perhaps by adding
use feature 'strict_quantifiers'

The effect would be that the presence of left curly in a regex would
be
valid only if it was used in a syntactically correct quantifier
(i.e. {n}, {n,}, {n,m}). Of course, this shouldn't affect the use
of various interpolations, such as $hash{key}, into a regex.

Currently, if the use of a left curly doesn't match a valid quantifier
pattern, the left curly is treated as a literal character. I just
spent
several minutes puzzling my way through the output of a program which
had this
regex​: /\G(.{1,$max)/ until I finally noticed the mismatched left
curly. When
I finally found it I was quite surprised that the unbalanced curly did
not
generate an error. After the fact, when I checked the perlre doc, I
see that
this behavior is documented, but I can't imagine why it would be a
good idea
to make the syntax and semantics of curlies so loose.

Disclaimer​: I am not a regex-guts expert; just another regex user.

That being said, I doubt that the benefit which Perl users might gain
from this requested feature would be worth the effort needed to
implement and -- more importantly -- maintain the functionality. As I
noted in the original RT (see Links), there's no error to report here.

Of course, any such change would not be backwards compatible, so maybe
a
feature pragma would be the best way to add it into new code. I
wonder how
many other similar gotchas there are in the definition of the way the
regex
engine works. Maybe other "misfeatures" could/should be tightened
down at
the same time?

I'll leave it to others to determine whether we would need to create a
new feature pragma for this. But the fact that we even have to consider
this suggests to me that the implementation/maintenance burden will
indeed be high. What you are suggesting seems to be​: "Enforce a
compile-time check solely to permit me to receive warnings about
something the docs already warned me about."

Regexes are not easy. They do require study and practice. And regex
users benefit strongly from discussion on mailing lists such as
perl-help or on sites like Perlmonks. But I don't think they require
additional warnings or pragmata.

So -1 from me ... but this will inevitably a community discussion.

Thank you for posting the RT.

Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Nov 10, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 2012

From @ap

Please refer to
http​://www.nntp.perl.org/group/perl.perl5.porters/;msgid=50A5AF43.4040109@​khwilliamson.com

@p5pRT
Copy link
Author

p5pRT commented Dec 14, 2013

From @jkeenan

On Wed Nov 21 02​:00​:20 2012, aristotle wrote​:

Please refer to
http​://www.nntp.perl.org/group/perl.perl5.porters/;msgid=50A5AF43.4040109@​khwilliamson.com

Aristotle, Karl​:

Can we get an update on the status of this ticket?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Dec 14, 2013

From @ap

* James E Keenan via RT <perlbug-followup@​perl.org> [2013-12-14 01​:25]​:

On Wed Nov 21 02​:00​:20 2012, aristotle wrote​:

Please refer to
http​://www.nntp.perl.org/group/perl.perl5.porters/;msgid=50A5AF43.4040109@​khwilliamson.com

Aristotle, Karl​:

Can we get an update on the status of this ticket?

Purely in terms of bug tracker concerns I *believe* that this ticket is
redundant, i.e. the same issue has to have been filed elsewhere as well.
But I’m not sure. It definitely has already been discussed on p5p and
has already been attempted – per the link I give, at minimum. As far as
I can tell about the effort as such, it is currently stalled, pending an
idea for how to cut the knot, but the desire to make it happen remains.

@p5pRT
Copy link
Author

p5pRT commented Dec 15, 2013

From @khwilliamson

On 12/13/2013 09​:37 PM, Aristotle Pagaltzis wrote​:

* James E Keenan via RT <perlbug-followup@​perl.org> [2013-12-14 01​:25]​:

On Wed Nov 21 02​:00​:20 2012, aristotle wrote​:

Please refer to
http​://www.nntp.perl.org/group/perl.perl5.porters/;msgid=50A5AF43.4040109@​khwilliamson.com

Aristotle, Karl​:

Can we get an update on the status of this ticket?

Purely in terms of bug tracker concerns I *believe* that this ticket is
redundant, i.e. the same issue has to have been filed elsewhere as well.
But I’m not sure. It definitely has already been discussed on p5p and
has already been attempted – per the link I give, at minimum. As far as
I can tell about the effort as such, it is currently stalled, pending an
idea for how to cut the knot, but the desire to make it happen remains.

It is not stalled. What happened in 5.18 is that we now have


=item Useless use of '\'; doesn't escape metacharacter '%c'

(D deprecated) You wrote a regular expression pattern something like
one of these​:

  m{ \x\{FF\} }x
  m{foo\{1,3\}}
  qr(foo\(bar\))
  s[foo\[a-z\]bar][baz]

The interior braces, square brackets, and parentheses are treated as
metacharacters even though they are backslashed; instead write​:

  m{ \x{FF} }x


This message will be in 5.18 and 5.20 so that everyone relying on the
old behavior will be warned to convert. (This message in 5.18 has found
bugs in existing code.)

Basically, the backslashes don't have any effect currently. This
deprecation will allow us in 5.22 to change so that the backslashes do
have an effect, and that will allow us to have m{...\{...} which has a
literal backslash. That will in turn allow us to deprecate unescaped
braces as literals, which will allow us in 5.26 to have all unescaped
braces be metacharacters, which will then allow us to fix this ticket
and allow spaces and other extensions in quantifiers. It's a slow
process, but it had to start somewhere; and it has.

@p5pRT
Copy link
Author

p5pRT commented Jan 30, 2017

From @khwilliamson

This is actually already fixed according to the OPs wishes. But I'm not going to resolve it until the final step is made, some releases in the future. The use of such a left brace, as in the example in the ticket, now raises a deprecation warning. I'll wait to resolve this until that warning becomes fatal, rather than write a new ticket for that.
--
Karl Williamson

@khwilliamson
Copy link
Contributor

I believe that a warning is now generated for all occurrences where a left brace is taken as a literal when it might not have been meant to be. That means that if it immediately follows a parenthesis, no warning is generated, as it can only ever be a literal there. There is nothing to quantify. Raising the warning for these contexts broke too much code, so we had to back off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants