New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No deprecation warning on literal left curly bracket in /.{/ etc #15349
Comments
From @trwyantCreated by @trwyantAs I understand it, literal left curly brackets in regular expressions /.{/ Compilation of these under 'use re debug' appears to show that the Is all this a misunderstanding of the deprecation on my part? Or an Appended example: $ perl -c -Mre=debug -e '/.{/' $ # but $ perl -c -Mre=debug -e '/x{/' Perl Info
|
From @jkeenanOn Sun May 22 13:14:54 2016, wyant wrote:
I can confirm your observations that no exceptions are thrown in these instances in perl 5.24.0. I can also confirm your observations re "perl -c -Mre=debug -e ...". I am not a regex expert and didn't follow the discussion of this feature. But I suspect this may be a case where the error message is not sufficiently precise as to the error condition. If we look at the source code we see this starting at line 13190 of 'regcomp.c': ##### The term 'construct' is used (as both noun and verb) throughout the inline comments in this file but I didn't see a precise definition of its noun version. I'm also not entirely clear as to what 'literal string' and 'something like "\b"' mean. But I would hazard a guess that in the examples you cite, the characters preceding the unescaped left brace, i.e., ##### ... because they are all regex-y things, disqualify these from being 'literal strings' and thus do not qualify as an error conditions. Pinging our regex experts! Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphqThis was a change made by Karl, but personally i consider this discrepancy Yves On Sun May 22 13:14:54 2016, wyant wrote:
I can confirm your observations that no exceptions are thrown in these I am not a regex expert and didn't follow the discussion of this feature. If we look at the source code we see this starting at line 13190 of ##### The term 'construct' is used (as both noun and verb) throughout the inline ##### ... because they are all regex-y things, disqualify these from being Pinging our regex experts! Thank you very much. -- via perlbug: queue: perl5 status: new |
From @trwyantFWIW: $ perl -c -Mre=debug -e '/\d{/' and similarly for \D, \s, \S, \v, \V, \N, \h. Which is why I expected a warning (5.24.0) or error (5.25.1) on /.{/. On the other hand, the comment seems to say none of these should warn. The 'Quantifiers' section of the Perl 5.24.0 and 5.25.1 perlre seems to say that all left curlys not part of a quantifier are literals and deprecated, but omits the "first character in the pattern" exception that appears in the 5.25.1 perldelta. None of which seems to help, thoug. :-( |
From @khwilliamsonOn 05/23/2016 04:51 AM, Tom Wyant via RT wrote:
Thank you for finding this. If you hadn't found this now, there would This situation is the result of oversights on my part. The goal of the deprecation is two-fold: As an example of item 2), before 5.22, if you say qr/a{3, 4}/, the blank All the other extensions that are envisioned involve using backslash However, { is often used in patterns to mean a literal left brace. But another goal would be to minimize this disruption. So I tried to But what you have found is that I missed some cases where it should have We can't just forbid things like that without a deprecation cycle. I I think that 5.26 should go out with the current fatal error for the And, I want to make the rule simple to follow. I think it's better to And then where's the cut off? qr/.{/ could not have been meant to be a |
From zefram@fysh.orgKarl Williamson wrote:
In that case, *every* unescaped brace should be subject to the -zefram |
From @trwyantOn Mon May 23 16:09:05 2016, zefram@fysh.org wrote:
Boy, when I turned back the corner of this carpet I had no idea how many creepy-crawlies would come swarming out. And I appreciate very much Karl's fulsome explanation of what is going on, and support the reasons for doing the work. The problem with applying Occam's razor is that which is the simpler alternative can depend on how you look at them. I personally am happy with something like "un-escaped literal '{' is deprecated/removed where and only where it can introduce a quantifier." That statement means to me to mean "not at the beginning of a group (or the regex as a whole)" for sure. Karl's addition of "... and not after an assertion" makes sense to me on the face of it, since it makes no sense (to me) to quantify an assertion. If ^ matches once at a given location, it will match any number of times. Given that, I personally, for whatever little that is worth, would look on the current parse of /^{3}/ as a bug (or at least a misfeature), and would be happy to see that sort of thing deprecated and eliminated. But that's just me, and far more Perl-savvy people than me have been wrong about such apparently-obvious things. The thing is, this could be looked at as the thin edge of the wedge, since it means a deprecation warning for '/^ {/' but not '/^ {/x'. The funny thing is, all this was started by my desire to release a PPIx::Regexp that shows literal '{' removed in 5.25.1 (at least, in those cases where it is in fact removed), and realizing that I did not understand either the documentation or what Perl was actually doing. My motivation may be important in evaluating my opinions, because they will almost surely be influenced by the ease or difficulty of the implementation I face. "Always removed" is certainly trivial, but "removed except at the beginning of a group and after an assertion" is not much harder. The tricky thing may be to implement what looks like the actual removal, with some things disappearing now, and some not disappearing until 5.29.x. So if someone could let me know exactly what deprecations are being deferred, I would greatly appreciate it. Or at least, whether there is anything besides the dot, bracketed classes (including extended, which I did not check before but which also do not issue a warning) and the POSIX classes \p{...} and \P{...}. And just as a side observation, /\b{/ is not a good example of what we're talking about, because that parse fails for a completely different reason -- after the '{' Perl is trying to complete something like /\b{wb}/ and failing. Thank you all very much. |
From @trwyantIn my next-to-last paragraph, s/POSIX/Unicode/g. Sorry about the proofreading. |
From zefram@fysh.orgTom Wyant via RT wrote:
That would make some sense if such a rule were being added, but it's not. If assertions were a consistent exception, I'd still be opposed to that.
The reality is harder than either. Implementing whatever it is isn't -zefram |
From @demerphqOn 24 May 2016 at 21:30, Zefram <zefram@fysh.org> wrote:
While I generally agree with what you said in this post this point is $ perl -wle'qr/\A{10}/' Quantifier unexpected on zero-length expression in regex m/\A{10}/ at -e line 1. We DO consider a quantifier on an assertion to be warnable, and IMO we So /\A{10}/ should be the same as /\A/ and /\A{0,10}/ should be the same as /\A?/ Obviously we would want to propagate quantifier modifiers through as well. cheers, -- |
From zefram@fysh.orgdemerphq wrote:
That's just a warning. It does actually parse as I stated.
That would be a sensible optimisation, but has essentially nothing to -zefram |
From @demerphqOn 25 May 2016 at 01:06, Zefram <zefram@fysh.org> wrote:
Yes, no argument there. My point was more that IMO it would not be IOW, I do consider use of a {min,max} quantifier on an assertion to be Yves -- |
From zefram@fysh.orgdemerphq wrote:
I'd be OK with syntactically forbidding quantifiers on assertions. -zefram |
From @demerphqOn 25 May 2016 at 01:36, Zefram <zefram@fysh.org> wrote:
Umm. I dont follow that last point... I can understand forbidding quantifiers on an assertion, but not Why do you think we need to support that? Yves -- |
From zefram@fysh.orgdemerphq wrote:
My point is that any forbidding of quantifiers on assertions should be a -zefram |
From @demerphqOn 25 May 2016 at 02:02, Zefram <zefram@fysh.org> wrote:
My first reaction to this is "Ok, I get it, and oh BTW, bonus points My second reaction to this is that without a quantifier or alternation Eg: $ perl -Mre=debug -wle'qr/(?:a)bcd/' So I would expect /^*/ and /(?:^)*/ to produce the same compiled But what you are saying is you would expect them to differ. Which I But the difference makes me wonder a bit, because of things like this: $ perl -Mre=debug -wle'qr/(?:a) */x' IOW, under /x we allow comments and whitespace to be inserted between So if we determine this at compile time I guess we have to do it Historically I have done everything I could to get us out of special Yves -- |
From zefram@fysh.orgdemerphq wrote:
It should indeed be semantically invisible, but it is *syntactically*
If they're syntactically legal (as they currently are), they should
No, I would expect /(?:^){4}/ not to error at all. The optimiser would
There are some things that should be based on the parse tree of the -zefram |
From @demerphqOn 25 May 2016 at 05:31, Zefram <zefram@fysh.org> wrote:
Yes I know. (Hence why I said "without a quantifier or alternation")
No its not essential. But IMO it is prima facie *saner*.
Well, I dunno. This is isnt as obvious to me as it seems to be to you.
I think that "No" is superfluous. :-) My point is that if we error at Perhaps I should mention that with the way the regex engine is Note that some of your arguments seem to predicated on the assumption
There is no parse tree in the regex engine. :-( Adding one is on my todo list. :-)
The opcodes produced do not include the whitespace.
Again there is no optree in the regex engine. Not in the classic sense Anyway, I will have to think about this a bit, I think I am generally No doubt in a day or two I will say something like "I get Zefram's Cheers! -- |
From @epaFWIW, I can imagine cases where you build up a regexp from smaller ones, which /(?:$RE{whatever}){1,2}/; and if $RE{whatever} changes under the covers to be just /^/ then this breaks. -- |
From @trwyant
The thing is, the more I think about it, the more I think the simple-to-follow argument says un-escaped literal '{' should not become fatal until _all_ cases of interest can be made fatal. Doing it in two (currently) stages means, at least, explaining to people that they are allowed (with a warning) to write /^{/, but /\A{/ is fatal -- and the same for however many other cases there are. Or have I misunderstood? |
From @khwilliamsonOn 05/25/2016 12:00 PM, Tom Wyant via RT wrote:
I don't think you've misunderstood. But I am loathe to back off the It would actually be best to make the new warning fatal for regexes I'm attaching a patch to look at which changes the current error's text It adds a new warning in the contexts where it should have warned but And I added explanatory text to perldiag explaining the situation
|
From @khwilliamson0007-Add-missing-deprecation-message-for-unescaped-in-reg.patchFrom da814ee9bcf58f82c698f30800dca313a0e0a0fb Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Thu, 9 Jun 2016 21:25:46 -0600
Subject: [PATCH 7/7] Add missing deprecation message for unescaped '{' in
regexes
The use of literal '{' without being escaped has been deprecated since
5.16, and warned out since 5.20. In 5.24, this has been made illegal,
with a bunch of CPAN modules broken by it, in spite of the long
deprecation period. See
https://rt.perl.org/Ticket/Display.html?id=128139
Unfortunately, I overlooked a code path, and not all instances that
should have warned did so in fact. This was spotted by Tom Wyant in
https://rt.perl.org/Ticket/Display.html?id=128213
This commit adds that warning, and rewords the fatal one slightly, and
clarifies the whole thing in perldiag.
---
pod/perldiag.pod | 64 +++++++++++++++++++++++++++++++++++++++++++++++---------
regcomp.c | 11 +++++++---
t/re/reg_mesg.t | 19 +++++++++++++----
3 files changed, 77 insertions(+), 17 deletions(-)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index dae7c26..247af33 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -6123,20 +6123,64 @@ C<undef *foo>.
(A) You've accidentally run your script through B<csh> instead of Perl.
Check the #! line, or manually feed your script into Perl yourself.
-=item Unescaped left brace in regex is illegal in regex;
+=item Unescaped left brace in regex is deprecated here, passed through in
+regex; marked by S<<-- HERE> in m/%s/
+
+(D deprecated, regexp) See the message immediately following this one.
+
+=item Unescaped left brace in regex is illegal here in regex;
marked by S<<-- HERE> in m/%s/
-(F) You used a literal C<"{"> character in a regular
-expression pattern. You should change to use C<"\{"> or C<[{]> instead.
-If the pattern delimiters are also braces, any matching
-right brace (C<"}">) should also be escaped to avoid confusing the parser,
-for example,
+(F)
+
+The simple rule to remember if you want to match a literal C<"{">
+character (U+007B C<LEFT CURLY BRACKET>) in a regular expression pattern
+is to escape each literal instance of it in some way. Generally easiest is
+to precede it with a backslash, like C<"\{"> or enclose it in square
+brackets (C<"[{]">). If the pattern delimiters are also braces, any
+matching right brace (C<"}">) should also be escaped to avoid confusing
+the parser, for example,
+
+ qr{abc\{def\}ghi}
+
+Forcing literal C<"{"> characters to be escaped will enable the Perl
+language to be extended in various ways in future releases. To avoid
+needlessly breaking existing code, the restriction is is not enforced in
+contexts where there are unlikely to ever be extensions that could
+conflict with the use there of C<"{"> as a literal. The reason that
+there are both fatal and non-fatal warnings about this is that, because
+of an oversight, some uses of a literal C<"{"> that should have raised a
+deprecation warning starting in v5.20 did not warn until v5.26. By
+making the already-warned uses fatal now, some of the planned extensions
+can be made to the language sooner.
+
+The contexts where no warnings or errors are raised are:
+
+=over 4
+
+=item *
- qr{abc\{def\}ghi}
+as the first character in a pattern, or following C<"^"> indicating to
+anchor the match to the beginning of a line.
-This restriction is not enforced if the C<"{"> is the first character in
-the pattern; nor is a warning generated for this case, as there are no
-current plans to forbid it.
+=item *
+
+as the first character following a C<"|"> indicating alternation.
+
+=item *
+
+as the first character in a parenthesized grouping like
+
+ /foo({bar)/
+ /foo(:?{bar)/
+
+=item *
+
+as the first character following a quantifier
+
+ /\s*{/
+
+=back
=item unexec of %s into %s failed!
diff --git a/regcomp.c b/regcomp.c
index 8562b8f..e431576 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -13256,7 +13256,7 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
* something like "\b" */
if (len || (p > RExC_start && isALPHA_A(*(p -1)))) {
RExC_parse = p + 1;
- vFAIL("Unescaped left brace in regex is illegal");
+ vFAIL("Unescaped left brace in regex is illegal here");
}
/*FALLTHROUGH*/
default: /* A literal character */
@@ -13661,8 +13661,6 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
RExC_parse = p - 1;
Set_Node_Cur_Length(ret, parse_start);
RExC_parse = p;
- skip_to_be_ignored_text(pRExC_state, &RExC_parse,
- FALSE /* Don't force to /x */ );
{
/* len is STRLEN which is unsigned, need to copy to signed */
IV iv = len;
@@ -13674,6 +13672,13 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
break;
} /* End of giant switch on input character */
+ /* Position parse to next real character */
+ skip_to_be_ignored_text(pRExC_state, &RExC_parse,
+ FALSE /* Don't force to /x */ );
+ if (PASS2 && *RExC_parse == '{' && OP(ret) != SBOL && ! regcurly(RExC_parse)) {
+ ckWARNregdep(RExC_parse + 1, "Unescaped left brace in regex is deprecated here, passed through");
+ }
+
return(ret);
}
diff --git a/t/re/reg_mesg.t b/t/re/reg_mesg.t
index ff20083..e3c11ba 100644
--- a/t/re/reg_mesg.t
+++ b/t/re/reg_mesg.t
@@ -268,10 +268,11 @@ my @death =
'/(?[\ |!])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[\ |!{#}])/', # [perl #126180]
'/(?[()-!])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[()-!{#}])/', # [perl #126204]
'/(?[!()])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[!(){#}])/', # [perl #126404]
- '/\w{/' => 'Unescaped left brace in regex is illegal {#} m/\w{{#}/',
- '/\q{/' => 'Unescaped left brace in regex is illegal {#} m/\q{{#}/',
- '/:{4,a}/' => 'Unescaped left brace in regex is illegal {#} m/:{{#}4,a}/',
- '/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal {#} m/xa{{#}3\,4}y/',
+ '/\w{/' => 'Unescaped left brace in regex is illegal here {#} m/\w{{#}/',
+ '/\q{/' => 'Unescaped left brace in regex is illegal here {#} m/\q{{#}/',
+ '/\A{/' => 'Unescaped left brace in regex is illegal here {#} m/\A{{#}/',
+ '/:{4,a}/' => 'Unescaped left brace in regex is illegal here {#} m/:{{#}4,a}/',
+ '/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal here {#} m/xa{{#}3\,4}y/',
'/abc/xix' => 'Only one /x regex modifier is allowed',
'/(?xmsixp:abc)/' => 'Only one /x regex modifier is allowed {#} m/(?xmsixp{#}:abc)/',
'/(?xmsixp)abc/' => 'Only one /x regex modifier is allowed {#} m/(?xmsixp{#})abc/',
@@ -621,6 +622,16 @@ my @experimental_regex_sets = (
);
my @deprecated = (
+ '/^{/' => "",
+ '/foo|{/' => "",
+ '/foo|^{/' => "",
+ '/foo({bar)/' => "",
+ '/foo(:?{bar)/' => "",
+ '/\s*{/' => "",
+ '/a{3,4}{/' => "",
+ '/.{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/.{{#}/',
+ '/[x]{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/[x]{{#}/',
+ '/\p{Latin}{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/\p{Latin}{{#}/',
);
for my $strict ("", "use re 'strict';") {
--
2.5.0
|
From zefram@fysh.orgKarl Williamson wrote:
Presumably should be /foo(?:{bar)/. -zefram |
From @khwilliamsonI have now committed the patch to blead as 8e84dec and closing this ticket |
@khwilliamson - Status changed from 'open' to 'pending release' |
From wolf-dietrich_moeller@t-online.deCreated by wolf-dietrich_moeller@t-online.deSeeing bug report 130497 on deprecation of "Unescaped left brace in regex", I noticed that the current Perl (5.24.1) does not always give the warning The test program below only warns in line 2 (regex 1), I think it should also warn in line 3. Or would this still be allowed in #### test program #### #### output test program #### Perl Info
|
From @xenuOn Tue, 7 Mar 2017, at 17:52, Wolf-Dietrich Moeller wrote:
It's a duplicate of |
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonOn 03/07/2017 01:31 PM, Tomasz Konojacki wrote:
Merged. |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been Perl 5.26.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#128213 (status was 'resolved')
Searchable as RT128213$
The text was updated successfully, but these errors were encountered: