New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/n regexp modifier and backreferences to previous groups #15199
Comments
From @epaCreated by @epaThe /n regexp modifier, according to perlvar, 'will stop $1, $2, So for example, with the current behaviour: % perl -E '$_ = "aa"; /(a)(?-1)/ or die; say $1 // "undef"' This applies too if the modifier is set within a part of the regexp: % perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"' I would prefer it to still allow referring to the group within the % perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"' Although this would be a change to the current semantics, it is more Now I will give a bit of background about why I this would be useful. $lang_re = qr/a+/; I may define this regexp in a library and then use it in client code /\A ($lang_re) ([0-9]) \z/x or die; Now suppose I change the definition of the language so that valid $lang_re = qr/ ( a+ | < (?-1) > ) /x; (For this trivially simple language there may be other ways to do it To avoid adding a new externally visible capturing group I would like $lang_re = qr/ (?n: ( a+ | < (?-1) > ) ) /x; The intention is that while $lang_re may use a recursive subpattern Although using named captures everywhere mitigates the problem it does I think that changing the semantics of /n, so that it stops (There may also be room for a regexp modifier X which hides groups (FWIW, the real code which prompted this is a regexp library to match a Perl Info
|
From @hvds"Ed Avis" (via RT) <perlbug-followup@perl.org> wrote: In the original discussion, AFAIR, the intent was solely to provide I suspect the reinterpretation you propose would be problematic, in It may be that we need a distinct new feature that delineates some (I can imagine value, at least for named captures, in modelling this Hugo |
The RT System itself - Status changed from 'new' to 'open' |
From @tamiasOn Fri, Feb 26, 2016 at 04:13:42AM -0800, Ed Avis wrote:
n Prevent the grouping metacharacters () from capturing. This modifier, new "Prevent the grouping metacharacters () from capturing." It seems to me this is working as intended. Ronald |
From @epaRonald J Kimball <rjk <at> tamias.net> writes:
This bug report is about pattern references, not capturing. The syntax $_ = 'ab'; This succeeds, with the (?1) call matching the string 'b', even though the I understand that /n turns off capturing. That is what is documented. Actually, I see that the documentation of recursive subpatterns in perlre FWIW, there is also an interesting effect where adding /n to an existing $_ = 'aa'; /(.)(?1)/n; # fails at compile time with a useful error Again, this breaks composability of regexps. Suppose the implementation of -- |
From @epa<hv <at> crypt.org> writes:
Yes, I can see that is the current behaviour of /n. If you want to keep As I mentioned in another message, this behaviour of /n is a bit dangerous. use Some::Regexp::Library; where the programmer has used /n as a shortcut to avoid writing (?:...) If you do want /n as a syntactic convenience to save on writing ?: then (By the way, this all applies to backreferences like \1 as well as I think it would be more useful to guarantee that /n does not change the Alternatively, create a new regexp syntax (?;...) which stops external
I agree but I'm trying to avoid getting into that, and trying to keep -- |
From zefram@fysh.orgEd Avis wrote:
Only if $RE{foo} is a string of regexp source and doesn't set the flags -zefram |
From @epaZefram <zefram <at> fysh.org> writes:
There is some distinction to be made between /i and more wacky modifiers like
Apologies, this is true. I had somehow got the idea that even qr// would be I think there is still an unhappy semantics for /n currently and it is this. my $library_re = qr/a+/; # Client code Now in the new version my $library_re = qr/(a+|<(?-1)>)/; But this breaks the client code since $1 is no longer the same. I would Suppose a new syntax (?;...) disables capturing, but still allows relative -- |
From zefram@fysh.orgEd Avis wrote:
No, that's a half-assed approach, with composability problems as (?;~NAME,NAME,...;PATTERN) (?;=NAME;PATTERN) (?;&NAME) (?;*NAME) The group names used here do not interact with the names used by -zefram |
From @epaNow that {} are special characters in regexps, can they be used to introduce I agree with your proposal, just wonder if a more readable syntax is possible. The other thing I might suggest is that perhaps 'goto' is a more flexible -- |
From zefram@fysh.orgEd Avis wrote:
I'd rather not overload them further. But in any case it wouldn't get
That would be a really bad idea. /x has a well-defined behaviour that
The exact character sequence introducing each item is up for tweaking, (*groupscope:NAME,NAME,...:PATTERN)
Less flexible in general, I'd say, because you can't build recursion -zefram |
From @demerphqOn 27 February 2016 at 22:30, Zefram <zefram@fysh.org> wrote:
I just want to point out that there two problems here. First is the Yves -- |
From @wolfsageOn Sat, Feb 27, 2016 at 2:31 AM, Ed Avis <eda@waniasset.com> wrote:
From perldoc perlre: n Prevent the grouping metacharacters "()" from capturing. This "hello" =~ /(hi|hello)/; # $1 is "hello" This is equivalent to putting "?:" at the beginning of every capturing "hello" =~ /(?:hi|hello)/; # $1 is undef Is there other places that this note needs to be added? -- Matthew Horsfall (alh) |
From @epaNote that .NET regular expressions have a possible solution to the
Is this something that would be easy to implement in Perl? The compatibility -- |
From @demerphqOn 6 June 2016 at 14:15, Ed Avis <eda@waniasset.com> wrote:
We dont keep a stack of matches for things like /(a)+/. If we to start doing so IMO it would have to be via a new modifier Yves |
From @iabynOn Tue, Jun 07, 2016 at 12:38:40PM +0200, demerphq wrote:
Although sometimes we do! CURLYX quantifiers push a WHILEM entry for every match, so at the end of CURLYX,WHILEM,WHILEM,WHILEM,WHILEM Each whilem struct could be made to contain the last start and end IIRC, all quantifiers start off as CURLYX, and then are optimised down to Of course it will make that part of the regex execution slower, but only -- |
From @demerphqOn 7 June 2016 at 15:08, Dave Mitchell <davem@iabyn.com> wrote:
Yes, true.
I guess this a simplification? perl -Mre=Debug,ALL -e'"abcd"=~/(\w+?)+/' produces this backtracking stack: #9 WHILEM_B_max yes
+ and * on simple objects does not go through CURLYX. $ perl -Mre=Debug,ALL -e'"abcd"=~/a+/'
I fear this sounds easier than it is. Yves -- |
From @iabynOn Tue, Jun 07, 2016 at 03:40:13PM +0200, demerphq wrote:
Yes sorry I was oversimplifying - I was only listing the contribution
But we're only referring to quantifiers for (?<foo>....), and in $ p -Mre=Debug,ALL -e'qr/(?<foo>a)+/'
I'm sure you're right, and I'm certainly not volunteering! -- |
Migrated from rt.perl.org#127617 (status was 'open')
Searchable as RT127617$
The text was updated successfully, but these errors were encountered: