New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile-time check for $1, $2 on RHS of s/// operator #15604
Comments
From @epaCreated by @epaCurrently this gives a run-time warning, but no compile-time error: use 5.022; If the regexp on the LHS of s/// is known at compile time, then the Perl Info
|
From @AbigailOn Fri, Sep 16, 2016 at 04:15:13AM -0700, Ed Avis wrote:
"use strict" never promotes a 'Use of uninitialized value' to an Abigail |
The RT System itself - Status changed from 'new' to 'open' |
From @maukeAm 16.09.2016 um 23:36 schrieb Abigail:
I agree. It's not that $2 doesn't exist there; it just contains undef. 'strict' I would like to see it as a warning, but I don't know how hard it would -- |
From @cpansproutOn Fri Sep 16 20:20:57 2016, plokinom@gmail.com wrote:
You would also have to take into account that a regexp match might occur within the replacement code; e.g., s/()/ m|()()|; $2/e. Not likely, but something similar could happen in complex code. (I’ve written s/// thingies that go on for fifty lines!) -- Father Chrysostomos |
From @epaTo be clear, I was referring only to literal regexp matches where both the LHS and RHS are known at compile time, that is, without /e. There may be complex cases where even a fixed regexp doesn't leave it entirely clear how many capturing buffers will exist, and it makes sense to ignore those cases (they can continue to warn at runtime). -- On 17 Sep 2016, at 06:12, Father Chrysostomos via RT <perlbug-followup@perl.org<mailto:perlbug-followup@perl.org>> wrote: On Fri Sep 16 20:20:57 2016, plokinom@gmail.com<mailto:plokinom@gmail.com> wrote: It's not that $2 doesn't exist there; it just contains undef. 'strict' I would like to see it as a warning, but I don't know how hard it would You would also have to take into account that a regexp match might occur within the replacement code; e.g., s/()/ m|()()|; $2/e. Not likely, but something similar could happen in complex code. (I've written s/// thingies that go on for fifty lines!) -- Father Chrysostomos ______________________________________________________________________ |
From @epaLukas M, Thanks for your reply. It is true that the $2 variable can always be read, even if no regexp match has happened so far. I guess the point is that the capture buffer number 2 does not exist. Note that sed, for example, does give a hard error if you try to refer to a capturing group that doesn't exist: % sed -E 's/(a)/\1\2/' (at least for GNU sed). But I don't want to get bogged down in the theological question of whether $2 in the RHS can be said to 'exist', whether such a check would be an extension of 'use strict', or indeed whether it should be a warning or an error. Simple substitutions such as s/a(b)c/$1/ are a commonly used part of Perl, and it would be great to give the programmer more help in avoiding mistakes in them, as sed and other tools do. -- This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 9th Floor Orion House, 5 Upper St Martin’s Lane, London, WC2H 9EA. It is authorised and regulated by the Financial Conduct Authority. |
From @demerphqOn 17 Sep 2016 3:54 a.m., "Ed Avis" <eda@waniasset.com> wrote:
I am less convinced by this line of reasoning than others.
But I think that is precisely the discussion we should have. Currently we cannot distinguish $2 is undef because it did not match and $2 In theory we can distinguish these cases and throw an additional warning. But we can't do this at compile time, as we cannot tell in advance of regex s/$qr/$1 ? $3 : $2 /e We have no way to know if we will inappropriately access a missing $3 until
Agreed. It would actually be trivial to make us generate an additional Yves |
From @maukeAm 17.09.2016 um 09:00 schrieb Ed Avis:
Even with /e the RHS is known at compile time. But /e is a redundant s/.../${ m|()()|; \"it's $2" }/ That's why I think detecting the error case reliably may be harder than
-- |
From @demerphqOn 17 September 2016 at 18:04, Lukas Mai <plokinom@gmail.com> wrote:
I agree. IMO detecting in advance cannot be done. s/$qr/$1 ? $2 : $3/e there is no way for us to know at Perl compile time what $qr will However we do know this at fetch time, and could easily throw a So for instance for code like defined $5 we could throw a warning like "use of capture buffer not in scope" and for code like print "$5"; we might throw both that warning and the fact that the resulting value Yves |
From @cpansproutOn Sat Sep 17 10:00:12 2016, demerphq wrote:
That would, unfortunately, give false positives. If the pattern is supplied by the user, or the calling code, then ‘defined $1‘ is a perfectly reasonable way to ask whether the pattern had a () and whether it participated in the match. While I agree that a warning would be helpful, I generally find false positives to be so annoying that I slap a ‘no warnings’ on my code in the affected area, just because I’m too frustrated to look up the category that needs to be disabled. I can’t imagine I’m the only one who feels that way. (Or am I unique? :-) The result is that other problems with the code will not be flagged by perl. -- Father Chrysostomos |
From @demerphqOn 17 September 2016 at 19:11, Father Chrysostomos via RT
Yes, I agree. I said could/might more or less for these reasons. It is OTOH I think it would be cool if we returned an "undef" that somehow
Yeah i think this is all too common. Warning/alarm fatigue. Yves -- |
From @AbigailOn Sat, Sep 17, 2016 at 10:11:30AM -0700, Father Chrysostomos via RT wrote:
No, you are certainly not unique in that aspect. False positives are bad. It makes people turn off warnings. Abigail |
From @epaLukas M. pointed out that even without the /e flag, a regexp match can evaulate code on the RHS, using the ${ ... } construct. So could I refine or clarify the proposal a bit. In a regular expression match of the form s/lhs/rhs/, where - lhs is a fixed string known at compile time (so it does not contain interpolated variables), the parts of rhs which are known can be checked to see if they are the capture variables $1, $2 and so on, and a use of one of these variables which is greater than the upper limit of the numbered capture buffers can raise an error at compile time (or a warning - it doesn't much matter which). For example, if you have s/hello (\w+)/hello, Mr $1, my name is $me/g; the right hand side is a doublequoted string equivalent to the concatenation of these four expressions 'hello, Mr ' The left hand side, when parsed, has at most one numbered capture. So at the end of the regexp you can imagine that adding (?1) would be legal but (?2) would raise a 'Reference to nonexistent group' error. Now all the four components of the RHS can be checked. Of them, only $1 is a numbered capture variable, and it's allowed. By contrast using $2 in the RHS would be an error or warning, just as (?2) in the LHS would be an error. Explicitly not checked are cases such as s/$my_regexp/foo/. I suggest that the same rule the regexp parser currently uses to number the capture buffers, and so raise the 'Reference to nonexistent group' error, could be applied to decide whether a capture variable in the replacement text is legal. I hope that this would be conservative, and not give false positives. An unwanted diagnostic would arise only in cases where existing code used a nonexistent capture group and relied on the old behaviour of getting an empty string - but this currently warns at runtime. -- This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 9th Floor Orion House, 5 Upper St Martin’s Lane, London, WC2H 9EA. It is authorised and regulated by the Financial Conduct Authority. |
Migrated from rt.perl.org#129283 (status was 'open')
Searchable as RT129283$
The text was updated successfully, but these errors were encountered: