New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New type of 'word boundary' - true when not in the middle of a word #15219
Comments
From @epaCreated by @epaWhen doing a search-and-replace you may wrap the regular expression in However, you may not know ahead of time whether your source regexp is /\b x[(][)] \b/x # will fail to match 'x()-1' or 'x()' What you need to do instead is something like say 'please enter source and replacement strings:'; These (?:\\b|(?!\\w)) and (?:\\b|(?<!\\w)) incantations are useful - at start of string (Subjective experience: this has come up a couple of times, and the FWIW, the different definition \b{wb} works means that it does not Perl Info
|
From @khwilliamsonOn Mon Mar 07 10:16:47 2016, eda@waniasset.com wrote:
One type of bound that was considered but hasn't been implemented because it was too close to the wb one is to break at \s/\S boundaries. This is trivial to implement, and since wb is more suitable for natural language processing, this might not be as close to it as we thought. That is, it might actually be useful for processing things like code -- |
The RT System itself - Status changed from 'new' to 'open' |
From @epaI agree that \s\S boundaries might be a useful thing to support. However it If a 'not in the middle of a word' anchor is thought too obscure to go into /\bhello\b/ # OK If \y is an anchor which matches at start, end, or next to any \W, then /\yhello\y/ # also OK So in the common case of matching a literal word and protecting its start or (Above I am assuming that 'what you want' is to avoid 'the Scunthorpe problem' -- |
From @demerphqOn 7 March 2016 at 19:16, Ed Avis <perlbug-followup@perl.org> wrote:
I don't know about that. It is not clear to me that that *is* actually On the other hand a better solution for this would be useful.
That is interesting. I would like to hear more opinions from people on Yves |
From @arcdemerphq <demerphq@gmail.com> wrote:
FWIW, I note that the reporter of #127436 (which was rejected because That said, I don't feel in a position to judge how commonplace this need is. -- |
From @epa
The way I see it, it is the current behaviour of \b which is quite unusual and unlikely to be needed. /\yhello\y/ and there are also the more subtle uses where the regexp is not known in advance, or doesn't end in a word character (as #127436 which you mentioned), which can be handled correctly with \y but are difficult to do or buggy with \b. In other words, to turn it the other way around, if perl5 already had a \y anchor and somebody proposed adding \b with its current semantics, I think it would be judged not commonly used enough to be worth adding. Of course that doesn't decide the issue of where to go from here, since \b is already well established, but it can be another way of looking at the question. \b{wb}, on the other hand, does do what you want in most situations so perhaps it can take most of the work of the proposed \y. One question is whether all the Unicode stuff it does will slow down matching in the ASCII-only or 8-bit-only case. This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 40 Berkeley Square, 3rd Floor, London, W1J 5AL. It is authorised and regulated by the Financial Conduct Authority. |
From @demerphqOn 13 March 2016 at 21:07, Ed Avis <eda@waniasset.com> wrote:
Just curious, what does Larry say about this in regard to Perl6? Yves |
From @epademerphq <demerphq <at> gmail.com> writes:
I don't know Perl 6 but this is what I found: << matches a left word boundary: it matches positions where there is a
From the sound of it, this reproduces the problem with \b, in that you cannot /<<$pattern>>/ and get 'whole word' matching that works for patterns like /hello3/ or /123/. -- |
From @AbigailOn Sun, Mar 13, 2016 at 08:07:34PM +0000, Ed Avis wrote:
I feel "not between word-characters" is a bit specific. Next there will If this were implemented, I'd rather see someway of taking an argument For instance: \y{w} # Match between word characters etc. Abigail |
From @demerphqOn 17 March 2016 at 17:29, Abigail <abigail@abigail.be> wrote:
If it was \y{PAT} then we could make it turn into (?(?=\w)(?<!\w)) (or one of various other possible implementations of this) So instead of being \y{w} it would be \y{\w}. Which might not be too But again, is \y{} worth this? You could do: my $y= qr/ /(&y)foo(&y)$y/ And we don't need to burn the \y. Anyway, im fine with looking into an implementation of this, but I do Yves -- |
From gm@cruft.deFrom the keyboard of demerphq [17.03.16,19:08]:
Note that "\x{F00}" is the holy syllable OM in tibetan writing. There SCNR, -- |
Migrated from rt.perl.org#127670 (status was 'open')
Searchable as RT127670$
The text was updated successfully, but these errors were encountered: