New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow variable length lookbehind for folded #16212
Comments
From @khwilliamsonCreated by @khwilliamsonThis is a bug report for perl from khw@cpan.org, ----------------------------------------------------------------- Perl Info
|
From jsailor@techhouse.orgCreated by jsailor@techhouse.orgRunning perl -le 'print if /h(?<!ssh)base/i' produces the error Variable length lookbehind not implemented in regex m/h(?<!ssh)base/ at -e line 1. which is odd, because these work fine perl -le 'print if /h(?<!xxh)base/i' works fine. Bug report is on a debian system running 5.20, but it also reproduces on 5.16 Summary of my perl5 (revision 5 version 16 subversion 3) configuration: Characteristics of this binary (from libperl): Perl Info
|
From @tonycozOn Tue, 30 Oct 2018 14:12:05 -0700, jsailor@techhouse.org wrote:
I believe it's the "ss", which can match ß (\xdf, LATIN SMALL LETTER SHARP S), which makes it variable length. Some ligatures can cause the same problem: $ perl -E 'qr/h(?<!ff)base/i' If you only need to match ASCII case-insensitively you can use the /aa flag: $ perl -E 'qr/h(?<!ssh)base/aai' If you're perl is too old for the aa flag, I think you're stuck with using character classes and no /i flag. Tony |
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamson#133630 is a duplicate of #132367. I have merged them together I would like to fix this in 5.30. But I need some guidance and/or collaboration from someone like Yves or Hugo to get started. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @hvdsOn Wed, 31 Oct 2018 10:07:14 -0700, khw wrote:
I'd be happy to collaborate, but I'm not confident I currently know any viable approach (and in particular I don't know what algorithm Yves had in mind in The rough classes of direction I can think of are: FWIW, I had another use-case a couple of days ago where an elegant solution to a problem would have required a different special case, a lookbehind that was variable-width but anchored to start. Things like that, and cases where you want literal match of something unknown at compile time (eg a capture), would probably be simple extensions of what we have now, and easily fit in the B-3b approach. I'd be tempted to try such an extension first as an exploratory measure, to get a better feel for what might be possible when the pattern is truly variable. Hugo |
From jsailor@techhouse.orgOn Tue, 30 Oct 2018, Tony Cook via RT wrote:
Yup, that's it all right. It's even listed as an example in perldiag, Though, I'm a little surprised that LANG=C LC_ALL=C perl -e 'qr/(?<!ss)/di' hits this. My expectation would be that, since /d means "to use the Feel free to close as RTFM I guess. ~jon. |
From @khwilliamsonOn 11/1/18 2:07 AM, Hugo van der Sanden via RT wrote:
Yves is apparently too busy to reply. I looked at this yesterday, and came up with a scheme that I believe The code that raises the variable-length lookbehind error is in the The two regnode types that deal with lookbehind are IFMATCH and UNLESSM. This field could be used to store the delta. At execution, regexec.c could use this delta to calculate minlen..maxlen I don't think B-4 is viable as I understand it. It doesn't work to I don't like option A. I'd rather not increase the cognitive load on B-2 seems better than B-1 to me, and apparently easy to implement. B-3 doesn't work, as the start point does vary, unless I misunderstand |
From @demerphqOn Mon, 12 Nov 2018 at 18:17, Karl Williamson <public@khwilliamson.com> wrote:
Guilt is a powerful motivator. :-)
This problem is definitely more tractable than supporting arbitrary
Yes, this is basically what I would do.
I think B-4 *is* viable, its just that it is something like a PHD
Agreed.
Agreed.
Agreed. And B-0 is a performance nightmare waiting to happen. I fully support B-2. I suspect there are some interesting question Good luck! Yves -- |
From @khwilliamsonOn 11/13/18 4:10 AM, demerphq wrote:
Having looked at the code some, my changes don't affect any area with
|
From @demerphqIt's just a hazy memory that capture buffers inside of look around Merry Christmas! On Thu, 27 Dec 2018, 00:50 Karl Williamson <public@khwilliamson.com wrote:
|
From wagnerc@plebeian.comOn Tue, 30 Oct 2018 15:53:01 -0700, tonyc wrote:
I think that a simple fix to this problem would be to require that the single unicode character be used to trigger variable length case folding. That way the appearance of a lower case ASCII sequence is always only interpreted as those ASCII characters. e.g. To get variable length case folding you must write e.g. Putting use re "/aa"; is another option. Thanks |
From @khwilliamsonTop posted; too late, blead now has variable length lookbehind On Mon, 18 Mar 2019 11:59:45 -0700, wagnerc@plebeian.com wrote:
-- |
From @khwilliamsonFixed by commit |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.30.0, this and 160 other issues have been Perl 5.30.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#132367 (status was 'resolved')
Searchable as RT132367$
The text was updated successfully, but these errors were encountered: