New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert in /\p^ / #14544
Comments
From @hvdsAFL (<http://lcamtuf.coredump.cx/afl/>) finds this: % ./perl -Ilib -ce '/\p^ /' (I haven't yet looked at the utf8_heavy warnings, only the coredump.) I'm not sure if (for example) /\p^L/ is intended to be supported, but the length handling in regcomp.c is suspect for this case: after \p if we see '{', we set the length (UV n) to be the number of characters to the matching close brace, else we set it to 1 (regcomp.c:14177). If we then see '^' we set a flag and decrement n, and skip past additional whitespace further decrementing n as we go. If we then get an error, we can end up passing a negative (wrapped to large unsigned) n as the length. My guess is we want to support /\p^L/ but not /\p^ L/; the diff below is a start towards that, but it's not sufficient - I think we need to move the parsing out of the !SIZE_ONLY guard, or we can't be sure to continue at the right point. Hugo /* We will handle any undefined properties ourselves */ if (UCHARAT(RExC_parse) == '^') { + if (braced) { |
From @hvdsOn Thu Feb 26 17:52:20 2015, hv wrote:
I now looked at the warnings: in SWASHNEW, we check $type is non-empty (in fact TRUE) before stripping whitespace; if it consists only of whitespace, we later end up with undef $property and $table. Either of the following one-line insertions seem to be enough to avoid the problem, but I'm not sure if either is actually the right thing - Karl, do you have an opinion on this? In the above case, the initial $type is " \n". Hugo --- a/lib/utf8_heavy.pl $type =~ s/^\s+//; # regcomp.c surrounds the property name with '__" and '_i' if this --- a/lib/utf8_heavy.pl |
From @hvdsOn Thu Feb 26 17:52:20 2015, hv wrote:
I'm not at all sure about that, and the docs are coy - the only mention I can find of using a caret to invert a property is in perlunicode: A CPAN grep shows braceless \p^x being tested by ShiftJIS::Regexp, and documented in passing by its pod examples, but didn't show any other uses. Permitting it, but not skipping whitespace after the caret, results in behaviour I don't understand - I think it's successfully matching some kind of null property, so that /\p^ / ends up roughly equivalent to /./s. So maybe it is better to skip whitespace after all; on that assumption I've pushed the branch hv/braceless-property for review, with one commit for the utf8_heavy warnings and a second for the parse issues. Hugo |
From @khwilliamsonOn 02/27/2015 06:48 AM, Hugo van der Sanden via RT wrote:
I had noticed lately various issues with \p parsing, but thought it too First, white space is supposed to be significant in patterns except Also, the white space accepted includes vertical space. I don't think \p ^ L with or without braces. I think those isSPACE calls should be isBLANK Another issue is probably endemic through perl, and that is the use of /\p{foo # This was supposed to be a comment This actually compiles, and when you match against it, you get It thought this was a user-defined property (with a multi-line name, More reasonable things are even problematic: \p{Any=Y} might make sense strchr() leads to this kind of problem which exists in other constructs To go back to your original question. We need to decide exactly what |
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonThe main issue was fixed by -- |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for submitting this report. You have helped make Perl better. Perl 5.24.0 may be downloaded via https://metacpan.org/release/RJBS/perl-5.24.0 |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#123946 (status was 'resolved')
Searchable as RT123946$
The text was updated successfully, but these errors were encountered: