New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing regex test that dies with "panic: pp_match start/end pointers" #10194
Comments
From @avarCreated by @avarIt's possible to make Perl die in pp_match by manually setting the See this message on perl5-porters for the full report: This is just an RT copy, as requested. Perl Info
|
From marvin@rectangular.comWhat's the proposed remedy? The behavior we have now is the behavior I Munging the internal data of SVf_UTF8 strings isn't an easy mistake to I don't know about this specific case, but in general, always being The time to validate UTF-8 data is on input. |
1 similar comment
From marvin@rectangular.comWhat's the proposed remedy? The behavior we have now is the behavior I Munging the internal data of SVf_UTF8 strings isn't an easy mistake to I don't know about this specific case, but in general, always being The time to validate UTF-8 data is on input. |
The RT System itself - Status changed from 'new' to 'open' |
From @avarOn Tue, Feb 23, 2010 at 05:19, Marvin Humphrey via RT
The core issue here seems to be that when this data gets passed into As noted in the bug this initially happened on a program that wasn't sub irc_to_utf8 { Unfortunately I haven't been able to reproduce that again.. |
From marvin@rectangular.comOn Tue, Feb 23, 2010 at 05:31:47AM +0000, Ævar Arnfjörð Bjarmason wrote:
But that's because the scalar is in an invalid state. It's not something
I submit that that's where the fixable bug lies, and that efforts should go There's room for one possible improvement within Perl itself: the "panic" Otherwise, I vote to close this bug as a "won't fix". Marvin Humphrey |
From @avarOn Tue, Feb 23, 2010 at 20:53, Marvin Humphrey <marvin@rectangular.com> wrote:
I'm not familiar with the UTF-8 bits of the regex engine but why is
Here's the short testcase that doesn't do anything too evil that I was $ perl -e 'print "\xe2\x90"' | perl -MDevel::Peek -CI -E 'my $in = The problem with saying that you shouldn't turn on the UTF-8 flag on I don't think it's acceptable that perl die this way. |
From ben@morrow.me.ukQuoth avarab@gmail.com (=?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?=):
As currently implemented pushing a :utf8 layer (as opposed to an Ben |
From marvin@rectangular.comOn Wed, Feb 24, 2010 at 12:27:38AM +0000, Ævar Arnfjörð Bjarmason wrote:
I don't know the regex engine well myself, but I have written a certain amount However, if you don't do that and you blithely keep on going, it's easy to get Here's the code in pp_hot.c that dies: if (RX_OFFS(rx)[i].end < 0 || RX_OFFS(rx)[i].start < 0 || That's a post-operation sanity check to ensure that the match points are
It's not safe to use -CI with data that is not known to be valid UTF-8. To mark FILEHANDLE as UTF-8, use :utf8 or :encoding(utf8) . :utf8 just
I don't think what you want can be achieved without slowing Perl unicode string Instead, validate on input. Marvin Humphrey |
From @ikegamiOn Tue, Feb 23, 2010 at 8:49 PM, Marvin Humphrey <marvin@rectangular.com>wrote:
That sounds like a bug. How much of a penalty is it to do a basic check on |
From @ikegamiOn Wed, Feb 24, 2010 at 1:37 PM, Eric Brine <ikegami@adaelis.com> wrote:
Note that IO stream checks could take advantage of algorithms that work with |
From ben@morrow.me.ukQuoth ikegami@adaelis.com (Eric Brine):
Not if you're expecting to be able to read the output of perl -CO you Ben |
From @avarOn Wed, Feb 24, 2010 at 22:57, Ben Morrow <ben@morrow.me.uk> wrote:
Internally processing the stream as UTF-32 doesn't mean you'd output as UTF-32. |
From ben@morrow.me.ukAt 11PM +0000 on 24/02/10 Ævar Arnfjörð Bjarmason wrote:
I wasn't clear. Since perl can represent codepoints greater than 2**32 Ben |
From @ikegamiOn Wed, Feb 24, 2010 at 6:21 PM, Ben Morrow <ben@morrow.me.uk> wrote:
First, I didn't mention :utf8. :utf8 is an interface between a PerlIO layer
The purpose of IO is to interact, and outputting illegal UTF-8 doesn't In fact, we already have warnings for some illegal charaters, so we're not $ perl -C -we'print chr 0xFFFF' | od -c But we're not outputting UTF-8 either: $ perl -C -we'print chr 0x200_000' | od -c |
From @khwilliamsonEric Brine wrote:
I feel the need to point out that this character and similar are no A set of cooperating applications can accept these characters in I/O,
When I do an od -x instead, I get I think this is the correct utf8-like byte sequence for this code point. |
From ben@morrow.me.ukQuoth public@khwilliamson.com (karl williamson):
In perl Encode terms, 'UTF-8' means the encoding as defined by the Ben |
This should be automatically fixed when https://github.com/Perl/perl5/pull/19121 is finally merged |
Migrated from rt.perl.org#73018 (status was 'open')
Searchable as RT73018$
The text was updated successfully, but these errors were encountered: