New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash with unicode characters in regex comment #8656
Comments
From h.f.s.@web.deThis is a bug report for perl from h.f.s.@web.de, Hi, perl crashes when I use an umlaut in a comment to a regex in a This is the smallest example is was able to create: #!/usr/bin/perl The version of Parse::RecDescent is 1.94 from CPAN. The error message is: The stacktrace is: This was with my system perl version. Continue below... Flags: Site configuration information for perl v5.8.8: Configured by Debian Project at Sun Aug 6 15:47:28 UTC 2006. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Locally applied patches: @INC for perl v5.8.8: Environment for perl v5.8.8: I locally compiled perl v5.9.4 and was able to reproduce the problem The stacktrace: Program received signal SIGSEGV, Segmentation fault. Regards, Site configuration information for perl 5.9.4: Configured by hermi at Wed Nov 1 18:16:47 CET 2006. Summary of my perl5 (revision 5 version 9 subversion 4) configuration: Locally applied patches: @INC for perl 5.9.4: Environment for perl 5.9.4: |
From guest@guest.guest.xxxxxxxxI investigated a bit further. With both perl 5.8.8 and 5.9.4 I guess the problems are related. The backtrace is: Program received signal SIGABRT, Aborted. |
The RT System itself - Status changed from 'new' to 'open' |
From guest@guest.guest.xxxxxxxxWhoops, I must have things seriously mixed up to think that -u had anything to perl -e 'eval "m/ää/;"' Program received signal SIGABRT, Aborted. Please excuse the noise. |
From @andk
> use encoding 'utf8'; There are known bugs in C< use encoding 'utf8'; > which usually go That said, I cannot reproduce your bug with any of 10 different -- Summary of my perl5 (revision 5 version 9 subversion 5) configuration: Characteristics of this binary (from libperl): |
From @demerphqOn 11/2/06, Andreas J. Koenig <andreas.koenig.gmwojprw@franz.ak.mind.de> wrote:
This rings a bell for me, didnt we have an issue related to this Note that backtrack stack, the error is NOT being emitted by the regex cheers, |
From BQW10602@nifty.comOn Thu, 02 Nov 2006 03:45:34 +0100, (Andreas J. Koenig) wrote
The following causes crash on perl 5.8.1 and later #!perl ... and I have often (but not always) encountered such a message: Regards, |
From BQW10602@nifty.comOn Thu, 2 Nov 2006 09:23:53 +0100, demerphq wrote
I think a chunk (see below) at the bottom of S_regatom() 6381: if (ret && PL_encoding && PL_regkind[OP(ret)] == EXACT) { ä encoded in 'utf8' is downgraded by sv_utf8_downgrade() at line 6387, I don't see what this chunk is for. Regards, |
From BQW10602@nifty.comOn Thu, 02 Nov 2006 22:55:23 +0900, SADAHIRO Tomoyuki wrote
I think the below example must clarify the circumstances. #!perl use encoding 'utf8'; use Devel::Peek; While sv_utf8_upgrade() is encoding-aware, sv_utf8_downgrade() isn't Regards, |
From shouldbedomo@mac.comOn 2006–11–02, at 13:57, SADAHIRO Tomoyuki wrote:
That's interesting: the original bug report was using an extended On 2006–11–01, at 21:40, Hermann Schwarting (via RT) wrote:
Firstly, with 32-bit debugging bleadperl@29183 on Mac OS X, Hermann's 32-bit_perl-current$ ./perl -Ilib -I/usr/local/lib/perl5/site_perl/ my $parser = testParser(); sub testParser { test : /\. # Gebäude my $parser = testParser(); sub testParser { test : /\. # Gebäude This may be due to architectural differences between Mac OX on PPC Secondly, making your regex extended stops perl misbehaving for me, 32-bit_perl-current$ ./perl -w -Ilib (gdb) b szone_error Breakpoint 1, 0x90114024 in szone_error () I hope that the gdb output means something to somebody. Please let me |
From BQW10602@nifty.comOn Thu, 2 Nov 2006 18:24:54 +0100, Dominic Dunlop wrote
No, I don't think it is different. Actually Hermann's code I think another bug in Parse::RecDescent is involved in this. #!perl This causes weird messages: Variable "$errortext" is not available at Parse/RecDescent.pm line 2917. The reason is that the invalid pattern in the comment is Parse::RecDescent 1082-1091 '' =~ m$ldel$pattern$rdel" and $@) If Parse::RecDescent::_warn works well, the warning should be Token pattern "m/\. # [c-a] Wrongly the modifiers are neglected in the evalled code. Inline Patchdiff -urN Parse-RecDescent-1.94/lib/Parse/RecDescent.pm Parse-RecDescent-new/lib/Parse/RecDescent.pm
--- Parse-RecDescent-1.94/lib/Parse/RecDescent.pm Wed Apr 09 17:29:37 2003
+++ Parse-RecDescent-new/lib/Parse/RecDescent.pm Fri Nov 03 12:27:33 2006
@@ -1081,9 +1081,9 @@
if (!eval "no strict;
local \$SIG{__WARN__} = sub {0};
- '' =~ m$ldel$pattern$rdel" and $@)
+ '' =~ m$ldel$pattern$rdel$mod" and $@)
{
- Parse::RecDescent::_warn(3, "Token pattern \"m$ldel$pattern$rdel\"
+ Parse::RecDescent::_warn(3, "Token pattern \"m$ldel$pattern$rdel$mod\"
may not be a valid regular expression",
$_[5]);
$@ =~ s/ at \(eval.*/./;
Regards,
SADAHIRO Tomoyuki |
From @andk(TheDamian added to CC)
> On Thu, 2 Nov 2006 18:24:54 +0100, Dominic Dunlop wrote > diff -urN Parse-RecDescent-1.94/lib/Parse/RecDescent.pm Parse-RecDescent-new/lib/Parse/RecDescent.pm I've also put the patch on CPAN as file: $CPAN/authors/id/A/AN/ANDK/patches/Parse-RecDescent-1.94-SADAHIRO-01.patch.gz This is not to steal you the show, just a test bed for the upcoming -- |
From BQW10602@nifty.comOn Thu, 02 Nov 2006 22:55:23 +0900, SADAHIRO Tomoyuki wrote
How to crash perl in this case: 1. When SIZE_ONLY is true, regcomp.c, 6379-6404 When the above chunk is removed, following tests in ext/Encode/t/encoding.t print "not " unless "\x{3AF}" =~ /\xDF/; print "not " unless "\xDF" =~ /\xDF/; Perhaps it is necessary to interpret \xHH in regex as encoded Note: real literals like /Gebäude/ are converted to unicode P.S. though the fix is not yet, a test suite for this report Regards, |
From BQW10602@nifty.comOn Fri, 03 Nov 2006 06:45:04 +0100, Andreas J. Koenig wrote
Thank you. I'm wondering how this module is maintained currently. Regards, |
From BQW10602@nifty.comOn Fri, 03 Nov 2006 16:23:13 +0900, SADAHIRO Tomoyuki wrote
Here is a patch (encoding.patch.gz). - if PL_encoding, regatom() recodes only octal and hexadecimal escapes. Though these changes concern the functionality of encoding.pm Regards, |
From BQW10602@nifty.comOn Sat, 04 Nov 2006 21:02:10 +0900, SADAHIRO Tomoyuki wrote
Then the patch is renewed, encoding2.gz - Test suites in the above patch emit messeges including non-ASCII Therefore the difference is only in messeges from test suites. Regards, |
From @demerphqOn 11/4/06, SADAHIRO Tomoyuki <bqw10602@nifty.com> wrote:
Sadahiro San++ Yves -- |
From @TuxOn Sat, 04 Nov 2006 21:53:50 +0900, SADAHIRO Tomoyuki <bqw10602@nifty.com>
All tests successful (1 subtest UNEXPECTEDLY SUCCEEDED), 39 tests and 340 subtests skipped.
-- |
@rgs - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#40641 (status was 'resolved')
Searchable as RT40641$
The text was updated successfully, but these errors were encountered: