Skip Menu |
Report information
Id: 75680
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: BKB <benkasminbullock [at] gmail.com>
doug [at] ablegrape.com
hector [at] debian.org
Cc:
AdminCc:

Operating System: darwin
PatchStatus: (no value)
Severity: High
Type:
  • core
  • regex
  • Unicode
Perl Version: 5.8.9
Fixed In: (no value)



Subject: Malformed UTF-8 after Encode::decode (utf8, and regex $2, $3
Date: Thu, 24 Jul 2008 13:32:45 +0900
To: perlbug [...] perl.org
From: "Ben Bullock" <benkasminbullock [...] gmail.com>
Download (untitled) / with headers
text/plain 19.6k

Message body is not shown because it is too large.

Download (untitled) / with headers
text/plain 280b
This is a very much simplified version of the script which tripped the bug (five lines). I've also simplified the regex drastically until it trips the bug. Shortening the regex from this makes it print "OK" but as it stands the "Malformed UTF-8 character (fatal)" message appears.
Download tinytest.pl
text/x-perl 75b
#! perl use utf8; if ('•¶' =~ /(.*?)\s*([A-Z])/s) { print "OK"; }
Subject: Regression on regexp from 5.8 to 5.10
Date: Mon, 22 Mar 2010 11:11:40 +0100
To: perlbug [...] perl.org
From: Hector Garcia <hector [...] debian.org>
Download (untitled) / with headers
text/plain 8.5k
This is a bug report for perl from hector@debian.org, generated with the help of perlbug 1.39 running under perl 5.10.1. ----------------------------------------------------------------- [Please describe your issue here] executing this (which works correctly on perl 5.8 gives an error #!/usr/bin/perl -w use utf8; use encoding 'utf8'; my $p = 'á d</p>'; #my $p = 'す d</p>'; print "$p\n"; if ($p =~ m#(.*?)[-]?EFE\s*</p>$#gsm) { print "yes $1\n"; }else{ print "no\n"; } hector@baloo:/tmp$ ./kk.pl á d</p> Malformed UTF-8 character (fatal) at ./kk.pl line 11. The script fails for any utf8 definition of $p This regression has been tested also on a perl vanilla compilation on another server. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=critical --- Site configuration information for perl 5.10.1: Configured by Debian Project at Sun Feb 7 16:19:05 UTC 2010. Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=2.6.26-2-amd64, archname=i486-linux-gnu-thread-multi uname='linux biber 2.6.26-2-amd64 #1 smp tue jan 12 22:12:20 utc 2010 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/l ib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/lo cal/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3 perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.3 20100108 (prerelease)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.10.2.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1 gnulibc_version='2.10.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable. DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules. DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD. DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets. DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd. DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64 DEBPKG:debian/dynaloader-config - http://bugs.debian.org/549170 Make DynaLoader work without Config_heavy.pl again DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626] DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-11 in patchlevel.h --- @INC for perl 5.10.1: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . --- Environment for perl 5.10.1: HOME=/home/hector LANG=es_ES.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/opt/drbl/sbin:/opt/drbl/bin:/home/hector/bin:/opt/drbl/sbin:/opt/drbl/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games PERL_BADLANG (unset) SHELL=/bin/bash
CC: bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Mon, 22 Mar 2010 18:51:40 -0400
To: perl5-porters [...] perl.org
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 1.1k
On Mon, Mar 22, 2010 at 6:13 AM, Hector Garcia <perlbug-followup@perl.org>wrote: Show quoted text
> # New Ticket Created by Hector Garcia > # Please include the string: [perl #73732] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=73732 > > > > This is a bug report for perl from hector@debian.org, > generated with the help of perlbug 1.39 running under perl 5.10.1. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > executing this (which works correctly on perl 5.8 gives an error > > #!/usr/bin/perl -w > > use utf8; > use encoding 'utf8'; > > my $p = 'á d</p>'; > #my $p = 'す d</p>'; > > print "$p\n"; > > if ($p =~ m#(.*?)[-]?EFE\s*</p>$#gsm) { > print "yes $1\n"; > }else{ > print "no\n"; > } > > > hector@baloo:/tmp$ ./kk.pl > á d</p> > Malformed UTF-8 character (fatal) at ./kk.pl line 11. >
Thanks for the report. Workaround until this is fixed: if ($p =~ m#(?:|(?!)\x{2660})(.*?)[-]?EFE\s*</p>$#sm) { Note that I removed the /g. "if (/.../g)" rarely makes any sense and can produce undesirable results.
CC: perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Mon, 22 Mar 2010 21:47:07 -0600
To: Eric Brine <ikegami [...] adaelis.com>
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 1.3k
Eric Brine wrote: Show quoted text
> On Mon, Mar 22, 2010 at 6:13 AM, Hector Garcia <perlbug-followup@perl.org>wrote: >
>> # New Ticket Created by Hector Garcia >> # Please include the string: [perl #73732] >> # in the subject line of all future correspondence about this issue. >> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=73732 > >> >> >> This is a bug report for perl from hector@debian.org, >> generated with the help of perlbug 1.39 running under perl 5.10.1. >> >> >> ----------------------------------------------------------------- >> [Please describe your issue here] >> >> executing this (which works correctly on perl 5.8 gives an error >> >> #!/usr/bin/perl -w >> >> use utf8; >> use encoding 'utf8'; >> >> my $p = 'á d</p>'; >> #my $p = 'す d</p>'; >> >> print "$p\n"; >> >> if ($p =~ m#(.*?)[-]?EFE\s*</p>$#gsm) { >> print "yes $1\n"; >> }else{ >> print "no\n"; >> } >> >> >> hector@baloo:/tmp$ ./kk.pl >> á d</p> >> Malformed UTF-8 character (fatal) at ./kk.pl line 11. >>
> > Thanks for the report. > > Workaround until this is fixed: > > if ($p =~ m#(?:|(?!)\x{2660})(.*?)[-]?EFE\s*</p>$#sm) { > > Note that I removed the /g. "if (/.../g)" rarely makes any sense and can > produce undesirable results. >
I wonder if this is related to #46563: g suffix on string search (/.../g) can cause string corruption which is a won't fix
CC: perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Tue, 23 Mar 2010 11:02:46 -0400
To: karl williamson <public [...] khwilliamson.com>
From: Eric Brine <ikegami [...] adaelis.com>
Download (untitled) / with headers
text/plain 345b
On Mon, Mar 22, 2010 at 11:47 PM, karl williamson <public@khwilliamson.com>wrote: Show quoted text
> I wonder if this is related to #46563: g suffix on string search (/.../g) > can cause string corruption > > which is a won't fix >
The /g is not germane to the bug. The workaround wasn't the removal of the /g, it's the addition of >8-bit char to the pattern.
CC: Eric Brine <ikegami [...] adaelis.com>, perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Tue, 23 Mar 2010 20:41:24 +0000
To: karl williamson <public [...] khwilliamson.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 526b
On Mon, Mar 22, 2010 at 09:47:07PM -0600, karl williamson wrote: Show quoted text
> I wonder if this is related to #46563: g suffix on string search > (/.../g) can cause string corruption > > which is a won't fix
http://rt.perl.org/rt3/Ticket/Display.html?id=46563 For now and for older perls this bug is firmly in the "wont fix" category. Sorry. It wasn't yet described as a "won't" fix if it's still in current blead. (I couldn't seem to replicate it even on 5.10.0, so I'm not sure what the state of the bug is) Nicholas Clark
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Tue, 23 Mar 2010 14:58:58 -0600
To: karl williamson <public [...] khwilliamson.com>, Eric Brine <ikegami [...] adaelis.com>, perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 634b
Nicholas Clark wrote: Show quoted text
> On Mon, Mar 22, 2010 at 09:47:07PM -0600, karl williamson wrote: >
>> I wonder if this is related to #46563: g suffix on string search >> (/.../g) can cause string corruption >> >> which is a won't fix
> > http://rt.perl.org/rt3/Ticket/Display.html?id=46563 > > For now and for older perls this bug is firmly in the "wont fix" > category. Sorry. > > > It wasn't yet described as a "won't" fix if it's still in current blead. > (I couldn't seem to replicate it even on 5.10.0, so I'm not sure what the > state of the bug is) > > Nicholas Clark >
I just tried it, and it is still a bug in 5.12RC0.
Download (untitled) / with headers
text/plain 2.5k
On Lun. Mar. 22 20:47:43 2010, public@khwilliamson.com wrote: Show quoted text
> Eric Brine wrote:
> > On Mon, Mar 22, 2010 at 6:13 AM, Hector Garcia <perlbug-
> followup@perl.org>wrote:
> >
> >> # New Ticket Created by Hector Garcia > >> # Please include the string: [perl #73732] > >> # in the subject line of all future correspondence about this
> issue.
> >> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=73732 > > >> > >> > >> This is a bug report for perl from hector@debian.org, > >> generated with the help of perlbug 1.39 running under perl 5.10.1. > >> > >> > >> ----------------------------------------------------------------- > >> [Please describe your issue here] > >> > >> executing this (which works correctly on perl 5.8 gives an error > >> > >> #!/usr/bin/perl -w > >> > >> use utf8; > >> use encoding 'utf8'; > >> > >> my $p = 'á d</p>'; > >> #my $p = 'す d</p>'; > >> > >> print "$p\n"; > >> > >> if ($p =~ m#(.*?)[-]?EFE\s*</p>$#gsm) { > >> print "yes $1\n"; > >> }else{ > >> print "no\n"; > >> } > >> > >> > >> hector@baloo:/tmp$ ./kk.pl > >> á d</p> > >> Malformed UTF-8 character (fatal) at ./kk.pl line 11. > >>
> > > > Thanks for the report. > > > > Workaround until this is fixed: > > > > if ($p =~ m#(?:|(?!)\x{2660})(.*?)[-]?EFE\s*</p>$#sm) { > > > > Note that I removed the /g. "if (/.../g)" rarely makes any sense and
> can
> > produce undesirable results. > >
> > I wonder if this is related to #46563: g suffix on string search > (/.../g) can cause string corruption > > which is a won't fix >
The /g isn't the problem: --------------------------------------------- #!/usr/bin/perl -w use utf8; use encoding 'utf8'; my $p = 'á d</p>'; #my $p = 'す d</p>'; print "$p\n"; if ($p =~ m#(.*?)[-]?EFE\s*</p>$#sm) { print "yes $1\n"; }else{ print "no\n"; } --------------------------------------------- $ perl problem.pl á d</p> Malformed UTF-8 character (fatal) at kk.pl line 11. And "m#(?:|(?!)\x{2660})(.*?)[-]?EFE\s*</p>$#sm" isn't a real workaround. This was just only an example of the problem If we change the (.*) and we use (\X*), it works. So, we think there is some problem with wide characters and the '.' in regular expressions. Surprisingly, it works with 5.8. We could fix it with this patch: --- regcomp.c.OLD 2010-03-24 10:15:59.381767760 +0100 +++ regcomp.c 2010-03-24 10:17:03.068877134 +0100 @@ -6932,7 +6932,7 @@ ret = reg_node(pRExC_state, SANY); else ret = reg_node(pRExC_state, REG_ANY); - *flagp |= HASWIDTH|SIMPLE; + *flagp |= HASWIDTH; RExC_naughty++; Set_Node_Length(ret, 1); /* MJD */ break; Any idea? Cheers,
Subject: Re: [perl #73732] perlbug AutoReply: Regression on regexp from 5.8 to 5.10
Date: Wed, 24 Mar 2010 09:31:16 +0100
To: perlbug-followup [...] perl.org
From: Hector Garcia <hector [...] debian.org>
Download (untitled) / with headers
text/plain 148b
This bug has nothing to do with bug 46563 If you take out the /g from the example I originally send, you'll see the bug it is still there. Thanks
CC: Eric Brine <ikegami [...] adaelis.com>, perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Wed, 24 Mar 2010 21:48:41 +0000
To: karl williamson <public [...] khwilliamson.com>
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 330b
On Tue, Mar 23, 2010 at 02:58:58PM -0600, karl williamson wrote: Show quoted text
> I just tried it, and it is still a bug in 5.12RC0.
And here is a minimal(ish) case that triggers a 'Malformed UTF-8 character' warning: $_ = "\x{e1} d</p>\x{100}"; chop $_; print "match\n" if m{(.*?)-\s</p>$}; -- You're only as old as you look.
Subject: Certain regex patterns cause fatal errors with valid UTF-8
Date: Fri, 11 Jun 2010 12:07:04 -0700
To: perlbug [...] perl.org
From: Doug Cook <doug [...] ablegrape.com>
Download (untitled) / with headers
text/plain 4.9k
This is a bug report for perl from doug@ablegrape.com, generated with the help of perlbug 1.39 running under perl v5.8.9. ----------------------------------------------------------------- My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard). Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8. I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails. My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid. #!/usr/bin/perl use strict vars; use utf8; binmode STDOUT, ":utf8"; my $e = "Böck"; if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; } # this succeeds (failed before with use encoding 'utf8', unknown why) if ($e=~ m/.*?[x]$/) { print "matched simple\n"; } print "success with simple\n"; # these die if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; } print "success with medium\n"; if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; } print "success with medium\n"; # the original, full expression. if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; } print "success with complex\n"; [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=critical --- Site configuration information for perl v5.8.9: Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009. Summary of my perl5 (revision 5 version 8 subversion 9) configuration: Platform: osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 ' config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include', optimize='-Os', cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib' libpth=/usr/local/lib /usr/lib libs=-ldbm -ldl -lm -lutil -lc perllibs=-ldl -lm -lutil -lc libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib' Locally applied patches: /Library/Perl/Updates/<version> comes before system perl directories installprivlib and installarchlib points to the Updates directory 6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized --- @INC for perl v5.8.9: /Library/Perl/Updates/5.8.9 /System/Library/Perl/5.8.9/darwin-thread-multi-2level /System/Library/Perl/5.8.9 /Library/Perl/5.8.9/darwin-thread-multi-2level /Library/Perl/5.8.9 /Network/Library/Perl/5.8.9/darwin-thread-multi-2level /Network/Library/Perl/5.8.9 /Network/Library/Perl /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.9 /Library/Perl/5.8.8 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl/5.8.1 . --- Environment for perl v5.8.9: DYLD_LIBRARY_PATH (unset) HOME=/Users/cook LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin PERL_BADLANG (unset) SHELL=/bin/bash
FYI, discussion of this bug on Perlmonks: http://www.perlmonks.org/?node_id=843208
CC: bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Date: Sat, 12 Jun 2010 09:14:44 -0400
To: perl5-porters [...] perl.org
From: "Chas. Owens" <chas.owens [...] gmail.com>
Download (untitled) / with headers
text/plain 5.7k
As a work around, I suggest you use the \x{} literal escape: my $e = "B\x{f6}ck"; It seems to work on my OS X machines. On Fri, Jun 11, 2010 at 15:15, Doug Cook <perlbug-followup@perl.org> wrote: Show quoted text
> # New Ticket Created by  Doug Cook > # Please include the string:  [perl #75680] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75680 > > > > > > This is a bug report for perl from doug@ablegrape.com, > generated with the help of perlbug 1.39 running under perl v5.8.9. > > > ----------------------------------------------------------------- > My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard). > > Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8. > > I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails. > > My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid. > > #!/usr/bin/perl > > use strict vars; > use utf8; > binmode STDOUT, ":utf8"; > > my $e = "Böck"; > > if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; } > > # this succeeds (failed before with use encoding 'utf8', unknown why) > if ($e=~ m/.*?[x]$/) { print "matched simple\n"; } > print "success with simple\n"; > > # these die > if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; } > print "success with medium\n"; > if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; } > print "success with medium\n"; > > # the original, full expression. > if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; } > print "success with complex\n"; > > > > [Please do not change anything below this line] > ----------------------------------------------------------------- > --- > Flags: >    category=core >    severity=critical > --- > Site configuration information for perl v5.8.9: > > Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009. > > Summary of my perl5 (revision 5 version 8 subversion 9) configuration: >  Platform: >    osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level >    uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 ' >    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2' >    hint=recommended, useposix=true, d_sigaction=define >    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define >    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef >    use64bitint=define use64bitall=define uselongdouble=undef >    usemymalloc=n, bincompat5005=undef >  Compiler: >    cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include', >    optimize='-Os', >    cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include' >    ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers='' >    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 >    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 >    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 >    alignbytes=8, prototype=define >  Linker and Libraries: >    ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib' >    libpth=/usr/local/lib /usr/lib >    libs=-ldbm -ldl -lm -lutil -lc >    perllibs=-ldl -lm -lutil -lc >    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib >    gnulibc_version='' >  Dynamic Linking: >    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' >    cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib' > > Locally applied patches: >    /Library/Perl/Updates/<version> comes before system perl directories >    installprivlib and installarchlib points to the Updates directory >    6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized > > --- > @INC for perl v5.8.9: >    /Library/Perl/Updates/5.8.9 >    /System/Library/Perl/5.8.9/darwin-thread-multi-2level >    /System/Library/Perl/5.8.9 >    /Library/Perl/5.8.9/darwin-thread-multi-2level >    /Library/Perl/5.8.9 >    /Network/Library/Perl/5.8.9/darwin-thread-multi-2level >    /Network/Library/Perl/5.8.9 >    /Network/Library/Perl >    /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level >    /System/Library/Perl/Extras/5.8.9 >    /Library/Perl/5.8.8 >    /Library/Perl/5.8.6/darwin-thread-multi-2level >    /Library/Perl/5.8.6 >    /Library/Perl/5.8.1 >    . > > --- > Environment for perl v5.8.9: >    DYLD_LIBRARY_PATH (unset) >    HOME=/Users/cook >    LANG=en_US.UTF-8 >    LANGUAGE (unset) >    LD_LIBRARY_PATH (unset) >    LOGDIR (unset) >    PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin >    PERL_BADLANG (unset) >    SHELL=/bin/bash > >
-- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read.
CC: perl5-porters [...] perl.org, bugs-bitbucket [...] netlabs.develooper.com
Subject: Re: [perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Date: Sun, 13 Jun 2010 09:22:57 -0600
To: "Chas. Owens" <chas.owens [...] gmail.com>
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 5.8k
Chas. Owens wrote: Show quoted text
> As a work around, I suggest you use the \x{} literal escape: > > my $e = "B\x{f6}ck"; > > It seems to work on my OS X machines.
Unfortunately the reason this workaround works is because it avoids upgrading $e to utf8. If you use "B\x{101}ck" instead, the malformed remains. Also, because of an unrelated bug, /i matching will not work properly for \x{f6}. Show quoted text
> > On Fri, Jun 11, 2010 at 15:15, Doug Cook <perlbug-followup@perl.org> wrote:
>> # New Ticket Created by Doug Cook >> # Please include the string: [perl #75680] >> # in the subject line of all future correspondence about this issue. >> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75680 > >> >> >> >> >> This is a bug report for perl from doug@ablegrape.com, >> generated with the help of perlbug 1.39 running under perl v5.8.9. >> >> >> ----------------------------------------------------------------- >> My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard). >> >> Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8. >> >> I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails. >> >> My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid. >> >> #!/usr/bin/perl >> >> use strict vars; >> use utf8; >> binmode STDOUT, ":utf8"; >> >> my $e = "Böck"; >> >> if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; } >> >> # this succeeds (failed before with use encoding 'utf8', unknown why) >> if ($e=~ m/.*?[x]$/) { print "matched simple\n"; } >> print "success with simple\n"; >> >> # these die >> if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; } >> print "success with medium\n"; >> if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; } >> print "success with medium\n"; >> >> # the original, full expression. >> if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; } >> print "success with complex\n"; >> >> >> >> [Please do not change anything below this line] >> ----------------------------------------------------------------- >> --- >> Flags: >> category=core >> severity=critical >> --- >> Site configuration information for perl v5.8.9: >> >> Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009. >> >> Summary of my perl5 (revision 5 version 8 subversion 9) configuration: >> Platform: >> osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level >> uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 ' >> config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2' >> hint=recommended, useposix=true, d_sigaction=define >> usethreads=define use5005threads=undef useithreads=define usemultiplicity=define >> useperlio=define d_sfio=undef uselargefiles=define usesocks=undef >> use64bitint=define use64bitall=define uselongdouble=undef >> usemymalloc=n, bincompat5005=undef >> Compiler: >> cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include', >> optimize='-Os', >> cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include' >> ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers='' >> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 >> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 >> ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 >> alignbytes=8, prototype=define >> Linker and Libraries: >> ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib' >> libpth=/usr/local/lib /usr/lib >> libs=-ldbm -ldl -lm -lutil -lc >> perllibs=-ldl -lm -lutil -lc >> libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib >> gnulibc_version='' >> Dynamic Linking: >> dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' >> cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib' >> >> Locally applied patches: >> /Library/Perl/Updates/<version> comes before system perl directories >> installprivlib and installarchlib points to the Updates directory >> 6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized >> >> --- >> @INC for perl v5.8.9: >> /Library/Perl/Updates/5.8.9 >> /System/Library/Perl/5.8.9/darwin-thread-multi-2level >> /System/Library/Perl/5.8.9 >> /Library/Perl/5.8.9/darwin-thread-multi-2level >> /Library/Perl/5.8.9 >> /Network/Library/Perl/5.8.9/darwin-thread-multi-2level >> /Network/Library/Perl/5.8.9 >> /Network/Library/Perl >> /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level >> /System/Library/Perl/Extras/5.8.9 >> /Library/Perl/5.8.8 >> /Library/Perl/5.8.6/darwin-thread-multi-2level >> /Library/Perl/5.8.6 >> /Library/Perl/5.8.1 >> . >> >> --- >> Environment for perl v5.8.9: >> DYLD_LIBRARY_PATH (unset) >> HOME=/Users/cook >> LANG=en_US.UTF-8 >> LANGUAGE (unset) >> LD_LIBRARY_PATH (unset) >> LOGDIR (unset) >> PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin >> PERL_BADLANG (unset) >> SHELL=/bin/bash >> >>
> > >
Subject: Re: [perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Date: Sun, 13 Jun 2010 11:45:36 +0200
To: perl5-porters [...] perl.org
From: "Dr.Ruud" <rvtol+usenet [...] isolution.nl>
Download (untitled) / with headers
text/plain 429b
Doug Cook wrote: Show quoted text
> My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard). > > Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.
It could well be that your editor saves the source as either UTF-8 or ISO-8859-1. Did you check the input data at the byte level? -- Ruud
According to Yves, this was fixed by commit v5.13.4-25-g92f3d48. --Steffen
Subject: Re: [perl #73732] Regression on regexp from 5.8 to 5.10
Date: Sun, 5 Sep 2010 14:52:14 -0700
To: perl5-porters [...] perl.org
From: Father Chrysostomos <sprout [...] cpan.org>
This appears to have been fixed. It may be the same bug as #75680.
Download (untitled) / with headers
text/plain 169b
On Sun Sep 05 14:52:42 2010, sprout wrote: Show quoted text
> This appears to have been fixed. It may be the same bug as #75680. >
Yes, it is the same. I’m marking this as resolved.
Download (untitled) / with headers
text/plain 689b
On Tue Jul 29 19:46:08 2008, BKB wrote: Show quoted text
> This is a very much simplified version of the script which tripped the > bug (five lines). I've also simplified the regex drastically until it > trips the bug. Shortening the regex from this makes it print "OK" but as > it stands the "Malformed UTF-8 character (fatal)" message appears.
Thank you for your report. You have ‘use utf8’ in your script, which signals to perl that your source code is in UTF-8. But then you have a string containing the octets 95 B6, which is not valid UTF-8. This results in an invalid scalar, so perl croaks. This behaviour is correct. You do not need ‘use utf8’ if you are just *using* Unicode strings.
Subject: Re: [perl #57234] Malformed UTF-8 after Encode::decode (utf8, and regex $2, $3
Date: Mon, 20 Sep 2010 07:42:38 +0900
To: perlbug-followup [...] perl.org
From: Ben Bullock <benkasminbullock [...] gmail.com>
Download (untitled) / with headers
text/plain 1010b
I'm pretty sure I filed a very much simpler example of this bug after that one (it was more than two years ago). I don't think there was anything wrong with the utf8 etc., that is just smoke-blowing. On 20 September 2010 05:48, Father Chrysostomos via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Tue Jul 29 19:46:08 2008, BKB wrote:
>> This is a very much simplified version of the script which tripped the >> bug (five lines). I've also simplified the regex drastically until it >> trips the bug. Shortening the regex from this makes it print "OK" but as >> it stands the "Malformed UTF-8 character (fatal)" message appears.
> > Thank you for your report. > > You have ‘use utf8’ in your script, which signals to perl that your > source code is in UTF-8. > > But then you have a string containing the octets 95 B6, which is not > valid UTF-8. This results in an invalid scalar, so perl croaks. This > behaviour is correct. > > You do not need ‘use utf8’ if you are just *using* Unicode strings. > >
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 608b
On Sun Sep 19 21:21:17 2010, BKB wrote: Show quoted text
> I'm pretty sure I filed a very much simpler example of this bug after > that one (it was more than two years ago). > > I don't think there was anything wrong with the utf8 etc., that is > just smoke-blowing.
I only looked at your reduced case at first. It was failing for the reason I mentioned. Your original script can be reduced to: perl -le' "(n) (See \x{7a93}\x{8ca9}) over the counter sales (often of financial packages)" =~ /(.*?)\s*([A-Z]{2}[12]?)\s*$/s' It is the same as 75680 and 73732, which were fixed by 92f3d4829170316374b610b3fc665389803d93f8.
CC: "OtherRecipients of perl Ticket #57234:;" [...] smtp.indra.com, perl5-porters [...] perl.org
Subject: Re: [perl #57234] Malformed UTF-8 after Encode::decode (utf8, and regex $2, $3
Date: Mon, 20 Sep 2010 09:49:18 -0600
To: perlbug-followup [...] perl.org
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 806b
Father Chrysostomos via RT wrote: Show quoted text
> On Sun Sep 19 21:21:17 2010, BKB wrote:
>> I'm pretty sure I filed a very much simpler example of this bug after >> that one (it was more than two years ago). >> >> I don't think there was anything wrong with the utf8 etc., that is >> just smoke-blowing.
> > I only looked at your reduced case at first. It was failing for the > reason I mentioned. > > Your original script can be reduced to: > > perl -le' "(n) (See \x{7a93}\x{8ca9}) over the counter sales (often of > financial packages)" =~ /(.*?)\s*([A-Z]{2}[12]?)\s*$/s' > > It is the same as 75680 and 73732, which were fixed by > 92f3d4829170316374b610b3fc665389803d93f8. > >
And this fix made it into 5.12.2, which is now an official Perl release available at http://search.cpan.org/~jesse/perl-5.12.2/


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org