Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain regex patterns cause fatal errors with valid UTF-8 #10434

Closed
p5pRT opened this issue Jun 11, 2010 · 31 comments
Closed

Certain regex patterns cause fatal errors with valid UTF-8 #10434

p5pRT opened this issue Jun 11, 2010 · 31 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 11, 2010

Migrated from rt.perl.org#75680 (status was 'resolved')

Searchable as RT75680$

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2008

From @benkasminbullock

This is a bug report for perl from benkasminbullock@​gmail.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

The following script run on Cygwin prints out an error message

Malformed UTF-8 character (fatal) at ./wwwjdicbug.pl line 75.

However, the UTF-8 character which is claimed to be malformed comes
from a Encode​::decode ('utf8',...) statement and then is part of a
regular expression match ($3), so this seems to be a bug in Perl.

######### wwwjdicbug.pl

#! perl
use warnings;
use strict;
use URI​::Escape 'uri_escape_utf8';
use Encode qw/encode decode/;

package WWWJDIC;
use LWP​::UserAgent;
use HTML​::TreeBuilder;
use Encode qw/encode decode/;
use URI​::Escape;
use utf8;

my %mirrors = (
japan => 'http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi',
);
my %dictionaries = ();
my %codes = ();

sub new
{
  my %options = @​_;
  my $wwwjdic = {};
  if ($options{mirror}) {
  my $mirror = lc $options{mirror};
  if ($mirrors{$mirror}) {
  $wwwjdic->{site} = $mirrors{$mirror};
  } else {
  print STDERR __PACKAGE__,"​: unknown mirror '$options{mirror}'​:
using Australian site\n";
  }
  } else {
  $wwwjdic->{site} = $mirrors{australia};
  }
  $wwwjdic->{user_agent} = LWP​::UserAgent->new;
  $wwwjdic->{user_agent}->agent(__PACKAGE__);
  bless $wwwjdic;
  return $wwwjdic;
}

# Parse a page of results from WWWJDIC

sub parse_results
{
  my ($wwwjdic, $contents) = @​_;
  $contents = decode ('utf8', $contents);
  print $contents;
  my $tree = HTML​::TreeBuilder->new();
  $tree->parse ($contents);

  my @​labels = $tree->look_down ('_tag', 'label');
  my @​inputs = $tree->look_down ('_tag', 'input');
  my %fors;
  my @​valid;
  for my $input (@​inputs) {
  if ($input->attr('name') && $input->attr('name') eq 'jukugosel'
  && $input->attr('id')) {
  $fors{$input->attr('id')} = $input;
  }
  }
  @​valid = grep {$fors{$_->attr('for')}} @​labels;
  for my $line (@​valid) {
  my %results;
  $results{wwwjdic_id} = $line->attr('id');
  my $text = $line->as_text;
  print $text,"\n";
  $results{text} = $text;
  if ($text =~ /^(.*?)\s*�$B!Z�(B\s*(.*?)\s*�$B![�(B\s*(.*?)\s*$/) {
  $results{kanji} = $1;
  $results{reading} = $2;
  $results{meaning} = $3;
  } else {
  print "Unreadable line '$text'\n";
  }
  # Get the dictionary from the end of the string.
  if ($results{meaning} &&
  $results{meaning} =~ /(.*?)\s*([A-Z]{2}[12]?)\s*$/s) {
  $results{meaning} = $1;
  $results{dictionary} = $2;
  }
  }
}

sub lookup_url
{
  my ($wwwjdic, $search_key, $search_type) = @​_;
  my %type;
  for (@​$search_type) {
  $type{max} = $_ if /^\d+$/;
  }
  my $url = $wwwjdic->{site};
  $url .= "?MMUJ";
  my $search_key_encoded = URI​::Escape​::uri_escape_utf8 ($search_key);
  $url .= $search_key_encoded;
  $url .= "_3";
  $url .= '_' . $type{max} if $type{max};
  return $url;
}

sub lookup
{
  my ($wwwjdic, $search_key, $search_type) = @​_;
  my $search_string = $wwwjdic->lookup_url ($search_key, $search_type);
  return if !$search_string;
  my $response = $wwwjdic->{user_agent}->get ($search_string);
  if ($response->is_success) {
  return $wwwjdic->parse_results ($response->content);
  }
}

sub lookup_kanji
{
  my ($wwwjdic, $search_key, $search_type) = @​_;
  my $search_string = $wwwjdic->lookup_url ($search_key, $search_type);

}

1;

package main;

my $wwwjdic = WWWJDIC​::new(mirror => 'japan');
binmode STDOUT, "​:encoding(cp932)";
my $arg = '�$BAk8}�(B';
$arg =~ s/^\s+|\s+$//g;
print "Looking up $arg in WWWJDIC​:\n";
$wwwjdic->lookup ($arg,[20]);

#### Output of ./wwwjdicbug.pl > bug.txt 2>&1

Looking up �$BAk8}�(B in WWWJDIC​:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD><META http-equiv="Content-Type" content="text/html;
charset=UTF-8"><TITLE>WWWJDIC​: Word Display</TITLE>
<SCRIPT type="text/javascript">
<!--
function sf(){document.inp.dsrchkey.focus();}
-->
</SCRIPT>
<style type="text/css">
<!-- .pu{CURSOR​: help;}
-->
</style>
<link rel="icon" href="http​://www.csse.monash.edu.au/~jwb/wwwjdic.ico"
type="image/x-icon">
</HEAD><BODY onLoad=sf()
BGCOLOR="ivory" TEXT="black">
<style type="text/css">
<!-- table.hdr { border-collapse​: collapse; border​: 2px solid #0000cc;}
td.hdr { background​: #0000cc; text-align​: center; vertical-align​:
middle; font-weight​: bold; border​: 1px solid #0066CC; padding​: 2px; }
td.hdr a​:link, td.hdr a​:visited, td.hdr a​:hover, td.hdr a​:active {
height​: 2em; color​: white; vertical-align​: middle; font-size​: 1.0em;
font-family​: Helvetica, sans-serif; text-decoration​: none;
font-weight​: bold; }
td.hdr a​:hover { text-decoration​: underline; }
--></style>
<table class="hdr" width="100%"><tr><td class="hdr" rowspan="2"><img
src="http​://www.aa.tufs.ac.jp/~jwb/jim_th.jpg" align="left"><span
style="font-size​: 9pt; font-family​: Helvetica, sans-serif; color​:
#FFFFFF">Jim Breen's </span><br>
<b><span style="font-size​: 14pt; font-family​: Helvetica, sans-serif;
color​: #FFFFFF">WWWJDIC</span></b></td>
<td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?1C_3_20">Word
Search/Home</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?9T_3_20">Translate
Words</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?1B_3_20">Kanji
Lookup</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?1R_3_20">Multi-Radical
Kanji</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/wwwjdicinf.html">User Guide</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/wwwjdicinf.html#dicfil_tag">Dictionaries</a>
</td>
</tr><tr>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?10">Example
Search</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?17_3_20">New
Entry/Amendment</a>
<td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?14">New
Examples</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?19B">Customize</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/wwwjdicinf.html#code_tag">Dictionary
Codes</a>
</td><td class="hdr"><a
href="http​://www.aa.tufs.ac.jp/~jwb/wwwjdicinf.html#don_tag">Donations</a>
</td></table>
<FORM NAME="inp" ID="inp"
ACTION="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?MF_3_20"
METHOD="POST" >
<font size="-3">  </font><br>
Search Key​: <font color="blue">�$BAk8}�(B</font> Current Dictionary​:
<font color="blue">Combined Jpn-Eng
</font><br>
<font size="-1">Options​:[G]oogle search, [GI] Google images,
[S]anseido dictionary, [A]LC dictionary (Eijiro), [Ex]ample sentences,
[V]erb conjugations, [F] Feedback, Japanese[W]ikipedia.
</font>
<p>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562616" CHECKED
ID="5562616"><label for="5562616"><font size="+1">�$BAk8}�(B �$B!Z$^$I$0$A![�(B</font>
(n) (1) ticket window; teller window; counter; (2) contact person;
point of contact; (P) </label><a
href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562620"
ID="5562620"><label for="5562620"><font size="+1">�$BAk8}$N�(B �$B!Z$^$I$0$A![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%A4%CE%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%A4%CE%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%A4%CE&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%A4%CE/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562621"
ID="5562621"><label for="5562621"><font size="+1">�$BAk8}$N780w�(B �$B!Z$^$I$0$A![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%A4%CE%B7%B8%B0%F7%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%A4%CE%B7%B8%B0%F7%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%A4%CE%B7%B8%B0%F7&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%A4%CE%B7%B8%B0%F7/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562622"
ID="5562622"><label for="5562622"><font size="+1">�$BAk8}1|�(B �$B!Z$^$I$0$A$*$/![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%B1%FC%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%B1%FC%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%B1%FC&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%B1%FC/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562623"
ID="5562623"><label for="5562623"><font size="+1">�$BAk8}5,@​)�(B
�$B!Z$^$I$0$A$-$;$$![�(B</font> (?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%B5%AC%C0%A9%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%B5%AC%C0%A9%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%B5%AC%C0%A9&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%B5%AC%C0%A9/EUC-JP/">[A]</a><a
href="http​://ja.wikipedia.org/wiki/%E7%AA%93%E5%8F%A3%E8%A6%8F%E5%88%B6">[W]</a>

<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562624"
ID="5562624"><label for="5562624"><font size="+1">�$BAk8}6HL3�(B
�$B!Z$^$I$0$A$.$g$&$`![�(B</font> (?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%B6%C8%CC%B3%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%B6%C8%CC%B3%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%B6%C8%CC%B3&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%B6%C8%CC%B3/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562625"
ID="5562625"><label for="5562625"><font size="+1">�$BAk8}?&0w�(B
�$B!Z$^$I$0$A$7$g$/$$$s![�(B</font> (?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%BF%A6%B0%F7%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%BF%A6%B0%F7%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%BF%A6%B0%F7&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%BF%A6%B0%F7/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562626"
ID="5562626"><label for="5562626"><font size="+1">�$BAk8}?t�(B �$B!Z$^$I$0$A$9$&![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%BF%F4%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%BF%F4%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%BF%F4&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%BF%F4/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562627"
ID="5562627"><label for="5562627"><font size="+1">�$BAk8}A0�(B �$B!Z$^$I$0$A$^$(![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%C1%B0%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%C1%B0%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%C1%B0&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%C1%B0/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562628"
ID="5562628"><label for="5562628"><font size="+1">�$BAk8}Fb�(B �$B!Z$^$I$0$A$J$$![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%C6%E2%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%C6%E2%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%C6%E2&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%C6%E2/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562629"
ID="5562629"><label for="5562629"><font size="+1">�$BAk8}HNGd�(B
�$B!Z$^$I$0$A$O$s$P$$![�(B</font> (n) (See <a
href="http​://www.aa.tufs.ac.jp/~jwb/cgi-bin/wwwjdic.cgi?1MDJ%C1%EB%C8%CE">�$BAkHN�(B</a>)
over the counter sales (often of financial packages) </label><a
href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%C8%CE%C7%E4%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%C8%CE%C7%E4%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%C8%CE%C7%E4&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%C8%CE%C7%E4/EUC-JP/">[A]</a>
<br>
<INPUT TYPE="radio" NAME="jukugosel" VALUE="5562630"
ID="5562630"><label for="5562630"><font size="+1">�$BAk8}Lr�(B �$B!Z$^$I$0$A$d$/![�(B</font>
(?) UNKNOWN; RH
</label><a href="http​://www.google.com/search?q=%22%C1%EB%B8%FD%CC%F2%22&hl=en&lr=lang_ja&ie=euc-jp">[G]</a><a
href="http​://images.google.com/images?q=%22%C1%EB%B8%FD%CC%F2%22&hl=en&ie=euc-jp">[GI]</a><a
href="http​://dictionary.goo.ne.jp/search.php?MT=%C1%EB%B8%FD%CC%F2&kind=je&mode=1">[S]</a><a
href="http​://eow.alc.co.jp/%C1%EB%B8%FD%CC%F2/EUC-JP/">[A]</a>
<br>
<input type="hidden" name="actionparam" value="_0_%C1%EB%B8%FD_1_">
<hr size="1">
<input type="hidden" name="originalkey" value="%C1%EB%B8%FD">
<INPUT TYPE="submit" NAME="Action" VALUE="Search " ID="ftrlabel1">
<label for="ftrlabel1"> </label><font size="-1"> for
<INPUT TYPE="text" NAME="dsrchkey" SIZE=30 VALUE=""> Dictionary​:
<SELECT NAME="dicsel"><OPTION VALUE="1" > Jpn-Eng General (EDICT)
</OPTION>
<OPTION VALUE="2" > Japanese Names (ENAMDICT)
</OPTION>
<OPTION VALUE="3" > Computing/Telecomms
</OPTION>
<OPTION VALUE="4" > Life Sciences/Bio-Med
</OPTION>
<OPTION VALUE="5" > Legal Terms
</OPTION>
<OPTION VALUE="6" > Finance/Marketing
</OPTION>
<OPTION VALUE="7" > Buddhism
</OPTION>
<OPTION VALUE="8" > Miscellaneous
</OPTION>
<OPTION VALUE="9" > Special Text-glossing
</OPTION>
<OPTION VALUE="A" > Engineering/Science
</OPTION>
<OPTION VALUE="B" > Linguistics
</OPTION>
<OPTION VALUE="C" > River & Water Systems
</OPTION>
<OPTION VALUE="D" > Automobile Industry
</OPTION>
<OPTION VALUE="E" > Extra Japanese Loanwords
</OPTION>
<OPTION VALUE="F" > Japanese-German (WaDoku)
</OPTION>
<OPTION VALUE="G" > Japanese-French
</OPTION>
<OPTION VALUE="H" > Japanese-Russian
</OPTION>
<OPTION VALUE="I" > Japanese-Swedish
</OPTION>
<OPTION VALUE="J" > Japanese-Hungarian
</OPTION>
<OPTION VALUE="K" > Japanese-Spanish
</OPTION>
<OPTION VALUE="L" > Untranslated
</OPTION>
<OPTION SELECTED VALUE="M" > Combined Jpn-Eng
</OPTION>
</SELECT> <INPUT TYPE="reset" VALUE="Reset">
<br><b>Key Type​:</b> <INPUT TYPE="radio" NAME="dsrchtype" VALUE="E"
ID="ftrlabel2" CHECKED> <label for="ftrlabel2">Text(J/E)</label>
<INPUT TYPE="radio" NAME="dsrchtype" VALUE="J"
ID="ftrlabel3"> <label for="ftrlabel3">Romaji</label>
  <b>Options​:</b> <INPUT TYPE="checkbox" NAME="firstkanj"
ID="ftrlabel4" VALUE="X"> <label for="ftrlabel4">Starting
Kanji</label> <INPUT TYPE="checkbox" NAME="engpri" ID="ftrlabel5"
VALUE="X"> <label for="ftrlabel5">Common words </label>
<INPUT TYPE="checkbox" NAME="exactm" ID="ftrlabel6"
VALUE="X"> <label for="ftrlabel6">Exact
word-match</label></font><br>
<INPUT TYPE="submit" NAME="Action" VALUE="Examine"> the kanji in a
selected compound (check the compound you wish to examine)<br>
<INPUT TYPE="submit" NAME="Action" VALUE="Suggest"> a new EDICT entry
based on the selected entry<br>
<INPUT TYPE="submit" NAME="Action" VALUE="Repeat "> this search
(choose another Dictionary above)<br>
</form>
<hr>
<center><font size="-1"> WWWJDIC site​: Japan [TUFS/RILCAA]  
  &#169; Copyright 2008, <a
href="http​://www.edrdg.org/">Electronic Dictionary Research and
Development Group</a>. (<a
href="http​://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#copyr_tag">Details</a>)</font></center>
</BODY>
</HTML>
�$BAk8}�(B �$B!Z$^$I$0$A![�(B (n) (1) ticket window; teller window; counter; (2) contact
person; point of contact; (P)
�$BAk8}$N�(B �$B!Z$^$I$0$A![�(B (?) UNKNOWN; RH
�$BAk8}$N780w�(B �$B!Z$^$I$0$A![�(B (?) UNKNOWN; RH
�$BAk8}1|�(B �$B!Z$^$I$0$A$*$/![�(B (?) UNKNOWN; RH
�$BAk8}5,@​)�(B �$B!Z$^$I$0$A$-$;$$![�(B (?) UNKNOWN; RH
�$BAk8}6HL3�(B �$B!Z$^$I$0$A$.$g$&$`![�(B (?) UNKNOWN; RH
�$BAk8}?&0w�(B �$B!Z$^$I$0$A$7$g$/$$$s![�(B (?) UNKNOWN; RH
�$BAk8}?t�(B �$B!Z$^$I$0$A$9$&![�(B (?) UNKNOWN; RH
�$BAk8}A0�(B �$B!Z$^$I$0$A$^$(![�(B (?) UNKNOWN; RH
�$BAk8}Fb�(B �$B!Z$^$I$0$A$J$$![�(B (?) UNKNOWN; RH
�$BAk8}HNGd�(B �$B!Z$^$I$0$A$O�(BMalformed UTF-8 character (fatal) at ./wwwjdicbug.pl line 75.
�$B$s$P$$![�(B (n) (See �$BAkHN�(B) over the counter sales (often of financial packages)

############## End---
Flags​:
  category=core
  severity=low


Site configuration information for perl 5.10.0​:

Configured by rurban at Mon Jun 30 16​:03​:19 GMT 2008.

Summary of my perl5 (revision 5 version 10 subversion 0 patch 34065)
configuration​:
  Platform​:
  osname=cygwin, osvers=1.5.25(0.15642), archname=cygwin-thread-multi-64int
  uname='cygwin_nt-5.1 reini 1.5.25(0.15642) 2008-06-12 19​:34 i686 cygwin '
  config_args='-de -Dmksymlinks -Dusethreads -Dmad=y -Dusedevel'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=undef, uselongdouble=undef
  usemymalloc=y, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__
-fno-strict-aliasing -pipe -I/usr/local/include',
  optimize='-O3',
  cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__
-fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion='', gccversion='3.4.4 (cygming special, gdc 0.12, using
dmd 0.125)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='g++', ldflags =' -Wl,--enable-auto-import
-Wl,--export-all-symbols -Wl,--stack,8388608
-Wl,--enable-auto-image-base -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib /lib
  libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
  perllibs=-ldl -lcrypt
  libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' --shared -Wl,--enable-auto-import
-Wl,--export-all-symbols -Wl,--stack,8388608
-Wl,--enable-auto-image-base -L/usr/local/lib'

Locally applied patches​:
  MAINT34065
  CYG11 no-bs
  CYG12 no archlib in otherlibdirs
  CYG14 Dynaloader
  CYG15 static-Win32CORE
  Bug#55162 File​::Spec​::case_tolerant performance


@​INC for perl 5.10.0​:
  /usr/lib/perl5/5.10/i686-cygwin
  /usr/lib/perl5/5.10
  /usr/lib/perl5/site_perl/5.10/i686-cygwin
  /usr/lib/perl5/site_perl/5.10
  /usr/lib/perl5/vendor_perl/5.10/i686-cygwin
  /usr/lib/perl5/vendor_perl/5.10
  /usr/lib/perl5/vendor_perl/5.10
  /usr/lib/perl5/site_perl/5.8
  /usr/lib/perl5/vendor_perl/5.8
  .


Environment for perl 5.10.0​:
  HOME=/cygdrive/c/Documents and Settings/bkb
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/local/bin​:/usr/bin​:/bin​:/usr/X11R6/bin​:/cygdrive/c/Program
Files/Perl/site/bin​:/cygdrive/c/Program
Files/Perl/bin​:/cygdrive/c/WINDOWS/system32​:/cygdrive/c/WINDOWS​:/cygdrive/c/WINDOWS/System32/Wbem​:/cygdrive/c/Program
Files/MySQL/MySQL Server 5.0/bin​:/cygdrive/c/Documents and
Settings/bkb/My Documents/scripts/bin​:
  PERL_BADLANG (unset)
  SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2008

From @benkasminbullock

This is a very much simplified version of the script which tripped the
bug (five lines). I've also simplified the regex drastically until it
trips the bug. Shortening the regex from this makes it print "OK" but as
it stands the "Malformed UTF-8 character (fatal)" message appears.

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2008

From @benkasminbullock

tinytest.pl

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2008

@benkasminbullock - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2010

From hector@debian.org

Created by hector@debian.org

executing this (which works correctly on perl 5.8 gives an error

#!/usr/bin/perl -w

use utf8;
use encoding 'utf8';

my $p = 'á d</p>';
#my $p = 'す d</p>';

print "$p\n";

if ($p =~ m#(.*?)[-]?EFE\s*&lt;/p&gt;$#gsm) {
  print "yes $1\n";
}else{
  print "no\n";
}

hector@​baloo​:/tmp$ ./kk.pl
á d</p>
Malformed UTF-8 character (fatal) at ./kk.pl line 11.

The script fails for any utf8 definition of $p

This regression has been tested also on a perl vanilla compilation on another server.

Perl Info

Flags:
    category=core
    severity=critical

Site configuration information for perl 5.10.1:

Configured by Debian Project at Sun Feb  7 16:19:05 UTC 2010.
Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.26-2-amd64, archname=i486-linux-gnu-thread-multi
    uname='linux biber 2.6.26-2-amd64 #1 smp tue jan 12 22:12:20 utc 2010 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/l
ib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/lo
cal/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3
perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.3 20100108 (prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.10.2.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1
    gnulibc_version='2.10.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
    DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable.
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. 
    DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix
    DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option
    DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules.
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries
    DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib
    DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD.
    DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
    DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD
    DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file
    DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets.
    DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
    DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
    DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other
    DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match
    DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64
    DEBPKG:debian/dynaloader-config - http://bugs.debian.org/549170 Make DynaLoader work without Config_heavy.pl again
    DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
    DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa
    DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method
    DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-11 in patchlevel.h


@INC for perl 5.10.1:
    /etc/perl
    /usr/local/lib/perl/5.10.1
    /usr/local/share/perl/5.10.1
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    .


Environment for perl 5.10.1:
    HOME=/home/hector
    LANG=es_ES.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/opt/drbl/sbin:/opt/drbl/bin:/home/hector/bin:/opt/drbl/sbin:/opt/drbl/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash



@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2010

From @ikegami

On Mon, Mar 22, 2010 at 6​:13 AM, Hector Garcia <perlbug-followup@​perl.org>wrote​:

# New Ticket Created by Hector Garcia
# Please include the string​: [perl #73732]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=73732 >

This is a bug report for perl from hector@​debian.org,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

executing this (which works correctly on perl 5.8 gives an error

#!/usr/bin/perl -w

use utf8;
use encoding 'utf8';

my $p = 'á d</p>';
#my $p = 'す d</p>';

print "$p\n";

if ($p =~ m#(.*?)[-]?EFE\s*&lt;/p&gt;$#gsm) {
print "yes $1\n";
}else{
print "no\n";
}

hector@​baloo​:/tmp$ ./kk.pl
á d</p>
Malformed UTF-8 character (fatal) at ./kk.pl line 11.

Thanks for the report.

Workaround until this is fixed​:

if ($p =~ m#(?​:|(?!)\x{2660})(.*?)[-]?EFE\s*&lt;/p&gt;$#sm) {

Note that I removed the /g. "if (/.../g)" rarely makes any sense and can
produce undesirable results.

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2010

From @khwilliamson

Eric Brine wrote​:

On Mon, Mar 22, 2010 at 6​:13 AM, Hector Garcia <perlbug-followup@​perl.org>wrote​:

# New Ticket Created by Hector Garcia
# Please include the string​: [perl #73732]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=73732 >

This is a bug report for perl from hector@​debian.org,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

executing this (which works correctly on perl 5.8 gives an error

#!/usr/bin/perl -w

use utf8;
use encoding 'utf8';

my $p = 'á d</p>';
#my $p = 'す d</p>';

print "$p\n";

if ($p =~ m#(.*?)[-]?EFE\s*&lt;/p&gt;$#gsm) {
print "yes $1\n";
}else{
print "no\n";
}

hector@​baloo​:/tmp$ ./kk.pl
á d</p>
Malformed UTF-8 character (fatal) at ./kk.pl line 11.

Thanks for the report.

Workaround until this is fixed​:

if ($p =~ m#(?​:|(?!)\x{2660})(.*?)[-]?EFE\s*&lt;/p&gt;$#sm) {

Note that I removed the /g. "if (/.../g)" rarely makes any sense and can
produce undesirable results.

I wonder if this is related to #46563​: g suffix on string search
(/.../g) can cause string corruption

which is a won't fix

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2010

From @ikegami

On Mon, Mar 22, 2010 at 11​:47 PM, karl williamson
<public@​khwilliamson.com>wrote​:

I wonder if this is related to #46563​: g suffix on string search (/.../g)
can cause string corruption

which is a won't fix

The /g is not germane to the bug. The workaround wasn't the removal of the
/g, it's the addition of >8-bit char to the pattern.

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2010

From @nwc10

On Mon, Mar 22, 2010 at 09​:47​:07PM -0600, karl williamson wrote​:

I wonder if this is related to #46563​: g suffix on string search
(/.../g) can cause string corruption

which is a won't fix

http​://rt.perl.org/rt3/Ticket/Display.html?id=46563

  For now and for older perls this bug is firmly in the "wont fix"
  category. Sorry.

It wasn't yet described as a "won't" fix if it's still in current blead.
(I couldn't seem to replicate it even on 5.10.0, so I'm not sure what the
state of the bug is)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2010

From @khwilliamson

Nicholas Clark wrote​:

On Mon, Mar 22, 2010 at 09​:47​:07PM -0600, karl williamson wrote​:

I wonder if this is related to #46563​: g suffix on string search
(/.../g) can cause string corruption

which is a won't fix

http​://rt.perl.org/rt3/Ticket/Display.html?id=46563

For now and for older perls this bug is firmly in the "wont fix"
category. Sorry.

It wasn't yet described as a "won't" fix if it's still in current blead.
(I couldn't seem to replicate it even on 5.10.0, so I'm not sure what the
state of the bug is)

Nicholas Clark

I just tried it, and it is still a bug in 5.12RC0.

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2010

From rivero@raulrivero.es

On Lun. Mar. 22 20​:47​:43 2010, public@​khwilliamson.com wrote​:

Eric Brine wrote​:

On Mon, Mar 22, 2010 at 6​:13 AM, Hector Garcia <perlbug-
followup@​perl.org>wrote​:

# New Ticket Created by Hector Garcia
# Please include the string​: [perl #73732]
# in the subject line of all future correspondence about this
issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=73732 >

This is a bug report for perl from hector@​debian.org,
generated with the help of perlbug 1.39 running under perl 5.10.1.

-----------------------------------------------------------------
[Please describe your issue here]

executing this (which works correctly on perl 5.8 gives an error

#!/usr/bin/perl -w

use utf8;
use encoding 'utf8';

my $p = 'á d</p>';
#my $p = 'す d</p>';

print "$p\n";

if ($p =~ m#(.*?)[-]?EFE\s*&lt;/p&gt;$#gsm) {
print "yes $1\n";
}else{
print "no\n";
}

hector@​baloo​:/tmp$ ./kk.pl
á d</p>
Malformed UTF-8 character (fatal) at ./kk.pl line 11.

Thanks for the report.

Workaround until this is fixed​:

if ($p =~ m#(?​:|(?!)\x{2660})(.*?)[-]?EFE\s*&lt;/p&gt;$#sm) {

Note that I removed the /g. "if (/.../g)" rarely makes any sense and
can
produce undesirable results.

I wonder if this is related to #46563​: g suffix on string search
(/.../g) can cause string corruption

which is a won't fix

The /g isn't the problem​:


#!/usr/bin/perl -w

use utf8;
use encoding 'utf8';

my $p = 'á d</p>';
#my $p = 'す d</p>';

print "$p\n";

if ($p =~ m#(.*?)[-]?EFE\s*&lt;/p&gt;$#sm) {
  print "yes $1\n";
}else{
  print "no\n";
}


$ perl problem.pl
á d</p>
Malformed UTF-8 character (fatal) at kk.pl line 11.

And "m#(?​:|(?!)\x{2660})(.*?)[-]?EFE\s*</p>$#sm" isn't a real
workaround. This was just only an example of the problem

If we change the (.*) and we use (\X*), it works. So, we think there is
some problem with wide characters and the '.' in regular expressions.
Surprisingly, it works with 5.8.

We could fix it with this patch​:

Inline Patch
--- regcomp.c.OLD	2010-03-24 10:15:59.381767760 +0100
+++ regcomp.c	2010-03-24 10:17:03.068877134 +0100
@@ -6932,7 +6932,7 @@
 	    ret = reg_node(pRExC_state, SANY);
 	else
 	    ret = reg_node(pRExC_state, REG_ANY);
-	*flagp |= HASWIDTH|SIMPLE;
+	*flagp |= HASWIDTH;
 	RExC_naughty++;
         Set_Node_Length(ret, 1); /* MJD */
 	break;

Any idea?

Cheers,

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2010

From hector@debian.org

This bug has nothing to do with bug 46563
If you take out the /g from the example I originally send, you'll see
the bug it is still there.

Thanks

@p5pRT
Copy link
Author

p5pRT commented Mar 24, 2010

From @iabyn

On Tue, Mar 23, 2010 at 02​:58​:58PM -0600, karl williamson wrote​:

I just tried it, and it is still a bug in 5.12RC0.

And here is a minimal(ish) case that triggers a 'Malformed UTF-8
character' warning​:

  $_ = "\x{e1} d</p>\x{100}";
  chop $_;
  print "match\n" if m{(.*?)-\s</p>$};

--
You're only as old as you look.

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2010

From doug@ablegrape.com

Created by doug@ablegrape.com

This is a bug report for perl from doug@​ablegrape.com,
generated with the help of perlbug 1.39 running under perl v5.8.9.

-----------------------------------------------------------------
My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).

Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.

I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.

My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.

#!/usr/bin/perl

use strict vars;
use utf8;
binmode STDOUT, "​:utf8";

my $e = "Böck";

if (utf8​::is_utf8($e)) { print "yep, is UTF8​: $e\n"; }

# this succeeds (failed before with use encoding 'utf8', unknown why)
if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
print "success with simple\n";

# these die
if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
print "success with medium\n";
if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
print "success with medium\n";

# the original, full expression.
if ($e =~ m/(.*?)[,\p{isSpace}]+((?​:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
print "success with complex\n";

Perl Info

Flags:
    category=core
    severity=critical

Site configuration information for perl v5.8.9:

Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
  Platform:
    osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
    uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 '
    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
    optimize='-Os',
    cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lutil -lc
    perllibs=-ldl -lm -lutil -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches:
    /Library/Perl/Updates/<version> comes before system perl directories
    installprivlib and installarchlib points to the Updates directory
    6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized


@INC for perl v5.8.9:
    /Library/Perl/Updates/5.8.9
    /System/Library/Perl/5.8.9/darwin-thread-multi-2level
    /System/Library/Perl/5.8.9
    /Library/Perl/5.8.9/darwin-thread-multi-2level
    /Library/Perl/5.8.9
    /Network/Library/Perl/5.8.9/darwin-thread-multi-2level
    /Network/Library/Perl/5.8.9
    /Network/Library/Perl
    /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
    /System/Library/Perl/Extras/5.8.9
    /Library/Perl/5.8.8
    /Library/Perl/5.8.6/darwin-thread-multi-2level
    /Library/Perl/5.8.6
    /Library/Perl/5.8.1
    .


Environment for perl v5.8.9:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/cook
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2010

From bitcard@candiru.com

FYI, discussion of this bug on Perlmonks​:

http​://www.perlmonks.org/?node_id=843208

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2010

From @cowens

As a work around, I suggest you use the \x{} literal escape​:

my $e = "B\x{f6}ck";

It seems to work on my OS X machines.

On Fri, Jun 11, 2010 at 15​:15, Doug Cook <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by  Doug Cook
# Please include the string​:  [perl #75680]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=75680 >

This is a bug report for perl from doug@​ablegrape.com,
generated with the help of perlbug 1.39 running under perl v5.8.9.

-----------------------------------------------------------------
My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).

Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.

I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.

My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.

#!/usr/bin/perl

use strict vars;
use utf8;
binmode STDOUT, "​:utf8";

my $e = "Böck";

if (utf8​::is_utf8($e)) { print "yep, is UTF8​: $e\n"; }

# this succeeds (failed before with use encoding 'utf8', unknown why)
if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
print "success with simple\n";

# these die
if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
print "success with medium\n";
if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
print "success with medium\n";

# the original, full expression.
if ($e =~ m/(.*?)[,\p{isSpace}]+((?​:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
print "success with complex\n";

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
   category=core
   severity=critical
---
Site configuration information for perl v5.8.9​:

Configured by _postfix at Wed Jun 24 00​:32​:40 PDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 9) configuration​:
 Platform​:
   osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
   uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8​: tue may 5 19​:29​:59 pdt 2009; root​:xnu-1437.2~2release_i386 i386 '
   config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
   hint=recommended, useposix=true, d_sigaction=define
   usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
   use64bitint=define use64bitall=define uselongdouble=undef
   usemymalloc=n, bincompat5005=undef
 Compiler​:
   cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
   optimize='-Os',
   cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
   ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
   intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
   d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
   ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
   alignbytes=8, prototype=define
 Linker and Libraries​:
   ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
   libpth=/usr/local/lib /usr/lib
   libs=-ldbm -ldl -lm -lutil -lc
   perllibs=-ldl -lm -lutil -lc
   libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
   gnulibc_version=''
 Dynamic Linking​:
   dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
   cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches​:
   /Library/Perl/Updates/<version> comes before system perl directories
   installprivlib and installarchlib points to the Updates directory
   6576362​: fixed 5.8.9 binary compatibility issue​: perlio mutex not initialized

---
@​INC for perl v5.8.9​:
   /Library/Perl/Updates/5.8.9
   /System/Library/Perl/5.8.9/darwin-thread-multi-2level
   /System/Library/Perl/5.8.9
   /Library/Perl/5.8.9/darwin-thread-multi-2level
   /Library/Perl/5.8.9
   /Network/Library/Perl/5.8.9/darwin-thread-multi-2level
   /Network/Library/Perl/5.8.9
   /Network/Library/Perl
   /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
   /System/Library/Perl/Extras/5.8.9
   /Library/Perl/5.8.8
   /Library/Perl/5.8.6/darwin-thread-multi-2level
   /Library/Perl/5.8.6
   /Library/Perl/5.8.1
   .

---
Environment for perl v5.8.9​:
   DYLD_LIBRARY_PATH (unset)
   HOME=/Users/cook
   LANG=en_US.UTF-8
   LANGUAGE (unset)
   LD_LIBRARY_PATH (unset)
   LOGDIR (unset)
   PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin​:/opt/subversion/bin​:/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/mysql/bin​:/sw/bin​:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts​:/opt/local/bin
   PERL_BADLANG (unset)
   SHELL=/bin/bash

--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2010

From @khwilliamson

Chas. Owens wrote​:

As a work around, I suggest you use the \x{} literal escape​:

my $e = "B\x{f6}ck";

It seems to work on my OS X machines.

Unfortunately the reason this workaround works is because it avoids
upgrading $e to utf8. If you use "B\x{101}ck" instead, the malformed
remains. Also, because of an unrelated bug, /i matching will not work
properly for \x{f6}.

On Fri, Jun 11, 2010 at 15​:15, Doug Cook <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Doug Cook
# Please include the string​: [perl #75680]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=75680 >

This is a bug report for perl from doug@​ablegrape.com,
generated with the help of perlbug 1.39 running under perl v5.8.9.

-----------------------------------------------------------------
My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).

Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.

I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.

My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.

#!/usr/bin/perl

use strict vars;
use utf8;
binmode STDOUT, "​:utf8";

my $e = "Böck";

if (utf8​::is_utf8($e)) { print "yep, is UTF8​: $e\n"; }

# this succeeds (failed before with use encoding 'utf8', unknown why)
if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
print "success with simple\n";

# these die
if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
print "success with medium\n";
if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
print "success with medium\n";

# the original, full expression.
if ($e =~ m/(.*?)[,\p{isSpace}]+((?​:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
print "success with complex\n";

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=critical
---
Site configuration information for perl v5.8.9​:

Configured by _postfix at Wed Jun 24 00​:32​:40 PDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 9) configuration​:
Platform​:
osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8​: tue may 5 19​:29​:59 pdt 2009; root​:xnu-1437.2~2release_i386 i386 '
config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=define uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
optimize='-Os',
cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches​:
/Library/Perl/Updates/<version> comes before system perl directories
installprivlib and installarchlib points to the Updates directory
6576362​: fixed 5.8.9 binary compatibility issue​: perlio mutex not initialized

---
@​INC for perl v5.8.9​:
/Library/Perl/Updates/5.8.9
/System/Library/Perl/5.8.9/darwin-thread-multi-2level
/System/Library/Perl/5.8.9
/Library/Perl/5.8.9/darwin-thread-multi-2level
/Library/Perl/5.8.9
/Network/Library/Perl/5.8.9/darwin-thread-multi-2level
/Network/Library/Perl/5.8.9
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.9
/Library/Perl/5.8.8
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl/5.8.1
.

---
Environment for perl v5.8.9​:
DYLD_LIBRARY_PATH (unset)
HOME=/Users/cook
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin​:/opt/subversion/bin​:/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/mysql/bin​:/sw/bin​:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts​:/opt/local/bin
PERL_BADLANG (unset)
SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 13, 2010

From @druud62

Doug Cook wrote​:

My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).

Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.

It could well be that your editor saves the source as either UTF-8 or
ISO-8859-1. Did you check the input data at the byte level?

--
Ruud

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2010

From @tsee

According to Yves, this was fixed by commit v5.13.4-25-g92f3d48.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2010

@tsee - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Aug 23, 2010
@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2010

From @cpansprout

This appears to have been fixed. It may be the same bug as #75680.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

From @cpansprout

On Sun Sep 05 14​:52​:42 2010, sprout wrote​:

This appears to have been fixed. It may be the same bug as #75680.

Yes, it is the same. I’m marking this as resolved.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2010

From @cpansprout

On Tue Jul 29 19​:46​:08 2008, BKB wrote​:

This is a very much simplified version of the script which tripped the
bug (five lines). I've also simplified the regex drastically until it
trips the bug. Shortening the regex from this makes it print "OK" but as
it stands the "Malformed UTF-8 character (fatal)" message appears.

Thank you for your report.

You have ‘use utf8’ in your script, which signals to perl that your
source code is in UTF-8.

But then you have a string containing the octets 95 B6, which is not
valid UTF-8. This results in an invalid scalar, so perl croaks. This
behaviour is correct.

You do not need ‘use utf8’ if you are just *using* Unicode strings.

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2010

@cpansprout - Status changed from 'open' to 'rejected'

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2010

From @benkasminbullock

I'm pretty sure I filed a very much simpler example of this bug after
that one (it was more than two years ago).

I don't think there was anything wrong with the utf8 etc., that is
just smoke-blowing.

On 20 September 2010 05​:48, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Jul 29 19​:46​:08 2008, BKB wrote​:

This is a very much simplified version of the script which tripped the
bug (five lines). I've also simplified the regex drastically until it
trips the bug. Shortening the regex from this makes it print "OK" but as
it stands the "Malformed UTF-8 character (fatal)" message appears.

Thank you for your report.

You have ‘use utf8’ in your script, which signals to perl that your
source code is in UTF-8.

But then you have a string containing the octets 95 B6, which is not
valid UTF-8. This results in an invalid scalar, so perl croaks. This
behaviour is correct.

You do not need ‘use utf8’ if you are just *using* Unicode strings.

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2010

From @cpansprout

On Sun Sep 19 21​:21​:17 2010, BKB wrote​:

I'm pretty sure I filed a very much simpler example of this bug after
that one (it was more than two years ago).

I don't think there was anything wrong with the utf8 etc., that is
just smoke-blowing.

I only looked at your reduced case at first. It was failing for the
reason I mentioned.

Your original script can be reduced to​:

perl -le' "(n) (See \x{7a93}\x{8ca9}) over the counter sales (often of
financial packages)" =~ /(.*?)\s*([A-Z]{2}[12]?)\s*$/s'

It is the same as 75680 and 73732, which were fixed by
92f3d48.

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2010

@cpansprout - Status changed from 'rejected' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 2010

From @khwilliamson

Father Chrysostomos via RT wrote​:

On Sun Sep 19 21​:21​:17 2010, BKB wrote​:

I'm pretty sure I filed a very much simpler example of this bug after
that one (it was more than two years ago).

I don't think there was anything wrong with the utf8 etc., that is
just smoke-blowing.

I only looked at your reduced case at first. It was failing for the
reason I mentioned.

Your original script can be reduced to​:

perl -le' "(n) (See \x{7a93}\x{8ca9}) over the counter sales (often of
financial packages)" =~ /(.*?)\s*([A-Z]{2}[12]?)\s*$/s'

It is the same as 75680 and 73732, which were fixed by
92f3d48.

And this fix made it into 5.12.2, which is now an official Perl release
available at http​://search.cpan.org/~jesse/perl-5.12.2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant