Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text::Wrap::wrap() generates a segfault with Cyrillic characters when the utf8 flag is turned on #9266

Closed
p5pRT opened this issue Mar 26, 2008 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 26, 2008

Migrated from rt.perl.org#52104 (status was 'resolved')

Searchable as RT52104$

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2008

From lpsolit@gmail.com

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439,
Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when
the utf8 flag is turned on. The testcase given in the bug,
https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl
script to run from the shell) shows this very clearly. Due to this bug,
all pages containing such strings are left blank, which is a real
problem for webapps such as Bugzilla.


Flags​:
  category=core
  severity=high


Site configuration information for perl v5.8.8​:

Configured by Mandriva at Mon Nov 5 17​:08​:57 EST 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration​:
  Platform​:
  osname=linux, osvers=2.6.12-31mdksmp, archname=i386-linux
  uname='linux mercury.mandriva.com 2.6.12-31mdksmp #1 smp fri mar 2
08​:52​:38 mst 2007 i686 intel(r) pentium(r) 4 cpu 3.00ghz gnulinux '
  config_args='-des -Dinc_version_list=5.8.7 5.8.7/i386-linux 5.8.6
5.8.6/i386-linux 5.8.5 5.8.4 5.8.3 5.8.2 5.8.1 5.8.0 5.6.1 5.6.0
-Darchname=i386-linux -Dcc=gcc -Doptimize=-O2 -pipe
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4
-fexceptions -fomit-frame-pointer -march=i586 -mtune=generic
-fasynchronous-unwind-tables -Dprefix=/usr -Dvendorprefix=/usr
-Dsiteprefix=/usr -Dsitebin=/usr/local/bin
-Dsiteman1dir=/usr/local/share/man/man1
-Dsiteman3dir=/usr/local/share/man/man3 -Dman3ext=3pm -Dcf_by=Mandriva
-Dmyhostname=localhost -Dperladmin=root@​localhost
-Dcf_email=root@​localhost -Dd_dosuid -Ud_csh -Duseshrplib'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-fno-strict-aliasing -pipe
-Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
  optimize='-O2 -pipe -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector
--param=ssp-buffer-size=4 -fexceptions -fomit-frame-pointer -march=i586
-mtune=generic -fasynchronous-unwind-tables',
  cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement
-I/usr/local/include -I/usr/include/gdbm'
  ccversion='', gccversion='4.2.2 20070909 (prerelease)
(4.2.2-0.RC.1mdv2008.0)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -lndbm -lgdbm -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.6.1.so, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.6.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.8.8/i386-linux/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:
  Mandriva Linux patches


@​INC for perl v5.8.8​:
  /usr/lib/perl5/5.8.8/i386-linux
  /usr/lib/perl5/5.8.8
  /usr/lib/perl5/site_perl/5.8.8/i386-linux
  /usr/lib/perl5/site_perl/5.8.8
  /usr/lib/perl5/site_perl
  /usr/lib/perl5/vendor_perl/5.8.8/i386-linux
  /usr/lib/perl5/vendor_perl/5.8.8
  /usr/lib/perl5/vendor_perl
  .


Environment for perl v5.8.8​:
  HOME=/root
  LANG=fr_CH.UTF-8
  LANGUAGE=fr_CH.UTF-8​:fr
  LC_ADDRESS=fr_CH.UTF-8
  LC_COLLATE=fr_CH.UTF-8
  LC_CTYPE=fr_CH.UTF-8
  LC_IDENTIFICATION=fr_CH.UTF-8
  LC_MEASUREMENT=fr_CH.UTF-8
  LC_MESSAGES=fr_CH.UTF-8
  LC_MONETARY=fr_CH.UTF-8
  LC_NAME=fr_CH.UTF-8
  LC_NUMERIC=fr_CH.UTF-8
  LC_PAPER=fr_CH.UTF-8
  LC_SOURCED=1
  LC_TELEPHONE=fr_CH.UTF-8
  LC_TIME=fr_CH.UTF-8
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/sbin​:/usr/sbin​:/bin​:/usr/bin​:/usr/X11R6/bin​:/usr/local/bin​:/usr/local/sbin​:/root/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2008

From @nwc10

On Tue, Mar 25, 2008 at 05​:08​:10PM -0700, Frdric Buclin wrote​:

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439,
Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when
the utf8 flag is turned on. The testcase given in the bug,
https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl
script to run from the shell) shows this very clearly. Due to this bug,
all pages containing such strings are left blank, which is a real
problem for webapps such as Bugzilla.

The bug seems to be caused by a regexp using pos() inside a substitution, and
can be reduced to something like this​:

$ cat 52104.pl
use strict;
use warnings;

$_ = chr(0x410) . "N";

s/N/ pos(); "" /e;

use Devel​::Peek;
Dump ($_);

__END__
$ ./perl -Ilib 52104.pl
Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6.
Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6.
Malformed UTF-8 character (unexpected end of string) in match position at 52104.pl line 6.
SV = PVMG(0x930958) at 0x8ce940
  REFCNT = 1
  FLAGS = (SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x8fe668 "\320\220"\0 [UTF8 "\x{410}"]
  CUR = 2
  LEN = 8
  MAGIC = 0x8ebd68
  MG_VIRTUAL = &PL_vtbl_utf8
  MG_TYPE = PERL_MAGIC_utf8(w)
  MG_LEN = -1
  MAGIC = 0x8ef338
  MG_VIRTUAL = &PL_vtbl_mglob
  MG_TYPE = PERL_MAGIC_regex_global(g)
  MG_LEN = -1

It's still present in blead. It's not Cyrillic specific, but I happened to stick
to the same Cyrillic character in the test case.

I don't know what the cause is.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2008

From @nwc10

On Wed, Mar 26, 2008 at 12​:50​:55PM +0000, Nicholas Clark wrote​:

On Tue, Mar 25, 2008 at 05​:08​:10PM -0700, Frdric Buclin wrote​:

As described at https://bugzilla.mozilla.org/show_bug.cgi?id=423439,
Text​::Wrap​::wrap() generates a segfault with Cyrillic characters when
the utf8 flag is turned on. The testcase given in the bug,
https://bugzilla.mozilla.org/attachment.cgi?id=311526 (a simple Perl
script to run from the shell) shows this very clearly. Due to this bug,
all pages containing such strings are left blank, which is a real
problem for webapps such as Bugzilla.

The bug seems to be caused by a regexp using pos() inside a substitution, and
can be reduced to something like this​:

I don't know what the cause is.

It should all be fixed by the appended change, which I expect will be in 5.8.9
soon (RC1 within weeks).

Nicholas Clark

Change 33580 by nicholas@​nicholas-saigo on 2008/03/26 21​:05​:20

  The offset for pos is stored as bytes, and converted to (Unicode)
  character position when read, if needed. The code for setting pos
  inside subst was incorrectly converting to character position before
  storing the value. This code appears to have been buggy since it was
  added in 2000 in change 7562.

Affected files ...

... //depot/perl/pp_ctl.c#688 edit
... //depot/perl/t/op/subst.t#50 edit

Differences ...

==== //depot/perl/pp_ctl.c#688 (text) ====

@​@​ -298,7 +298,6 @​@​
  { /* Update the pos() information. */
  SV * const sv = cx->sb_targ;
  MAGIC *mg;
- I32 i;
  SvUPGRADE(sv, SVt_PVMG);
  if (!(mg = mg_find(sv, PERL_MAGIC_regex_global))) {
#ifdef PERL_OLD_COPY_ON_WRITE
@​@​ -308,10 +307,7 @​@​
  mg = sv_magicext(sv, NULL, PERL_MAGIC_regex_global, &PL_vtbl_mglob,
  NULL, 0);
  }
- i = m - orig;
- if (DO_UTF8(sv))
- sv_pos_b2u(sv, &i);
- mg->mg_len = i;
+ mg->mg_len = m - orig;
  }
  if (old != rx)
  (void)ReREFCNT_inc(rx);

==== //depot/perl/t/op/subst.t#50 (xtext) ====

@​@​ -7,7 +7,7 @​@​
}

require './test.pl';
-plan( tests => 136 );
+plan( tests => 139 );

$x = 'foo';
$_ = "x";
@​@​ -583,3 +583,11 @​@​
  is($want,$_,"RT#17542");
}

+{
+ my @​tests = ('ABC', "\xA3\xA4\xA5", "\x{410}\x{411}\x{412}");
+ foreach (@​tests) {
+ my $id = ord $_;
+ s/./pos/ge;
+ is($_, "012", "RT#52104​: $id");
+ }
+}

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2008

From lpsolit@gmail.com

For all installations which do not have the not-yet-released Perl 5.8.9,
which workaround(s) do they have? On critical installations, it's hard
to upgrade Perl as it's a pretty heavy change.

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2008

From @nwc10

On Thu, Mar 27, 2008 at 10​:41​:51PM +0100, Frdric Buclin wrote​:

For all installations which do not have the not-yet-released Perl 5.8.9,
which workaround(s) do they have? On critical installations, it's hard
to upgrade Perl as it's a pretty heavy change.

Given that the bug means that using pos inside the replacement in s///e isn't
going to work with Unicode, it unfortunately is as simple and restrictive as
"don't do that then". Which, I guess, in this case means not triggering the
replacement in Text​::Tabs​::expand(), which does just that.

For Bugzilla, as it's generating HTML for web pages, if it's not text in <pre>
tags, am I right in assuming that it doesn't matter in the HTML source whether
it's "\t" or " ", as whitespace is equivalent and folded? If so, can it be
worked round by translating all "\t" to " " before the call to Text​::Wrap?

Or is this explicitly for fixed width text, where formatting of tabs at
tabstops is an issue?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2008

From Robin.Barker@npl.co.uk

From​: Fr��d��ric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released Perl 5.8.9,
which workaround(s) do they have? On critical installations, it's hard
to upgrade Perl as it's a pretty heavy change.

I have rewritten Text​::Tabs to avoid s//pos/
The code is at work and I haven't had a chance to test it.
From memory, the code for C<sub expand> is below.

Robin

sub expand {
  my @​l;
  for ( @​_ ) {
  my $s = '';
  for (split(/^/m, $_, -1)) {
  my @​tabs = split(/\t/, $_, -1);
  my $line = shift @​tabs;
  for (@​tabs) {
  my $pad = $tabstop - (length $line) % $tabstop;
  $line .= " " x $pad;
  $line .= $_;
  }
  $s .= $line;
  }
  push(@​l, $s);
  }
  return @​l if wantarray;
  return $l[0];
}


This e-mail and any attachments may contain confidential and/or
privileged material; it is for the intended addressee(s) only.
If you are not a named addressee, you must not use, retain or
disclose such information.

NPL Management Ltd cannot guarantee that the e-mail or any
attachments are free from viruses.

NPL Management Ltd. Registered in England and Wales. No​: 2937881
Registered Office​: Serco House, 16 Bartley Wood Business Park,
  Hook, Hampshire, United Kingdom RG27 9UY


@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2008

From @nwc10

On Fri, Mar 28, 2008 at 10​:58​:25PM -0000, Robin Barker wrote​:

From​: Frédéric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released Perl 5.8.9,
which workaround(s) do they have? On critical installations, it's hard
to upgrade Perl as it's a pretty heavy change.

I have rewritten Text​::Tabs to avoid s//pos/
The code is at work and I haven't had a chance to test it.

From memory, the code for C<sub expand> is below.

Robin

sub expand {
my @​l;
for ( @​_ ) {
my $s = '';
for (split(/^/m, $_, -1)) {
my @​tabs = split(/\t/, $_, -1);
my $line = shift @​tabs;
for (@​tabs) {
my $pad = $tabstop - (length $line) % $tabstop;
$line .= " " x $pad;
$line .= $_;
}
$s .= $line;
}
push(@​l, $s);
}
return @​l if wantarray;
return $l[0];
}

Although that's not a huge win unless the work around code goes into a
release on CPAN, which is not something that p5p controls.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2008

From @ap

* Robin Barker <Robin.Barker@​npl.co.uk> [2008-03-29 00​:00]​:

From​: Frédéric Buclin [mailto​:lpsolit@​gmail.com]

For all installations which do not have the not-yet-released
Perl 5.8.9, which workaround(s) do they have? On critical
installations, it's hard to upgrade Perl as it's a pretty
heavy change.

I have rewritten Text​::Tabs to avoid s//pos/ The code is at
work and I haven't had a chance to test it. From memory, the
code for C<sub expand> is below.

I wrote the version of `expand` with s//pos/e; it is the way it
is for maximum performance. The old code ran in O(n^2) but was
fast for short strings; my goal was O(n) without significantly
higher overhead. I had some variants I liked better, code quality
wise, but they were slower.

Your version is O(n) too, but if you benchmark it you’ll find it
is measurably slower than mine in all cases, and much slower than
the old Text​::Tabs code for short strings.

There have been some fixes in Text​::Wrap since my performance
patch, but none in Text​::Tabs, so if you have a buggy version
of perl I suggest you simply downgrade Text​::Tabs to the version
from the Text-Tabs+Wrap-2001.0929 distribution.

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Apr 26, 2008

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant