Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 string substitution corrupts memory #7724

Closed
p5pRT opened this issue Dec 25, 2004 · 10 comments
Closed

UTF-8 string substitution corrupts memory #7724

p5pRT opened this issue Dec 25, 2004 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 25, 2004

Migrated from rt.perl.org#33185 (status was 'resolved')

Searchable as RT33185$

@p5pRT
Copy link
Author

p5pRT commented Dec 25, 2004

From sroy@search-box.com

This is a bug report for perl from sroy@​search-box.com,
generated with the help of perlbug 1.35 running under perl v5.8.4.


The following test program corrupts perl's memory​:


#!/usr/bin/perl
#
# Demonstrates memory corruption bug in 5.8.4. The 'use Benchmark'
# line does not affect the corruption, but on my system it
# moves memory allocations around enough to ensure that 'perl -d bug.pl'
# hangs, and that under 'gdb debugperl' it produces a segmentation fault.
# The exact string is also unimportant as long as it has characters
# that match the string class in the s///. The one chosen here simply
# reliably reproduces the problem on my system.
#

use Encode;
use Benchmark;

$_ = decode_utf8('title​: �Ã�¿�Â��Ã�¿�Ã�¿�Â��Â��Ã�¿�Â��Â��Ã�¿�Ã�¿�Â��Ã�¿�Â��Ã�¿�Ã�¿�Â�', 1);
$_ =~ s/[^[​:print​:]]/ /g;


Since it's a memory corruption, it may or may not crash when running
under perl, perl -d, or gdb. To reliably observe the problem under gdb​:

1. gdb debugperl (a perl compiled with debugging on)
2. b regexec.c​:4373
3. r -d bug.pl
4. c (from within the perl debugger)
5. c
6. c
7. crash (on my system)

The breakpoint stops the first time perl needs to check whether a
utf8 character is part of a string class. At this point (step #5) everything
is ok. By step #6 the value of PL_bostr (my_perl->Tbostr) is corrupted.
To see more details, instead of c at step #6 do​:

6. fin
7. s 4

Now the debugger is sitting at the line that corrupts prog->startp.
Ultimately, this corruption leads to a seg fault at pp_hot.c​:2151 when perl
tries to copy characters as part of the s/// operation.

ANALYSIS​:

In the middle of processing the regular expression, The regex library
demand-loads a bunch of stuff to create the swashes for the [​:print​:]
expression. At the end of all that PL_bostr has a completely new value.
I have no idea whether the right fix is to move away from using PL_bostr
in the regex library in favor of some local variable, or to try and
save PL_bostr and restore it before any line that might change it.

WORKAROUND​:

Adding a 'use utf8' pragma at the top of the program seems to load everything
ahead of time and avoid the problem with the demand-load. I have no real
confidence that it avoids other bugs of this sort, though. Note that
if you add 'use utf8' to the test program, you'll want to get rid of the
decode_utf8 call since now perl interprets the string directly as utf8.

Scott



Flags​:
  category=core
  severity=high


Site configuration information for perl v5.8.4​:

Configured by Debian Project at Mon Oct 25 01​:52​:37 EST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration​:
  Platform​:
  osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
  uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18​:17​:45 est 2004 i686 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
  ccversion='', gccversion='3.3.5 (Debian 1​:3.3.5-1)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
  gnulibc_version='2.3.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:
 


@​INC for perl v5.8.4​:
  /etc/perl
  /usr/local/lib/perl/5.8.4
  /usr/local/share/perl/5.8.4
  /usr/lib/perl5
  /usr/share/perl5
  /usr/lib/perl/5.8
  /usr/share/perl/5.8
  /usr/local/lib/site_perl
  /usr/local/lib/perl/5.8.3
  /usr/local/share/perl/5.8.3
  .


Environment for perl v5.8.4​:
  HOME=/home/sroy
  LANG (unset)
  LANGUAGE (unset)
  LC_ALL=C
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/sroy/cvs/searchbox/server​:/usr/local/bin​:/usr/bin​:/bin​:/usr/bin/X11​:/usr/games
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2004

From hsr@cs.stanford.edu

It looks like something in the mail chain doesn't like bytes with their high order bits set. Here's a corrected test program that's ASCII.

Scott


#!/usr/bin/perl
#
# Demonstrates memory corruption bug in 5.8.4. The 'use Benchmark'
# line does not affect the corruption, but on my system it
# moves memory allocations around enough to ensure that 'perl -d bug.pl'
# hangs, and that under 'gdb debugperl' it produces a segmentation fault.
# The exact string is also unimportant as long as it has characters
# that match the string class in the s///. The one chosen here simply
# reliably reproduces the problem on my system.
#

use Encode;
use Benchmark;

$_ = decode_utf8("\164\151\164\154\145\72\40\303\277\302\200\303\277\303\277\302\202\302\222\303\277\302\217\302\220\303\277\303\277\302\210\303\277\302\201\303\277\303\277\302\202", 1);
$_ =~ s/[^[​:print​:]]/ /g;


@p5pRT
Copy link
Author

p5pRT commented Dec 26, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 30, 2004

From @nwc10

On Sat, Dec 25, 2004 at 09​:50​:46PM -0000, sroy @​ search-box. com wrote​:

The breakpoint stops the first time perl needs to check whether a
utf8 character is part of a string class. At this point (step #5) everything
is ok. By step #6 the value of PL_bostr (my_perl->Tbostr) is corrupted.
To see more details, instead of c at step #6 do​:

6. fin
7. s 4

Now the debugger is sitting at the line that corrupts prog->startp.
Ultimately, this corruption leads to a seg fault at pp_hot.c​:2151 when perl
tries to copy characters as part of the s/// operation.

I can recreate this on OS X when running with the perl debugger. I can't
recreate it on FreeBSD (on a box where valgrind has been installed) and
annoyingly the x86 Linux box I usually use for this sort of thing is
currently inaccessible.

ANALYSIS​:

In the middle of processing the regular expression, The regex library
demand-loads a bunch of stuff to create the swashes for the [​:print​:]
expression. At the end of all that PL_bostr has a completely new value.
I have no idea whether the right fix is to move away from using PL_bostr
in the regex library in favor of some local variable, or to try and
save PL_bostr and restore it before any line that might change it.

Thanks for the analysis, which seems to be spot on. (Seems, because I'm no
expert on the regexp engine's guts).

Ideally we'd really like to re-write the regexp engine sufficiently to remove
all the global state, and hence make it totally re-entrant. Currently no-one
with the expertise to do this has the time.

Currently there are kludges to save enough state to theoretically make the
utf8 initialisation work​:

/* XXX Here's a total kludge. But we need to re-enter for swash routines. */

void
Perl_save_re_context(pTHX)
{
  SAVEI32(PL_reg_flags); /* from regexec.c */
  SAVEPPTR(PL_bostr);
  SAVEPPTR(PL_reginput); /* String-input pointer. */

but what doesn't make sense to me is why PL_bostr isn't being saved (or
maybe isn't being restored) via the code path that you code takes.

The realistic fix is going to be to make it save and restore correctly for
the class of operations that your code represents. I don't have the
experience with the regexp engine to know where to look to quickly find the
correct solution, but I believe that several other people on the
perl5-porters mailing list do.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Dec 30, 2004

From @nwc10

On Thu, Dec 30, 2004 at 06​:56​:49PM +0000, Nicholas Clark wrote​:

The realistic fix is going to be to make it save and restore correctly for
the class of operations that your code represents. I don't have the
experience with the regexp engine to know where to look to quickly find the
correct solution, but I believe that several other people on the
perl5-porters mailing list do.

OK. I underestimate my brute force and ignorance. With a watchpoint on
my_perl->Tbostr, a backtrace at the first point of modification is​:

0 Perl_pp_match (my_perl=0x800200) at pp_hot.c​:1288
#1 0x000cfd70 in Perl_runops_debug (my_perl=0x800200) at dump.c​:1572
#2 0x0001fde0 in Perl_call_sv (my_perl=0x800200, sv=0x4c, flags=6) at perl.c​:2134
#3 0x00024308 in S_call_list_body (my_perl=0x800200, cv=0x888224) at perl.c​:4721
#4 0x00023e34 in Perl_call_list (my_perl=0x800200, oldscope=3, paramList=0x80b278) at perl.c​:4650
#5 0x00014de4 in Perl_newATTRSUB (my_perl=0x800200, floor=3, o=0x0, proto=0x888224, attrs=0x0, block=0x111e050) at op.c​:4482
#6 0x00011228 in Perl_utilize (my_perl=0x800200, aver=1, floor=3, version=0x111e100, idop=0x63a360, arg=0x111df60) at op.c​:3039
#7 0x00011434 in Perl_vload_module (my_perl=0x800200, flags=2, name=0x1, ver=0x111df60, args=0x4599e0) at op.c​:3141
#8 0x00011284 in Perl_load_module (my_perl=0x68a7e0, flags=8430128, name=0x1, ver=0x68a7e0) at op.c​:3093
#9 0x000b139c in Perl_swash_init (my_perl=0x800200, pkg=0xeaf84 "utf8", name=0xe7ef0 "", listsv=0x32b14c, minbits=1, none=0) at utf8.c​:1587
#10 0x000df018 in Perl_regclass_swash (my_perl=0x800200, node=0x80a230, doinit=1 '\001', listsvp=0x0, altsvp=0xbfffee34) at regexec.c​:4337
#11 0x000df254 in S_reginclass (my_perl=0x800200, n=0x6e31c4, p=0x111e017 "\202", lenp=0x0, do_utf8=1 '\001') at regexec.c​:4388
#12 0x000d5854 in S_find_byclass (my_perl=0x800200, prog=0x6e3180, c=0x6e31c4, s=0x111e017 " perl.c​:1853
#17 0x0001f03c in perl_run (my_perl=0x800200) at perl.c​:1771
#18 0x0000289c in main (argc=4, argv=0xbffff730, env=0x1) at perlmain.c​:98

where the culprit is #9, which was exercising this code *before* saving the
regexp engine context​:

  if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) { /* demand load utf8 */
  ENTER;
  errsv_save = newSVsv(ERRSV);
  Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpvn(pkg,pkg_len),
  Nullsv);
  if (!SvTRUE(ERRSV))
  sv_setsv(ERRSV, errsv_save);
  SvREFCNT_dec(errsv_save);
  LEAVE;
  }

The appended patch seems to cure the problem for me, but I'm not confident
that it's the correct way.

Nicholas Clark

==== //depot/perl/utf8.c#212 - /Users/nick/p4perl/perl/utf8.c ====

Inline Patch
--- /tmp/tmp.26186.0    Thu Dec 30 19:46:18 2004
+++ /Users/nick/p4perl/perl/utf8.c      Thu Dec 30 19:24:46 2004
@@ -1581,6 +1581,8 @@ Perl_swash_init(pTHX_ char* pkg, char* n
     HV *stash = gv_stashpvn(pkg, pkg_len, FALSE);
     SV* errsv_save;

+    ENTER;
+    save_re_context();
     if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) {     /* demand load utf8 */
        ENTER;
        errsv_save = newSVsv(ERRSV);
@@ -1601,10 +1603,8 @@ Perl_swash_init(pTHX_ char* pkg, char* n
     PUSHs(sv_2mortal(newSViv(minbits)));
     PUSHs(sv_2mortal(newSViv(none)));
     PUTBACK;
-    ENTER;
     SAVEI32(PL_hints);
     PL_hints = 0;
-    save_re_context();
     if (IN_PERL_COMPILETIME) {
        /* XXX ought to be handled by lex_start */
        SAVEI32(PL_in_my);

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2004

From @hvds

Nicholas Clark <nick@​ccl4.org> wrote​:
:The appended patch seems to cure the problem for me, but I'm not confident
:that it's the correct way.

I notice that this changes the order things are stacked; I'm not sure if
that's ever going to be relevant. In particularly this swaps the first
two of the sequence​:
  PUSHSTACKi; ENTER; LEAVE; POPSTACK
.. but I'm not sure what they all do without expanding a lot of macros.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2004

From @nwc10

On Fri, Dec 31, 2004 at 03​:04​:49PM +0000, hv@​crypt.org wrote​:

Nicholas Clark <nick@​ccl4.org> wrote​:
:The appended patch seems to cure the problem for me, but I'm not confident
:that it's the correct way.

I notice that this changes the order things are stacked; I'm not sure if
that's ever going to be relevant. In particularly this swaps the first
two of the sequence​:
PUSHSTACKi; ENTER; LEAVE; POPSTACK
.. but I'm not sure what they all do without expanding a lot of macros.

Aha. I was rather hoping that someone would be able to tell me if the changes
were going to be relevant. The "simplest" fix appears to be add
save_re_context(); after the ENTER in

  if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) { /* demand load utf8 */
  ENTER;
  errsv_save = newSVsv(ERRSV);
  Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpvn(pkg,pkg_len),
  Nullsv);
  if (!SvTRUE(ERRSV))
  sv_setsv(ERRSV, errsv_save);
  SvREFCNT_dec(errsv_save);
  LEAVE;
  }

except that seems to be wasteful, as it would mean doing all the save work
twice, because of the existing save_re_context() later in Perl_swatch_init.
I was hoping that there was a good way to save the regexp engine's context
just once for the entire function.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Dec 31, 2004

From @iabyn

On Fri, Dec 31, 2004 at 03​:04​:49PM +0000, hv@​crypt.org wrote​:

Nicholas Clark <nick@​ccl4.org> wrote​:
:The appended patch seems to cure the problem for me, but I'm not confident
:that it's the correct way.

I notice that this changes the order things are stacked; I'm not sure if
that's ever going to be relevant. In particularly this swaps the first
two of the sequence​:
PUSHSTACKi; ENTER; LEAVE; POPSTACK
.. but I'm not sure what they all do without expanding a lot of macros.

I'd have thought that the better approach would be to move the

  if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) { /* demand load utf8 */

block further down to just above the line

  if (call_method("SWASHNEW", G_SCALAR))

then the call to Perl_load_module is protected by the PUSHSTACKi. Note
that PUSHSTACKi is needed any place where the caller isn't prepared for
the stack to get extended and thus possibly reallocated (thus invalidating
SP etc). It gives you a brand new stack to play with.

--
In England there is a special word which means the last sunshine
of the summer. That word is "spring".

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2005

From @iabyn

On Fri, Dec 31, 2004 at 04​:11​:16PM +0000, Dave Mitchell wrote​:

On Fri, Dec 31, 2004 at 03​:04​:49PM +0000, hv@​crypt.org wrote​:

Nicholas Clark <nick@​ccl4.org> wrote​:
:The appended patch seems to cure the problem for me, but I'm not confident
:that it's the correct way.

I notice that this changes the order things are stacked; I'm not sure if
that's ever going to be relevant. In particularly this swaps the first
two of the sequence​:
PUSHSTACKi; ENTER; LEAVE; POPSTACK
.. but I'm not sure what they all do without expanding a lot of macros.

I'd have thought that the better approach would be to move the

if \(\!gv\_fetchmeth\(stash\, "SWASHNEW"\, 8\, \-1\)\) \{    /\* demand load utf8 \*/

block further down to just above the line

if \(call\_method\("SWASHNEW"\, G\_SCALAR\)\)

then the call to Perl_load_module is protected by the PUSHSTACKi. Note
that PUSHSTACKi is needed any place where the caller isn't prepared for
the stack to get extended and thus possibly reallocated (thus invalidating
SP etc). It gives you a brand new stack to play with.

Nicolas just reminded of this outstanding issue from December, so I've
committed the following change. It (hopefully) does all the state saving
needed *before* doing the require.

Dave.

--
Spock (or Data) is fired from his high-ranking position for not being able
to understand the most basic nuances of about one in three sentences that
anyone says to him.
  -- Things That Never Happen in "Star Trek" #19

Change 24084 by davem@​davem-splatty on 2005/03/26 21​:25​:47

  [perl #33185] UTF-8 string substitution corrupts memory
  The implicit call of 'require utf8' triggered by code like
  "\x{100}" =~ /[[​:print​:]]/
  wasn't saving state correctly first.

Affected files ...

... //depot/perl/utf8.c#223 edit

Differences ...

==== //depot/perl/utf8.c#223 (text) ====

@​@​ -1578,6 +1578,11 @​@​
  HV *stash = gv_stashpvn(pkg, pkg_len, FALSE);
  SV* errsv_save;

+ PUSHSTACKi(PERLSI_MAGIC);
+ ENTER;
+ SAVEI32(PL_hints);
+ PL_hints = 0;
+ save_re_context();
  if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) { /* demand load utf8 */
  ENTER;
  errsv_save = newSVsv(ERRSV);
@​@​ -1589,7 +1594,6 @​@​
  LEAVE;
  }
  SPAGAIN;
- PUSHSTACKi(PERLSI_MAGIC);
  PUSHMARK(SP);
  EXTEND(SP,5);
  PUSHs(sv_2mortal(newSVpvn(pkg, pkg_len)));
@​@​ -1598,10 +1602,6 @​@​
  PUSHs(sv_2mortal(newSViv(minbits)));
  PUSHs(sv_2mortal(newSViv(none)));
  PUTBACK;
- ENTER;
- SAVEI32(PL_hints);
- PL_hints = 0;
- save_re_context();
  if (IN_PERL_COMPILETIME) {
  /* XXX ought to be handled by lex_start */
  SAVEI32(PL_in_my);

@p5pRT p5pRT closed this as completed Mar 26, 2005
@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2005

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant