Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

g suffix on string search (/.../g) can cause string corruption #9073

Closed
p5pRT opened this issue Oct 19, 2007 · 8 comments
Closed

g suffix on string search (/.../g) can cause string corruption #9073

p5pRT opened this issue Oct 19, 2007 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 19, 2007

Migrated from rt.perl.org#46563 (status was 'resolved')

Searchable as RT46563$

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2007

From owl@barnowl.research.intel-research.net

Created by dgay42@gmail.com

The following code prints "a" rather than "z"​:
$z = "z";
$z = sprintf "aaa" if $z =~ /(.)/g;
printf "$1\n";

The problem does not occur if the g suffix is removed from the search.

The problem is that $1 is aliased with $z, and $z gets subsequently
overwritten. The following *might* be a fix, to the extent that I've
understood pp_match (...)​:

Inline Patch
--- pp_hot.c    2006-09-29 17:28:06.000000000 -0700
+++ pp_hot_fix.c        2007-10-19 16:11:28.000000000 -0700
@@ -1303,7 +1303,7 @@
            }
        }
     }
-    if ((!global && rx->nparens)
+    if ((rx->nparens)
            || SvTEMP(TARG) || PL_sawampersand)
        r_flags |= REXEC_COPY_STR;
     if (SvSCREAM(TARG))

One "minor" drawback is that making this fix causes SPEC2006's perl test to consume very large amounts of memory \(and fail when used with a 32\-bit address space\, at least\.\.\.\)\.
Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.8:

Configured by owl at Tue Sep 25 23:53:19 PDT 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=darwin, osvers=8.10.1, archname=darwin-2level
    uname='darwin barnowl.research.intel-research.net 8.10.1 darwin kernel version 8.10.1: wed may 23 16:33:00 pdt 2007; root:xnu-792.22.5~1release_i386 i386 i386 '
    config_args='-des -Dprefix=/opt/local -Dccflags=-I'/opt/local/include' -Dldflags=-L/opt/local/lib -Dvendorprefix=/opt/local -Dcc=/usr/bin/gcc-4.0'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='/usr/bin/gcc-4.0', ccflags ='-I/opt/local/include -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/opt/local/include',
    optimize='-O3',
    cppflags='-no-cpp-precomp -I/opt/local/include -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/opt/local/include'
    ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5363)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='-L/opt/local/lib -L/usr/local/lib'
    libpth=/usr/local/lib /opt/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lc
    perllibs=-ldl -lm -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.8:
    /opt/local/lib/perl5/5.8.8/darwin-2level
    /opt/local/lib/perl5/5.8.8
    /opt/local/lib/perl5/site_perl/5.8.8/darwin-2level
    /opt/local/lib/perl5/site_perl/5.8.8
    /opt/local/lib/perl5/site_perl
    /opt/local/lib/perl5/vendor_perl/5.8.8/darwin-2level
    /opt/local/lib/perl5/vendor_perl/5.8.8
    /opt/local/lib/perl5/vendor_perl
    .


Environment for perl v5.8.8:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/owl
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Users/owl/bin:/usr/X11R6/bin:/usr/local/bin:/opt/local/bin:/opt/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2007

From @demerphq

On 10/20/07, via RT owl @​ barnowl. research. intel-research. net
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by owl@​barnowl.research.intel-research.net
# Please include the string​: [perl #46563]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=46563 >

This is a bug report for perl from dgay42@​gmail.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.

-----------------------------------------------------------------
[Please enter your report here]

The following code prints "a" rather than "z"​:
$z = "z";
$z = sprintf "aaa" if $z =~ /(.)/g;
printf "$1\n";

The problem does not occur if the g suffix is removed from the search.

The problem is that $1 is aliased with $z, and $z gets subsequently
overwritten. The following *might* be a fix, to the extent that I've
understood pp_match (...)​:

--- pp_hot.c 2006-09-29 17​:28​:06.000000000 -0700
+++ pp_hot_fix.c 2007-10-19 16​:11​:28.000000000 -0700
@​@​ -1303,7 +1303,7 @​@​
}
}
}
- if ((!global && rx->nparens)
+ if ((rx->nparens)
|| SvTEMP(TARG) || PL_sawampersand)
r_flags |= REXEC_COPY_STR;
if (SvSCREAM(TARG))

One "minor" drawback is that making this fix causes SPEC2006's perl
test to consume very large amounts of memory (and fail when used with
a 32-bit address space, at least...).

This is a known, "wont fix" (at least for now) bug in the regex engine.

Your patch was actually done by me over a year ago and then retracted
as it makes matches using scalar /g in a loop go quadratic, which can
have punishing consequences on code that uses it on long strings in
such places as while (/.../g) {...}. The workaround is to ensure that
you do not modify the target of a /g match in between a successful
match and accessing the magic variables. Since this is considered to
be a rare situation in comparison to using /g on long strings we have
decided to accept the lessor of two weevils (bugs).

A proper solution hopefully will be forthcoming in 5.12 where I hope
to find the time to completely redesign the way regex match results
are stored, how regex magic is applied to SV's and how target string
copying occurs.

Basically what we need to do is have a flag on the sv that says "ive
been copied for /g" which is reset if the sv is modified, we then need
to change the logic for scalar /g matches to ensure that it checks the
value and does the copy if it is not set, and then reuses the copy
henceforth. I tried to get something along these lines working but my
attempts were crude and met with failure, and lack of tuits prevented
me from going the long and hard route of dealing with magic on the sv
and etc.

So for the time being the solution to this problem is "dont do that".

However we probably should document this issue of scalar context /g
matches. Currently I dont think it is documented anywhere.

For now and for older perls this bug is firmly in the "wont fix"
category. Sorry.

Thanks for your report however. :-)

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2007

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 20, 2011

From @shlomif

This is a bug report for perl from shlomif@​iglu.org.il,
generated with the help of perlbug 1.39 running under perl 5.14.1.

The following command prints '\0' instead of 'a'​:

perl -e '$_ = "a"; while (/(.)/g) { chop; print $1 }' | xxd

So does the following command​:

perl -e '$_ = "a"; while (/(.)/g) { $_=""; print $1 }' | xxd

Removing the /g fixes it​:

perl -e '$_ = "a"; while (/(.)/) { $_=""; print $1 }' | xxd

This was reported by subichan on #perl on Freenode. I've reported it on their behalf
before, but am reporting it again, because it seems my bug report was not sent due to
the *.perl.org outage.

This happens on perl-5.14.1 on Linux and with all perls we tried back to perl-5.8.0.


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.14.1​:

Configured by Mageia at Sun Jun 19 13​:46​:32 UTC 2011.

Summary of my perl5 (revision 5 version 14 subversion 1) configuration​:
 
  Platform​:
  osname=linux, osvers=2.6.33.7-server-2.2mnb, archname=x86_64-linux-thread-multi
  uname='linux jonund.mageia.org 2.6.33.7-server-2.2mnb #1 smp fri dec 10 00​:37​:20 eet 2010 x86_64 x86_64 x86_64 gnulinux '
  config_args='-des -Dinc_version_list=5.14.1 5.14.1/x86_64-linux-thread-multi 5.14.0 5.14.0/x86_64-linux-thread-multi 5.12.3 5.12.2 5.12.1 5.12.0 5.10.1 5.10.0 5.8.8 5.8.7 5.8.6 5.8.5 5.8.4 5.8.3 5.8.2 5.8.1 5.8.0 5.6.1 5.6.0 -Darchname=x86_64-linux -Dcc=x86_64-mageia-linux-gnu-gcc -Doptimize=-O2 -g -pipe -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4 -DDEBUGGING=-g -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dsitebin=/usr/local/bin -Dsiteman1dir=/usr/local/share/man/man1 -Dsiteman3dir=/usr/local/share/man/man3 -Dman3dir=/usr/share/man/man3pm -Dvendorman3dir=/usr/share/man/man3 -Dman3ext=3pm -Dcf_by=Mageia -Dmyhostname=localhost -Dperladmin=root@​localhost -Dcf_email=root@​localhost -Ud_csh -Duseshrplib -Duseithreads -Di_db -Di_ndbm -Di_gdbm'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='x86_64-mageia-linux-gnu-gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2 -g -pipe -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.5.2', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='x86_64-mageia-linux-gnu-gcc', ldflags =' -fstack-protector -L/usr/local/lib64'
  libpth=/usr/local/lib64 /lib/../lib64 /usr/lib/../lib64 /lib /usr/lib /lib64 /usr/lib64
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=/lib/libc-2.12.1.so, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.12.1'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.14.1/x86_64-linux-thread-multi/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4 -L/usr/local/lib64'

Locally applied patches​:
  Mageia patches


@​INC for perl 5.14.1​:
  /home/shlomif/apps/perl/modules/lib/perl5/site_perl/5.14.1
  /home/shlomif/apps/perl/modules/lib/perl5/site_perl/5.12.3/x86_64-linux-thread-multi
  /home/shlomif/apps/perl/modules/lib/perl5/site_perl/5.12.3
  /home/shlomif/apps/perl/modules/lib/site_perl/5.14.1
  /home/shlomif/apps/perl/modules/lib/site_perl/5.12.3
  /home/shlomif/apps/perl/modules/lib/perl5/5.14.1
  /home/shlomif/apps/perl/modules/lib/perl5/5.12.3
  /usr/lib/perl5/site_perl/5.14.1/x86_64-linux-thread-multi
  /usr/lib/perl5/site_perl/5.14.1
  /usr/lib/perl5/vendor_perl/5.14.1/x86_64-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.14.1
  /usr/lib/perl5/5.14.1/x86_64-linux-thread-multi
  /usr/lib/perl5/5.14.1
  /usr/lib/perl5/site_perl/5.14.1
  /usr/lib/perl5/site_perl/5.14.1/x86_64-linux-thread-multi
  /usr/lib/perl5/site_perl
  /usr/lib/perl5/vendor_perl/5.14.1
  /usr/lib/perl5/vendor_perl/5.14.1/x86_64-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.14.0
  /usr/lib/perl5/vendor_perl/5.14.0/x86_64-linux-thread-multi
  /usr/lib/perl5/vendor_perl/5.12.3
  /usr/lib/perl5/vendor_perl/5.12.2
  /usr/lib/perl5/vendor_perl
  .


Environment for perl 5.14.1​:
  HOME=/home/shlomif
  LANG=en_US.UTF-8
  LANGUAGE=en_US.UTF-8​:en_US​:en
  LC_ADDRESS=en_US.UTF-8
  LC_COLLATE=en_US.UTF-8
  LC_CTYPE=en_US.UTF-8
  LC_IDENTIFICATION=en_US.UTF-8
  LC_MEASUREMENT=en_US.UTF-8
  LC_MESSAGES=en_US.UTF-8
  LC_MONETARY=en_US.UTF-8
  LC_NAME=en_US.UTF-8
  LC_NUMERIC=en_US.UTF-8
  LC_PAPER=en_US.UTF-8
  LC_SOURCED=1
  LC_TELEPHONE=en_US.UTF-8
  LC_TIME=en_US.UTF-8
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/shlomif/apps/test/quadpres/bin/​:/home/shlomif/apps/vim/bin​:/home/shlomif/apps/git/bin​:/home/shlomif/apps/perl/bin​:/home/shlomif/apps/test/quadpres/bin/​:/home/shlomif/apps/vim/bin​:/home/shlomif/apps/git/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/games​:/usr/lib/qt4/bin​:/home/shlomif/bin
  PERL5LIB=/home/shlomif/apps/perl/modules/lib/perl5/site_perl/5.14.1​:/home/shlomif/apps/perl/modules/lib/perl5/site_perl/5.12.3​:/home/shlomif/apps/perl/modules/lib/site_perl/5.14.1​:/home/shlomif/apps/perl/modules/lib/site_perl/5.12.3​:/home/shlomif/apps/perl/modules/lib/perl5/5.14.1​:/home/shlomif/apps/perl/modules/lib/perl5/5.12.3
  PERLBREW_PATH=/home/shlomif/apps/perl/bin
  PERLBREW_ROOT=/home/shlomif/apps/perl
  PERLBREW_VERSION=0.21
  PERL_AUTOINSTALL=--skipdeps --alldeps
  PERL_BADLANG (unset)
  PERL_MM_USE_DEFAULT=1
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 21, 2011

From @Hugmeir

On Sat, Aug 20, 2011 at 12​:05 PM, Shlomi Fish <perlbug-followup@​perl.org>wrote​:

The following command prints '\0' instead of 'a'​:

perl -e '$_ = "a"; while (/(.)/g) { chop; print $1 }' | xxd

So does the following command​:

perl -e '$_ = "a"; while (/(.)/g) { $_=""; print $1 }' | xxd

Removing the /g fixes it​:

perl -e '$_ = "a"; while (/(.)/) { $_=""; print $1 }' | xxd

Strangely enough, all of those work if you force the regex to cache the
original string, either by using /p or one of the special vars.

What's the intended behavior here?

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2012

From @cpansprout

Fixed by a41aa44.
--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2012

From [Unknown Contact. See original ticket]

Fixed by a41aa44.
--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2012

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant