Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture corruption through self-modying regexp (?{...}) #2310

Closed
p5pRT opened this issue Aug 4, 2000 · 22 comments
Closed

Capture corruption through self-modying regexp (?{...}) #2310

p5pRT opened this issue Aug 4, 2000 · 22 comments
Labels
Closable? We might be able to close this ticket, but we need to check with the reporter distro-Linux type-core type-regex type-regex-codeblocks

Comments

@p5pRT
Copy link

p5pRT commented Aug 4, 2000

Migrated from rt.perl.org#3634 (status was 'open')

Searchable as RT3634$

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2000

From jfriedl@yahoo-inc.com

Created by jfriedl@yahoo-inc.com

With the latest bleedperl, the script

  #!/usr/local/bin/perl -w
  use strict;

  my $text = "a";
  $text =~ m/(.(?{ $text .= "x" }))*/;

  print "text is [$text]\n";
  print "length of text​: ", length($text), "\n";
  print "starts​: ", join('|', @​-), "\n";
  print "ends : ", join('|', @​-), "\n";
  printf("length of match parts​: [%d|%d|%d]\n", length($`), length($&), length($'));
  printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'} $`, $&, $');
  print "\$1[$1]\n";

prints (when piped through cat -v)​:

  text is [axxxxxxxxx]
  length of text​: 10
  starts​: 0|7
  ends : 0|7
  length of match parts​: [0|8|0]
  match itself​: [|a^@​^X@​M-hd^O^H|X]
  $1[^H]

An optimization stops it from going forever, which might be considered a
bug[*], but in any case, the contents of $& and $1 are garbage.

It might make sense to turn the target of a search into a read-only
variable for the duration of the search.

Or maybe not -- self-modifying can be fun.

[*] as far as the optimizer causing side effects via (?{...}) to change,
  I think that's fine so long as it's advertised that such a thing is
  possible. I'd rather not have any regex with (?{...}) have all regex
  optimizations turned off....

Jeffrey

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.6.0:

Configured by jfriedl at Sat Jul 29 20:09:33 PDT 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.15, archname=i686-linux
    uname='linux fummy.dsl.yahoo.com 2.2.16 #6 smp sun jul 23 11:26:16 pdt 2000 i686 unknown '
    config_args='-ds -e -A optimize=-g'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
  Compiler:
    cc='cc', optimize='-O2 -g', gccversion=pgcc-2.91.66 19990314 (egcs-1.1.2 release)
    cppflags='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=/lib/libc-2.1.1.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.6.0:
    /home/jfriedl/lib/perl
    /home/jfriedl/lib/perl/yahoo
    /usr/local/lib/perl5/5.6.0/i686-linux
    /usr/local/lib/perl5/5.6.0
    /usr/local/lib/perl5/site_perl/5.6.0/i686-linux
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.6.0:
    HOME=/home/jfriedl
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/usr/local/pgsql/lib:/home/jfriedl/src/rvplayer5.0
    LOGDIR (unset)
    PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:/usr/local/gcc-2.95.2/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0
    PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh


@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2000

From [Unknown Contact. See original ticket]

Jeffrey Friedl (lists.p5p)​:

$text =~ m/(.(?{ $text .= "x" }))*/;

Or maybe not -- self-modifying can be fun.

May demons forever fly out of your nose! What on *earth* is the correct
behaviour here?

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2000

From [Unknown Contact. See original ticket]

Jeffrey Friedl <jfriedl@​yahoo-inc.com> wrote

It might make sense to turn the target of a search into a read-only
variable for the duration of the search.

Or maybe not -- self-modifying can be fun.

Bleagh! The effects of such antics should be undefined.

But the fact that you got garbage results suggests that SEGVs are not
far away. And they certainly shouldn't be allowed to happen.

Mike Guy

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2000

From @vanstyn

In <200008040802.BAA09267@​ventrue.yahoo.com>, Jeffrey Friedl writes​:
[...]
: $text =~ m/(.(?{ $text .= "x" }))*/;
[...]
:An optimization stops it from going forever, which might be considered a
:bug[*], but in any case, the contents of $& and $1 are garbage.
:
:It might make sense to turn the target of a search into a read-only
:variable for the duration of the search.

This has been discussed before; I agree it should be done.

:Or maybe not -- self-modifying can be fun.
:
:[*] as far as the optimizer causing side effects via (?{...}) to change,
: I think that's fine so long as it's advertised that such a thing is
: possible. I'd rather not have any regex with (?{...}) have all regex
: optimizations turned off....

It might be possible to add a new regexp flag to leave the target
writable, and turn off all optimisations that assume an unchanging
string. I'm not sure though what should happen if the string is
shortened beyond where we've matched, or if text we've already
matched changes to something we wouldn't have matched. It is also
likely to be difficult to turn off all optimisations without
slowing down the normal case.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Nov 21, 2004

In 2013, @iabyn wrote:

The assertion failures stop with the following commit, according to
bisect, although I haven't looked closely to decide whether this
is actually the complete fix or whether anything still needs addresssing.

commit 7016d6e
Author​: David Mitchell <davem@​iabyn.com>
Date​: Fri Sep 21 10​:29​:04 2012 +0100

stop regex engine reading beyond end of string
[...]

I think it's time to close this ticket; if we find anything that still needs addressing it'll almost certainly happen in a new ticket in any case, but I also think there's a good chance fuzzers would already have pointed us at any remaining problems in this area if they have any reasonable chance of generating non-null-terminated strings in perl code. (Though if they don't, maybe we need to give them a way.)

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2010

From @khwilliamson

I tried this on blead essentially equivalent to 5.12 RC0, and get
instead this​:
perl​: regcomp.c​:5199​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2010

From @gannett-ggreer

On Sun Mar 21 18​:40​:19 2010, khw wrote​:

I tried this on blead essentially equivalent to 5.12 RC0, and get
instead this​:
perl​: regcomp.c​:5199​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

Looks like it eventually corrupts memory (as seen with -Mre=debug below)​:

- - - 8< - - - 8< - - -
whilem​: matched 22 out of 0..32767
  22 <axxxxxxxxxxxxxxxxxxxxx> <>| 3​:
  OPEN1(5)
  22 <axxxxxxxxxxxxxxxxxxxxx> <>| 5​:
  REG_ANY(6)
  23 <axxxxxxxxxxxxxxxxxxxxxx> <>| 6​:
  EVAL(8)
  23 <%240%237Z%2%0%0%0%0xxxxxxxxxxxxxxx> <>| 8​:
  CLOSE1(10)
  23 <CLOSE1%0%0xxxxxxxxxxxxxxx> <>| 10​:
  WHILEM[1/1](0)
 
  whilem​: matched 23 out of 0..32767
  23 <CLOSE1%0%0xxxxxxxxxxxxxxx> <>| 3​:
  OPEN1(5)
  23 <CLOSE1%0%0xxxxxxxxxxxxxxx> <>| 5​:
  REG_ANY(6)
 
  failed...
 
  whilem​: failed, trying continuation...
  23 <CLOSE1%0%0xxxxxxxxxxxxxxx> <>| 11​:
  NOTHING(12)
  23 <CLOSE1%0%0xxxxxxxxxxxxxxx> <>| 12​:
  END(0)
Match successful!
text is [axxxxxxxxxxxxxxxxxxxxxxx]
length of text​: 24
starts​: 0|22
ends : 0|22
length of match parts​: [0|23|0]
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "re_comp.c",
line 5177 at /tmp/ppp line 12.
- - - 8< - - - 8< - - -

There's junk at the beginning. With a self-modifying regex I think you
get to keep both pieces though.

--
George Greer

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2011

From @nwc10

Created by @nwc10

The regex engine assumes that the scalar it's matching over can't change.
(In at least some cases)

If you use a (?{}) code block inside a regex to undefine the target scalar,
um​:

$ valgrind ./perl -Ilib -le '$a = "ydydydyd"; warn $_ foreach $a =~ /[^x]d(?{undef $a})[^x]d/g'
==46337== Memcheck, a memory error detector
==46337== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==46337== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==46337== Command​: ./perl -Ilib -le $a\ =\ "ydydydyd";\ warn\ $_\ foreach\ $a\ =~\ /[^x]d(?{undef\ $a})[^x]d/g
==46337==
--46337-- ./perl​:
--46337-- dSYM directory is missing; consider using --dsymutil=yes
==46337== Invalid read of size 1
==46337== at 0x1001AB360​: S_reginclass (in ./perl)
==46337== by 0x1001A174F​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337== Address 0x1006019b2 is 2 bytes inside a block of size 10 free'd
==46337== at 0x100280C7C​: free (vg_replace_malloc.c​:366)
==46337== by 0x1000AE8B5​: Perl_safesysfree (in ./perl)
==46337== by 0x1001240F9​: Perl_pp_undef (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x1001A4793​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337==
==46337== Invalid read of size 1
==46337== at 0x1001A17CB​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337== Address 0x1006019b3 is 3 bytes inside a block of size 10 free'd
==46337== at 0x100280C7C​: free (vg_replace_malloc.c​:366)
==46337== by 0x1000AE8B5​: Perl_safesysfree (in ./perl)
==46337== by 0x1001240F9​: Perl_pp_undef (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x1001A4793​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)

That would be a bad thing :-(

It's not a terrible thing, given that​:

  For reasons of security, this construct is forbidden if the regular
  expression involves run-time interpolation of variables, unless the
  perilous C<use re 'eval'> pragma has been used (see L<re>), or the
  variables contain results of the C<qr//> operator (see
  L<perlop/"qr/STRINGE<sol>msixpodual">).

My vague understanding of the engine is that there are mechanisms in place to
copy the target string. Should these also be triggered if the pattern contains
any code blocks? [or anything else that could have side effects *during* the
match, if anything else exists]

Nicholas Clark

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.15.0:

Configured by nick at Thu Jun 30 08:16:54 BST 2011.

Summary of my perl5 (revision 5 version 15 subversion 0) configuration:
  Derived from: 5ef88e32837b528ef762bb5bdc3074489cf43a85
  Platform:
    osname=darwin, osvers=10.8.0, archname=darwin-2level
    uname='darwin mouse-mill.local 10.8.0 darwin kernel version 10.8.0: tue jun 7 16:33:36 pdt 2011; root:xnu-1504.15.3~1release_i386 i386 '
    config_args='-Dusedevel=y -Dcc=ccache clang -Dld=clang -Ubincompat5005 -Uinstallusrbinperl -Dcf_email=nick@ccl4.org -Dperladmin=nick@ccl4.org -Dinc_version_list=  -Dinc_version_list_init=0 -Doptimize=-g -Uusethreads -Uuse64bitall -Uuselongdouble -Uusemymalloc -Duseperlio -Dprefix=~/Sandpit/snap5.9.x-v5.15.0-135-g5ef88e3-i -Dinstallman1dir=none -Dinstallman3dir=none -Dusevendorprefix -Dvendorprefix=~/Sandpit/vendor -Uuserelocatableinc -Ud_dosuid -Uuseshrplib -de -Accccflags=-DNO_PERL_PRESERVE_IVUV -Umad'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='ccache clang', ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/opt/local/include',
    optimize='-g',
    cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/opt/local/include'
    ccversion='', gccversion='4.2.1 Compatible Clang Compiler', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib -L/opt/local/lib'
    libpth=/usr/local/lib /opt/local/lib /usr/lib
    libs=-lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-ldl -lm -lutil -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.15.0:
    lib
    /Users/nick/Sandpit/snap5.9.x-v5.15.0-135-g5ef88e3-i/lib/perl5/site_perl/5.15.0/darwin-2level
    /Users/nick/Sandpit/snap5.9.x-v5.15.0-135-g5ef88e3-i/lib/perl5/site_perl/5.15.0
    /Users/nick/Sandpit/vendor/lib/perl5/vendor_perl/5.15.0/darwin-2level
    /Users/nick/Sandpit/vendor/lib/perl5/vendor_perl/5.15.0
    /Users/nick/Sandpit/snap5.9.x-v5.15.0-135-g5ef88e3-i/lib/perl5/5.15.0/darwin-2level
    /Users/nick/Sandpit/snap5.9.x-v5.15.0-135-g5ef88e3-i/lib/perl5/5.15.0
    .


Environment for perl 5.15.0:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/nick
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/opt/local/bin:/opt/local/sbin:/Users/nick/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/sbin:/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2011

From @cpansprout

On Thu Jun 30 04​:31​:53 2011, nicholas wrote​:

The regex engine assumes that the scalar it's matching over can't
change.
(In at least some cases)

If you use a (?{}) code block inside a regex to undefine the target
scalar,
um​:

Isn’t this the same as bug #3634?

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2011

From @nwc10

On Thu Jun 30 08​:37​:23 2011, sprout wrote​:

On Thu Jun 30 04​:31​:53 2011, nicholas wrote​:

The regex engine assumes that the scalar it's matching over can't
change.
(In at least some cases)

If you use a (?{}) code block inside a regex to undefine the target
scalar,
um​:

Isn’t this the same as bug #3634?

I think it might be. Sorry for the dup. I wasn't aware of the previous bug.

I guess this advice from Klortho applies​:

#11943 Ah yes, and you are the first person to have noticed this bug
since 1987. Sure.

Although I wasn't reporting that 'if' doesn't work :-)

@p5pRT
Copy link
Author

p5pRT commented Jul 29, 2011

From @Abigail

On Thu, Jun 30, 2011 at 04​:31​:54AM -0700, Nicholas Clark wrote​:

# New Ticket Created by Nicholas Clark
# Please include the string​: [perl #93824]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=93824 >

This is a bug report for perl from nick@​ccl4.org,
generated with the help of perlbug 1.39 running under perl 5.15.0.

-----------------------------------------------------------------
[Please describe your issue here]

The regex engine assumes that the scalar it's matching over can't change.
(In at least some cases)

If you use a (?{}) code block inside a regex to undefine the target scalar,
um​:

$ valgrind ./perl -Ilib -le '$a = "ydydydyd"; warn $_ foreach $a =~ /[^x]d(?{undef $a})[^x]d/g'
==46337== Memcheck, a memory error detector
==46337== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==46337== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==46337== Command​: ./perl -Ilib -le $a\ =\ "ydydydyd";\ warn\ $_\ foreach\ $a\ =~\ /[^x]d(?{undef\ $a})[^x]d/g
==46337==
--46337-- ./perl​:
--46337-- dSYM directory is missing; consider using --dsymutil=yes
==46337== Invalid read of size 1
==46337== at 0x1001AB360​: S_reginclass (in ./perl)
==46337== by 0x1001A174F​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337== Address 0x1006019b2 is 2 bytes inside a block of size 10 free'd
==46337== at 0x100280C7C​: free (vg_replace_malloc.c​:366)
==46337== by 0x1000AE8B5​: Perl_safesysfree (in ./perl)
==46337== by 0x1001240F9​: Perl_pp_undef (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x1001A4793​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337==
==46337== Invalid read of size 1
==46337== at 0x1001A17CB​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)
==46337== Address 0x1006019b3 is 3 bytes inside a block of size 10 free'd
==46337== at 0x100280C7C​: free (vg_replace_malloc.c​:366)
==46337== by 0x1000AE8B5​: Perl_safesysfree (in ./perl)
==46337== by 0x1001240F9​: Perl_pp_undef (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x1001A4793​: S_regmatch (in ./perl)
==46337== by 0x10019EADA​: S_regtry (in ./perl)
==46337== by 0x1001935AA​: Perl_regexec_flags (in ./perl)
==46337== by 0x1000EE85E​: Perl_pp_match (in ./perl)
==46337== by 0x1000E5EE7​: Perl_runops_standard (in ./perl)
==46337== by 0x10002B4BC​: S_run_body (in ./perl)
==46337== by 0x10002ADEE​: perl_run (in ./perl)
==46337== by 0x1000014D4​: main (in ./perl)

That would be a bad thing :-(

It's not a terrible thing, given that​:

For reasons of security\, this construct is forbidden if the regular
expression involves run\-time interpolation of variables\, unless the
perilous C\<use re 'eval'> pragma has been used \(see L\<re>\)\, or the
variables contain results of the C\<qr//> operator \(see
L\<perlop/"qr/STRINGE\<sol>msixpodual">\)\.

My vague understanding of the engine is that there are mechanisms in place to
copy the target string. Should these also be triggered if the pattern contains
any code blocks? [or anything else that could have side effects *during* the
match, if anything else exists]

I'd say that given that (?{ }) is still marked experimental, and
there's enough support to keep it this way, to leave it as is. (?{ })
has been around for more than a decade, and I don't think anyone has
reported this before - so it doesn't seem to be a huge issue. Copying
the string for each pattern containing (?{ }) (or (??{ })) seem like a
large price. Perhaps document it as "don't do it".

Abigail

@p5pRT
Copy link
Author

p5pRT commented Jul 29, 2011

From @cpansprout

On Fri Jul 29 09​:09​:39 2011, abigail@​abigail.be wrote​:

I don't think anyone has
reported this before

Try looking at it in RT. :-)

@p5pRT
Copy link
Author

p5pRT commented Jul 29, 2011

From [Unknown Contact. See original ticket]

On Fri Jul 29 09​:09​:39 2011, abigail@​abigail.be wrote​:

I don't think anyone has
reported this before

Try looking at it in RT. :-)

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From @cpansprout

On Thu Aug 03 18​:02​:21 2000, jfriedl@​yahoo-inc.com wrote​:

\#\!/usr/local/bin/perl \-w
use strict;

my $text = "a";
$text =~ m/\(\.\(?\{ $text \.= "x" \}\)\)\*/;

print "text is \[$text\]\\n";
print "length of text&#8203;: "\, length\($text\)\, "\\n";
print "starts&#8203;: "\, join\('|'\, @&#8203;\-\)\, "\\n";
print "ends  : "\, join\('|'\, @&#8203;\-\)\, "\\n";
printf\("length of match parts&#8203;: \[%d|%d|%d\]\\n"\, length\($\`\)\,

length($&amp;), length($'));
printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'}
$`, $&amp;, $');
print "\$1[$1]\n";

prints (when piped through cat -v)​:

text is \[axxxxxxxxx\]
length of text&#8203;: 10
starts&#8203;: 0|7
ends  : 0|7
length of match parts&#8203;: \[0|8|0\]
match itself&#8203;: \[|a^@&#8203;^X@&#8203;M\-hd^O^H|X\]
$1\[^H\]

This is still a problem in bleadperl (c8d84f8), even after Dave
Mitchell’s jumbo re-eval rewrite.

$ pbpaste|./perl -Ilib
text is [axxxxxxxxx]
length of text​: 10
starts​: 0|7
ends : 0|7
Assertion failed​: (rx->sublen >= (s - rx->subbeg) + i), function
Perl_reg_numbered_buff_fetch, file regcomp.c, line 6696.
Abort trap

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2012

From @iabyn

On Thu, Jun 14, 2012 at 09​:48​:34AM -0700, Father Chrysostomos via RT wrote​:

On Thu Aug 03 18​:02​:21 2000, jfriedl@​yahoo-inc.com wrote​:

\#\!/usr/local/bin/perl \-w
use strict;

my $text = "a";
$text =~ m/\(\.\(?\{ $text \.= "x" \}\)\)\*/;

print "text is \[$text\]\\n";
print "length of text&#8203;: "\, length\($text\)\, "\\n";
print "starts&#8203;: "\, join\('|'\, @&#8203;\-\)\, "\\n";
print "ends  : "\, join\('|'\, @&#8203;\-\)\, "\\n";
printf\("length of match parts&#8203;: \[%d|%d|%d\]\\n"\, length\($\`\)\,

length($&amp;), length($'));
printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'}
$`, $&amp;, $');
print "\$1[$1]\n";

prints (when piped through cat -v)​:

text is \[axxxxxxxxx\]
length of text&#8203;: 10
starts&#8203;: 0|7
ends  : 0|7
length of match parts&#8203;: \[0|8|0\]
match itself&#8203;: \[|a^@&#8203;^X@&#8203;M\-hd^O^H|X\]
$1\[^H\]

This is still a problem in bleadperl (c8d84f8), even after Dave
Mitchell’s jumbo re-eval rewrite.

Yep, that's the one ticket in the metaticket that's not fixed yet.

--
Overhead, without any fuss, the stars were going out.
  -- Arthur C Clarke

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2013

From @cpansprout

On Thu Jun 14 15​:13​:18 2012, davem wrote​:

On Thu, Jun 14, 2012 at 09​:48​:34AM -0700, Father Chrysostomos via RT
wrote​:

On Thu Aug 03 18​:02​:21 2000, jfriedl@​yahoo-inc.com wrote​:

\#\!/usr/local/bin/perl \-w
use strict;

my $text = "a";
$text =~ m/\(\.\(?\{ $text \.= "x" \}\)\)\*/;

print "text is \[$text\]\\n";
print "length of text&#8203;: "\, length\($text\)\, "\\n";
print "starts&#8203;: "\, join\('|'\, @&#8203;\-\)\, "\\n";
print "ends  : "\, join\('|'\, @&#8203;\-\)\, "\\n";
printf\("length of match parts&#8203;: \[%d|%d|%d\]\\n"\, length\($\`\)\,

length($&amp;), length($'));
printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'}
$`, $&amp;, $');
print "\$1[$1]\n";

prints (when piped through cat -v)​:

text is \[axxxxxxxxx\]
length of text&#8203;: 10
starts&#8203;: 0|7
ends  : 0|7
length of match parts&#8203;: \[0|8|0\]
match itself&#8203;: \[|a^@&#8203;^X@&#8203;M\-hd^O^H|X\]
$1\[^H\]

This is still a problem in bleadperl (c8d84f8), even after Dave
Mitchell’s jumbo re-eval rewrite.

Yep, that's the one ticket in the metaticket that's not fixed yet.

This appears to be fixed now, and I suspect it is because of
PERL_NEW_COPY_ON_WRITE (meaning the bug is still present under
-Accflags=-DPERL_NO_COW), but I haven’t checked.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 28, 2013

From @iabyn

On Sat, Jul 27, 2013 at 07​:05​:39AM -0700, Father Chrysostomos via RT wrote​:

On Thu Jun 14 15​:13​:18 2012, davem wrote​:

On Thu, Jun 14, 2012 at 09​:48​:34AM -0700, Father Chrysostomos via RT
wrote​:

On Thu Aug 03 18​:02​:21 2000, jfriedl@​yahoo-inc.com wrote​:

\#\!/usr/local/bin/perl \-w
use strict;

my $text = "a";
$text =~ m/\(\.\(?\{ $text \.= "x" \}\)\)\*/;

print "text is \[$text\]\\n";
print "length of text&#8203;: "\, length\($text\)\, "\\n";
print "starts&#8203;: "\, join\('|'\, @&#8203;\-\)\, "\\n";
print "ends  : "\, join\('|'\, @&#8203;\-\)\, "\\n";
printf\("length of match parts&#8203;: \[%d|%d|%d\]\\n"\, length\($\`\)\,

length($&amp;), length($'));
printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'}
$`, $&amp;, $');
print "\$1[$1]\n";

prints (when piped through cat -v)​:

text is \[axxxxxxxxx\]
length of text&#8203;: 10
starts&#8203;: 0|7
ends  : 0|7
length of match parts&#8203;: \[0|8|0\]
match itself&#8203;: \[|a^@&#8203;^X@&#8203;M\-hd^O^H|X\]
$1\[^H\]

This is still a problem in bleadperl (c8d84f8), even after Dave
Mitchell’s jumbo re-eval rewrite.

Yep, that's the one ticket in the metaticket that's not fixed yet.

This appears to be fixed now, and I suspect it is because of
PERL_NEW_COPY_ON_WRITE (meaning the bug is still present under
-Accflags=-DPERL_NO_COW), but I haven’t checked.

The assertion failures stop with the following commit, according to
bisect, although I haven't looked closely to decide whether this
is actually the complete fix or whether anything still needs addresssing.

commit 7016d6e
Author​: David Mitchell <davem@​iabyn.com>
Date​: Fri Sep 21 10​:29​:04 2012 +0100

  stop regex engine reading beyond end of string
 
  Historically the regex engine has assumed that any string passed to it
  will have a trailing null char. This isn't normally an issue in perl code,
  since perl strings *are* null terminated; but it could cause problems with
  strings returned by XS code, or with someone calling the regex engine
  directly from XS, with strend not pointing at a null char.
 
  The engine currently relies on there being a null char in the following
  ways.
 
  First, when at the end of string, the main loop of regmatch() still reads
  in the 'next' character (i.e. the character following the end of string)
  even if it doesn't make any use of it. This precludes using memory mapped
  files as strings for example, since the read off the end would SEGV.
 
  Second, the matching algorithm often required the trailing character to be
  \0 to work correctly​: the test for 'EOF' was "if next char is null *and*
  locinput >= PL_regeol, then stop". So a random non-null trailing char
  could cause an overshoot.
 
  Thirdly, some match ops require the trailing char to be null to operate
  correctly; for example, \b applied at the end of the string only happens
  to work because the trailing char (\0) happens to match \W.
 
  Also, some utf8 ops will try to extract the code point at the end, which
  can result in multiple bytes past the end of string being read, and
  possible problems if they don't correspond to well-formed utf8.
 
  The main fix is in S_regmatch, where the 'read next char' code has been
  updated to set it to a special value, NEXTCHR_EOS instead, if we would be
  reading past the end of the string.
 
  Lots of other random bits in the regex engine needed to be fixed up too.
 
  To track these down, I temporarily hacked regexec_flags() to make a copy
  of the string but without trailing \0, then ran all the t/re/*.t tests
  under valgrind to flush out all buffer overruns. So I think I've removed
  most of the bad code, but by no means all of it. The code within the
  various functions in regexec.c is far too complex to be able to visually
  audit the code with any confidence.

--
You live and learn (although usually you just live).

@p5pRT
Copy link
Author

p5pRT commented Jul 29, 2013

From @nwc10

On Sun, Jul 28, 2013 at 01​:21​:23AM +0100, Dave Mitchell wrote​:

On Sat, Jul 27, 2013 at 07​:05​:39AM -0700, Father Chrysostomos via RT wrote​:

On Thu Jun 14 15​:13​:18 2012, davem wrote​:

On Thu, Jun 14, 2012 at 09​:48​:34AM -0700, Father Chrysostomos via RT
wrote​:

On Thu Aug 03 18​:02​:21 2000, jfriedl@​yahoo-inc.com wrote​:

\#\!/usr/local/bin/perl \-w
use strict;

my $text = "a";
$text =~ m/\(\.\(?\{ $text \.= "x" \}\)\)\*/;

print "text is \[$text\]\\n";
print "length of text&#8203;: "\, length\($text\)\, "\\n";
print "starts&#8203;: "\, join\('|'\, @&#8203;\-\)\, "\\n";
print "ends  : "\, join\('|'\, @&#8203;\-\)\, "\\n";
printf\("length of match parts&#8203;: \[%d|%d|%d\]\\n"\, length\($\`\)\,

length($&amp;), length($'));
printf("match itself​: [%s|%s|%s]\n", map { defined($_) ? $_ : 'X'}
$`, $&amp;, $');
print "\$1[$1]\n";

prints (when piped through cat -v)​:

text is \[axxxxxxxxx\]
length of text&#8203;: 10
starts&#8203;: 0|7
ends  : 0|7
length of match parts&#8203;: \[0|8|0\]
match itself&#8203;: \[|a^@&#8203;^X@&#8203;M\-hd^O^H|X\]
$1\[^H\]

This is still a problem in bleadperl (c8d84f8), even after Dave
Mitchell's jumbo re-eval rewrite.

Yep, that's the one ticket in the metaticket that's not fixed yet.

This appears to be fixed now, and I suspect it is because of
PERL_NEW_COPY_ON_WRITE (meaning the bug is still present under
-Accflags=-DPERL_NO_COW), but I haven't checked.

The assertion failures stop with the following commit, according to
bisect, although I haven't looked closely to decide whether this
is actually the complete fix or whether anything still needs addresssing.

commit 7016d6e
Author​: David Mitchell <davem@​iabyn.com>
Date​: Fri Sep 21 10​:29​:04 2012 +0100

stop regex engine reading beyond end of string

For the given test case, the errors also stop at that commit.
(It wasn't clear to me whether you had confirmed this, as you only mentioned
the assertions)

I ran this​:

Porting/bisect.pl --expect-fail --target=miniperl -we '$_ = "a"; /(.(?{ $_ .= "x" }))*/; $_ = $&amp;; die $_ if /[^ax]/'

it predates the merge of COW, so that won't affect it, but I did also try
this, which got the same answer​:

Porting/bisect.pl -Accflags=-DPERL_NO_COW --expect-fail --target=miniperl -we '$_ = "a"; /(.(?{ $_ .= "x" }))*/; $_ = $&amp;; die $_ if /[^ax]/'

[From the commit message]

To track these down\, I temporarily hacked regexec\_flags\(\) to make a copy
of the string but without trailing \\0\, then ran all the t/re/\*\.t tests
under valgrind to flush out all buffer overruns\. So I think I've removed
most of the bad code\, but by no means all of it\. The code within the
various functions in regexec\.c is far too complex to be able to visually
audit the code with any confidence\.

I guess it would also be possible to hack this to SEGV without valgrind by
using mmap to map an anonymous block, copying the string (without
trailing \0) to abut the end, and then matching on that.

But I guess that this won't reveal any more than the failures you already
have fixed, as the limiting thing here is the testsuite, not the analysis
tools.

Would a fuzzer help?

Nicholas Clark

@hvds hvds added the Closable? We might be able to close this ticket, but we need to check with the reporter label Sep 15, 2020
@khwilliamson
Copy link
Contributor

Why isn't this closable?

@jkeenan
Copy link
Contributor

jkeenan commented Apr 14, 2021

Why isn't this closable?

No one has said that it is closable; no one has said it isn't. @hvds merely put the "closable?" label on the issue 7 months ago. The ticket had been languishing for years.

@nwc10, can you take a look at this?

Thank you very much.
Jim Keenan

@iabyn
Copy link
Contributor

iabyn commented Jul 13, 2021

Looks fixed; closing

@iabyn iabyn closed this as completed Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closable? We might be able to close this ticket, but we need to check with the reporter distro-Linux type-core type-regex type-regex-codeblocks
Projects
None yet
Development

No branches or pull requests

5 participants