regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN , int): Assertion `(d - 1) == ')'' failed #15838

p5pRT · 2017-01-26T10:19:19Z

Migrated from rt.perl.org#130648 (status was 'open')

Searchable as RT130648$

p5pRT · 2017-01-26T10:19:19Z

From @dur-randir

Created by @dur-randir

While fuzzing perl v5.25.9-35-g32207c637b built with afl and run
under libdislocator, I found the following 16-bytes program

hexdump -C 0042
00000000 6d 27 5c 34 30 30 28 3f 7b 3c 3c 7d 29 0a 0a 27 |m'\400(?{<<})..'|
00000010

to cause an assertion failure. It crashes on perls dating back to at
least 5.8.8, albeit with different messages. GDB info about the crash
location:

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007f0dbddc740a in __GI_abort () at abort.c:89
#2 0x00007f0dbddbee47 in __assert_fail_base (fmt=<optimized out>,
assertion=assertion@entry=0x7f0dbf4ab3ac "*(d - 1) == ')'",
file=file@entry=0x7f0dbf4a9198 "regcomp.c", line=line@entry=6195,
function=function@entry=0x7f0dbf4b2500 <__PRETTY_FUNCTION__.16556>
"S_pat_upgrade_to_utf8") at assert.c:92
#3 0x00007f0dbddbeef2 in __GI___assert_fail (assertion=0x7f0dbf4ab3ac
"*(d - 1) == ')'", file=0x7f0dbf4a9198 "regcomp.c", line=6195,
function=0x7f0dbf4b2500 <__PRETTY_FUNCTION__.16556>
"S_pat_upgrade_to_utf8") at assert.c:101
#4 0x00007f0dbf1ec789 in S_pat_upgrade_to_utf8
(pRExC_state=0x7fff4cce2030, pat_p=0x7fff4cce1cb8,
plen_p=0x7fff4cce1cb0, num_code_blocks=1)
at regcomp.c:6195
#5 0x00007f0dbf1f1c7d in Perl_re_op_compile (patternp=0x0,
pat_count=3, expr=0x7f0dc1237bd8, eng=0x7f0dbf735540
<PL_core_reg_engine>, old_re=0x0,
is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c:7106
#6 0x00007f0dbf112952 in Perl_pmruntime (o=0x7f0dc1237c18,
expr=0x7f0dc1238078, repl=0x0, flags=1, floor=0) at op.c:5882
#7 0x00007f0dbf1c3ecc in Perl_yyparse (gramtype=258) at perly.y:1204
#8 0x00007f0dbf142b1a in S_parse_body (env=0x0, xsinit=0x7f0dbf0fdf98
<xs_init>) at perl.c:2376
#9 0x00007f0dbf140e7f in perl_parse (my_perl=0x7f0dc1215010,
xsinit=0x7f0dbf0fdf98 <xs_init>, argc=3, argv=0x7fff4cce2b88, env=0x0)
at perl.c:1691
#10 0x00007f0dbf0fded6 in main (argc=3, argv=0x7fff4cce2b88,
env=0x7fff4cce2ba8) at perlmain.c:121
(gdb) f 4
#4 0x00007f0dbf1ec789 in S_pat_upgrade_to_utf8
(pRExC_state=0x7fff4cce2030, pat_p=0x7fff4cce1cb8,
plen_p=0x7fff4cce1cb0, num_code_blocks=1)
at regcomp.c:6195
6195 assert(*(d - 1) == ')');
(gdb) p *(d-1)
$1 = 10 '\n'

Perl Info


Flags:
    category=core
    severity=medium

Site configuration information for perl 5.25.9:

Configured by root at Sat Jan 14 02:25:05 MSK 2017.

Summary of my perl5 (revision 5 version 25 subversion 9) configuration:
  Commit id: cbe2fc5001aa59cdc73e04cc35e097a2ecfbeec0
  Platform:
    osname=linux
    osvers=3.16.0-4-amd64
    archname=x86_64-linux
    uname='linux dorothy 3.16.0-4-amd64 #1 smp debian 3.16.36-1+deb8u2
(2016-10-19) x86_64 gnulinux '
    config_args='-des -Dusedevel -DDEBUGGING -Dcc=afl-clang-fast
-Doptimize=-O0 -g -ggdb3'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    bincompat5005=undef
  Compiler:
    cc='afl-clang-fast'
    ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O0 -g -ggdb3'
    cppflags='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Clang 3.9.1 (tags/RELEASE_391/rc2)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='afl-clang-fast'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/llvm-3.9/bin/../lib/clang/3.9.1/lib
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu
/lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.24.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.24'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O0 -g -ggdb3 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.25.9:
    lib
    /usr/local/lib/perl5/site_perl/5.25.9/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.25.9
    /usr/local/lib/perl5/5.25.9/x86_64-linux
    /usr/local/lib/perl5/5.25.9


Environment for perl 5.25.9:
    HOME=/home/afl
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.78
    PERLBREW_HOME=/home/afl/.perlbrew
    PERLBREW_MANPATH=/home/afl/perlbrew/perls/perl-5.22.1/man
    PERLBREW_PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.22.1/bin
    PERLBREW_PERL=perl-5.22.1
    PERLBREW_ROOT=/home/afl/perlbrew
    PERLBREW_VERSION=0.78
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

p5pRT · 2017-01-26T10:19:19Z

From @dur-randir

0042

p5pRT · 2017-01-29T16:17:33Z

From @hvds

On Thu, 26 Jan 2017 02:19:19 -0800, randir wrote:

While fuzzing perl v5.25.9-35-g32207c637b built with afl and run
under libdislocator, I found the following 16-bytes program

hexdump -C 0042
00000000 6d 27 5c 34 30 30 28 3f 7b 3c 3c 7d 29 0a 0a 27
|m'\400(?{<<})..'|
00000010

to cause an assertion failure.

We're hitting S_pat_upgrade_to_utf8() with a code block of "(?{<<})\n\n". My initial suspicion is that that's fine, and the assumption that the last char of such a code block must be ')' is wrong, but I don't know.

There's also a similar assertion in S_compile_runtime_code() line 6670:
assert(pat[src->end] == ')');
.. so if this one is wrong, that probably is too.

Hugo

p5pRT · 2017-01-29T16:17:33Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2017-01-30T16:42:51Z

From @iabyn

On Sun, Jan 29, 2017 at 08:17:33AM -0800, Hugo van der Sanden via RT wrote:

On Thu, 26 Jan 2017 02:19:19 -0800, randir wrote:

While fuzzing perl v5.25.9-35-g32207c637b built with afl and run
under libdislocator, I found the following 16-bytes program

hexdump -C 0042
00000000 6d 27 5c 34 30 30 28 3f 7b 3c 3c 7d 29 0a 0a 27
|m'\400(?{<<})..'|
00000010

to cause an assertion failure.

We're hitting S_pat_upgrade_to_utf8() with a code block of
"(?{<<})\n\n". My initial suspicion is that that's fine, and the
assumption that the last char of such a code block must be ')' is wrong,
but I don't know.

Hmmm... the assertion is correct, the toker is very wrong.

When compile-time code is seen in a pattern, the code is parsed, so that
for

/abc(?{...})def/

the toker returns this sequence of tokens:

FUNC, '(', const("abc"), 'DO', '{', ...., '}, '(?{...})', 'def', ')'

As well as the individual parsed tokens for the code block, the text of
the code block is returned afterwards as a separate const op, which is
used by re_op_compile() to reconstruct the original text of the regex
(in case a regex is ever stringified).

The problem with

m{\x{100}(?{<<EOF})
x
EOF
}

is that the stringification of the code block is being returned by yylex()
as

"(?{<<EOF})\nx\nEOF"

rather than what I'd expect:

"(?{\"x\n\"})"

(or similar).

But to a certain extent it depends on how heredocs are supposed to operate
within regex codeblocks, and how such regexes are supposed to stringify.
I think FC did a lot of fixups in this area recently.

This is all too horrible to contemplate at the moment.

--
My Dad used to say 'always fight fire with fire', which is probably why
he got thrown out of the fire brigade.

p5pRT · 2017-01-30T22:39:57Z

From @cpansprout

On Mon, 30 Jan 2017 08:42:51 -0800, davem wrote:

On Sun, Jan 29, 2017 at 08:17:33AM -0800, Hugo van der Sanden via RT wrote:

On Thu, 26 Jan 2017 02:19:19 -0800, randir wrote:

While fuzzing perl v5.25.9-35-g32207c637b built with afl and run
under libdislocator, I found the following 16-bytes program

hexdump -C 0042
00000000 6d 27 5c 34 30 30 28 3f 7b 3c 3c 7d 29 0a 0a 27
|m'\400(?{<<})..'|
00000010

to cause an assertion failure.

We're hitting S_pat_upgrade_to_utf8() with a code block of
"(?{<<})\n\n". My initial suspicion is that that's fine, and the
assumption that the last char of such a code block must be ')' is wrong,
but I don't know.

Hmmm... the assertion is correct, the toker is very wrong.

When compile-time code is seen in a pattern, the code is parsed, so that
for
/abc$?\{\.\.\.\}$def/
the toker returns this sequence of tokens:
FUNC\, '$'\, const\("abc"$\, 'DO'\, '\{'\, \.\.\.\.\, '\}\, '$?\{\.\.\.\}$'\, 'def'\, '\)'
As well as the individual parsed tokens for the code block, the text of
the code block is returned afterwards as a separate const op, which is
used by re_op_compile() to reconstruct the original text of the regex
(in case a regex is ever stringified).

The problem with
m\{\\x\{100\}$?\{\<\<EOF\}$
x
EOF
\}
is that the stringification of the code block is being returned by yylex()
as
"$?\{\<\<EOF\}$\\nx\\nEOF"
rather than what I'd expect:
"$?\{\\"x\\n\\"\}$"
(or similar).

But to a certain extent it depends on how heredocs are supposed to operate
within regex codeblocks, and how such regexes are supposed to stringify.
I think FC did a lot of fixups in this area recently.

I fixed up the deparsing of code blocks, by actually deparsing the code inside the regexp, instead of just stringifying it.

Prior to that, I did many fix-ups in the parsing of here-docs, but I don’t recall doing anything specific to (?{...}) blocks; in fact, I think it predated your rewrite of those blocks.

This is all too horrible to contemplate at the moment.

What’s funny is that the length of the string that is supposed to represent the stringification of the code block amounts to the length of the code block plus the length of the trailing here-doc. But the code that gets used is a string of that length taken indiscriminately from the source code, beginning at the start of the code block.

In other words,

m{\x{100}(?{<<EOF})123456789
x
EOF
}

produces the token PV("(?{<<EOF})123456")

because the here-doc is 6 characters lon ("x\nEOF\n" or maybe "\nx\nEOF"--I don’t know which).

So I can get past the assertion by putting a parenthesis at the right spot:

print qr{\x{100}((?{<<EOF})12345)
x
EOF
}, "\n"

This gives me

(?^u:\x{100}((?{<<EOF})12345)12345)
)

which is completely wrong.

Traditionally the stringification of a regular expression with a here-doc body outside the pattern has not included the here-doc body. It still behaves that way:

$ ./perl -lIlib -e 'print qr/(?{<<EOF})/' -eEOF
(?^:(?{<<EOF}))

I think that is acceptable. There is really no way to make it behave correctly when stringified and then recompiled as a regexp (which is generally true of code blocks, which may or may not work).

So can we do something similar with here-doc bodies inside the pattern? (Actually, I though we were already doing that. Look for the ‘Paranoia’ comment in toke.c. Why is that not working?)

--

Father Chrysostomos

p5pRT added Severity Medium type-core labels Oct 19, 2019

toddr removed the khw label Oct 25, 2019

xenu removed the affects-5.25 label Nov 19, 2021

xenu removed the Severity Medium label Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN , int): Assertion `(d - 1) == ')'' failed #15838

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN , int): Assertion `(d - 1) == ')'' failed #15838

p5pRT commented Jan 26, 2017

p5pRT commented Jan 26, 2017

p5pRT commented Jan 26, 2017

p5pRT commented Jan 29, 2017

p5pRT commented Jan 29, 2017

p5pRT commented Jan 30, 2017

p5pRT commented Jan 30, 2017

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN *, int): Assertion `*(d - 1) == ')'' failed #15838

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN *, int): Assertion `*(d - 1) == ')'' failed #15838

Comments

p5pRT commented Jan 26, 2017

p5pRT commented Jan 26, 2017

From @dur-randir

Created by @dur-randir

p5pRT commented Jan 26, 2017

From @dur-randir

p5pRT commented Jan 29, 2017

From @hvds

p5pRT commented Jan 29, 2017

p5pRT commented Jan 30, 2017

From @iabyn

p5pRT commented Jan 30, 2017

From @cpansprout

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN , int): Assertion `(d - 1) == ')'' failed #15838

regcomp.c:6195: void S_pat_upgrade_to_utf8(RExC_state_t *const, char **, STRLEN , int): Assertion `(d - 1) == ')'' failed #15838