Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching fancy Unicode regex against an ASCII string leaks memory #17140

Closed
p5pRT opened this issue Aug 30, 2019 · 8 comments
Closed

Matching fancy Unicode regex against an ASCII string leaks memory #17140

p5pRT opened this issue Aug 30, 2019 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 30, 2019

Migrated from rt.perl.org#134390 (status was 'pending release')

Searchable as RT134390$

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2019

From choroba@matfyz.cz

Created by choroba@matfyz.cz

If a regex contains a fancy Unicode character and the string being
matched doesn't have the UTF8 flag, matching leaks memory.

  "a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before it eats all the memory!

Using an upgraded string doesn't leak at all​:

  utf8​::upgrade(my $x = 'a');
  $x =~ /\N{U+2129}/ while 1;

See https://www.perlmonks.org/?node_id=11105281 for the original report (with
a bit longer examples) and discussion.

Ch.

Perl Info

Flags:
     category=core
     severity=high

Site configuration information for perl 5.31.4:

Configured by choroba at Mon Aug 26 16:15:05 CEST 2019.

Summary of my perl5 (revision 5 version 31 subversion 4) configuration:
   Commit id: 6e404ab585deadc1c32d50513f13b50ae395c00d
   Platform:
     osname=linux
     osvers=4.12.14-lp151.28.13-default
     archname=x86_64-linux-thread-multi
     uname='linux lenonovo 4.12.14-lp151.28.13-default #1 smp wed aug 7 07:20:16 utc 2019 (0c09ad2) x86_64 x86_64 x86_64 gnulinux '
     config_args='-rdes -Dusethreads -Dpthread -Dprefix=~/blead -Dusedevel'
     hint=recommended
     useposix=true
     d_sigaction=define
     useithreads=define
     usemultiplicity=define
     use64bitint=define
     use64bitall=define
     uselongdouble=undef
     usemymalloc=n
     default_inc_excludes_dot=define
     bincompat5005=undef
   Compiler:
     cc='cc'
     ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
     optimize='-O2'
     cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
     ccversion=''
     gccversion='7.4.1 20190424 [gcc-7-branch revision 270538]'
     gccosandvers=''
     intsize=4
     longsize=8
     ptrsize=8
     doublesize=8
     byteorder=12345678
     doublekind=3
     d_longlong=define
     longlongsize=8
     d_longdbl=define
     longdblsize=16
     longdblkind=3
     ivtype='long'
     ivsize=8
     nvtype='double'
     nvsize=8
     Off_t='off_t'
     lseeksize=8
     alignbytes=8
     prototype=define
   Linker and Libraries:
     ld='cc'
     ldflags =' -fstack-protector-strong -L/usr/local/lib'
     libpth=/usr/local/lib /usr/lib64/gcc/x86_64-suse-linux/7/include-fixed /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64
     libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
     perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
     libc=libc-2.26.so
     so=so
     useshrplib=false
     libperl=libperl.a
     gnulibc_version='2.26'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs
     dlext=so
     d_dlsymun=undef
     ccdlflags='-Wl,-E'
     cccdlflags='-fPIC'
     lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.31.4:
     /home/choroba/blead/lib/perl5/site_perl/5.31.4/x86_64-linux-thread-multi
     /home/choroba/blead/lib/perl5/site_perl/5.31.4
     /home/choroba/blead/lib/perl5/5.31.4/x86_64-linux-thread-multi
     /home/choroba/blead/lib/perl5/5.31.4


Environment for perl 5.31.4:
     HOME=/home/choroba
     LANG=en_US.utf8
     LANGUAGE (unset)
     LC_CTYPE=en_US.UTF-8
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
     PATH=/home/choroba/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/home/choroba/perl5/bin:/home/choroba/opensource/worktime/bin:.
     PERL_BADLANG (unset)
     SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2019

From @khwilliamson

On Fri, 30 Aug 2019 04​:52​:16 -0700, choroba@​matfyz.cz wrote​:

This is a bug report for perl from choroba@​matfyz.cz,
generated with the help of perlbug 1.41 running under perl 5.31.4.

-----------------------------------------------------------------
[Please describe your issue here]

If a regex contains a fancy Unicode character and the string being
matched doesn't have the UTF8 flag, matching leaks memory.

"a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before
it eats all the memory!

Using an upgraded string doesn't leak at all​:

utf8​::upgrade(my $x = 'a');
$x =~ /\N{U+2129}/ while 1;

See https://www.perlmonks.org/?node_id=11105281 for the original
report (with
a bit longer examples) and discussion.

Ch.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=high
---
Site configuration information for perl 5.31.4​:

Configured by choroba at Mon Aug 26 16​:15​:05 CEST 2019.

Summary of my perl5 (revision 5 version 31 subversion 4)
configuration​:
Commit id​: 6e404ab
Platform​:
osname=linux
osvers=4.12.14-lp151.28.13-default
archname=x86_64-linux-thread-multi
uname='linux lenonovo 4.12.14-lp151.28.13-default #1 smp wed aug
7 07​:20​:16 utc 2019 (0c09ad2) x86_64 x86_64 x86_64 gnulinux '
config_args='-rdes -Dusethreads -Dpthread -Dprefix=~/blead
-Dusedevel'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler​:
cc='cc'
ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
-pipe -fstack-protector-strong -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
-pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='7.4.1 20190424 [gcc-7-branch revision 270538]'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib64/gcc/x86_64-suse-linux/7/include-
fixed /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-
linux/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64
/usr/lib64 /usr/local/lib64
libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
libc=libc-2.26.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.26'
Dynamic Linking​:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

---
@​INC for perl 5.31.4​:
/home/choroba/blead/lib/perl5/site_perl/5.31.4/x86_64-linux-
thread-multi
/home/choroba/blead/lib/perl5/site_perl/5.31.4
/home/choroba/blead/lib/perl5/5.31.4/x86_64-linux-thread-multi
/home/choroba/blead/lib/perl5/5.31.4

---
Environment for perl 5.31.4​:
HOME=/home/choroba
LANG=en_US.utf8
LANGUAGE (unset)
LC_CTYPE=en_US.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/choroba/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/bin/X11​:/home/choroba/perl5/bin​:/home/choroba/opensource/worktime/bin​:.
PERL_BADLANG (unset)
SHELL=/bin/bash

What is happening here is that in re_intuit_start() at line 922 in regexec.c, it determines there is no possible match because you need the target string to be in UTF-8 to match the character in the pattern. But something is not returning memory when re_intuit_start returns failure. There are other instances of this failure return in re_intuit_start, and I suspect they leak as well.

I'm thinking someone who knows about the regex memory allocation can answer this without much effort, so I'm deferring to someone like that to step forward
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2019

From @demerphq

I can easily imagine that SV's constructed during compilation arent
cleaned up in this scenario.

Yves

On Fri, 30 Aug 2019 at 17​:35, Karl Williamson via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 30 Aug 2019 04​:52​:16 -0700, choroba@​matfyz.cz wrote​:

This is a bug report for perl from choroba@​matfyz.cz,
generated with the help of perlbug 1.41 running under perl 5.31.4.

-----------------------------------------------------------------
[Please describe your issue here]

If a regex contains a fancy Unicode character and the string being
matched doesn't have the UTF8 flag, matching leaks memory.

"a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before
it eats all the memory!

Using an upgraded string doesn't leak at all​:

utf8​::upgrade(my $x = 'a');
$x =~ /\N{U+2129}/ while 1;

See https://www.perlmonks.org/?node_id=11105281 for the original
report (with
a bit longer examples) and discussion.

Ch.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=high
---
Site configuration information for perl 5.31.4​:

Configured by choroba at Mon Aug 26 16​:15​:05 CEST 2019.

Summary of my perl5 (revision 5 version 31 subversion 4)
configuration​:
Commit id​: 6e404ab
Platform​:
osname=linux
osvers=4.12.14-lp151.28.13-default
archname=x86_64-linux-thread-multi
uname='linux lenonovo 4.12.14-lp151.28.13-default #1 smp wed aug
7 07​:20​:16 utc 2019 (0c09ad2) x86_64 x86_64 x86_64 gnulinux '
config_args='-rdes -Dusethreads -Dpthread -Dprefix=~/blead
-Dusedevel'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler​:
cc='cc'
ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
-pipe -fstack-protector-strong -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
-pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='7.4.1 20190424 [gcc-7-branch revision 270538]'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib64/gcc/x86_64-suse-linux/7/include-
fixed /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-
linux/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64
/usr/lib64 /usr/local/lib64
libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
libc=libc-2.26.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.26'
Dynamic Linking​:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

---
@​INC for perl 5.31.4​:
/home/choroba/blead/lib/perl5/site_perl/5.31.4/x86_64-linux-
thread-multi
/home/choroba/blead/lib/perl5/site_perl/5.31.4
/home/choroba/blead/lib/perl5/5.31.4/x86_64-linux-thread-multi
/home/choroba/blead/lib/perl5/5.31.4

---
Environment for perl 5.31.4​:
HOME=/home/choroba
LANG=en_US.utf8
LANGUAGE (unset)
LC_CTYPE=en_US.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/choroba/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/bin/X11​:/home/choroba/perl5/bin​:/home/choroba/opensource/worktime/bin​:.
PERL_BADLANG (unset)
SHELL=/bin/bash

What is happening here is that in re_intuit_start() at line 922 in regexec.c, it determines there is no possible match because you need the target string to be in UTF-8 to match the character in the pattern. But something is not returning memory when re_intuit_start returns failure. There are other instances of this failure return in re_intuit_start, and I suspect they leak as well.

I'm thinking someone who knows about the regex memory allocation can answer this without much effort, so I'm deferring to someone like that to step forward
--
Karl Williamson

---
via perlbug​: queue​: perl5 status​: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=134390

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2019

From @tonycoz

On Fri, 30 Aug 2019 08​:35​:45 -0700, khw wrote​:

What is happening here is that in re_intuit_start() at line 922 in
regexec.c, it determines there is no possible match because you need
the target string to be in UTF-8 to match the character in the
pattern. But something is not returning memory when re_intuit_start
returns failure. There are other instances of this failure return in
re_intuit_start, and I suspect they leak as well.

I'm thinking someone who knows about the regex memory allocation can
answer this without much effort, so I'm deferring to someone like that
to step forward

It was fairly simple, I ran​:

  valgrind --leak-check=full --show-leak-kinds=all ./perl -Ilib -e '"a" =~ /\N{U+2129}/ for 1 .. 1000' 2>&1 | less

The leak with 1000 entries​:

==25945== 10,000 bytes in 1,000 blocks are still reachable in loss record 227 of 230
==25945== at 0x4C2BBAF​: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25945== by 0x2A921D​: Perl_safesysmalloc (util.c​:155)
==25945== by 0x321DFC​: Perl_sv_grow (sv.c​:1599)
==25945== by 0x340BF0​: Perl_sv_setsv_flags (sv.c​:4712)
==25945== by 0x362331​: Perl_newSVsv_flags (sv.c​:9800)
==25945== by 0x465E41​: S_to_byte_substr (regexec.c​:10406)
==25945== by 0x436D53​: Perl_re_intuit_start (regexec.c​:921)
==25945== by 0x446B74​: Perl_regexec_flags (regexec.c​:3348)
==25945== by 0x306505​: Perl_pp_match (pp_hot.c​:3014)
==25945== by 0x2A78F5​: Perl_runops_debug (dump.c​:2557)
==25945== by 0x185946​: S_run_body (perl.c​:2712)
==25945== by 0x184EC4​: perl_run (perl.c​:2635)
==25945==

Fix attached.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2019

From @tonycoz

0001-perl-134390-don-t-leak-the-SV-we-just-created-on-an-.patch
From 05a03c0da6f3694904885fa1629a6e35e75d2875 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 2 Sep 2019 15:35:36 +1000
Subject: (perl #134390) don't leak the SV we just created on an early return

---
 regexec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/regexec.c b/regexec.c
index c390bff72e..97ea458a20 100644
--- a/regexec.c
+++ b/regexec.c
@@ -10405,6 +10405,7 @@ S_to_byte_substr(pTHX_ regexp *prog)
 	    && !prog->substrs->data[i].substr) {
 	    SV* sv = newSVsv(prog->substrs->data[i].utf8_substr);
 	    if (! sv_utf8_downgrade(sv, TRUE)) {
+                SvREFCNT_dec_NN(sv);
                 return FALSE;
             }
             if (SvVALID(prog->substrs->data[i].utf8_substr)) {
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2019

From @tonycoz

On Sun, 01 Sep 2019 22​:38​:31 -0700, tonyc wrote​:

On Fri, 30 Aug 2019 08​:35​:45 -0700, khw wrote​:

What is happening here is that in re_intuit_start() at line 922 in
regexec.c, it determines there is no possible match because you need
the target string to be in UTF-8 to match the character in the
pattern. But something is not returning memory when re_intuit_start
returns failure. There are other instances of this failure return in
re_intuit_start, and I suspect they leak as well.

I'm thinking someone who knows about the regex memory allocation can
answer this without much effort, so I'm deferring to someone like
that
to step forward

It was fairly simple, I ran​:

valgrind --leak-check=full --show-leak-kinds=all ./perl -Ilib -e '"a"
=~ /\N{U+2129}/ for 1 .. 1000' 2>&1 | less

The leak with 1000 entries​:

==25945== 10,000 bytes in 1,000 blocks are still reachable in loss
record 227 of 230
==25945== at 0x4C2BBAF​: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25945== by 0x2A921D​: Perl_safesysmalloc (util.c​:155)
==25945== by 0x321DFC​: Perl_sv_grow (sv.c​:1599)
==25945== by 0x340BF0​: Perl_sv_setsv_flags (sv.c​:4712)
==25945== by 0x362331​: Perl_newSVsv_flags (sv.c​:9800)
==25945== by 0x465E41​: S_to_byte_substr (regexec.c​:10406)
==25945== by 0x436D53​: Perl_re_intuit_start (regexec.c​:921)
==25945== by 0x446B74​: Perl_regexec_flags (regexec.c​:3348)
==25945== by 0x306505​: Perl_pp_match (pp_hot.c​:3014)
==25945== by 0x2A78F5​: Perl_runops_debug (dump.c​:2557)
==25945== by 0x185946​: S_run_body (perl.c​:2712)
==25945== by 0x184EC4​: perl_run (perl.c​:2635)
==25945==

Fix attached.

Tony

Applied as 05a03c0.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2019

@tonycoz - Status changed from 'open' to 'pending release'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant