Skip Menu |
Report information
Id: 73542
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: nicholas <nick [at] ccl4.org>
Cc: LeonT <fawaka [at] gmail.com>
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: medium
Type:
Perl Version: 5.11.5
Fixed In: (no value)



Subject: regexp engine reads 1 beyond the string
Date: Fri, 12 Mar 2010 17:13:00 +0000
To: perlbug [...] perl.org
From: Nicholas Clark <nick [...] etla.org>
Download (untitled) / with headers
text/plain 10.1k
This is a bug report for perl from nick@ccl4.org, generated with the help of perlbug 1.39 running under perl 5.11.5. ----------------------------------------------------------------- [Please describe your issue here] The regexp engine often reads 1 character beyond the end of the string, before deciding that it doesn't need the value. If the byte 1 beyond the end of the string doesn't exist, then this will SEGV. This can be seen as a bug, or can be seen as wishlist. It's also old, and probably dates from 5.000, if not 1.000. I know that Adrian Enache hit this, but I don't know if there is an open bug report. $ valgrind /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MFile::Map -e 'File::Map::map_anonymous($a, 4096); $a =~ /\0+/' ==2541== Memcheck, a memory error detector. ==2541== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==2541== Using LibVEX rev 1854, a library for dynamic binary translation. ==2541== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==2541== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==2541== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==2541== For more details, rerun with: -v ==2541== ==2541== Invalid read of size 1 ==2541== at 0x69D6DC: S_regmatch (regexec.c:5417) ==2541== by 0x68F284: S_regtry (regexec.c:2474) ==2541== by 0x68C67A: Perl_regexec_flags (regexec.c:2075) ==2541== by 0x54845C: Perl_pp_match (pp_hot.c:1362) ==2541== by 0x4FF223: Perl_runops_debug (dump.c:2049) ==2541== by 0x44B1FF: S_run_body (perl.c:2308) ==2541== by 0x44A6C9: perl_run (perl.c:2233) ==2541== by 0x42007C: main (perlmain.c:117) ==2541== Address 0x4020000 is not stack'd, malloc'd or (recently) free'd ==2541== ==2541== Process terminating with default action of signal 11 (SIGSEGV) ==2541== Access not within mapped region at address 0x4020000 ==2541== at 0x69D6DC: S_regmatch (regexec.c:5417) ==2541== by 0x68F284: S_regtry (regexec.c:2474) ==2541== by 0x68C67A: Perl_regexec_flags (regexec.c:2075) ==2541== by 0x54845C: Perl_pp_match (pp_hot.c:1362) ==2541== by 0x4FF223: Perl_runops_debug (dump.c:2049) ==2541== by 0x44B1FF: S_run_body (perl.c:2308) ==2541== by 0x44A6C9: perl_run (perl.c:2233) ==2541== by 0x42007C: main (perlmain.c:117) ==2541== ==2541== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 2) ==2541== malloc/free: in use at exit: 734,151 bytes in 8,911 blocks. ==2541== malloc/free: 16,092 allocs, 7,181 frees, 1,283,041 bytes allocated. ==2541== For counts of detected errors, rerun with: -v ==2541== searching for pointers to 8,911 not-freed blocks. ==2541== checked 1,027,248 bytes. ==2541== ==2541== LEAK SUMMARY: ==2541== definitely lost: 2,199 bytes in 36 blocks. ==2541== possibly lost: 0 bytes in 0 blocks. ==2541== still reachable: 731,952 bytes in 8,875 blocks. ==2541== suppressed: 0 bytes in 0 blocks. ==2541== Rerun with --leak-check=full to see details of leaked memory. Segmentation fault It's arguably "wishlist" because strictly, the scalar is not well formed, according to the rules of the internals, because it doesn't have a '\0' byte beyond the end. This is usually what saves us. However, it still means that we are reading more than we need, and hence causing cache misses, and potentially even page faults. You can see what the structure of the SVs that File::Map produces with $ /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MDevel::Peek -MFile::Map -e 'File::Map::map_anonymous($a, 16); Dump($a)' SV = PVMG(0xa0b260) at 0x9cc1f8 REFCNT = 1 FLAGS = (SMG,RMG,POK,pPOK) IV = 0 NV = 0 PV = 0x2ae5a1e2b000 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0" CUR = 16 LEN = 0 MAGIC = 0x9cef20 MG_VIRTUAL = 0x2ae5a1e2a480 MG_PRIVATE = 19540 MG_TYPE = PERL_MAGIC_uvar(U) MG_PTR = 0x9ced70 "" and the "problem" again, as dump.c tries to access the byte beyond: $ valgrind /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MDevel::Peek -MFile::Map -e 'File::Map::map_anonymous($a, 4096); Dump($a)' ==2905== Memcheck, a memory error detector. ==2905== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==2905== Using LibVEX rev 1854, a library for dynamic binary translation. ==2905== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==2905== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==2905== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==2905== For more details, rerun with: -v ==2905== SV = PVMG(0x5e52790) at 0x5c6b380 REFCNT = 1 FLAGS = (SMG,RMG,POK,pPOK) IV = 0 NV = 0 PV = 0x401f000 ==2905== Invalid read of size 1 ==2905== at 0x4F12F4: Perl_pv_escape (dump.c:302) ==2905== by 0x4F169A: Perl_pv_pretty (dump.c:383) ==2905== by 0x4F1889: Perl_pv_display (dump.c:419) ==2905== by 0x4FA7A4: Perl_do_sv_dump (dump.c:1655) ==2905== by 0x60568D4: XS_Devel__Peek_Dump (Peek.xs:346) ==2905== by 0x557C72: Perl_pp_entersub (pp_hot.c:2882) ==2905== by 0x4FF223: Perl_runops_debug (dump.c:2049) ==2905== by 0x44B1FF: S_run_body (perl.c:2308) ==2905== by 0x44A6C9: perl_run (perl.c:2233) ==2905== by 0x42007C: main (perlmain.c:117) ==2905== Address 0x4020000 is not stack'd, malloc'd or (recently) free'd ==2905== ==2905== Process terminating with default action of signal 11 (SIGSEGV) ==2905== Access not within mapped region at address 0x4020000 ==2905== at 0x4F12F4: Perl_pv_escape (dump.c:302) ==2905== by 0x4F169A: Perl_pv_pretty (dump.c:383) ==2905== by 0x4F1889: Perl_pv_display (dump.c:419) ==2905== by 0x4FA7A4: Perl_do_sv_dump (dump.c:1655) ==2905== by 0x60568D4: XS_Devel__Peek_Dump (Peek.xs:346) ==2905== by 0x557C72: Perl_pp_entersub (pp_hot.c:2882) ==2905== by 0x4FF223: Perl_runops_debug (dump.c:2049) ==2905== by 0x44B1FF: S_run_body (perl.c:2308) ==2905== by 0x44A6C9: perl_run (perl.c:2233) ==2905== by 0x42007C: main (perlmain.c:117) ==2905== ==2905== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 26 from 2) ==2905== malloc/free: in use at exit: 794,203 bytes in 9,496 blocks. ==2905== malloc/free: 18,210 allocs, 8,714 frees, 5,575,543 bytes allocated. ==2905== For counts of detected errors, rerun with: -v ==2905== searching for pointers to 9,496 not-freed blocks. ==2905== checked 1,089,288 bytes. ==2905== ==2905== LEAK SUMMARY: ==2905== definitely lost: 2,199 bytes in 36 blocks. ==2905== possibly lost: 0 bytes in 0 blocks. ==2905== still reachable: 792,004 bytes in 9,460 blocks. ==2905== suppressed: 0 bytes in 0 blocks. ==2905== Rerun with --leak-check=full to see details of leaked memory. Segmentation fault (sort of can't fix that one). It would be good to change the regexp code in question, which currently looks like this: /* Note that nextchr is a byte even in UTF */ nextchr = UCHARAT(locinput); scan = prog; while (scan != NULL) { The "quicker" fix looks to be set nextchr to 0 if locinput >= PL_regeol The more elegant fix (may not be possible) looks to be to defer reading UCHARAT() until the later code knows that it needs it. It looks like/I assume that the code retains the basic structure of Henry Spencer's regexp engine, and that that was built to work on C NUL terminated strings. Nicholas Clark [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.11.5: Configured by nick at Fri Mar 12 16:31:25 GMT 2010. Summary of my perl5 (revision 5 version 11 subversion 5) configuration: Commit id: 801ed997c7a7937af6eb7d7e84217db79179b4f4 Platform: osname=linux, osvers=2.6.18.8-xenu, archname=x86_64-linux uname='linux eris 2.6.18.8-xenu #1 smp sat oct 3 10:27:42 bst 2009 x86_64 gnulinux ' config_args='-Dusedevel=y -Dcc=ccache gcc -Dld=gcc -Ubincompat5005 -Uinstallusrbinperl -Dcf_email=nick@ccl4.org -Dperladmin=nick@ccl4.org -Dinc_version_list= -Dinc_version_list_init=0 -Doptimize=-g -Uusethreads -Uuse64bitall -Uusemymalloc -Duseperlio -Dprefix=~/Sandpit/snap5.9.x-v5.11.5-59-g801ed99 -Uusevendorprefix -Uvendorprefix=~/Sandpit/snap5.9.x-v5.11.5-59-g801ed99 -Dinstallman1dir=none -Dinstallman3dir=none -Uuserelocatableinc -de' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='ccache gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-g', cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.3.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.7' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -g -L/usr/local/lib -fstack-protector' Locally applied patches: --- @INC for perl 5.11.5: lib /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/lib/perl5/site_perl/5.11.5/x86_64-linux /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/lib/perl5/site_perl/5.11.5 /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/lib/perl5/5.11.5/x86_64-linux /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/lib/perl5/5.11.5 . --- Environment for perl 5.11.5: HOME=/home/nick LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/nick/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/sbin:/sbin:/usr/sbin PERL_BADLANG (unset) SHELL=/bin/bash
Subject: Re: [perl #73542] regexp engine reads 1 beyond the string
Date: Sun, 14 Mar 2010 12:37:28 -0600
To: perl5-porters [...] perl.org
From: karl williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 7.8k
Nicholas Clark (via RT) wrote: Show quoted text
> # New Ticket Created by Nicholas Clark > # Please include the string: [perl #73542] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=73542 > > > > > This is a bug report for perl from nick@ccl4.org, > generated with the help of perlbug 1.39 running under perl 5.11.5. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > The regexp engine often reads 1 character beyond the end of the string, > before deciding that it doesn't need the value. If the byte 1 beyond the > end of the string doesn't exist, then this will SEGV. > > This can be seen as a bug, or can be seen as wishlist. It's also old, and > probably dates from 5.000, if not 1.000. I know that Adrian Enache hit this, > but I don't know if there is an open bug report. > > $ valgrind /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MFile::Map -e 'File::Map::map_anonymous($a, 4096); $a =~ /\0+/' > ==2541== Memcheck, a memory error detector. > ==2541== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. > ==2541== Using LibVEX rev 1854, a library for dynamic binary translation. > ==2541== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. > ==2541== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. > ==2541== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. > ==2541== For more details, rerun with: -v > ==2541== > ==2541== Invalid read of size 1 > ==2541== at 0x69D6DC: S_regmatch (regexec.c:5417) > ==2541== by 0x68F284: S_regtry (regexec.c:2474) > ==2541== by 0x68C67A: Perl_regexec_flags (regexec.c:2075) > ==2541== by 0x54845C: Perl_pp_match (pp_hot.c:1362) > ==2541== by 0x4FF223: Perl_runops_debug (dump.c:2049) > ==2541== by 0x44B1FF: S_run_body (perl.c:2308) > ==2541== by 0x44A6C9: perl_run (perl.c:2233) > ==2541== by 0x42007C: main (perlmain.c:117) > ==2541== Address 0x4020000 is not stack'd, malloc'd or (recently) free'd > ==2541== > ==2541== Process terminating with default action of signal 11 (SIGSEGV) > ==2541== Access not within mapped region at address 0x4020000 > ==2541== at 0x69D6DC: S_regmatch (regexec.c:5417) > ==2541== by 0x68F284: S_regtry (regexec.c:2474) > ==2541== by 0x68C67A: Perl_regexec_flags (regexec.c:2075) > ==2541== by 0x54845C: Perl_pp_match (pp_hot.c:1362) > ==2541== by 0x4FF223: Perl_runops_debug (dump.c:2049) > ==2541== by 0x44B1FF: S_run_body (perl.c:2308) > ==2541== by 0x44A6C9: perl_run (perl.c:2233) > ==2541== by 0x42007C: main (perlmain.c:117) > ==2541== > ==2541== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 2) > ==2541== malloc/free: in use at exit: 734,151 bytes in 8,911 blocks. > ==2541== malloc/free: 16,092 allocs, 7,181 frees, 1,283,041 bytes allocated. > ==2541== For counts of detected errors, rerun with: -v > ==2541== searching for pointers to 8,911 not-freed blocks. > ==2541== checked 1,027,248 bytes. > ==2541== > ==2541== LEAK SUMMARY: > ==2541== definitely lost: 2,199 bytes in 36 blocks. > ==2541== possibly lost: 0 bytes in 0 blocks. > ==2541== still reachable: 731,952 bytes in 8,875 blocks. > ==2541== suppressed: 0 bytes in 0 blocks. > ==2541== Rerun with --leak-check=full to see details of leaked memory. > Segmentation fault > > It's arguably "wishlist" because strictly, the scalar is not well formed, > according to the rules of the internals, because it doesn't have a '\0' > byte beyond the end. This is usually what saves us. However, it still means > that we are reading more than we need, and hence causing cache misses, and > potentially even page faults. > > > You can see what the structure of the SVs that File::Map produces with > > $ /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MDevel::Peek -MFile::Map -e 'File::Map::map_anonymous($a, 16); Dump($a)' > SV = PVMG(0xa0b260) at 0x9cc1f8 > REFCNT = 1 > FLAGS = (SMG,RMG,POK,pPOK) > IV = 0 > NV = 0 > PV = 0x2ae5a1e2b000 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0" > CUR = 16 > LEN = 0 > MAGIC = 0x9cef20 > MG_VIRTUAL = 0x2ae5a1e2a480 > MG_PRIVATE = 19540 > MG_TYPE = PERL_MAGIC_uvar(U) > MG_PTR = 0x9ced70 "" > > and the "problem" again, as dump.c tries to access the byte beyond: > > $ valgrind /home/nick/Sandpit/snap5.9.x-v5.11.5-59-g801ed99/bin/perl5.11.5 -MDevel::Peek -MFile::Map -e 'File::Map::map_anonymous($a, 4096); Dump($a)' > ==2905== Memcheck, a memory error detector. > ==2905== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. > ==2905== Using LibVEX rev 1854, a library for dynamic binary translation. > ==2905== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. > ==2905== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. > ==2905== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. > ==2905== For more details, rerun with: -v > ==2905== > SV = PVMG(0x5e52790) at 0x5c6b380 > REFCNT = 1 > FLAGS = (SMG,RMG,POK,pPOK) > IV = 0 > NV = 0 > PV = 0x401f000 ==2905== Invalid read of size 1 > ==2905== at 0x4F12F4: Perl_pv_escape (dump.c:302) > ==2905== by 0x4F169A: Perl_pv_pretty (dump.c:383) > ==2905== by 0x4F1889: Perl_pv_display (dump.c:419) > ==2905== by 0x4FA7A4: Perl_do_sv_dump (dump.c:1655) > ==2905== by 0x60568D4: XS_Devel__Peek_Dump (Peek.xs:346) > ==2905== by 0x557C72: Perl_pp_entersub (pp_hot.c:2882) > ==2905== by 0x4FF223: Perl_runops_debug (dump.c:2049) > ==2905== by 0x44B1FF: S_run_body (perl.c:2308) > ==2905== by 0x44A6C9: perl_run (perl.c:2233) > ==2905== by 0x42007C: main (perlmain.c:117) > ==2905== Address 0x4020000 is not stack'd, malloc'd or (recently) free'd > ==2905== > ==2905== Process terminating with default action of signal 11 (SIGSEGV) > ==2905== Access not within mapped region at address 0x4020000 > ==2905== at 0x4F12F4: Perl_pv_escape (dump.c:302) > ==2905== by 0x4F169A: Perl_pv_pretty (dump.c:383) > ==2905== by 0x4F1889: Perl_pv_display (dump.c:419) > ==2905== by 0x4FA7A4: Perl_do_sv_dump (dump.c:1655) > ==2905== by 0x60568D4: XS_Devel__Peek_Dump (Peek.xs:346) > ==2905== by 0x557C72: Perl_pp_entersub (pp_hot.c:2882) > ==2905== by 0x4FF223: Perl_runops_debug (dump.c:2049) > ==2905== by 0x44B1FF: S_run_body (perl.c:2308) > ==2905== by 0x44A6C9: perl_run (perl.c:2233) > ==2905== by 0x42007C: main (perlmain.c:117) > ==2905== > ==2905== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 26 from 2) > ==2905== malloc/free: in use at exit: 794,203 bytes in 9,496 blocks. > ==2905== malloc/free: 18,210 allocs, 8,714 frees, 5,575,543 bytes allocated. > ==2905== For counts of detected errors, rerun with: -v > ==2905== searching for pointers to 9,496 not-freed blocks. > ==2905== checked 1,089,288 bytes. > ==2905== > ==2905== LEAK SUMMARY: > ==2905== definitely lost: 2,199 bytes in 36 blocks. > ==2905== possibly lost: 0 bytes in 0 blocks. > ==2905== still reachable: 792,004 bytes in 9,460 blocks. > ==2905== suppressed: 0 bytes in 0 blocks. > ==2905== Rerun with --leak-check=full to see details of leaked memory. > Segmentation fault > > (sort of can't fix that one). > > It would be good to change the regexp code in question, which currently > looks like this: > > /* Note that nextchr is a byte even in UTF */ > nextchr = UCHARAT(locinput); > scan = prog; > while (scan != NULL) { > > > The "quicker" fix looks to be set nextchr to 0 if locinput >= PL_regeol > The more elegant fix (may not be possible) looks to be to defer reading > UCHARAT() until the later code knows that it needs it. > > It looks like/I assume that the code retains the basic structure of Henry > Spencer's regexp engine, and that that was built to work on C NUL terminated > strings. > > Nicholas Clark >
My guess is that it won't properly match a string that contains a NULL.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #73542] regexp engine reads 1 beyond the string
Date: Mon, 15 Mar 2010 13:39:27 +0000
To: karl williamson <public [...] khwilliamson.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 391b
On Sun, Mar 14, 2010 at 12:37:28PM -0600, karl williamson wrote: Show quoted text
> >It looks like/I assume that the code retains the basic structure of Henry > >Spencer's regexp engine, and that that was built to work on C NUL > >terminated > >strings.
Show quoted text
> My guess is that it won't properly match a string that contains a NULL.
That is my suspicion too, but I don't have any test cases. Nicholas Clark
Fixed by 7016d6ebb4
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 187b
On Wed Sep 26 09:07:30 2012, sprout wrote: Show quoted text
> Fixed by 7016d6ebb4
Thanks Dave, for fixing this. (And thanks, sprout, for being on top of which fixes map to which tickets.) Nicholas Clark


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org