Skip Menu |
Report information
Id: 129897
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: jorma.laaksonen [at] aalto.fi
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: High
Type: core
Perl Version: 5.22.1
Fixed In: (no value)



Date: Sun, 16 Oct 2016 22:04:42 +0300
From: Jorma Laaksonen <jorma.laaksonen [...] aalto.fi>
Subject: Unexpected behavior with a regular expression
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 11.5k
This is a bug report for perl from jorma.laaksonen@aalto.fi, generated with the help of perlbug 1.40 running under perl 5.22.1. ----------------------------------------------------------------- [Please describe your issue here] #! /usr/bin/perl use strict; use diagnostics; my @a = "riiaan" =~ /.*?(xs|p)*(a(a)|i(i))n/; my @b = ($&, @a); for my $i ( 0 .. $#b ) { if (defined $b[$i]) { print "$i $-[$i] $+[$i] [$b[$i]]\n"; } else { print "$i\n"; } } Running it shows me: 0 0 6 [riiaan] 1 2 3 5 [aa] 3 4 5 [a] 4 2 3 [i] so the last regexp group (i) seems to have been matched as "ri(i)aan" even though it should not have matched at all. The match can be avoided eg. by removing "?", "x" or the first "|" from the regexp. Then the output is correct: 0 0 6 [riiaan] 1 2 3 5 [aa] 3 4 5 [a] 4 Any hint if I'm doing something wrong or not doing something I should do? Yours, Jorma [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=high --- Site configuration information for perl 5.22.1: Configured by Debian Project at Sun Mar 13 11:54:18 UTC 2016. Summary of my perl5 (revision 5 version 22 subversion 1) configuration: Platform: osname=linux, osvers=3.16.0, archname=x86_64-linux-gnu-thread-multi uname='linux localhost 3.16.0 #1 smp debian 3.16.0 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.22 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.22 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.22 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.22.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.22.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.22.1' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='x86_64-linux-gnu-gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='5.3.1 20160311', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='x86_64-linux-gnu-gcc', ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=libc-2.21.so, so=so, useshrplib=true, libperl=libperl.so.5.22 gnulibc_version='2.21' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector-strong' Locally applied patches: DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking DEBPKG:fixes/respect_umask - Respect umask during installation DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.22.1-9 in patchlevel.h DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags} DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/758471 Pass LD settings through to subdirectories DEBPKG:fixes/pod_man_reproducible_date - http://bugs.debian.org/759405 Support POD_MAN_DATE in Pod::Man for the left-hand footer DEBPKG:debian/locale-robustness - http://bugs.debian.org/782068 [perl #124310] Make t/run/locale.t survive missing locales masked by LC_ALL DEBPKG:fixes/podman-utc - http://bugs.debian.org/780259 Make the embedded date from Pod::Man reproducible DEBPKG:fixes/podman-utc-docs - http://bugs.debian.org/780259 Documentation and test suite updates for UTC fix DEBPKG:fixes/podman-empty-date - http://bugs.debian.org/780259 Support an empty POD_MAN_DATE environment variable DEBPKG:fixes/podman-pipe - http://bugs.debian.org/777405 Better errors for man pages from standard input DEBPKG:debian/pod2man-customized - Update porting/customized.dat for pod2man modifications DEBPKG:debian/makemaker-manext - http://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers DEBPKG:debian/makemaker_customized - Update t/porting/customized.dat for files patched in Debian DEBPKG:debian/do-not-record-build-date - [6baa8db] http://bugs.debian.org/774422 [perl #125830] Allow overriding the compile time in "perl -V" output DEBPKG:fixes/podman-source-date-epoch - http://bugs.debian.org/801621 Make Pod::Man honor the SOURCE_DATE_EPOCH environment variable DEBPKG:fixes/podman-source-date-epoch-cleanups - http://bugs.debian.org/801621 Coding style and documentation for SOURCE_EPOCH_DATE DEBPKG:fixes/podman-source-date-epoch-testfix - http://bugs.debian.org/807086 Guard for building with SOURCE_DATE_EPOCH or POD_MAN_DATE set DEBPKG:debian/devel-ppport-reproducibility - http://bugs.debian.org/801523 Sort the list of XS code files when generating RealPPPort.xs DEBPKG:fixes/encode-unicode-bom - http://bugs.debian.org/798727 [rt.cpan.org #107043] Address https://rt.cpan.org/Public/Bug/Display.html?id=107043 DEBPKG:debian/encode-unicode-bom-doc - http://bugs.debian.org/798727 Document Debian backport of Encode::Unicode fix DEBPKG:debian/kfreebsd-softupdates - http://bugs.debian.org/796798 Work around Debian Bug#796798 DEBPKG:fixes/autodie-scope - http://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub DEBPKG:debian/debugperl-compat-fix - [perl #127212] http://bugs.debian.org/810326 Disable PERL_TRACK_MEMPOOL for debugging builds DEBPKG:fixes/CVE-2015-8607_file_spec_taint_fix - http://bugs.debian.org/810719 [perl #126862] ensure File::Spec::canonpath() preserves taint DEBPKG:fixes/mkstemp-umask - http://bugs.debian.org/810924 [perl #127322] [e57270b] Fix umask for mkstemp(3) calls DEBPKG:fixes/crosscompile-no-targethost - [perl #127234] Fix the Configure escape with usecrosscompile but no targethost DEBPKG:fixes/podlators-no-encode - [rt.cpan.org #111156] Degrade gracefully if utf8 is requested but Encode is not available DEBPKG:debian/cross-time-hires - [rt.cpan.org #111391] Add an environment variable to skip running configuration probes DEBPKG:fixes/encode-unicode-pod - Unicode.pm: Fix POD error DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize DEBPKG:fixes/ok-pod - Added encoding for pod. DEBPKG:fixes/CVE-2016-2381_duplicate_env - remove duplicate environment variables from environ --- @INC for perl 5.22.1: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base . --- Environment for perl 5.22.1: HOME=/home/jorma LANG=fi_FI.UTF-8 LANGUAGE=en_US LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/jorma/bin:/home/jorma/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/sbin:/home/jorma/hosts PERL_BADLANG (unset) SHELL=/bin/tcsh -- Jorma Laaksonen jorma.laaksonen@aalto.fi Teaching Researcher http://users.ics.aalto.fi/jorma/ Dr. of Science in Technology mob. +358-50-3058719 Department of Computer Science fax. +358-9-47023277 Aalto University School of Science Konemiehentie 2, PO Box 15400, FI-00076 Aalto, Finland
From: Zefram <zefram [...] fysh.org>
To: perl5-porters [...] perl.org
Subject: Re: [perl #129897] Unexpected behavior with a regular expression
Date: Mon, 17 Oct 2016 03:50:44 +0100
Download (untitled) / with headers
text/plain 199b
Jorma Laaksonen wrote: Show quoted text
>Any hint if I'm doing something wrong or not doing something I should >do?
No, that's all supported usage. You're quite right about the behaviour being erroneous. -zefram
Date: Mon, 17 Oct 2016 18:37:17 +0200
CC: Perl5 Porteros <perl5-porters [...] perl.org>
To: Zefram <zefram [...] fysh.org>, Dave Mitchell <davem [...] iabyn.com>
Subject: Re: [perl #129897] Unexpected behavior with a regular expression
From: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 3.7k
On 17 October 2016 at 04:50, Zefram <zefram@fysh.org> wrote: Show quoted text
> Jorma Laaksonen wrote:
>>Any hint if I'm doing something wrong or not doing something I should >>do?
> > No, that's all supported usage. You're quite right about the behaviour > being erroneous.
Agreed. It seems to be a bug about unwinding .*? although it also interacts with TRIE code in ways I dont entirely understand. (Making the code not produce a TRIE fixes the bug, but on the other hand, so does removing the .*?) Nevertheless I can fix the bug (while possibly introducing new bugs) with the code in yves/fix_129897 c09f087940c61f3b6e57e7cf5e5b7a4faa683420 I would prefer that Dave have a look into this, as I dont entirely understand why my patch fixes things for this case, but that in most other cases it is not needed. The key point is that when we fail a .*? match we should unwind and reset any buffers we matched after our current point. But STAR and PLUS do not initialize the proper member fields so that we can do this unwinding properly. I have to admit that this bug is quite surprising. I would have thought that if we have a bug like this that we fail our regex tests completely, but apparently not. Of course, it may have to do with the fact that the form of this bug is incredibly horrible. Having an unanchored .* at the beginning of a pattern is a good way to make your regex quadratic on failure. (We may trigger an optimisation that automagically adds the anchor, and we may not....) So it may simply be that most times we dont trigger this bug, but I admit its not obvious to me why not. Yves commit c09f087940c61f3b6e57e7cf5e5b7a4faa683420 Author: Yves Orton <demerphq@gmail.com> Date: Mon Oct 17 18:29:43 2016 +0200 provisional patch to fix [perl #129897] diff --git a/regexec.c b/regexec.c index e9e23f2..0cde487 100644 --- a/regexec.c +++ b/regexec.c @@ -7868,6 +7868,8 @@ NULL case STAR: /* /A*B/ where A is width 1 char */ ST.paren = 0; + ST.lastparen = rex->lastparen; + ST.lastcloseparen = rex->lastcloseparen; ST.min = 0; ST.max = REG_INFTY; scan = NEXTOPER(scan); @@ -7875,6 +7877,8 @@ NULL case PLUS: /* /A+B/ where A is width 1 char */ ST.paren = 0; + ST.lastparen = rex->lastparen; + ST.lastcloseparen = rex->lastcloseparen; ST.min = 1; ST.max = REG_INFTY; scan = NEXTOPER(scan); @@ -7900,6 +7904,8 @@ NULL ST.paren = 0; ST.min = ARG1(scan); /* min to match */ ST.max = ARG2(scan); /* max to match */ + ST.lastparen = rex->lastparen; + ST.lastcloseparen = rex->lastcloseparen; scan = NEXTOPER(scan) + NODE_STEP_REGNODE; repeat: /* @@ -8013,7 +8019,7 @@ NULL /* failed to find B in a non-greedy match where c1,c2 valid */ REGCP_UNWIND(ST.cp); - if (ST.paren) { + if ( 1 || ST.paren ) { UNWIND_PAREN(ST.lastparen, ST.lastcloseparen); } /* Couldn't or didn't -- move forward. */ @@ -8086,7 +8092,7 @@ NULL /* failed to find B in a non-greedy match where c1,c2 invalid */ REGCP_UNWIND(ST.cp); - if (ST.paren) { + if ( 1 || ST.paren ) { UNWIND_PAREN(ST.lastparen, ST.lastcloseparen); } /* failed -- move forward one */ @@ -8147,7 +8153,7 @@ NULL /* failed to find B in a greedy match */ REGCP_UNWIND(ST.cp); - if (ST.paren) { + if ( 1 || ST.paren ) { UNWIND_PAREN(ST.lastparen, ST.lastcloseparen); } /* back up. */ -- perl -Mre=debug -e "/just|another|perl|hacker/"
Date: Mon, 17 Oct 2016 23:40:15 +0200
From: demerphq <demerphq [...] gmail.com>
CC: Perl5 Porteros <perl5-porters [...] perl.org>
To: Zefram <zefram [...] fysh.org>, Dave Mitchell <davem [...] iabyn.com>
Subject: Re: [perl #129897] Unexpected behavior with a regular expression
Download (untitled) / with headers
text/plain 1.9k
On 17 October 2016 at 18:37, demerphq <demerphq@gmail.com> wrote: Show quoted text
> On 17 October 2016 at 04:50, Zefram <zefram@fysh.org> wrote:
>> Jorma Laaksonen wrote:
>>>Any hint if I'm doing something wrong or not doing something I should >>>do?
>> >> No, that's all supported usage. You're quite right about the behaviour >> being erroneous.
> > Agreed. > > It seems to be a bug about unwinding .*? although it also interacts > with TRIE code in ways I dont entirely understand. (Making the code > not produce a TRIE fixes the bug, but on the other hand, so does > removing the .*?) > > Nevertheless I can fix the bug (while possibly introducing new bugs) > with the code in yves/fix_129897 > c09f087940c61f3b6e57e7cf5e5b7a4faa683420 > > I would prefer that Dave have a look into this, as I dont entirely > understand why my patch fixes things for this case, but that in most > other cases it is not needed. > > The key point is that when we fail a .*? match we should unwind and > reset any buffers we matched after our current point. But STAR and > PLUS do not initialize the proper member fields so that we can do this > unwinding properly. > > I have to admit that this bug is quite surprising. I would have > thought that if we have a bug like this that we fail our regex tests > completely, but apparently not. > > Of course, it may have to do with the fact that the form of this bug > is incredibly horrible. Having an unanchored .* at the beginning of a > pattern is a good way to make your regex quadratic on failure. (We may > trigger an optimisation that automagically adds the anchor, and we may > not....) > > So it may simply be that most times we dont trigger this bug, but I > admit its not obvious to me why not.
Cause my analysis was wrong... Dave, forget it, nothing you need to poke into. Put simply, the "short-circuit" logic in the TRIE code should not trigger when there is a jump table. I have a patch ready, but i am having issues talking to the master repo right now. Yves
Download (untitled) / with headers
text/plain 2.2k
On Mon Oct 17 14:40:47 2016, demerphq wrote: Show quoted text
> On 17 October 2016 at 18:37, demerphq <demerphq@gmail.com> wrote:
> > On 17 October 2016 at 04:50, Zefram <zefram@fysh.org> wrote:
> >> Jorma Laaksonen wrote:
> >>> Any hint if I'm doing something wrong or not doing something I > >>> should > >>> do?
> >> > >> No, that's all supported usage. You're quite right about the > >> behaviour > >> being erroneous.
> > > > Agreed. > > > > It seems to be a bug about unwinding .*? although it also interacts > > with TRIE code in ways I dont entirely understand. (Making the code > > not produce a TRIE fixes the bug, but on the other hand, so does > > removing the .*?) > > > > Nevertheless I can fix the bug (while possibly introducing new bugs) > > with the code in yves/fix_129897 > > c09f087940c61f3b6e57e7cf5e5b7a4faa683420 > > > > I would prefer that Dave have a look into this, as I dont entirely > > understand why my patch fixes things for this case, but that in most > > other cases it is not needed. > > > > The key point is that when we fail a .*? match we should unwind and > > reset any buffers we matched after our current point. But STAR and > > PLUS do not initialize the proper member fields so that we can do > > this > > unwinding properly. > > > > I have to admit that this bug is quite surprising. I would have > > thought that if we have a bug like this that we fail our regex tests > > completely, but apparently not. > > > > Of course, it may have to do with the fact that the form of this bug > > is incredibly horrible. Having an unanchored .* at the beginning of a > > pattern is a good way to make your regex quadratic on failure. (We > > may > > trigger an optimisation that automagically adds the anchor, and we > > may > > not....) > > > > So it may simply be that most times we dont trigger this bug, but I > > admit its not obvious to me why not.
> > Cause my analysis was wrong... Dave, forget it, nothing you need to > poke into. > > Put simply, the "short-circuit" logic in the TRIE code should not > trigger when there is a jump table. > > I have a patch ready, but i am having issues talking to the master > repo right now. > > Yves
Thank you for your rapid responses and the patch. I'm happy to confirm that the fix has removed all problems I had associated to this behavior or perl. Thanks, Jorma
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 704b
On Mon, 17 Oct 2016 14:40:47 -0700, demerphq wrote: Show quoted text
> I have a patch ready, but i am having issues talking to the master > repo right now.
It appears this did eventually go in as cfe04db5: regexec.c: fix perl #129897: trie short circuit breaks capture buffers There is an optimisation when a trie matches only one thing which causes it to fall through to the following code without setting up a stack unwind frame. This breaks if we are using a trie jump table where we might change state that will need to be unwound on failure. .. with a followup to fix the test in ac2365fd. I'm setting it to 'pending release' - Yves, please correct it if that was inappropriate. Hugo
Download (untitled) / with headers
text/plain 313b
Thank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been resolved. Perl 5.26.0 may be downloaded via: https://metacpan.org/release/XSAWYERX/perl-5.26.0 If you find that the problem persists, feel free to reopen this ticket.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org