Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex stores wrong value in $1, $^N etc, thoroughly corrupting parsers #17052

Open
p5pRT opened this issue Jun 18, 2019 · 4 comments
Open

Regex stores wrong value in $1, $^N etc, thoroughly corrupting parsers #17052

p5pRT opened this issue Jun 18, 2019 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 18, 2019

Migrated from rt.perl.org#134209 (status was 'open')

Searchable as RT134209$

@p5pRT
Copy link
Author

p5pRT commented Jun 18, 2019

From @jlokier

Created by @jlokier

Hi,

The following short Perl program shows the error​:

  sub S { "A" =~ /(.)(?{})/; }
  "xyz" =~ /(?​:(.)(?{say $1;S()}))*/;

Output should be​:

  x
  y
  z

Actual output is​:

  x
  A
  A

This is because the value stored in $1 is incorrect.

It's not even part of the matched string!

If we change the $1 to $^N, as we might use in better quality code,
the output is wrong in the same way.

As you can imagine, breaking $1 and $^N has a severe impact on any
regex-driven parser which uses code expressions in a regex to collect
data, if the code expressions call anything that happens to do any
pattern matching itself.

Removing the (?{}) from the inner regex in S() stops this bug from
happening.

I've verified this in Perl 5.22.1, Perl 5.26.1 and Perl 5.28.1, hoping
that such a severe error would have been noticed and fixed by a later
version.

Now I am struggling to figure out how to rewrite all my parsers,
knowing that $1 and $^N are subtly unreliable in every Perl version :-/

Any advice on a clean workaround would be welcome.

(The obvious advice to use //g instead of /(...)*/ does work for
simple lists, but thhat doesn't compose as part of a larger regex
where a list of things is one part of a pattern.)

Thanks,
-- Jamie Lokier

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.28.1:

Configured by Ubuntu at Sun Mar 31 11:51:22 UTC 2019.

Summary of my perl5 (revision 5 version 28 subversion 1) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.0
    archname=x86_64-linux-gnu-thread-multi
    uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-IfZKzZ/perl-5.28.1=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.28 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.28 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.28 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.28.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.28.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint
-Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.28.1'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='x86_64-linux-gnu-gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion=''
    gccversion='8.3.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='x86_64-linux-gnu-gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.29.so
    so=so
    useshrplib=true
    libperl=libperl.so.5.28
    gnulibc_version='2.29'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.28.1-6 in patchlevel.h
    DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers
    DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798
    DEBPKG:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub
    DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize
    DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd
    DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint
    DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO
    DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units
    DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site
    DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems
    DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters
    DEBPKG:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068.
    DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa
    DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4
    DEBPKG:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less
    DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes
    DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/disable-stack-check - https://bugs.debian.org/902779 [perl #133327] Disable debugperl stack extension checks for binary compatibility with perl
    DEBPKG:debian/gdbm-fatal - [perl #133295] https://bugs.debian.org/904005 Temporarily skip GDBM_File fatal.t for gdbm >= 1.15 compatibility
    DEBPKG:fixes/storable-recursion - https://bugs.debian.org/912900 [perl #133326] [120060c] (perl #133326) fix and clarify handling of recurs_sv.
    DEBPKG:fixes/caretx-fallback - https://bugs.debian.org/913347 [perl #133573] [03b94aa] RT#133573: $^X fallback when platform-specific technique fails
    DEBPKG:fixes/eumm-usrmerge - https://bugs.debian.org/913637 Avoid mangling /bin non-perl shebangs on merged-/usr systems
    DEBPKG:fixes/errno-include-path - [6c5080f] [perl #133662] https://bugs.debian.org/875921 Make Errno_pm.PL compatible with /usr/include/<ARCH>/errno.h
    DEBPKG:fixes/kfreebsd-renameat - [a3c63a9] https://bugs.debian.org/912521 [perl #133668] Also work around renameat() kernel bug on GNU/kFreeBSD
    DEBPKG:fixes/time-local-2020 - https://bugs.debian.org/915209 [rt.cpan.org #124787] Fix Time::Local tests
    DEBPKG:fixes/inplace-editing-bugfix/part1 - https://bugs.debian.org/914651 (perl #133659) move argvout cleanup to a new function
    DEBPKG:fixes/inplace-editing-bugfix/part2 - https://bugs.debian.org/914651 (perl #133659) tests for global destruction handling of inplace editing
    DEBPKG:fixes/inplace-editing-bugfix/part3 - https://bugs.debian.org/914651 (perl #133659) make an in-place edit successful if the exit status is zero
    DEBPKG:fixes/fix-manifest-failures - https://bugs.debian.org/914962 Fix t/porting/manifest.t failures when run in a foreign git checkout
    DEBPKG:fixes/pipe-open-bugfix/part1 - [perl #133726] https://bugs.debian.org/916313 Always mark pipe in pipe-open as inherit-on-exec
    DEBPKG:fixes/pipe-open-bugfix/part2 - [perl #133726] https://bugs.debian.org/916313 Always mark pipe in list pipe-open as inherit-on-exec
    DEBPKG:fixes/storable-probing/prereq1 - [3f4cad1] Storable: fix for strawberry build failures:
    DEBPKG:fixes/storable-probing/prereq2 - [perl #133411] [edf639f] (perl #133411) don't try to load Storable with -Dusecrosscompile
    DEBPKG:fixes/storable-probing/disable-probing - https://bugs.debian.org/914133 [perl #133708] [2a0bbd3] (perl #133708) remove build-time probing for stack limits for Storable
    DEBPKG:debian/perlbug-editor - https://bugs.debian.org/922609 Use "editor" as the default perlbug editor, as per Debian policy
    DEBPKG:fixes/posix-mbrlen - [25d7b7a] https://bugs.debian.org/924517 [perl #133928] Fix POSIX::mblen mbstate_t initialization on threaded perls with glibc


@INC for perl 5.28.1:
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.28.1
    /usr/local/share/perl/5.28.1
    /usr/lib/x86_64-linux-gnu/perl5/5.28
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl/5.28
    /usr/share/perl/5.28
    /usr/local/lib/site_perl
    /usr/lib/x86_64-linux-gnu/perl-base


Environment for perl 5.28.1:
    HOME=/root
    LANG=en_GB.UTF-8
    LANGUAGE=en_GB:en_US:en
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 19, 2019

From @tonycoz

On Tue, 18 Jun 2019 13​:37​:38 -0700, jamie wrote​:

[Please describe your issue here]

Hi,

The following short Perl program shows the error​:

sub S { "A" =~ /(.)(?{})/; }
"xyz" =~ /(?​:(.)(?{say $1;S()}))*/;

Output should be​:

x
y
z

Actual output is​:

x
A
A

This is because the value stored in $1 is incorrect.

I'm not suggesting this isn't a bug, but has it ever worked the way you expect?

I tried perls back to 5.10​:

$ ~/perl/5.10.0-debug/bin/perl ../134209.pl
x
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c", line 5098 at (re_eval 2) line 1.

A non-debug perl 5.10 produced corrupted output and an out of memory error.

$ ~/perl/5.12.0/bin/perl ../134209.pl
x
/
x

$ ~/perl/5.14.2-debug/bin/perl ../134209.pl
x
A
A

$ ~/perl/5.16.3/bin/perl ../134209.pl
x
A
A

Tony

@p5pRT
Copy link
Author

p5pRT commented Jun 19, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 20, 2019

From @jlokier

Tony Cook via RT wrote​:

On Tue, 18 Jun 2019 13​:37​:38 -0700, jamie wrote​:

The following short Perl program shows the error​:

sub S { "A" =~ /(.)(?{})/; }
"xyz" =~ /(?​:(.)(?{say $1;S()}))*/;

Output should be​:

x
y
z

Actual output is​:

x
A
A

This is because the value stored in $1 is incorrect.

I'm not suggesting this isn't a bug, but has it ever worked the way you expect?

No, I wouldn't have expected everything to be fine prior to Perl 5.18.

I had thought regex variables and (?{...}) blocks in general to be
largely reliable since the big regex engine update of Perl 5.18, and
quite buggy prior to that. Before that we couldn't trust lexicals
inside (?{...}) at all, and I used to use only global variables and
very simple code fragments within those scopes. Certainly no calling
subroutines in those days, let alone pattern matching on $^N.

From perl5180delta(1)​:

  - The implementation of code blocks in regular expressions, such as
  "(?{})" and "(??{})", has been heavily reworked to eliminate a
  whole slew of bugs.

  - Lexical variables are now sane as regards scope, recursion and
  closure behavior.

So I've always thought lexical variables and scopes generally were
fixed in 5.18. I haven't noticed other problems with them, and I use
them quite a bit. (This made Perl 5.18 the minimum version my code
requires, until I adopted subroutines signatures.)

Variables like $^N and $1 are so much more "straightforward" and
"core" than closures, that with someone having put in the effort to do
closures and (?{...}) correctly, I'm really surprised to recently
notice bogus values in them.

So surprised, in fact, that I didn't notice with one parser alone,
until I started finding errors in the data, when multiple parsers are
used with one calling another - years after getting used to $1, @​-, @​+
and $^N "doing what they say on the tin".

Best,
-- Jamie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants