Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use re 'debug' ignored for uniprop embedded regexps #17026

Closed
p5pRT opened this issue May 30, 2019 · 2 comments
Closed

use re 'debug' ignored for uniprop embedded regexps #17026

p5pRT opened this issue May 30, 2019 · 2 comments

Comments

@p5pRT
Copy link

p5pRT commented May 30, 2019

Migrated from rt.perl.org#134150 (status was 'new')

Searchable as RT134150$

@p5pRT
Copy link
Author

p5pRT commented May 30, 2019

From @tonycoz

Created by @tonycoz

use re 'debug' doesn't enable debug output for regexps embedded in
unicode properties, for example​:

$ ./perl -Ilib -Mre=debug -e 'qr!\p{numeric_value=/\A[0-5]\z/}!'
Compiling REx "\p{numeric_value=/\A[0-5]\z/}"
The Unicode property wildcards feature is experimental at -e line 1.
~ tying lastbr ANYOF[0-5\xB2\xB3\xB9][0660-0665 06F0-06F5 07C0-07C5 0966-096B 09E6-09EB 0A66-0A6B 0AE6-0AEB 0B66-0B6B 0BE6-0BEB 0C66-0C6B 0C78-0C7E 0CE6-0CEB 0D66-0D6B 0DE6-0DEB 0E50-0E55 0ED0-0ED5 0F20-0F25 1040-1045 1090-1095 1369-136D 17E0-17E5 17F0-17F5 1810-1815 1946-194B 19D0-19D5 19DA...] (1) to ender END (11) offset 10
Final program​:
  1​: ANYOF[0-5\xB2\xB3\xB9][0660-0665 06F0-06F5 07C0-07C5 0966-096B 09E6-09EB 0A66-0A6B 0AE6-0AEB 0B66-0B6B 0BE6-0BEB 0C66-0C6B 0C78-0C7E 0CE6-0CEB 0D66-0D6B 0DE6-0DEB 0E50-0E55 0ED0-0ED5 0F20-0F25 1040-1045 1090-1095 1369-136D 17E0-17E5 17F0-17F5 1810-1815 1946-194B 19D0-19D5 19DA...] (11)
  11​: END (0)
stclass ANYOF[0-5\xB2\xB3\xB9][0660-0665 06F0-06F5 07C0-07C5 0966-096B 09E6-09EB 0A66-0A6B 0AE6-0AEB 0B66-0B6B 0BE6-0BEB 0C66-0C6B 0C78-0C7E 0CE6-0CEB 0D66-0D6B 0DE6-0DEB 0E50-0E55 0ED0-0ED5 0F20-0F25 1040-1045 1090-1095 1369-136D 17E0-17E5 17F0-17F5 1810-1815 1946-194B 19D0-19D5 19DA...] minlen 1
Freeing REx​: "\p{numeric_value=/\A[0-5]\z/}"

But -Dr does produce such output​:

$ ./perl -Ilib -Dr -e 'qr!\p{numeric_value=/\A[0-5]\z/}!' 2>&1 | head -15
Compiling REx "\p{numeric_value=/\A[0-5]\z/}"
The Unicode property wildcards feature is experimental at -e line 1.
Compiling REx "(?iaa​:\A[0-5]\z)"
~ tying lastbr SBOL /\A/ (1) to ender END (13) offset 12
rarest char
at 0
Final program​:
  1​: SBOL /\A/ (2)
  2​: ANYOF[0-5] (12)
  12​: EOS (13)
  13​: END (0)
anchored ""$ at 1..1 stclass ANYOF[0-5] anchored(SBOL) minlen 1
Matching REx "(?iaa​:\A[0-5]\z)" against "-1/2"
  0 <> <-1/2> | 0| 1​:SBOL /\A/(2)
  0 <> <-1/2> | 0| 2​:ANYOF[0-5](12)

which doesn't match the documentation.

It also means that a user with only a non-DEBUGGING perl (as is
typical) can't use C<use re 'debug';> to trace compilation and
execution of the embedded regular expression.

I mentioned this to Karl who suggested a ticket for further
discussion.

Patch with fix attached.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.31.1:

Configured by tony at Thu May 30 09:51:48 AEST 2019.

Summary of my perl5 (revision 5 version 31 subversion 1) configuration:
  Derived from: fd0b39e2c8e08d180653e64e96276216d2b606ac
  Platform:
    osname=linux
    osvers=4.9.0-8-amd64
    archname=x86_64-linux
    uname='linux mars 4.9.0-8-amd64 #1 smp debian 4.9.130-2 (2018-10-27) x86_64 gnulinux '
    config_args='-des -Dusedevel -DDEBUGGING -Doptimize=-O0 -g'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O0 -g'
    cppflags='-fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='6.3.0 20170516'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/6/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.24.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.24'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O0 -g -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    uncommitted-changes


@INC for perl 5.31.1:
    lib
    /usr/local/lib/perl5/site_perl/5.31.1/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.31.1
    /usr/local/lib/perl5/5.31.1/x86_64-linux
    /usr/local/lib/perl5/5.31.1


Environment for perl 5.31.1:
    HOME=/home/tony
    LANG=en_AU.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/tony/perl5/perlbrew/bin:/home/tony/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.43
    PERLBREW_HOME=/home/tony/.perlbrew
    PERLBREW_MANPATH=
    PERLBREW_PATH=/home/tony/perl5/perlbrew/bin
    PERLBREW_ROOT=/home/tony/perl5/perlbrew
    PERLBREW_VERSION=0.67
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 30, 2019

From @tonycoz

0001-allow-use-re-debug-to-work-on-p-embedded-regexps.patch
From f996d1d38beb139cf8f0d00a989a4bd3ca73b7af Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 30 May 2019 14:24:28 +1000
Subject: allow use re 'debug' to work on \p{} embedded regexps

---
 regcomp.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 36f5afff71..677e5b8049 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -22759,7 +22759,7 @@ Perl_parse_uniprop_string(pTHX_
                                               (unsigned) subpattern_len,
                                               name + i);
                 subpattern = sv_2mortal(subpattern);
-                subpattern_re = re_compile(subpattern, 0);
+                subpattern_re = pregcomp(subpattern, 0);
                 assert(subpattern_re);  /* Should have died if didn't compile
                                          successfully */
 
@@ -22770,12 +22770,13 @@ Perl_parse_uniprop_string(pTHX_
                     const Size_t len = strlen(entry);
                     SV* entry_sv = newSVpvn_flags(entry, len, SVs_TEMP);
 
-                    if (pregexec(subpattern_re,
-                                 (char *) entry,
-                                 (char *) entry + len,
-                                 (char *) entry, 0,
-                                 entry_sv,
-                                 0))
+                    if (CALLREGEXEC(subpattern_re,
+                                    (char *) entry,
+                                    (char *) entry + len,
+                                    (char *) entry, 0,
+                                    entry_sv,
+                                    NULL,
+                                    0))
                     { /* Here, matched.  Add to the returned list */
                         Size_t total_len = j + len;
                         SV * sub_invlist = NULL;
-- 
2.11.0

@toddr toddr removed the khw label Oct 25, 2019
khwilliamson added a commit that referenced this issue Feb 27, 2020
This fixes #17026

Patterns can now have subpatterns.  Prior to this commit debugging
info was only available under DEBUGGING builds with -Drv.  This commit
adds a new keyword, WILDCARD, that use re qw(Debug ...) can use so that
whatever other debugging options have been turned on will show up when a
wildcard subpattern is compiled.

The output of this may be voluminous, which is why you have to ask for
it specifically.  Or, the EXTRA option turns it on with several other
things.
khwilliamson added a commit that referenced this issue Mar 2, 2020
This fixes #17026

Patterns can now have subpatterns, called wildcards by Unicode.  Prior
to this commit debugging info was only available under DEBUGGING builds
with -Drv.  This commit adds a new keyword, WILDCARD for
'use re qw(Debug ...)' so that whatever other debugging options have
been turned on will show up when a wildcard subpattern is compiled.

The output of this may be voluminous, which is why you have to ask for
it specifically.  Or, the EXTRA option turns it on along with several
other things.
khwilliamson added a commit that referenced this issue Mar 3, 2020
This fixes #17026

Patterns can have subpatterns since 5.30.  These are processed when
encountered, by suspending the main pattern compilation, compiling the
subpattern, and then matching that against the set of all legal
possibilities, which Perl knows about.

Prior to this commit, debugging info was not available for that matching
portion of the compilation, except under DEBUGGING builds, with -Drv.
This commit adds a new option to 'use re qw(Debug ...)', WILDCARD, to
enable subpattern match debugging.  Whatever other match debugging
options have been turned on will show up when a wildcard subpattern is
compiled iff WILDCARD is specified.

The output of this may be voluminous, which is why you have to ask for
it specifically.  Or, the EXTRA option turns it on, along with several
other things.
khwilliamson added a commit that referenced this issue Mar 5, 2020
This fixes #17026

Patterns can have subpatterns since 5.30.  These are processed when
encountered, by suspending the main pattern compilation, compiling the
subpattern, and then matching that against the set of all legal
possibilities, which Perl knows about.

Prior to this commit, debugging info was not available for that matching
portion of the compilation, except under DEBUGGING builds, with -Drv.
This commit adds a new option to 'use re qw(Debug ...)', WILDCARD, to
enable subpattern match debugging.  Whatever other match debugging
options have been turned on will show up when a wildcard subpattern is
compiled iff WILDCARD is specified.

The output of this may be voluminous, which is why you have to ask for
it specifically.  Or, the EXTRA option turns it on, along with several
other things.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants