Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*COMMIT etc in subroutines #16645

Open
p5pRT opened this issue Jul 24, 2018 · 5 comments
Open

*COMMIT etc in subroutines #16645

p5pRT opened this issue Jul 24, 2018 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 24, 2018

Migrated from rt.perl.org#133405 (status was 'open')

Searchable as RT133405$

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2018

From ph10@hermes.cam.ac.uk

Created by ph10@cam.ac.uk

The Perl documentation says these things​:

(1) When discussing subroutine calls such as (?1)​: "Treat the contents of a
given capture buffer in the current pattern as an independent subpattern and
attempt to match it at the current position in the string."

(2) When discussing (*ACCEPT)​: "When inside of a nested pattern, such as
recursion, or in a subpattern dynamically generated via "(??{})", only the
innermost pattern is ended immediately." In other words, the effect of
(*ACCEPT) is confined to the subroutine/recursion. This is indeed how it works​:

$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes >$&<\n"; } else { print "no \n"; }'
yes >ab<

In this example (?1) is successful, so it goes on to match "b"; it does not
terminate the whole match after (?1) matches "a". So far, so good.

I couldn't find any statement about what happens when (*COMMIT), (*PRUNE),
(*SKIP), or (*THEN) are triggered inside a recursion or subroutine call and
backtrack to the outer level. Experiments seem to indicate that these verbs are
*not* confined to within a subroutine call​:

$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&<\n"; } else { print "no \n"; }'
no

If (*COMMIT) had just caused (?1) to fail, there should have been a backtrack
that would enable the "ac" branch to match. It appears that (*COMMIT) has
caused the entire pattern match to fail. This seems wrong to me, and IMHO it
contradicts statement (1) above. And it seems inconsistent that (*ACCEPT) is
treated differently.

FYI​: PCRE does restrict (*COMMIT), (*PRUNE), (*SKIP), and (*THEN) to act only
within a subroutine call (as well as (*ACCEPT)). This was originally because
subroutine calls were atomic in PCRE. From release 10.30 of PCRE, however,
subroutine (or recursive) calls are no longer atomic, but I kept the
restriction on the backtracking verbs for backwards compatibility. The Perl
documentation mentions that PCRE and Python have atomic subroutine calls; that
now needs updating for PCRE.

If the current behaviour of (*COMMIT) etc. is what is intended, it would be
useful for it to be documented.

I hope that is all clear. Thanks for your attention.

Regards,
Philip Hazel

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.26.2:

Configured by builduser at Thu Jun 28 12:11:25 CEST 2018.

Summary of my perl5 (revision 5 version 26 subversion 2) configuration:
   
  Platform:
    osname=linux
    osvers=4.17.3-1-arch
    archname=x86_64-linux-thread-multi
    uname='linux flo-64s 4.17.3-1-arch #1 smp preempt tue jun 26 04:42:36 utc 2018 x86_64 gnulinux '
    config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.26/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.26/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.26/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dcccdlflags='-fPIC' -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='8.1.1 20180531'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-pc-linux-gnu/8.1.1/include-fixed /usr/lib /lib/../lib /usr/lib/../lib /lib /lib64 /usr/lib64
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.27.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.27'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.26/core_perl/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.26.2:
    /usr/lib/perl5/5.26/site_perl
    /usr/share/perl5/site_perl
    /usr/lib/perl5/5.26/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/5.26/core_perl
    /usr/share/perl5/core_perl


Environment for perl 5.26.2:
    HOME=/home/ph10
    LANG=en_GB.utf8
    LANGUAGE=en_GB.utf8
    LC_ALL=C
    LC_COLLATE=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/ph10/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/sbin:.:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Sep 3, 2018

From @jkeenan

On Tue, 24 Jul 2018 15​:43​:57 GMT, ph10@​hermes.cam.ac.uk wrote​:

Subject​: *COMMIT etc in subroutines
Message-Id​: <5.26.2_8846_1532445050@​quercite>
To​: perlbug@​perl.org
From​: ph10@​cam.ac.uk
Reply-To​: ph10@​cam.ac.uk
Cc​: builduser

This is a bug report for perl from ph10@​cam.ac.uk,
generated with the help of perlbug 1.40 running under perl 5.26.2.

-----------------------------------------------------------------
[Please describe your issue here]

The Perl documentation says these things​:

(1) When discussing subroutine calls such as (?1)​: "Treat the contents
of a
given capture buffer in the current pattern as an independent
subpattern and
attempt to match it at the current position in the string."

(2) When discussing (*ACCEPT)​: "When inside of a nested pattern, such
as
recursion, or in a subpattern dynamically generated via "(??{})", only
the
innermost pattern is ended immediately." In other words, the effect
of
(*ACCEPT) is confined to the subroutine/recursion. This is indeed how
it works​:

$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes

$&<\n"; } else { print "no \n"; }'
yes >ab<

In this example (?1) is successful, so it goes on to match "b"; it
does not
terminate the whole match after (?1) matches "a". So far, so good.

I couldn't find any statement about what happens when (*COMMIT),
(*PRUNE),
(*SKIP), or (*THEN) are triggered inside a recursion or subroutine
call and
backtrack to the outer level. Experiments seem to indicate that these
verbs are
*not* confined to within a subroutine call​:

$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&<\n"; }
else { print "no \n"; }'
no

If (*COMMIT) had just caused (?1) to fail, there should have been a
backtrack
that would enable the "ac" branch to match. It appears that (*COMMIT)
has
caused the entire pattern match to fail. This seems wrong to me, and
IMHO it
contradicts statement (1) above. And it seems inconsistent that
(*ACCEPT) is
treated differently.

FYI​: PCRE does restrict (*COMMIT), (*PRUNE), (*SKIP), and (*THEN) to
act only
within a subroutine call (as well as (*ACCEPT)). This was originally
because
subroutine calls were atomic in PCRE. From release 10.30 of PCRE,
however,
subroutine (or recursive) calls are no longer atomic, but I kept the
restriction on the backtracking verbs for backwards compatibility. The
Perl
documentation mentions that PCRE and Python have atomic subroutine
calls; that
now needs updating for PCRE.

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also, could you provide a link to the PCRE documentation and, perhaps, an illustration of how this works in PCRE?

If the current behaviour of (*COMMIT) etc. is what is intended, it
would be
useful for it to be documented.

I hope that is all clear. Thanks for your attention.

Regards,
Philip Hazel

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Sep 3, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 4, 2018

From ph10@hermes.cam.ac.uk

On Mon, 3 Sep 2018, James E Keenan via RT wrote​:

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also, could you provide a link to the PCRE documentation and, perhaps, an illustration of how this works in PCRE?

I don't know what form would be best for sending patches to Perl
documentation. Currently the perlre man page says this​:

  Note that this pattern does not behave the same way as the equivalent PCRE or
  Python construct of the same form. In Perl you can backtrack into a recursed
  group, in PCRE and Python the recursed into group is treated as atomic.
 
It would now be more accurate to say this​:

  Note that this pattern does not behave the same way as the equivalent Python
  construct of the same form. In Perl you can backtrack into a recursed group,
  but in Python the recursed group is treated as atomic. This is also true of
  earlier PCRE releases, but from PCRE 10.30 onwards backtracking into recursed
  groups is implemented as it is for Perl.

The documentation for the current PCRE release is here​:

http​://www.pcre.org/current/doc/html/

I have done a bit of editing on the documentation for the next release,
to try to make it as clear as I can. This is how the relevant section
now reads​:


"Backtracking verbs in subroutines"

These behaviours occur whether or not the subpattern is called recursively.

(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the
subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.

(*FAIL) in a subpattern called as a subroutine has its normal effect​: it forces
an immediate backtrack.

(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when
triggered by being backtracked to in a subpattern called as a subroutine. There
is then a backtrack at the outer level.

(*THEN), when triggered, skips to the next alternative in the innermost
enclosing group within the subpattern that has alternatives (its normal
behaviour). However, if there is no such group within the subroutine
subpattern, the subroutine match fails and there is a backtrack at the outer
level.


I hope this helps.

Regards,
Philip

--
Philip Hazel

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2018

From @demerphq

This is on my todo list to review. But i am very busy.

Yves
On Tue, 4 Sep 2018 at 19​:56, <ph10@​hermes.cam.ac.uk> wrote​:

On Mon, 3 Sep 2018, James E Keenan via RT wrote​:

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also, could you provide a link to the PCRE documentation and, perhaps, an illustration of how this works in PCRE?

I don't know what form would be best for sending patches to Perl
documentation. Currently the perlre man page says this​:

Note that this pattern does not behave the same way as the equivalent PCRE or
Python construct of the same form. In Perl you can backtrack into a recursed
group, in PCRE and Python the recursed into group is treated as atomic.

It would now be more accurate to say this​:

Note that this pattern does not behave the same way as the equivalent Python
construct of the same form. In Perl you can backtrack into a recursed group,
but in Python the recursed group is treated as atomic. This is also true of
earlier PCRE releases, but from PCRE 10.30 onwards backtracking into recursed
groups is implemented as it is for Perl.

The documentation for the current PCRE release is here​:

http​://www.pcre.org/current/doc/html/

I have done a bit of editing on the documentation for the next release,
to try to make it as clear as I can. This is how the relevant section
now reads​:

----------------------------------------------------------------------
"Backtracking verbs in subroutines"

These behaviours occur whether or not the subpattern is called recursively.

(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the
subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.

(*FAIL) in a subpattern called as a subroutine has its normal effect​: it forces
an immediate backtrack.

(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when
triggered by being backtracked to in a subpattern called as a subroutine. There
is then a backtrack at the outer level.

(*THEN), when triggered, skips to the next alternative in the innermost
enclosing group within the subpattern that has alternatives (its normal
behaviour). However, if there is no such group within the subroutine
subpattern, the subroutine match fails and there is a backtrack at the outer
level.
----------------------------------------------------------------------

I hope this helps.

Regards,
Philip

--
Philip Hazel

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants