Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex bug: auto-reference to a capture used in a conditional expression #7754

Closed
p5pRT opened this issue Jan 17, 2005 · 14 comments
Closed

Regex bug: auto-reference to a capture used in a conditional expression #7754

p5pRT opened this issue Jan 17, 2005 · 14 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 17, 2005

Migrated from rt.perl.org#33820 (status was 'open')

Searchable as RT33820$

@p5pRT
Copy link
Author

p5pRT commented Jan 17, 2005

From philippe.verdret@xps-pro.com

Created by pverdret@xps-pro.com

Hi,

In a capture a reference to this capture is perfectly possible​:

  "baaa" =~ (\1|a)+

the result of this match is b<aaa>.
If I replace the back-reference \1 by a conditional expression like this​:

  "baaa" =~ ((?(1)a|a))+

the result is the same. The tested condition is
independant of the capture content so if i change the regex like this​:

  "baaa" =~ ((?(1)a|b))+

the result should be "<baaaa>". But the matched string is <b>aaaa.

By introducing a (?{}) behind "b" the result is now right​:

  "baaa" =~ ((?(1)a|b(?{})))+

use re 'debug' indicates that the empty code block changes the
generated opcodes.

Philippe Verdret

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl v5.8.6 - Mon Dec 13 09:51:32 2004
It is being executed now by  Perl v5.8.4 - Mon Jun  7 08:31:41 2004.

Site configuration information for perl v5.8.4:

Configured by CRB at Mon Jun  7 08:31:41 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=undef use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 
-D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT  -DPERL_IMPLICIT_CONTEXT 
-DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf  
-libpath:"c:\perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  
netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib 
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib 
oleaut32.lib  netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib  
version.lib odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug 
-opt:ref,icf  -libpath:"c:\perl\lib\CORE"  -machine:x86'

Locally applied patches:
    ACTIVEPERL_LOCAL_PATCHES_ENTRY
    21540 Fix backward-compatibility issues in if.pm
    23565 Wrong MANIFEST.SKIP


@INC for perl v5.8.4:
    e:/macro4/Columbus/Bin/perl/lib
    e:/macro4/Columbus/Bin/perl/site/lib
    .


Environment for perl v5.8.4:
    HOME=E:\cygwin\home\pverdret
    LANG=enus1252
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=E:\cygwin\usr\local\bin;E:\cygwin\bin;E:\cygwin\bin;E:\cygwin\usr\X11R6\bin;e:\oracle\ora92\bin;e:\macro4\Columbus\Bin\Perl\bin;c:\Program 
Files\Oracle\jre\1.3.1\bin;c:\Program 
Files\Oracle\jre\1.1.8\bin;e:\Macro4\ColumbusCentral\Bin\Perl\bin;e:\Program 
Files\Adobe\Document Server 
6.0\bin;e:\macro4\IBM\LDAP\bin;c:\Perl\bin\;c:\WINNT\system32;c:\WINNT;c:\WINNT\System32\Wbem;c:\Program 
Files\ATI Technologies\ATI Control 
Panel;e:\macro4\IBM\SQLLIB\BIN;e:\macro4\IBM\SQLLIB\FUNCTION;e:\macro4\IBM\SQLLIB\SAMPLES\REPL;.\;c:\Program 
Files\Symantec\pcAnywhere\;e:\macro4\Columbus\Bin\rexxdb2;e:\macro4\Columbus\Bin;e:\niPerl\bin;e:\Program 
Files\Microsoft Visual Studio\Common\Tools\WinNT;e:\Program 
Files\Microsoft Visual Studio\Common\MSDev98\Bin;e:\Program 
Files\Microsoft Visual Studio\Common\Tools;e:\Program Files\Microsoft 
Visual 
Studio\VC98\bin;e:\macro4\ColumbusCentral\Bin;e:\macro4\Columbus\Bin;.\
    PERL5_INCLUDE=E:\macro4\Columbus\Bin\perl\lib\CORE
    
PERL5_LIB=E:\macro4\Columbus\Bin\perl\lib;E:\macro4\Columbus\Bin\perl\site\lib
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2012

From @jkeenan

On Mon Jan 17 08​:49​:39 2005, philippe.verdret@​xps-pro.com wrote​:

This is a bug report for perl from pverdret@​xps-pro.com,
generated with the help of perlbug 1.35 running under perl v5.8.4.

In a capture a reference to this capture is perfectly possible​:

"baaa" =~ (\1|a)+

the result of this match is b<aaa>.

When I try out this code (or any of the subsequent examples) *as is*, I
get syntax errors.

#####
$ perl -e '"baaa" =~ (\1|a)+'
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors.
#####

But when I modify each of the code samples to make a proper regular
expression, I do not get the output the OP described​:

If I replace the back-reference \1 by a conditional expression like
this​:

"baaa" =~ ((?(1)a|a))+

the result is the same. The tested condition is
independant of the capture content so if i change the regex like this​:

"baaa" =~ \(\(?\(1\)a|b\)\)\+

the result should be "<baaaa>". But the matched string is <b>aaaa.

By introducing a (?{}) behind "b" the result is now right​:

"baaa" =~ \(\(?\(1\)a|b\(?\{\}\)\)\)\+

use re 'debug' indicates that the empty code block changes the
generated opcodes.

Philippe Verdret

#####
$ perl -e '"baaa" =~ m/(\1|a)+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/((?(1)a|a))+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b))+/;print "$1\n"'

$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b(?{})))+/;print "$1\n"'

#####
So either I'm misinterpreting the OP, or his original premise was
incorrect, or Perl has changed since this was posted seven years ago.

Which?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2012

From @cpansprout

On Thu Jan 19 18​:54​:33 2012, jkeenan wrote​:

On Mon Jan 17 08​:49​:39 2005, philippe.verdret@​xps-pro.com wrote​:

This is a bug report for perl from pverdret@​xps-pro.com,
generated with the help of perlbug 1.35 running under perl v5.8.4.

In a capture a reference to this capture is perfectly possible​:

"baaa" =~ (\1|a)+

the result of this match is b<aaa>.

When I try out this code (or any of the subsequent examples) *as is*, I
get syntax errors.

#####
$ perl -e '"baaa" =~ (\1|a)+'
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors.
#####

But when I modify each of the code samples to make a proper regular
expression, I do not get the output the OP described​:

If I replace the back-reference \1 by a conditional expression like
this​:

"baaa" =~ ((?(1)a|a))+

the result is the same. The tested condition is
independant of the capture content so if i change the regex like this​:

"baaa" =~ \(\(?\(1\)a|b\)\)\+

the result should be "<baaaa>". But the matched string is <b>aaaa.

By introducing a (?{}) behind "b" the result is now right​:

"baaa" =~ \(\(?\(1\)a|b\(?\{\}\)\)\)\+

use re 'debug' indicates that the empty code block changes the
generated opcodes.

Philippe Verdret

#####
$ perl -e '"baaa" =~ m/(\1|a)+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/((?(1)a|a))+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b))+/;print "$1\n"'

$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b(?{})))+/;print "$1\n"'

#####
So either I'm misinterpreting the OP, or his original premise was
incorrect, or Perl has changed since this was posted seven years ago.

Which?

I think you made a copy-and-paste error there. You have "baaa" =~
inside your pattern. I get this with bleadperl​:

$ ./perl -Ilib -e '"baaa" =~ /((?(1)a|b))+/;print "$1\n"'
b
$ ./perl -Ilib -e '"baaa" =~ /((?(1)a|b(?{})))+/;print "$1\n"'
a

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jan 20, 2012

From @tamias

On Thu, Jan 19, 2012 at 06​:54​:34PM -0800, James E Keenan via RT wrote​:

On Mon Jan 17 08​:49​:39 2005, philippe.verdret@​xps-pro.com wrote​:

In a capture a reference to this capture is perfectly possible​:

"baaa" =~ (\1|a)+

the result of this match is b<aaa>.

#####
$ perl -e '"baaa" =~ m/(\1|a)+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/((?(1)a|a))+/;print "$1\n"'
a
$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b))+/;print "$1\n"'

$ perl -e '"baaa" =~ m/"baaa" =~ ((?(1)a|b(?{})))+/;print "$1\n"'

#####
So either I'm misinterpreting the OP, or his original premise was
incorrect, or Perl has changed since this was posted seven years ago.

I think you are misinterpreting the OP. You should be printing $&, not $1.
(In addition to the copy-paste errors in the last two tries.)

Ronald

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @khwilliamson

And for those of you following along at home, if you do this, you get
the results that the OP said you would
--
Karl Williamson

@toddr
Copy link
Member

toddr commented Feb 13, 2020

@khwilliamson This is the one liner we're describing?

$>perl -e '"baaa" =~ m/(\1|a)+/;print "$&\n"'
aaa
$>perl -e '"baaa" =~ m/((?(1)a|a))+/;print "$&\n"'
aaa

@khwilliamson
Copy link
Contributor

What he is saying is that perl -Dr -le ' "baaa" =~ /((?(1)a|b))+/; print $&' should print "baaa" but instead prints just "b".

But if he adds what effectively should be a no-op, like so
perl -Dr -le ' "baaa" =~ /((?(1)a|b(?{})))+/; print $&'
it prints "baaa"

The current code generated in the first case is
Compiling REx "((?(1)a|b))+"
Final program:
1: CURLYM[1]{1,INFTY} (22)
5: GROUPP1 (7)
7: IFTHEN (13)
9: EXACT (20)
11: LONGJMP (17)
13: IFTHEN (17)
15: EXACT (20)
17: TAIL (20)
20: SUCCEED (0)
21: NOTHING (22)
22: END (0)
minlen 1
Enabling $` $& $' support (0x7).

And in the second it is
Compiling REx "((?(1)a|b(?{})))+"
Final program:
1: CURLYX[0]{1,INFTY} (24)
3: OPEN1 (5)
5: GROUPP1 (7)
7: IFTHEN (13)
9: EXACT (21)
11: LONGJMP (20)
13: IFTHEN (20)
15: EXACT (17)
17: EVAL (21)
20: TAIL (21)
21: CLOSE1 (23)
23: WHILEM[1/1] (0)
24: NOTHING (25)
25: END (0)
minlen 1 with eval
Enabling $` $& $' support (0x7).

My suspicions fall on the difference between CURLYM and CURLYX

@toddr toddr added this to Monitoring in toddr Feb 15, 2020
@khwilliamson
Copy link
Contributor

And I'm right (probably). It is getting compiled to CURLYM, which according to regexec.c:
/* /A{m,n}B/ where A is fixed-length */
And that (?(1) should tell it that it isn't necessarily fixed length.

@demerphq
Copy link
Collaborator

@khwilliamson in this case it is fixed width tho. Both branches of the condition are a single char. It looks to me like CURLYM is broken somehow.

./perl -Ilib -le ' "baaaa" =~ /((?(1)aa|b))+/; print $&'
baaaa

I assume that CURLYM is doing something wrong.

demerphq added a commit that referenced this issue Dec 29, 2022
We have to close the paren immediately after each time we
match A, or conditional patterns will break.

See #7754
@demerphq
Copy link
Collaborator

Should be fixed by #20654

@demerphq
Copy link
Collaborator

CURLYM fakes an OPEN/CLOSE paren, and was deferring updating its close paren data until after it finished looping. This meant that tests against the capture buffer would not work. Changed it to close the parens each time. An alternative solution would be to exclude cases containing conditional logic from the CURLYX->CURLYM conversion, just like EVAL is excluded.

@toddr
Copy link
Member

toddr commented Dec 29, 2022

@demerphq if your commit message in your PR has the string Fixes #7754, It will auto close this issue.

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests

demerphq added a commit that referenced this issue Dec 30, 2022
We have to close the paren immediately after each time we
match A, or conditional patterns will break.

An alternative and possibly more efficient solution would be to block
CURLYX -> CURLYM conversion when the inside contains a conditional check
just like we do with EVAL.

Fixes #7754 aka #7754
@demerphq
Copy link
Collaborator

Thanks @toddr I added one.

pjacklam pushed a commit to pjacklam/perl5 that referenced this issue May 20, 2023
We have to close the paren immediately after each time we
match A, or conditional patterns will break.

An alternative and possibly more efficient solution would be to block
CURLYX -> CURLYM conversion when the inside contains a conditional check
just like we do with EVAL.

Fixes Perl#7754 aka Perl#7754
pjacklam pushed a commit to pjacklam/perl5 that referenced this issue May 20, 2023
We have to close the paren immediately after each time we
match A, or conditional patterns will break.

An alternative and possibly more efficient solution would be to block
CURLYX -> CURLYM conversion when the inside contains a conditional check
just like we do with EVAL.

Fixes Perl#7754 aka Perl#7754
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this issue Jul 10, 2023
We have to close the paren immediately after each time we
match A, or conditional patterns will break.

An alternative and possibly more efficient solution would be to block
CURLYX -> CURLYM conversion when the inside contains a conditional check
just like we do with EVAL.

Fixes Perl#7754 aka Perl#7754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
toddr
  
Monitoring
Development

No branches or pull requests

5 participants