Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some bugs in Perl regexp (core Perl issues) #9408

Closed
p5pRT opened this issue Jul 8, 2008 · 23 comments
Closed

Some bugs in Perl regexp (core Perl issues) #9408

p5pRT opened this issue Jul 8, 2008 · 23 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 8, 2008

Migrated from rt.perl.org#56690 (status was 'resolved')

Searchable as RT56690$

@p5pRT
Copy link
Author

p5pRT commented May 12, 2004

From til@schubbe.org

Created by til@schubbe.org

til@​debian​:~ - perl -e 'if ("a1" =~ m/(^|\D)(?=\d)1/) {print "1\n"} else
{print "0\n"}'
0
til@​debian​:~ - perl -e 'if ("a1" =~ m/(^|\D)(?=\d)1/i) {print "1\n"} else
{print "0\n"}'
1

The 1. expression should be true, but it isn't. This behaviour changes
when I add the 'i'-modifier.

The bug exists in v5.8.3 and v5.8.2, too.

Regards
Til

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.4:

Configured by inst at Wed May 12 21:53:09 CEST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.4.26, archname=i686-linux
    uname='linux debian 2.4.26 #23 fr apr 30 19:16:33 cest 2004 i686 gnulinux '
    config_args='-de'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.2 (Debian)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /usr/local/lib/perl5/5.8.4/i686-linux
    /usr/local/lib/perl5/5.8.4
    /usr/local/lib/perl5/site_perl/5.8.4/i686-linux
    /usr/local/lib/perl5/site_perl/5.8.4
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.8.4:
    HOME=/home/til
    LANG=de_DE@euro
    LANGUAGE (unset)
    LC_CTYPE=de_DE@euro
    LC_MESSAGES=C
    LD_LIBRARY_PATH=/usr/local/qt/lib:/usr/local/lib
    LOGDIR (unset)
    PATH=.:/home/til/bin:/usr/local/qt/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 13, 2004

From japhy@pobox.com

On May 12, Til Schubbe said​:

# New Ticket Created by Til Schubbe
# Please include the string​: [perl #29538]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org​:80/rt3/Ticket/Display.html?id=29538 >

til@​debian​:~ - perl -e 'if ("a1" =~ m/(^|\D)(?=\d)1/) {print "1\n"} else
{print "0\n"}'
0
til@​debian​:~ - perl -e 'if ("a1" =~ m/(^|\D)(?=\d)1/i) {print "1\n"} else
{print "0\n"}'
1

Here's the re=debug output for a simpler case​:

perlmonk​:~ 1779​:$ perl -mre=debug -e '"a1" =~ /(a|^)(?=1)1/'
Freeing REx​: `","'
Compiling REx `(a|^)(?=1)1'
size 18 Got 148 bytes for offset annotations.
first at 3
synthetic stclass `ANYOF[1]'.
  1​: OPEN1(3)
  3​: BRANCH(6)
  4​: EXACT <a>(8)
  6​: BRANCH(8)
  7​: BOL(8)
  8​: CLOSE1(10)
  10​: IFMATCH[-0](16)
  12​: EXACT <1>(14)
  14​: SUCCEED(0)
  15​: TAIL(16)
  16​: EXACT <1>(18)
  18​: END(0)
floating `1' at 0..1 (checking floating) stclass `ANYOF[1]' minlen 1
Offsets​: [18]
  1[1] 0[0] 1[1] 2[1] 0[0] 3[1] 4[1] 5[1] 0[0] 9[1] 0[0] 9[1] 0[0]
9[0] 9[0] 11[1] 0[0] 12[0]
Guessing start of match, REx `(a|^)(?=1)1' against `a1'...
Found floating substr `1' at offset 1...
By STCLASS​: moving 0 --> 1
Guessed​: match at offset 1
Matching REx `(a|^)(?=1)1' against `1'
Matching stclass `ANYOF[1]' against `1'
  Setting an EVAL scope, savestack=3
  1 <a> <1> | 1​: OPEN1
  1 <a> <1> | 3​: BRANCH
  Setting an EVAL scope, savestack=13
  1 <a> <1> | 4​: EXACT <a>
  failed...
  1 <a> <1> | 7​: BOL
  failed...
  Clearing an EVAL scope, savestack=3..13
Contradicts stclass...
Match failed
Freeing REx​: `"(a|^)(?=1)1"'

This​:

  Guessing start of match, REx `(a|^)(?=1)1' against `a1'...
  Found floating substr `1' at offset 1...
  By STCLASS​: moving 0 --> 1
  Guessed​: match at offset 1
  Matching REx `(a|^)(?=1)1' against `1'
  Matching stclass `ANYOF[1]' against `1'

appears to be the culprit. This guessing and moving doesn't occur if /i
is turned on.

--
Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/
RPI Acacia brother #734 http​://www.perlmonks.org/ http​://www.cpan.org/
CPAN ID​: PINYAN [Need a programmer? If you like my work, let me know.]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.

@p5pRT
Copy link
Author

p5pRT commented May 13, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2008

From g+i@gameintellect.com

Hello,

I have found some simple and unpleasant bugs in Perl regexp​:

print "Match" if 'ab' =~ /^a?(?=b)b/; # Not match, but must...

Also you can replace ^ with \A, and ? with *.

Here are bugs similar to the above​:

print $& if 'ab' =~ /a?(?=b)b/;
print $& if 'ab' =~ /a*(?=b)b/;

Both operators print b, but must print ab.

Here is my bug report at ActiveState​:
http​://bugs.activestate.com/show_bug.cgi?id=78536

--
Sincerely yours,
Serge
http​://www.cronc.com
http​://www.gameintellect.com

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2008

From mail@der-pepe.de

I checked some old perl installations and it appears that the bug has
appeared somewhere between 5.005_03 and 5.6.1. On 5.005_03 all three
snippets behave as they should, whereas on 5.6.1 the behaviour is the
same as in the original message.

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2008

From @Abigail

On Mon, Jul 07, 2008 at 11​:30​:05PM -0700, Serge wrote​:

# New Ticket Created by Serge
# Please include the string​: [perl #56690]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=56690 >

Hello,

I have found some simple and unpleasant bugs in Perl regexp​:

print "Match" if 'ab' =~ /^a?(?=b)b/; # Not match, but must...

Also you can replace ^ with \A, and ? with *.

Here are bugs similar to the above​:

print $& if 'ab' =~ /a?(?=b)b/;
print $& if 'ab' =~ /a*(?=b)b/;

Both operators print b, but must print ab.

Here are some tests for this bug​:

Inline Patch
--- t/op/re_tests.orig	2008-04-11 14:20:20.000000000 +0200
+++ t/op/re_tests	2008-07-08 18:43:39.000000000 +0200
@@ -1344,4 +1344,7 @@
 .*?(?:(\w)|(\w))x	abx	y	$1-$2	b-
 
 0{50}	000000000000000000000000000000000000000000000000000	y	-	-
+# Bug #56690
+^a?(?=b)b	ab	y	$&	ab
+^a*(?=b)b	ab	y	$&	ab
 

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2008

From g+i@gameintellect.com

Hello Christoph Bussenius,

this is another bug in regexp​: the special variable $^N does not work, when
a captured parenthesis has a quantifier. Example​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)/;

Output​:
$1=a, $^N=a, $+=a

This work well.

Now with a quantifier​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)+/;

Output​:
Use of uninitialized value in concatenation (.) or string at test.pl line ...
$1=b, $^N=, $+=b

$^N is undefined! It is a bug.
In case we write

my $a='bbb';
print "\$a=$a" if 'ab' =~ /(\w)+(?{ $a=$^N })/;

$a is undefined.

I hope the both bugs I found will be fixed in the next Perl build, thanks.

--
Sincerely yours,
Serge

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2008

From @moritz

On Wed Jul 09 07​:50​:31 2008, g+i@​gameintellect.com wrote​:

this is another bug in regexp​: the special variable $^N does not
work, when
a captured parenthesis has a quantifier. Example​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)/;

Output​:
$1=a, $^N=a, $+=a

This work well.

Now with a quantifier​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)+/;

Output​:
Use of uninitialized value in concatenation (.) or string at test.pl
line ...
$1=b, $^N=, $+=b

$^N is undefined! It is a bug.

it says
$1=b, $^N=b, $+=b
for me under perl 5.10.0

In case we write

my $a='bbb';
print "\$a=$a" if 'ab' =~ /(\w)+(?{ $a=$^N })/;

$a is undefined.

It says
$a=b
for me on perl 5.10.0

I hope the both bugs I found will be fixed in the next Perl build,
thanks.

These two are fixed in 5.10.0 already.

Cheers,
Moritz

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2008

From @smpeters

On Tue, Jul 8, 2008 at 11​:48 AM, Abigail <abigail@​abigail.be> wrote​:

On Mon, Jul 07, 2008 at 11​:30​:05PM -0700, Serge wrote​:

# New Ticket Created by Serge
# Please include the string​: [perl #56690]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=56690 >

Hello,

I have found some simple and unpleasant bugs in Perl regexp​:

print "Match" if 'ab' =~ /^a?(?=b)b/; # Not match, but must...

Also you can replace ^ with \A, and ? with *.

Here are bugs similar to the above​:

print $& if 'ab' =~ /a?(?=b)b/;
print $& if 'ab' =~ /a*(?=b)b/;

Both operators print b, but must print ab.

Here are some tests for this bug​:

--- t/op/re_tests.orig 2008-04-11 14​:20​:20.000000000 +0200
+++ t/op/re_tests 2008-07-08 18​:43​:39.000000000 +0200
@​@​ -1344,4 +1344,7 @​@​
.*?(?​:(\w)|(\w))x abx y $1-$2 b-

0{50} 000000000000000000000000000000000000000000000000000 y - -
+# Bug #56690
+^a?(?=b)b ab y $& ab
+^a*(?=b)b ab y $& ab

This patch (skipping the tests for now) has been added as change #34116.

Thanks,

Steve Peters
steve@​fisharerojo.org

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2008

From @demerphq

2008/7/9 via RT Serge <perlbug-followup@​perl.org>​:

# New Ticket Created by Serge
# Please include the string​: [perl #56738]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=56738 >

Hello Christoph Bussenius,

this is another bug in regexp​: the special variable $^N does not work, when
a captured parenthesis has a quantifier. Example​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)/;

Output​:
$1=a, $^N=a, $+=a

This work well.

Now with a quantifier​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)+/;

Output​:
Use of uninitialized value in concatenation (.) or string at test.pl line ...
$1=b, $^N=, $+=b

$^N is undefined! It is a bug.

Actually in my book the result of a capture buffer with a quantifier
is undefined.

What should it hold? The content of the last matched thing?

Or the content of the thing it didnt match which stopped the
quantifier from repeating?

Curently /in this pattern/ its the latter. Which to me is just as
logical as the former. And in some cases it *will* be the former. Thus
i consider it undefined.

In case we write

my $a='bbb';
print "\$a=$a" if 'ab' =~ /(\w)+(?{ $a=$^N })/;

$a is undefined.

I hope the both bugs I found will be fixed in the next Perl build, thanks.

I doubt this will be fixed for the above mentioned reasons and because
it is a much more serious change than it looks. IIRC it requires an
almost total rewrite of how capture buffers are stored.

Luckily in most of these situations there is a workaround, you can
redefine the pattern so there is no quantifier on the capture buffer.

'ab'=~/\w*(\w)(?{ $a=$^N})/

Cheers,
yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 10, 2008

From @demerphq

2008/7/11 Moritz Lenz <moritz@​casella.verplant.org>​:

demerphq wrote​:

2008/7/9 via RT Serge <perlbug-followup@​perl.org>​:

# New Ticket Created by Serge
# Please include the string​: [perl #56738]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=56738 >

Hello Christoph Bussenius,

this is another bug in regexp​: the special variable $^N does not work, when
a captured parenthesis has a quantifier. Example​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)/;

Output​:
$1=a, $^N=a, $+=a

This work well.

Now with a quantifier​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)+/;

Output​:
Use of uninitialized value in concatenation (.) or string at test.pl line ...
$1=b, $^N=, $+=b

$^N is undefined! It is a bug.

Actually in my book the result of a capture buffer with a quantifier
is undefined.

What should it hold? The content of the last matched thing?

Yes. Just like $1. Why should it behave any different than $1 if there's
only one positional capturing group?

It might seem weird in retrospect to implement $1, $2... this way, but I
think that once we chose that path, we should stick to it.

My point is that both are acceptable behaviours given that the
construct is undefined.

If you poke around Im pretty sure youll find that the behaviour of
capture buffers with quantifiers varies depending on the pattern. Its
not guaranteed that $1 will be set.

I did look into all of this at one time and it was not fun.

But ok, in this context yes ill grant that its a bug that $^N and $1
dont agree. Sorry i should have realized that was the point in the
first place.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 11, 2008

From @moritz

demerphq wrote​:

2008/7/9 via RT Serge <perlbug-followup@​perl.org>​:

# New Ticket Created by Serge
# Please include the string​: [perl #56738]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=56738 >

Hello Christoph Bussenius,

this is another bug in regexp​: the special variable $^N does not work, when
a captured parenthesis has a quantifier. Example​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)/;

Output​:
$1=a, $^N=a, $+=a

This work well.

Now with a quantifier​:

print "\$1=$1, \$^N=$^N, \$+=$+" if 'ab' =~ /(\w)+/;

Output​:
Use of uninitialized value in concatenation (.) or string at test.pl line ...
$1=b, $^N=, $+=b

$^N is undefined! It is a bug.

Actually in my book the result of a capture buffer with a quantifier
is undefined.

What should it hold? The content of the last matched thing?

Yes. Just like $1. Why should it behave any different than $1 if there's
only one positional capturing group?

It might seem weird in retrospect to implement $1, $2... this way, but I
think that once we chose that path, we should stick to it.

Or the content of the thing it didnt match which stopped the
quantifier from repeating?

Curently /in this pattern/ its the latter. Which to me is just as
logical as the former. And in some cases it *will* be the former. Thus
i consider it undefined.

In case we write

my $a='bbb';
print "\$a=$a" if 'ab' =~ /(\w)+(?{ $a=$^N })/;

$a is undefined.

I hope the both bugs I found will be fixed in the next Perl build, thanks.

I doubt this will be fixed for the above mentioned reasons and because
it is a much more serious change than it looks. IIRC it requires an
almost total rewrite of how capture buffers are stored.

Luckily in most of these situations there is a workaround, you can
redefine the pattern so there is no quantifier on the capture buffer.

'ab'=~/\w*(\w)(?{ $a=$^N})/

Cheers,
yves

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

actually exists in 5.8.x too, but looks like a good one to fix for 5.10.1

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2009

From @hvds

On Tue Jul 08 14​:49​:43 2008, abigail@​abigail.be wrote​:

Here are some tests for this bug​:

--- t/op/re_tests.orig 2008-04-11 14​:20​:20.000000000 +0200
+++ t/op/re_tests 2008-07-08 18​:43​:39.000000000 +0200
@​@​ -1344,4 +1344,7 @​@​
.*?(?​:(\w)|(\w))x abx y $1-$2 b-

0{50} 000000000000000000000000000000000000000000000000000 y - -
+# Bug #56690
+^a?(?=b)b ab y $& ab
+^a*(?=b)b ab y $& ab

This is caused by a failure of the start_class optimization in the case
of lookahead, as per the attached comment.

In more detail​: at the point study_chunk() attempts to deal with the
start_class discovered for the lookahead chunk, we have
SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.

So given​:
  start = ANYOF_EOS | ANYOF_UNICODE_ALL
  pre = [a] | ANYOF_EOS
  lookahead = [b]
  post = [b]
what we should be getting is​:
  start_class = start & (pre | (lookahead & post))
  = start & (pre | [b])
  = start & [ab]
  = [ab]
but what we are getting is​:
  start_class = start & ((pre & lookahead) | post)
  = start & (ANYOF_EOS | post)
  = start & [b]
  = [b]

In other words, we need to stack an alternation of ANDs and ORs to cope
with this situation, and we don't have a mechanism to do that except to
recurse into study_chunk() some more.

A simpler short-term fix is instead to throw up our hands in this
situation, and just nullify start_class. I'm not sure exactly how to do
that, but it seems the more likely to be achievable for 5.10.1.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2009

From @hvds

study_class_comment

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2009

From p5p@spam.wizbit.be

Binary search​:

----Program----
#!/usr/bin/perl -l

print "ok" if "ab" =~ m/^a?(?=b)b/

----Output of ...IZtxUq/perl-5.005_62@​4668/bin/perl----
ok

----EOF ($?='0')----
----Output of ...FhbW01/perl-5.005_62@​4669/bin/perl----

----EOF ($?='0')----
Need a perl between 4668 and 4669

http​://perl5.git.perl.org/perl.git/commit/
653099f
author Gurusamy Sarathy <gsar@​cpan.org>
  Wed, 8 Dec 1999 19​:09​:27 +0000 (19​:09 +0000)
committer Gurusamy Sarathy <gsar@​cpan.org>
  Wed, 8 Dec 1999 19​:09​:27 +0000 (19​:09 +0000)
commit 653099f
tree 705c2971e242d8197c594afa4dd284aa81eb3122 tree | snapshot
parent 9059aa1 commit | diff

apply change#4618 again along with Ilya's patch to fix bugs
in it (see change#4622)

p4raw-link​: @​4622 on //depot/perl​: 34baa6c
p4raw-link​: @​4618 on //depot/perl​: f9d9cdc

p4raw-id​: //depot/perl@​4669

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2009

From @hvds

"Hugo van der Sanden via RT" <perlbug-followup@​perl.org> wrote​:
:This is caused by a failure of the start_class optimization in the case
:of lookahead, as per the attached comment.
:
:In more detail​: at the point study_chunk() attempts to deal with the
:start_class discovered for the lookahead chunk, we have
:SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
:ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.
[...]
:In other words, we need to stack an alternation of ANDs and ORs to cope
:with this situation, and we don't have a mechanism to do that except to
:recurse into study_chunk() some more.
:
:A simpler short-term fix is instead to throw up our hands in this
:situation, and just nullify start_class. I'm not sure exactly how to do
:that, but it seems the more likely to be achievable for 5.10.1.

This patch implements the simple fix, and passes all tests including
Abigail's test cases for the bug.

Yves​: note that I've preserved the 'was' code in this chunk, introduced
by you in the patch [1], discussed in the thread [2]. As far as I can
see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those
3 lines a little later) are simply doing the wrong thing - they seem
to be saying "when we combine two start classes using SCF_DO_STCLASS_AND,
claim that end-of-string is valid if the first class says it would be
even though the second says it wouldn't be". Removing those lines doesn't
cause any test failures - can you remember why you introduced those lines,
and maybe add a test case that fails without them?

Hugo

[1] http​://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760
[2] http​://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89

Inline Patch
--- regcomp.c.old	2009-06-18 10:21:11.000000000 +0100
+++ regcomp.c	2009-07-02 11:16:29.000000000 +0100
@@ -3727,11 +3727,22 @@
                     data->whilem_c = data_fake.whilem_c;
                 }
                 if (f & SCF_DO_STCLASS_AND) {
-                    const int was = (data->start_class->flags & ANYOF_EOS);
-
-                    cl_and(data->start_class, &intrnl);
-                    if (was)
-                        data->start_class->flags |= ANYOF_EOS;
+		    if (flags & SCF_DO_STCLASS_OR) {
+			/* OR before, AND after: ideally we would recurse with
+			 * data_fake to get the AND applied by study of the
+			 * remainder of the pattern, and then derecurse;
+			 * *** HACK *** for now just treat as "no information".
+			 * See [perl #56690].
+			 */
+			cl_init(pRExC_state, data->start_class);
+		    }  else {
+			/* AND before and after: combine and continue */
+			const int was = (data->start_class->flags & ANYOF_EOS);
+
+			cl_and(data->start_class, &intrnl);
+			if (was)
+			    data->start_class->flags |= ANYOF_EOS;
+		    }
                 }
 	    }
 #if PERL_ENABLE_POSITIVE_ASSERTION_STUDY
--- t/op/re_tests.old	2009-06-18 10:21:11.000000000 +0100
+++ t/op/re_tests	2009-07-02 11:21:31.000000000 +0100
@@ -1365,8 +1365,8 @@
 .*?(?:(\w)|(\w))x	abx	y	$1-$2	b-
 
 0{50}	000000000000000000000000000000000000000000000000000	y	-	-
-^a?(?=b)b	ab	B	$&	ab	# Bug #56690
-^a*(?=b)b	ab	B	$&	ab	# Bug #56690
+^a?(?=b)b	ab	y	$&	ab	# Bug #56690
+^a*(?=b)b	ab	y	$&	ab	# Bug #56690
 />\d+$ \n/ix	>10\n	y	$&	>10
 />\d+$ \n/ix	>1\n	y	$&	>1
 /\d+$ \n/ix	>10\n	y	$&	10

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2009

From @craigberry

On Thu, Jul 2, 2009 at 5​:36 AM, <hv@​crypt.org> wrote​:

This patch implements the simple fix, and passes all tests including
Abigail's test cases for the bug.

Thanks, applied here​:

<http​://perl5.git.perl.org/perl.git/commitdiff/906cdd2>

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2009

@hvds - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jul 6, 2009
@p5pRT
Copy link
Author

p5pRT commented Apr 29, 2011

From @cpansprout

This bug (the same as #56690) was fixed by commit 906cdd2.

@p5pRT
Copy link
Author

p5pRT commented Apr 29, 2011

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant