Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex captures not reset in negative lookahead #14387

Open
p5pRT opened this issue Jan 3, 2015 · 7 comments
Open

regex captures not reset in negative lookahead #14387

p5pRT opened this issue Jan 3, 2015 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 3, 2015

Migrated from rt.perl.org#123537 (status was 'open')

Searchable as RT123537$

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

From @mauke

Created by @mauke

$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(a)x|a/'
$VAR1 = undef;
$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(?!(a)x)a/'
$VAR1 = 'a';

I don't understand why these two have different output. I think the second
example should also give $VAR1 = undef; because after '(a)' captures the "a",
'x' should make the match fail and backtrack through the '()' group (resetting
\1 to its previous value, undef) until it hits '(?!', which catches and inverts
the match failure.

Instead it seems like $1 gets its value from a branch that never succeeded.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.20.1:

Configured by mauke at Thu Nov  6 21:08:16 CET 2014.

Summary of my perl5 (revision 5 version 20 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=3.17.1-1-arch, archname=i686-linux
    uname='linux simplicio 3.17.1-1-arch #1 smp preempt wed oct 15 15:36:07 cest 2014 i686 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.9.1 20140903 (prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/4.9.1/include-fixed /usr/lib /lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.20.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.20'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'



@INC for perl 5.20.1:
    /home/mauke/usr/lib/perl5/site_perl/5.20.1/i686-linux
    /home/mauke/usr/lib/perl5/site_perl/5.20.1
    /home/mauke/usr/lib/perl5/5.20.1/i686-linux
    /home/mauke/usr/lib/perl5/5.20.1
    .


Environment for perl 5.20.1:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_BASHRC_VERSION=0.69
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_ROOT=/home/mauke/perl5/perlbrew
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

From @jkeenan

On Sat Jan 03 06​:08​:29 2015, mauke- wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
[Please describe your issue here]

$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(a)x|a/'
$VAR1 = undef;
$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(?!(a)x)a/'
$VAR1 = 'a';

I don't understand why these two have different output. I think the
second
example should also give $VAR1 = undef; because after '(a)' captures
the "a",
'x' should make the match fail and backtrack through the '()' group
(resetting
\1 to its previous value, undef) until it hits '(?!', which catches
and inverts
the match failure.

Instead it seems like $1 gets its value from a branch that never
succeeded.

Could you explain why you believe that the inner parentheses in​:

(?!(a)x)

... should *capture* the 'a' rather than merely *grouping* it.

I don't see any evidence in the 'perldoc perlre' discussion of negative-look-ahead to support that, and in examples I can construct there is no capturing, only grouping.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

From @mauke

Am 03.01.2015 um 15​:49 schrieb James E Keenan via RT​:

On Sat Jan 03 06​:08​:29 2015, mauke- wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
[Please describe your issue here]

$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(a)x|a/'
$VAR1 = undef;
$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(?!(a)x)a/'
$VAR1 = 'a';

I don't understand why these two have different output. I think the
second
example should also give $VAR1 = undef; because after '(a)' captures
the "a",
'x' should make the match fail and backtrack through the '()' group
(resetting
\1 to its previous value, undef) until it hits '(?!', which catches
and inverts
the match failure.

Instead it seems like $1 gets its value from a branch that never
succeeded.

Could you explain why you believe that the inner parentheses in​:

(?!(a)x)

... should *capture* the 'a' rather than merely *grouping* it.

I don't see any evidence in the 'perldoc perlre' discussion of negative-look-ahead to support that, and in examples I can construct there is no capturing, only grouping.

This seems somewhat off-topic, but OK ... ?

() should capture because that's what () does in a regex. Why would it
suddenly work differently in lookahead?

Consider /(?!(["'])foo\1).../. This does something sensible, I think.

If you're looking for an example, see this bug report. A successful
match of a regex without captures would have printed $VAR1 = 1;.

--
Lukas Mai <plokinom@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

From @cpansprout

On Sat Jan 03 06​:08​:29 2015, mauke- wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
[Please describe your issue here]

$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(a)x|a/'
$VAR1 = undef;
$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(?!(a)x)a/'
$VAR1 = 'a';

I don't understand why these two have different output. I think the
second
example should also give $VAR1 = undef; because after '(a)' captures
the "a",
'x' should make the match fail and backtrack through the '()' group
(resetting
\1 to its previous value, undef) until it hits '(?!', which catches
and inverts
the match failure.

Instead it seems like $1 gets its value from a branch that never
succeeded.

Trying to break my JAPHs, are you? :-)

$_ = "JJJuussttt aannotherrr PPerrlll hhhhaackkeerrrr,,\n";
s/(?!(.)(?!))\1+/$1/g;
print;

I have a hideous workaround for this bug in JE. It would be nice to see it fixed. This may be related to #38133.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2015

From @mauke

Am 03.01.2015 um 16​:07 schrieb Father Chrysostomos via RT​:

On Sat Jan 03 06​:08​:29 2015, mauke- wrote​:

This is a bug report for perl from l.mai@​web.de,
generated with the help of perlbug 1.40 running under perl 5.20.1.

-----------------------------------------------------------------
[Please describe your issue here]

$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(a)x|a/'
$VAR1 = undef;
$ perl -we 'use Data​::Dumper; print Dumper "ab" =~ /(?!(a)x)a/'
$VAR1 = 'a';

I don't understand why these two have different output. I think the
second
example should also give $VAR1 = undef; because after '(a)' captures
the "a",
'x' should make the match fail and backtrack through the '()' group
(resetting
\1 to its previous value, undef) until it hits '(?!', which catches
and inverts
the match failure.

Instead it seems like $1 gets its value from a branch that never
succeeded.

Trying to break my JAPHs, are you? :-)

$_ = "JJJuussttt aannotherrr PPerrlll hhhhaackkeerrrr,,\n";
s/(?!(.)(?!))\1+/$1/g;
print;

print "JJJuussttt aannotherrr PPerrlll hhhhaackkeerrrr,,\n" =~ y)))csr;

FTFY :-)

--
Lukas Mai <plokinom@​gmail.com>

@demerphq
Copy link
Collaborator

demerphq commented Jan 6, 2023

see also #19615

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants