Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl5 regexp: wrong content in capture buffer #16250

Open
p5pRT opened this issue Nov 16, 2017 · 5 comments
Open

perl5 regexp: wrong content in capture buffer #16250

p5pRT opened this issue Nov 16, 2017 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 16, 2017

Migrated from rt.perl.org#132453 (status was 'open')

Searchable as RT132453$

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2017

From wolf-dietrich_moeller@t-online.de

The following test program shows a wrong content of a capture
group depending on the content of the string to be matched.
4 test strings are matched. Case 2 yields a wrong result,
as $1 should be undefined in this case. Cases 1, 3 and 4 with
different string lengths yield the correct result.
It seems to me that capture buffer $1 is not cleared
correctly on backtracking.

BTW, the same error occurs also under Perl 5.16.3, so the bug
may be old.

for ( 'abcd5678', # ok
'abcde5678', # error
'abcdef5678', # ok
'abcdefg5678' ) { # ok
m'b(?=.*(?<=(?<=(.{4}))?(.{5})).$)';
print '$1: "',$1//'undef','" $2: "',$2//'undef',"\"\n" }
Perl Info
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl 5.26.1:

Configured by strawberry-perl at Sat Sep 23 23:10:19 2017.

Summary of my perl5 (revision 5 version 26 subversion 1) configuration:

Platform:
osname=MSWin32
osvers=6.3
archname=MSWin32-x86-multi-thread-64int
uname='Win32 strawberry-perl 5.26.1.1 #1 Sat Sep 23 23:07:28 2017 i386'
config_args='undef'
hint=recommended
useposix=true
d_sigaction=undef
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=undef
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='gcc'
ccflags =' -s -O2 -DWIN32 -D__USE_MINGW_ANSI_STDIO
-DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
-DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields'
optimize='-s -O2'
cppflags='-DWIN32'
ccversion=''
gccversion='7.1.0'
gccosandvers=''
intsize=4
longsize=4
ptrsize=4
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=12
longdblkind=3
ivtype='long long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='long long'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='g++'
ldflags ='-s -L"C:\Perl\perl\lib\CORE" -L"C:\Perl\c\lib"'
libpth=C:\Perl\c\lib C:\Perl\c\i686-w64-mingw32\lib
C:\Perl\c\lib\gcc\i686-w64-mingw32\7.1.0
libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
libc=
so=dll
useshrplib=true
libperl=libperl526.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs
dlext=xs.dll
d_dlsymun=undef
ccdlflags=' '
cccdlflags=' '
lddlflags='-mdll -s -L"C:\Perl\perl\lib\CORE" -L"C:\Perl\c\lib"'


---
@INC for perl 5.26.1:
C:/Perl/perl/site/lib
C:/Perl/perl/vendor/lib
C:/Perl/perl/lib

---
Environment for perl 5.26.1:
HOME (unset)
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=<content
cleared>;C:\Perl\c\bin;C:\Perl\perl\site\bin;C:\Perl\perl\bin
PERL_BADLANG (unset)
SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2017

From zefram@fysh.org

Wolf-Dietrich Moeller wrote​:

4 test strings are matched. Case 2 yields a wrong result,
as $1 should be undefined in this case.

I concur with your analysis. The output that I get is

$1​: "undef" $2​: "cd567"
$1​: "abcd" $2​: "de567"
$1​: "abcd" $2​: "ef567"
$1​: "bcde" $2​: "fg567"

The nature of the regexp is such that where $1 is captured it should
contain the four characters that immediately precede the five characters
captured in $2. In the second case above, the two captures overlap,
which is incorrect; $1 should be undef instead. The other cases yield
correct results.

BTW, the same error occurs also under Perl 5.16.3, so the bug
may be old.

Much older than that. The same thing happens on 5.005_03, which is the
oldest that I have handy.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2017

From @demerphq

I have a bit if familiarity with this, but the prospect of dealing
with nested lookaheads and lookbehinds fills me with horror, and does
not surprise me that it does not work well.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2017

From @Smylers

demerphq wrote​:

I have a bit if familiarity with this, but the prospect of dealing
with nested lookaheads and lookbehinds fills me with horror, and does
not surprise me that it does not work well.

The lookahead isn't required to trigger the bug; this still includes 'd'
in both $1 and $2:

'abcde5678' =~ /b .* (?<= (?<=(.{4}))? (.{5}) ) .$/x;

Smylers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants