Skip Menu |
Report information
Id: 133027
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: david [at] cantrell.org.uk
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: medium
Type: core
Perl Version: 5.27.11
Fixed In: (no value)



From: David Cantrell <david [...] cantrell.org.uk>
Subject: Regexes which interpolate list elements incorrectly parsed
To: perlbug [...] perl.org
Date: Mon, 26 Mar 2018 16:56:52 +0100
Download (untitled) / with headers
text/plain 4.1k
This is a bug report for perl from david@cantrell.org.uk, generated with the help of perlbug 1.41 running under perl 5.27.11. ----------------------------------------------------------------- [Please describe your issue here] If you interpolate a list element - ie $x[...] into a regex, It appears to not always be parsed correctly. I would expect this code to print "bar" four times. Instead it prints: bar bar foo bar $x[1] = "foo"; $_ = "foo"; s/$x[1]/bar/; print "$_\n"; $x[10] = "foo"; $_ = "foo"; s/$x[10]/bar/; print "$_\n"; $x[100] = "foo"; $_ = "foo"; s/$x[100]/bar/; print "$_\n"; $x[1000] = "foo"; $_ = "foo"; s/$x[1000]/bar/; print "$_\n"; NB I didn't discover this myself. User 'ernix' on stackoverflow did, but I couldn't see anything for it in rt.perl.org. Here's his original message: https://stackoverflow.com/questions/49492181/regex-substitution-using-a-list-element-with-just-3-digits-index-doesnt-work-as [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl 5.27.11: Configured by dc at Mon Mar 26 16:20:20 BST 2018. Summary of my perl5 (revision 5 version 27 subversion 11) configuration: Commit id: 811612a11efb1ebc131370e8238d3512779354f8 Platform: osname=linux osvers=4.9.0-6-amd64 archname=x86_64-linux-thread-multi uname='linux chimera-dev-vm 4.9.0-6-amd64 #1 smp debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 gnulinux ' config_args='-de -Duse64bitall -Dusethreads -Dprefix=/home/dc/perl-blead -Dusedevel' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2' optimize='-O2' cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='6.3.0 20170516' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/6/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.24.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.24' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong' --- @INC for perl 5.27.11: /home/dc/perl-blead/lib/site_perl/5.27.11/x86_64-linux-thread-multi /home/dc/perl-blead/lib/site_perl/5.27.11 /home/dc/perl-blead/lib/5.27.11/x86_64-linux-thread-multi /home/dc/perl-blead/lib/5.27.11 --- Environment for perl 5.27.11: HOME=/home/dc LANG=en_GB.UTF-8 LANGUAGE=en_GB:en LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/dc/bin:/opt/perlbrew/bin:/opt/perlbrew/perls/perl-5.22.3/bin:/home/dc/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games PERLBREW_BASHRC_VERSION=0.39 PERLBREW_HOME=/home/dc/.perlbrew PERLBREW_MANPATH=/opt/perlbrew/perls/perl-5.22.3/man PERLBREW_PATH=/opt/perlbrew/bin:/opt/perlbrew/perls/perl-5.22.3/bin PERLBREW_PERL=perl-5.22.3 PERLBREW_ROOT=/opt/perlbrew PERLBREW_VERSION=0.39 PERL_BADLANG (unset) PERL_UNICODE=AS SHELL=/bin/bash
To: perl5-porters [...] perl.org
Subject: Re: [perl #133027] Regexes which interpolate list elements incorrectly parsed
From: Abigail <abigail [...] abigail.be>
Date: Mon, 26 Mar 2018 18:44:01 +0200
Download (untitled) / with headers
text/plain 1.6k
On Mon, Mar 26, 2018 at 09:14:26AM -0700, David Cantrell wrote: Show quoted text
> # New Ticket Created by David Cantrell > # Please include the string: [perl #133027] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=133027 > > > > This is a bug report for perl from david@cantrell.org.uk, > generated with the help of perlbug 1.41 running under perl 5.27.11. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > If you interpolate a list element - ie $x[...] into a regex, > It appears to not always be parsed correctly. I would expect > this code to print "bar" four times. Instead it prints: > > bar > bar > foo > bar > > $x[1] = "foo"; $_ = "foo"; s/$x[1]/bar/; print "$_\n"; > $x[10] = "foo"; $_ = "foo"; s/$x[10]/bar/; print "$_\n"; > $x[100] = "foo"; $_ = "foo"; s/$x[100]/bar/; print "$_\n"; > $x[1000] = "foo"; $_ = "foo"; s/$x[1000]/bar/; print "$_\n"; > > NB I didn't discover this myself. User 'ernix' on stackoverflow > did, but I couldn't see anything for it in rt.perl.org. Here's his > original message: > > https://stackoverflow.com/questions/49492181/regex-substitution-using-a-list-element-with-just-3-digits-index-doesnt-work-as >
If one uses Deparse, it becomes clear where the difference is: $x[1] = 'foo'; $_ = 'foo'; s/$x[1]/bar/; print "$_\n"; $x[10] = 'foo'; $_ = 'foo'; s/$x[10]/bar/; print "$_\n"; $x[100] = 'foo'; $_ = 'foo'; s/${x}[100]/bar/; print "$_\n"; $x[1000] = 'foo'; $_ = 'foo'; s/$x[1000]/bar/; print "$_\n"; No idea why this happens though. Abigail
To: perl5-porters [...] perl.org
Subject: Re: [perl #133027] Regexes which interpolate list elements incorrectly parsed
Date: Mon, 26 Mar 2018 18:08:26 +0100
From: ilmari [...] ilmari.org (Dagfinn Ilmari Mannsåker)
Download (untitled) / with headers
text/plain 1.6k
Abigail <abigail@abigail.be> writes: Show quoted text
> If one uses Deparse, it becomes clear where the difference is: > > $x[1] = 'foo'; $_ = 'foo'; s/$x[1]/bar/; print "$_\n"; > $x[10] = 'foo'; $_ = 'foo'; s/$x[10]/bar/; print "$_\n"; > $x[100] = 'foo'; $_ = 'foo'; s/${x}[100]/bar/; print "$_\n"; > $x[1000] = 'foo'; $_ = 'foo'; s/$x[1000]/bar/; print "$_\n"; > > No idea why this happens though.
Additionally, if all the three digits are the same, it doesn't happen: $ perl -MO=Deparse -e '$x[666] = "foo"; $_ = "foo"; s/$x[666]/bar/; print "$_\n";' $x[666] = 'foo'; $_ = 'foo'; s/$x[666]/bar/; print "$_\n"; from #london.pm: <@mst> oh, there's a thingy in the parser where it tries to work out what [] means that adds up a bunch of things to get a score to decide which way to break <@ilmari> huh, $x[111] works <@ilmari> and 222, 333 etc <@ilmari> it deparses at s/${x}[...]/bar/ when it fails <@mst> my guess would be that 'all the same character inside the []' changes the scorte <@ilmari> ô_Ô <@ilmari> oh, it's trying to guess if it's an array subscript or a character class <@ilmari> but only if it's three characters... <@mst> that heuristic scares me more than S_intuit_method <@DrHyde> i'm kinda surprised that the parser doesn't first try as hard as it can to find variables to interpolate, and only treat $x[...] as normal regex line-noise if it can't <@mst> the treatment of [] is *special* - ilmari -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law
CC: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Date: Tue, 27 Mar 2018 12:03:24 +0100
To: Dagfinn Ilmari Mannsåker <ilmari [...] ilmari.org>
Subject: Re: [perl #133027] Regexes which interpolate list elements incorrectly parsed
Download (untitled) / with headers
text/plain 1.1k
On Mon, Mar 26, 2018 at 06:08:26PM +0100, Dagfinn Ilmari Mannsåker wrote: Show quoted text
> <@mst> oh, there's a thingy in the parser where it tries to work > out what [] means that adds up a bunch of things to get a > score to decide which way to break
The code is in S_intuit_more(). From the code comments: * Returns TRUE if there's more to the expression (e.g., a subscript), * FALSE otherwise. ... * if we're in a pattern and the first char is a [ * [] returns FALSE * [SOMETHING] has a funky algorithm to decide whether it's a * character class or not. It has to deal with things like * /$foo[-3]/ and /$foo[$bar]/ as well as /$foo[$\d]+/ .... /* This is the one truly awful dwimmer necessary to conflate C and sed. */ .... /* this is terrifying, and it works */ That last comment is followed by 100 lines of code which calculates a weight heuristic, and which I haven't even attempted to understand. -- Spock (or Data) is fired from his high-ranking position for not being able to understand the most basic nuances of about one in three sentences that anyone says to him. -- Things That Never Happen in "Star Trek" #19


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org