Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable in postponed regex only really evaluated once #6270

Closed
p5pRT opened this issue Feb 4, 2003 · 16 comments
Closed

Variable in postponed regex only really evaluated once #6270

p5pRT opened this issue Feb 4, 2003 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 4, 2003

Migrated from rt.perl.org#20683 (status was 'resolved')

Searchable as RT20683$

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From perl-5.8.0@ton.iguana.be

Created by perl-5.8.0@ton.iguana.be

When trying to remove consecutive letters in a string I tried
basically this​:

perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";$z})/$z++;"*"/eg; print'

giving​:

z=A
z=A
z=A
z=B
z=B
z=B
z=B
z=B
z=C
z=C
QD*DEBC*V

(the prints are added to show the code is executed)

So even though $z changes, it still only matches on the A. However, if I do
This​:
perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";~~$z})/$z++;"*"/eg; print'

giving​:

z=A
z=A
z=A
z=B
z=B
z=B
z=C
z=D
z=D
z=D
QD*DE**AV

Which is as expected.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.0:

Configured by ton at Tue Nov 12 01:56:18 CET 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld
    uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=define
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fomit-frame-pointer',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
    libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.2.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.8.0:
    HOME=/home/ton
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2003

From enache@rdslink.ro

$ perl -le '$_ ="1234"; $x=1; s/(??{$x})/$x++,z/ge; print $_'

that should print zzzz but prints z234.

The return value of a (??{ .. }) construct is compiled to a regexp before
being matched. In order to avoid re-compiling it every time, it is
cached by assigning 'r' magic to the variable which holds the return value
- with the mg_obj field pointing to the compiled regular expression.
(see regexec.c​:2864)

When the value of that variable is changed in any way, constructs like
(??{ ..;$var}) will still use the stale regular expression.

The example could be rewritten​:

#! /usr/bin/perl
$p=1;
foreach (1,2,3,4) {
  $p++ if /(??{ $p })/;
}
print $p;
__END__

this prints 2, not 5.

The only solution I found is wiring the get & set methods of 'r' magic
variables to a Perl_getsetregexp() function : that simply calls sv_unmagic(),
forcing recompilation of the regexp if the variable is assigned or its
value is fetched ( notice that $p may be tied or have other kind of magic
which change SvPVX, etc ).

This doesn't affect
  $p = qr/.../
  ... =~ $p
since $p isn't a 'r' magic variable, but a reference.

Please try the patch below ( you'll have to run make regen_headers too)

Regards

Adi


Inline Patch
diff -rupb /arc/perl-current/embed.fnc perl-current/embed.fnc
--- /arc/perl-current/embed.fnc	2003-02-16 16:24:51.000000000 +0200
+++ perl-current/embed.fnc	2003-02-22 19:20:36.000000000 +0200
@@ -381,6 +381,7 @@ p	|int	|magic_getglob	|SV* sv|MAGIC* mg
 p	|int	|magic_getnkeys	|SV* sv|MAGIC* mg
 p	|int	|magic_getpack	|SV* sv|MAGIC* mg
 p	|int	|magic_getpos	|SV* sv|MAGIC* mg
+p	|int	|magic_getsetregexp|SV* sv|MAGIC* mg
 p	|int	|magic_getsig	|SV* sv|MAGIC* mg
 p	|int	|magic_getsubstr|SV* sv|MAGIC* mg
 p	|int	|magic_gettaint	|SV* sv|MAGIC* mg
diff -rupb /arc/perl-current/ext/Devel/Peek/Peek.t perl-current/ext/Devel/Peek/Peek.t
--- /arc/perl-current/ext/Devel/Peek/Peek.t	2003-02-20 20:42:20.000000000 +0200
+++ perl-current/ext/Devel/Peek/Peek.t	2003-02-22 19:45:19.000000000 +0200
@@ -264,7 +264,7 @@ do_test(15,
   RV = $ADDR
   SV = PVMG\\($ADDR\\) at $ADDR
     REFCNT = 1
-    FLAGS = \\(OBJECT,RMG\\)
+    FLAGS = \\(OBJECT,GMG,SMG\\)
     IV = 0
     NV = 0
     PV = 0
diff -rupb /arc/perl-current/mg.c perl-current/mg.c
--- /arc/perl-current/mg.c	2003-02-16 00:33:27.000000000 +0200
+++ perl-current/mg.c	2003-02-22 19:20:50.000000000 +0200
@@ -1823,6 +1823,14 @@ Perl_magic_freeregexp(pTHX_ SV *sv, MAGI
     return 0;
 }
 
+int
+Perl_magic_getsetregexp(pTHX_ SV *sv, MAGIC *mg)
+{
+    sv_unmagic(sv, PERL_MAGIC_qr);
+    return 0;
+}
+
 #ifdef USE_LOCALE_COLLATE
 int
 Perl_magic_setcollxfrm(pTHX_ SV *sv, MAGIC *mg)
diff -rupb /arc/perl-current/perl.h perl-current/perl.h
--- /arc/perl-current/perl.h	2003-02-16 17:05:00.000000000 +0200
+++ perl-current/perl.h	2003-02-22 19:21:43.000000000 +0200
@@ -3485,7 +3485,7 @@ EXT MGVTBL PL_vtbl_defelem = {MEMBER_TO_
     					MEMBER_TO_FPTR(Perl_magic_setdefelem),
 					0,	0,	0};
 
-EXT MGVTBL PL_vtbl_regexp = {0,0,0,0, MEMBER_TO_FPTR(Perl_magic_freeregexp)};
+EXT MGVTBL PL_vtbl_regexp = {MEMBER_TO_FPTR(Perl_magic_getsetregexp),MEMBER_TO_FPTR(Perl_magic_getsetregexp),0,0, MEMBER_TO_FPTR(Perl_magic_freeregexp)};
 EXT MGVTBL PL_vtbl_regdata = {0, 0, MEMBER_TO_FPTR(Perl_magic_regdata_cnt), 0, 0};
 EXT MGVTBL PL_vtbl_regdatum = {MEMBER_TO_FPTR(Perl_magic_regdatum_get),
 			       MEMBER_TO_FPTR(Perl_magic_regdatum_set), 0, 0, 0};
diff -rupb /arc/perl-current/sv.c perl-current/sv.c
--- /arc/perl-current/sv.c	2003-02-18 04:01:51.000000000 +0200
+++ perl-current/sv.c	2003-02-22 19:29:45.000000000 +0200
@@ -2964,9 +2964,7 @@ Perl_sv_2pv_flags(pTHX_ register SV *sv,
 		
 		switch (SvTYPE(sv)) {
 		case SVt_PVMG:
-		    if ( ((SvFLAGS(sv) &
-			   (SVs_OBJECT|SVf_OK|SVs_GMG|SVs_SMG|SVs_RMG))
-			  == (SVs_OBJECT|SVs_RMG))
+		    if (SvFLAGS(sv) & SVs_OBJECT
 			 && (mg = mg_find(sv, PERL_MAGIC_qr))) {
 			regexp *re = (regexp *)mg->mg_obj;
 

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2003

From perl5-porters@ton.iguana.be

In article <20030222192221.GA10431@​ratsnest.hole>,
  Enache Adrian <enache@​rdslink.ro> writes​:

$ perl -le '$_ ="1234"; $x=1; s/(??{$x})/$x++,z/ge; print $_'

that should print zzzz but prints z234.

The return value of a (??{ .. }) construct is compiled to a regexp before
being matched. In order to avoid re-compiling it every time, it is
cached by assigning 'r' magic to the variable which holds the return value
- with the mg_obj field pointing to the compiled regular expression.
(see regexec.c​:2864)

When the value of that variable is changed in any way, constructs like
(??{ ..;$var}) will still use the stale regular expression.

This seems a case where the problem can be solved by simply dropping
this optimization. If the value wasn't supposed to change all the
time, the user would almost certainly have written s/$p// instead of
s/(??{$p})//, so i think this optimization is not worth it (ok, someone
might do it to set $^R and $^N)

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From enache@rdslink.ro

On Sat, Feb 22, 2003 at 09​:22​:21PM +0200, Enache Adrian wrote​:

Please try the patch below ( you'll have to run make regen_headers too)

Please add this too to the previous patch - I deleted it by error from the
diff -rup output.
(without it, I suspect that the regexp caching will be defeated in
_all_ cases).


Inline Patch
--- /arc/perl-current/regexec.c	2003-02-16 16:24:51.000000000 +0200
+++ perl-current/regexec.c	2003-02-23 01:43:04.000000000 +0200
@@ -2863,18 +2863,12 @@ S_regmatch(pTHX_ regnode *prog)
 	    if (logical) {
 		if (logical == 2) {	/* Postponed subexpression. */
 		    regexp *re;
-		    MAGIC *mg = Null(MAGIC*);
+		    MAGIC *mg;
 		    re_cc_state state;
 		    CHECKPOINT cp, lastcp;
                     int toggleutf;
-
-		    if(SvROK(ret) || SvRMAGICAL(ret)) {
-			SV *sv = SvROK(ret) ? SvRV(ret) : ret;
-
-			if(SvMAGICAL(sv))
-			    mg = mg_find(sv, PERL_MAGIC_qr);
-		    }
-		    if (mg) {
+		    SV *sv = SvROK(ret) ? SvRV(ret) : ret;
+		    if(SvMAGICAL(sv) && (mg = mg_find(sv, PERL_MAGIC_qr))) {
 			re = (regexp *)mg->mg_obj;
 			(void)ReREFCNT_inc(re);
 		    }
------------------------------------------------------------------------

Regards
Adi

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From enache@rdslink.ro

On Sat, Feb 22, 2003 at 09​:11​:03PM +0000, Ton Hospel wrote​:

This seems a case where the problem can be solved by simply dropping
this optimization. If the value wasn't supposed to change all the
time, the user would almost certainly have written s/$p// instead of
s/(??{$p})//, so i think this optimization is not worth it (ok, someone
might do it to set $^R and $^N)

IMHO that optimization is worth keeping in.
(??{ ... }) 's may return (by different paths!) complicated regular
expressions or long chunks of text.
Caching them dramatically improves the performance.

Regards

Adi

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From perl5-porters@ton.iguana.be

In article <20030223004840.GA1442@​ratsnest.hole>,
  Enache Adrian <enache@​rdslink.ro> writes​:

On Sat, Feb 22, 2003 at 09​:11​:03PM +0000, Ton Hospel wrote​:

This seems a case where the problem can be solved by simply dropping
this optimization. If the value wasn't supposed to change all the
time, the user would almost certainly have written s/$p// instead of
s/(??{$p})//, so i think this optimization is not worth it (ok, someone
might do it to set $^R and $^N)

IMHO that optimization is worth keeping in.
(??{ ... }) 's may return (by different paths!) complicated regular
expressions or long chunks of text.
Caching them dramatically improves the performance.

I don't get it.
- if the value isn't going to change (much), why did the user use a (??{$p}) ?
  What is the scenario in which you expect the cache to be hit ?
- why is the r-magic not invalidated when the variable gets assigned ?

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From enache@rdslink.ro

$ perl -le '$_ ="1234"; $x=1; s/(??{$x})/$x++,z/ge; print $_'

that should print zzzz but prints z234.
...
The example could be rewritten​:

#! /usr/bin/perl
$p=1;
foreach (1,2,3,4) {
$p++ if /(??{ $p })/;
}
print $p;
__END__

this prints 2, not 5.

My previous patch has the drawback of dropping the optimization even
in simple cases where $p is a regular variable, and it's just accessed,
not modified.

The modified version works as follows​:
  - only a magic_setregexp() method is added to 'r'-magical variables,
  method which calls sv_unmagic(sv,'r'), forcing recompilation if
  the variable is set.
  - if the variable has some 'get' magic methods, it isn't made
  'r'-magical in the first place (regexec.c ~ 2900)
  - if the variable gets some 'get' magic methods since it was cached,
  (ex. it is tied) its 'r' magic is dropped (regexec.c ~2876)

I added 2 two tests to op/pat.t, too.

Regards

Adi


Inline Patch
diff -rup /arc/perl-current/embed.fnc perl-current/embed.fnc
--- /arc/perl-current/embed.fnc	2003-02-16 16:24:51.000000000 +0200
+++ perl-current/embed.fnc	2003-02-23 19:37:55.000000000 +0200
@@ -408,6 +408,7 @@ p	|int	|magic_setmglob	|SV* sv|MAGIC* mg
 p	|int	|magic_setnkeys	|SV* sv|MAGIC* mg
 p	|int	|magic_setpack	|SV* sv|MAGIC* mg
 p	|int	|magic_setpos	|SV* sv|MAGIC* mg
+p	|int	|magic_setregexp|SV* sv|MAGIC* mg
 p	|int	|magic_setsig	|SV* sv|MAGIC* mg
 p	|int	|magic_setsubstr|SV* sv|MAGIC* mg
 p	|int	|magic_settaint	|SV* sv|MAGIC* mg
diff -rup /arc/perl-current/ext/Devel/Peek/Peek.t perl-current/ext/Devel/Peek/Peek.t
--- /arc/perl-current/ext/Devel/Peek/Peek.t	2003-02-20 20:42:20.000000000 +0200
+++ perl-current/ext/Devel/Peek/Peek.t	2003-02-23 19:37:55.000000000 +0200
@@ -264,7 +264,7 @@ do_test(15,
   RV = $ADDR
   SV = PVMG\\($ADDR\\) at $ADDR
     REFCNT = 1
-    FLAGS = \\(OBJECT,RMG\\)
+    FLAGS = \\(OBJECT,SMG\\)
     IV = 0
     NV = 0
     PV = 0
diff -rup /arc/perl-current/mg.c perl-current/mg.c
--- /arc/perl-current/mg.c	2003-02-16 00:33:27.000000000 +0200
+++ perl-current/mg.c	2003-02-23 19:38:52.000000000 +0200
@@ -1816,6 +1816,13 @@ Perl_magic_setuvar(pTHX_ SV *sv, MAGIC *
 }
 
 int
+Perl_magic_setregexp(pTHX_ SV *sv, MAGIC *mg)
+{
+    sv_unmagic(sv, PERL_MAGIC_qr);
+    return 0;
+}
+
+int
 Perl_magic_freeregexp(pTHX_ SV *sv, MAGIC *mg)
 {
     regexp *re = (regexp *)mg->mg_obj;
diff -rup /arc/perl-current/perl.h perl-current/perl.h
--- /arc/perl-current/perl.h	2003-02-16 17:05:00.000000000 +0200
+++ perl-current/perl.h	2003-02-23 19:37:55.000000000 +0200
@@ -3485,7 +3485,7 @@ EXT MGVTBL PL_vtbl_defelem = {MEMBER_TO_
     					MEMBER_TO_FPTR(Perl_magic_setdefelem),
 					0,	0,	0};
 
-EXT MGVTBL PL_vtbl_regexp = {0,0,0,0, MEMBER_TO_FPTR(Perl_magic_freeregexp)};
+EXT MGVTBL PL_vtbl_regexp = {0, MEMBER_TO_FPTR(Perl_magic_setregexp),0,0, MEMBER_TO_FPTR(Perl_magic_freeregexp)};
 EXT MGVTBL PL_vtbl_regdata = {0, 0, MEMBER_TO_FPTR(Perl_magic_regdata_cnt), 0, 0};
 EXT MGVTBL PL_vtbl_regdatum = {MEMBER_TO_FPTR(Perl_magic_regdatum_get),
 			       MEMBER_TO_FPTR(Perl_magic_regdatum_set), 0, 0, 0};
diff -rup /arc/perl-current/regexec.c perl-current/regexec.c
--- /arc/perl-current/regexec.c	2003-02-16 16:24:51.000000000 +0200
+++ perl-current/regexec.c	2003-02-23 19:38:40.000000000 +0200
@@ -2867,13 +2867,17 @@ S_regmatch(pTHX_ regnode *prog)
 		    re_cc_state state;
 		    CHECKPOINT cp, lastcp;
                     int toggleutf;
+		    register SV *sv;
 
-		    if(SvROK(ret) || SvRMAGICAL(ret)) {
-			SV *sv = SvROK(ret) ? SvRV(ret) : ret;
-
-			if(SvMAGICAL(sv))
-			    mg = mg_find(sv, PERL_MAGIC_qr);
+		    if(SvROK(ret) && SvSMAGICAL(sv = SvRV(ret)))
+			mg = mg_find(sv, PERL_MAGIC_qr);
+		    else if (SvSMAGICAL(ret)) {
+			if (SvGMAGICAL(ret))
+			    sv_unmagic(ret, PERL_MAGIC_qr);
+			else
+			    mg = mg_find(ret, PERL_MAGIC_qr);
 		    }
+
 		    if (mg) {
 			re = (regexp *)mg->mg_obj;
 			(void)ReREFCNT_inc(re);
@@ -2890,7 +2894,8 @@ S_regmatch(pTHX_ regnode *prog)
                         if (DO_UTF8(ret)) pm.op_pmdynflags |= PMdf_DYN_UTF8;
 			re = CALLREGCOMP(aTHX_ t, t + len, &pm);
 			if (!(SvFLAGS(ret)
-			      & (SVs_TEMP | SVs_PADTMP | SVf_READONLY)))
+			      & (SVs_TEMP | SVs_PADTMP | SVf_READONLY
+				| SVs_GMG)))
 			    sv_magic(ret,(SV*)ReREFCNT_inc(re),
 					PERL_MAGIC_qr,0,0);
 			PL_regprecomp = oprecomp;
diff -rup /arc/perl-current/sv.c perl-current/sv.c
--- /arc/perl-current/sv.c	2003-02-18 04:01:51.000000000 +0200
+++ perl-current/sv.c	2003-02-23 19:37:55.000000000 +0200
@@ -2966,7 +2966,7 @@ Perl_sv_2pv_flags(pTHX_ register SV *sv,
 		case SVt_PVMG:
 		    if ( ((SvFLAGS(sv) &
 			   (SVs_OBJECT|SVf_OK|SVs_GMG|SVs_SMG|SVs_RMG))
-			  == (SVs_OBJECT|SVs_RMG))
+			  == (SVs_OBJECT|SVs_SMG))
 			 && (mg = mg_find(sv, PERL_MAGIC_qr))) {
 			regexp *re = (regexp *)mg->mg_obj;
 
diff -rup /arc/perl-current/t/op/pat.t perl-current/t/op/pat.t
--- /arc/perl-current/t/op/pat.t	2003-02-05 22:38:19.000000000 +0200
+++ perl-current/t/op/pat.t	2003-02-23 19:58:27.000000000 +0200
@@ -6,7 +6,7 @@
 
 $| = 1;
 
-print "1..988\n";
+print "1..990\n";
 
 BEGIN {
     chdir 't' if -d 't';
@@ -3108,5 +3108,20 @@ ok("bbbbac" =~ /$pattern/ && $1 eq 'a', 
     ok ( "0" =~ /\p{N}+\z/, "[perl #19767] variant test" );
 }
 
-# last test 988
+{
+
+    $p = 1;
+    foreach (1,2,3,4) {
+	    $p++ if /(??{ $p })/
+    }
+    ok ($p == 5, "[perl #20683] (??{ }) returns stale values");
+    { package P; $a=1; sub TIESCALAR { bless[] } sub FETCH { $a++ } }
+    tie $p, P;
+    foreach (1,2,3,4) {
+	    /(??{ $p })/
+    }
+    ok ( $p == 5, "(??{ }) returns stale values");
+}
+
+# last test 990
 

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From enache@rdslink.ro

On Sun, Feb 23, 2003 at 03​:31​:38AM +0000, Ton Hospel wrote​:

- if the value isn't going to change (much), why did the user use a (??{$p}) ?
What is the scenario in which you expect the cache to be hit ?

Please do something like
dd if=/dev/zero of=/tmp/k count=4096
and run the script below on it with/without the optimization.
( Dropping the optimization is matter of commenting out the branch
  if (!(SvFLAGS(ret) ...
at regexec.c​:2892 - line number of the unpatched file ).

I get​:
ratsnest# time perl /tmp/x /tmp/k # without the optimization

real 0m54.794s
user 0m54.752s
sys 0m0.041s

ratsnest# time perl /tmp/x /tmp/k # with the optimization in

real 0m4.332s
user 0m4.316s
sys 0m0.016s

Notice that instead of the silly quotes there the user may want to put
some convoluted regular expressions, without putting it in qr// for
some reason or another - these may be build up by some sub, etc.

- why is the r-magic not invalidated when the variable gets assigned ?

That's just what my patch try to do​: invalidate the r-magic when
the variable gets assigned.

( I admit the test is silly - I suspect however that a more serious
investigation will give the same results).

Regards
Adi

/tmp/x


$entremets = "
  Tancowala ryba z rakiem, a pietruszka z pasternakiem.
  Cebula sie dziwowala, jak pietruszka tancowala.
  |#";
$appero = "
  Pisse une goutte dans ce vermouth,
  c'est pas beau mais c'est salaud.
  |%";

$entremets = "a";
$appero = "b";

$/=\4096;

while(<>) {
  $a = not $a;
  print if /(??{
  $a ? $entremets : $appero;
  })/
}

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2003

From enache@rdslink.ro

On Sun, Feb 23, 2003 at 09​:02​:01PM +0200, Enache Adrian wrote​:

real 0m4.332s
user 0m4.316s
sys 0m0.016s

That's with whe simple ..="a", ..="b". Removing these produces​:

$ time perl /tmp/x /tmp/k

real 1m40.802s
user 1m40.783s
sys 0m0.018s

$ time perl /tmp/x /tmp/k

real 0m5.881s
user 0m5.871s
sys 0m0.010s

Regards

Adi.

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2003

From @hvds

Enache Adrian <enache@​rdslink.ro> wrote​:
:$entremets = "a";
:$appero = "b";
:
:$/=\4096;
:
:while(<>) {
: $a = not $a;
: print if /(??{
: $a ? $entremets : $appero;
: })/
:}

For what it's worth, I've written code very like this to add
assertions to regexps, and always did it like this​:

$true = qr/(?=)/;
$false = qr/(?!)/;
...
m{
  match stuff
  (??{ ok() ? $true : $false })
  match more
}x;

The main reason for doing that was to get around reentrancy problems
(coredumps when trying to compile re's during a match); it appears
that with 5.8.0 it is now safe to do​:
  perl -wle '
  $_="";
  for $p (0,1,0,1) {
  print /(??{ $p ? qr{(?=)} : qr{(?!)} })/ ? "match" : "no match";
  }'

Precompiling the returned patterns bypasses the (obviously buggy)
optimisation altogether. I think it might make most sense to remove
the optimisation and document this as the recommended approach.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Feb 24, 2003

From enache@rdslink.ro

On Mon, Feb 24, 2003 at 02​:06​:38AM +0000, hv@​crypt.org wrote​:

Precompiling the returned patterns bypasses the (obviously buggy)
optimisation altogether. I think it might make most sense to remove
the optimisation and document this as the recommended approach.

The optimization seems to be valuable even in simple, un-branched
cases, so why not keep it in ?

$ dd if=/dev/zero count=4096 | time perl -ne '$a=1; m/(??{ $a })/'

3.96user 0.00system 0​:03.98elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
( optimized )
55.04user 0.01system 0​:55.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
( unoptimized )

Notice that my patch ( the second variant ) doesn't affect the rest
of the code in way​: the only drawback is the bloat of this little
magic_setregexp() function.

Of course, it's always better to precompile its regexp's and to
avoid (??{ .. }) altogether.

But why forcing a speed regression, even for weird cases ?

Regards

Adi

@p5pRT
Copy link
Author

p5pRT commented Feb 26, 2003

From @hvds

Enache Adrian <enache@​rdslink.ro> wrote​:
:> The example could be rewritten​:
:>
:> #! /usr/bin/perl
:> $p=1;
:> foreach (1,2,3,4) {
:> $p++ if /(??{ $p })/;
:> }
:> print $p;
:> __END__
:>
:> this prints 2, not 5.
:
:My previous patch has the drawback of dropping the optimization even
:in simple cases where $p is a regular variable, and it's just accessed,
:not modified.
:
:The modified version works as follows​:
: - only a magic_setregexp() method is added to 'r'-magical variables,
: method which calls sv_unmagic(sv,'r'), forcing recompilation if
: the variable is set.
: - if the variable has some 'get' magic methods, it isn't made
: 'r'-magical in the first place (regexec.c ~ 2900)
: - if the variable gets some 'get' magic methods since it was cached,
: (ex. it is tied) its 'r' magic is dropped (regexec.c ~2876)
:
:I added 2 two tests to op/pat.t, too.

Thanks, applied as #18782.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Feb 26, 2003

From @hvds

Enache Adrian <enache@​rdslink.ro> wrote​:
:On Mon, Feb 24, 2003 at 02​:06​:38AM +0000, hv@​crypt.org wrote​:
:> Precompiling the returned patterns bypasses the (obviously buggy)
:> optimisation altogether. I think it might make most sense to remove
:> the optimisation and document this as the recommended approach.
:
:The optimization seems to be valuable even in simple, un-branched
:cases, so why not keep it in ?

Well, the more types of magic we apply unasked to random SVs, the
more possible interactions between different types of magic we have
to cope with, and the more bugs we introduce. This is one type of
magic that has already been shown to be buggy, and is not of any
great benefit since the only change in behaviour it provides is an
optimisation that you can easily supply for yourself by writing your
(??{...}) constructs slightly different (and those constructs are
pretty rare in any case).

That was my original thinking. But looking over the patch again,
I notice that it now tries hard to avoid interaction with other
magic, which should help to reduce the potential for bugs, so I've
applied it.

:Of course, it's always better to precompile its regexp's and to
:avoid (??{ .. }) altogether.

Not precompiling regexps is a quite distinct thing from using
(??{...}) - the latter is a very useful construct on those rare
occasions you need it. I would rather like to elevate it out
of the "experimental" pit for the next release, though it'll
need a number of problems fixing before that is possible.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2003

From @hvds

"perl-5.8.0@​ton.iguana.be (via RT)" <perlbug-followup@​perl.org> wrote​:
:When trying to remove consecutive letters in a string I tried
:basically this​:
:
:perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";$z})/$z++;"*"/eg; print'
:
:giving​:
[...]
:QD*DEBC*V
:
:So even though $z changes, it still only matches on the A. However, if I do
:This​:
:perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";~~$z})/$z++;"*"/eg; print'
:
:giving​:
[...]
:QD*DE**AV
:
:Which is as expected.

This appears to be fixed by change #18805.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2003

From @hvds

hv@​crypt.org wrote​:
:"perl-5.8.0@​ton.iguana.be (via RT)" <perlbug-followup@​perl.org> wrote​:
:​:When trying to remove consecutive letters in a string I tried
:​:basically this​:
:​:
:​:perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";$z})/$z++;"*"/eg; print'
:​:
:​:giving​:
:[...]
:​:QD*DEBC*V
:​:
:​:So even though $z changes, it still only matches on the A. However, if I do
:​:This​:
:​:perl -wle '$z="A"; $_="QDADEBCAV"; s/(??{print"z=$z";~~$z})/$z++;"*"/eg; print'
:​:
:​:giving​:
:[...]
:​:QD*DE**AV
:​:
:​:Which is as expected.
:
:This appears to be fixed by change #18805.

Sorry, ignore that​: it had already been fixed before this change.

Hugo

@p5pRT p5pRT closed this as completed Mar 2, 2003
@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2003

@rspier - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant