Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bleadperl v5.13.10-341-gcf34198 breaks NWIGER/CGI-FormBuilder-3.0501.tgz #11189

Closed
p5pRT opened this issue Mar 12, 2011 · 11 comments
Closed

Bleadperl v5.13.10-341-gcf34198 breaks NWIGER/CGI-FormBuilder-3.0501.tgz #11189

p5pRT opened this issue Mar 12, 2011 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 12, 2011

Migrated from rt.perl.org#85964 (status was 'resolved')

Searchable as RT85964$

@p5pRT
Copy link
Author

p5pRT commented Mar 12, 2011

From @andk

git bisect​:


  cf34198 is the first bad commit
  commit cf34198
  Author​: Karl Williamson <public@​khwilliamson.com>
  Date​: Tue Mar 8 15​:28​:05 2011 -0700

  regcomp.c​: Change start class init for /l

  Before /l was added, locale only applied to regular expressions as a
  whole. Now it can be subsections, so the flag for allowing it should
  be treated as any other flag.

example fail report​:


  cpantesters do not have a fail report yet. Excerpt of test output from

  Note​: PERL_TEST_DIFF="diff -u" make test TEST_FILES=t/1a-generate.t

  #
  # diff -u test-exp-MY0f1 test-got-Eux3C
  # --- test-exp-MY0f1 2011-03-12 08​:49​:09.000000000 +0100
  # +++ test-got-Eux3C 2011-03-12 08​:49​:09.000000000 +0100
  # @​@​ -6,7 +6,7 @​@​
  #
  # // email​: standard text, hidden, password, or textarea box
  # var email = form.elements['email'].value;
  # - if (email != null && email != "" && ! email.match(/^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/)) {
  # + if (email != null && email != "" && email /^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/) {
  # alertstr += '- Invalid entry for the "Email" field\n';
  # invalid++;
  # invalid_fields.push('email');
  # t/1a-generate.t line 590 is​: my $ok = ok($ren, $out);

perl -V​:


  Summary of my perl5 (revision 5 version 13 subversion 10) configuration​:
  Commit id​: cf34198
  Platform​:
  osname=linux, osvers=2.6.32-5-amd64, archname=x86_64-linux
  uname='linux k81 2.6.32-5-amd64 #1 smp wed jan 12 03​:40​:32 utc 2011 x86_64 gnulinux '
  config_args='-Dprefix=/home/src/perl/repoperls/installed-perls/perl/v5.13.10-341-gcf34198 -Dinstallusrbinperl=n -Uversiononly -Dusedevel -des -Ui_db -Uuseithreads -Uuselongdouble'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.4.5', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.11.2.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.11.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

  Characteristics of this binary (from libperl)​:
  Compile-time options​: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_USE_DEVEL
  USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES
  USE_PERLIO USE_PERL_ATOF
  Built under linux
  Compiled at Mar 12 2011 07​:46​:31
  @​INC​:
  /home/src/perl/repoperls/installed-perls/perl/v5.13.10-341-gcf34198/lib/site_perl/5.13.10/x86_64-linux
  /home/src/perl/repoperls/installed-perls/perl/v5.13.10-341-gcf34198/lib/site_perl/5.13.10
  /home/src/perl/repoperls/installed-perls/perl/v5.13.10-341-gcf34198/lib/5.13.10/x86_64-linux
  /home/src/perl/repoperls/installed-perls/perl/v5.13.10-341-gcf34198/lib/5.13.10
  .

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @andk

On Fri, 11 Mar 2011 23​:52​:36 -0800, perlbug-followup@​perl.org said​:

  > http​://rt.perl.org/rt3/Ticket/Display.html?id=85964

More candidates affected by this change​:

VOJ/DAIA-0.31.tar.gz
DROLSKY/DateTime-Format-Builder-0.80.tar.gz

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @demerphq

On 15 March 2011 08​:46, Andreas J. Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Fri, 11 Mar 2011 23​:52​:36 -0800, perlbug-followup@​perl.org said​:

 > http​://rt.perl.org/rt3/Ticket/Display.html?id=85964

More candidates affected by this change​:

VOJ/DAIA-0.31.tar.gz
DROLSKY/DateTime-Format-Builder-0.80.tar.gz

This can be reduced to the following one liner in blead

$ ./perl -Ilib -Mre=debug -le'$s =
q{/^[\\w\\-\\+\\._]+\\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\\.[a-zA-Z]+$/};
print for "Length​:".length($s), "Matched​:".[0,1]->[$s =~
m#^m?(\S)(.*)\1$#]'
Compiling REx "^m?(\S)(.*)\1$"
synthetic stclass "ANYOF{loc}{i}[m][{non-utf8-latin1-all}]".
Final program​:
  1​: BOL (2)
  2​: CURLY {0,1} (6)
  4​: EXACT <m> (0)
  6​: OPEN1 (8)
  8​: NSPACE (9)
  9​: CLOSE1 (11)
  11​: OPEN2 (13)
  13​: STAR (15)
  14​: REG_ANY (0)
  15​: CLOSE2 (17)
  17​: REF1 (19)
  19​: EOL (20)
  20​: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass
ANYOF{loc}{i}[m][{non-utf8-latin1-all}] anchored(BOL) minlen 1
Guessing start of match in sv for REx "^m?(\S)(.*)\1$" against
"/^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/"
Found floating substr ""$ at offset 55...
start_shift​: 1 check_at​: 55 s​: 0 endpos​: 55
Could not match STCLASS...
Match rejected by optimizer
Length​:55
Matched​:0
Freeing REx​: "^m?(\S)(.*)\1$"

For some reason Karls patch has changed the STCLASS to be
ANYOF{loc}{i}[m][{non-utf8-latin1-all}] when it should be simply
NSPACE, or an anyof equivalent. (the charset for NSPACE union the
charset of [m] => NSPACE)

Also it looks like the class is somehow case insensitive, which doesnt
make sense, this pattern doesnt use the /i modifier (perhaps I am
misreading the dump tho).

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @demerphq

On 15 March 2011 09​:57, demerphq <demerphq@​gmail.com> wrote​:

On 15 March 2011 08​:46, Andreas J. Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Fri, 11 Mar 2011 23​:52​:36 -0800, perlbug-followup@​perl.org said​:

 > http​://rt.perl.org/rt3/Ticket/Display.html?id=85964

More candidates affected by this change​:

VOJ/DAIA-0.31.tar.gz
DROLSKY/DateTime-Format-Builder-0.80.tar.gz

This can be reduced to the following one liner in blead

$ ./perl -Ilib -Mre=debug -le'$s =
q{/^[\\w\\-\\+\\._]+\\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\\.[a-zA-Z]+$/};
print for "Length​:".length($s), "Matched​:".[0,1]->[$s =~
m#^m?(\S)(.*)\1$#]'
Compiling REx "^m?(\S)(.*)\1$"
synthetic stclass "ANYOF{loc}{i}[m][{non-utf8-latin1-all}]".
Final program​:
  1​: BOL (2)
  2​: CURLY {0,1} (6)
  4​:   EXACT <m> (0)
  6​: OPEN1 (8)
  8​:   NSPACE (9)
  9​: CLOSE1 (11)
 11​: OPEN2 (13)
 13​:   STAR (15)
 14​:     REG_ANY (0)
 15​: CLOSE2 (17)
 17​: REF1 (19)
 19​: EOL (20)
 20​: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass
ANYOF{loc}{i}[m][{non-utf8-latin1-all}] anchored(BOL) minlen 1
Guessing start of match in sv for REx "^m?(\S)(.*)\1$" against
"/^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/"
Found floating substr ""$ at offset 55...
start_shift​: 1 check_at​: 55 s​: 0 endpos​: 55
Could not match STCLASS...
Match rejected by optimizer
Length​:55
Matched​:0
Freeing REx​: "^m?(\S)(.*)\1$"

For some reason Karls patch has changed the STCLASS to be
ANYOF{loc}{i}[m][{non-utf8-latin1-all}] when it should be simply
NSPACE, or an anyof equivalent. (the charset for NSPACE union the
charset of [m] => NSPACE)

Also it looks like the class is somehow case insensitive, which doesnt
make sense, this pattern doesnt use the /i modifier (perhaps I am
misreading the dump tho).

And this is what it used to look like​:

$ perl -mre=debug -le'$s =
q{/^[\\w\\-\\+\\._]+\\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\\.[a-zA-Z]+$/};
print for "Length​:".length($s), "Matched​:".[0,1]->[$s =~
m#^m?(\S)(.*)\1$#]'
Compiling REx "^m?(\S)(.*)\1$"
synthetic stclass "ANYOF[\0-\10\13\16-\37!-\377][]".
Final program​:
  1​: BOL (2)
  2​: CURLY {0,1} (6)
  4​: EXACT <m> (0)
  6​: OPEN1 (8)
  8​: NSPACE (9)
  9​: CLOSE1 (11)
  11​: OPEN2 (13)
  13​: STAR (15)
  14​: REG_ANY (0)
  15​: CLOSE2 (17)
  17​: REF1 (19)
  19​: EOL (20)
  20​: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass
ANYOF[\0-\10\13\16-\37!-\377][] anchored(BOL) minlen 1
Guessing start of match in sv for REx "^m?(\S)(.*)\1$" against
"/^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/"
Found floating substr ""$ at offset 55...
start_shift​: 1 check_at​: 55 s​: 0 endpos​: 55
Does not contradict STCLASS...
Guessed​: match at offset 0
Matching REx "^m?(\S)(.*)\1$" against
"/^[\w\-\+\._]+\@​[a-zA-Z0-9][-a-zA-Z0-9\.]*\.[a-zA-Z]+$/"
  0 <> </^[\w\-\+\> | 1​:BOL(2)
  0 <> </^[\w\-\+\> | 2​:CURLY {0,1}(6)
  EXACT <m> can match 0 times out of 1...
  0 <> </^[\w\-\+\> | 6​: OPEN1(8)
  0 <> </^[\w\-\+\> | 8​: NSPACE(9)
  1 </> <^[\w\-\+\.> | 9​: CLOSE1(11)
  1 </> <^[\w\-\+\.> | 11​: OPEN2(13)
  1 </> <^[\w\-\+\.> | 13​: STAR(15)
  REG_ANY can match 54 times out of
2147483647...
  55 <.[a-zA-Z]+$/> <> | 15​: CLOSE2(17)
  55 <.[a-zA-Z]+$/> <> | 17​: REF1(19)
  failed...
  54 <.[a-zA-Z]+$> </> | 15​: CLOSE2(17)
  54 <.[a-zA-Z]+$> </> | 17​: REF1(19)
  55 <.[a-zA-Z]+$/> <> | 19​: EOL(20)
  55 <.[a-zA-Z]+$/> <> | 20​: END(0)
Match successful!
Length​:55
Matched​:1
Freeing REx​: "^m?(\S)(.*)\1$"

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @demerphq

The offending patch looks like this​:

commit cf34198
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Tue Mar 8 15​:28​:05 2011 -0700

  regcomp.c​: Change start class init for /l

  Before /l was added, locale only applied to regular expressions as a
  whole. Now it can be subsections, so the flag for allowing it
  should be treated as any other flag.

Inline Patch
diff --git a/regcomp.c b/regcomp.c
index 53047ee..18079ff 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -726,9 +726,7 @@ S_cl_anything(const RExC_state_t *pRExC_state,
struct regnode_charclass_class *c

  ANYOF_CLASS_ZERO(cl);
  ANYOF_BITMAP_SETALL(cl);
- cl->flags =
ANYOF_EOS|ANYOF_UNICODE_ALL|ANYOF_LOC_NONBITMAP_FOLD|ANYOF_NON_UTF8_LATIN1_ALL;
- if (LOC)
- cl->flags |= ANYOF_LOCALE;
+ cl->flags =
ANYOF_EOS|ANYOF_UNICODE_ALL|ANYOF_LOC_NONBITMAP_FOLD|ANYOF_NON_UTF8_LATIN1_ALL|ANYOF_LOCALE;
}

/* Can match anything (initialization) */
@​@​ -768,8 +766,6 @​@​ S_cl_init_zero(const RExC_state_t *pRExC_state,
struct regnode_charclass_class *
  Zero(cl, 1, struct regnode_charclass_class);
  cl->type = ANYOF;
  cl_anything(pRExC_state, cl);
- if (LOC)
- cl->flags |= ANYOF_LOCALE;
}

/* 'And' a given class with another one. Can create false positives */

Unfortunately however it doesnt revert cleanly, and when i try to
revert it by hand lots of stuff fails.

However some observations​: adding ANYOF_LOCALE to this bitmap doesnt
make sense to me. In fact, generating a start class at compile time
under use locale doesn't make sense to me.

Consider that the *meaning* of SPACE/NSPACE under use locale has to be
determined at RUN TIME. So we cannot do any kind of compile time
charclass set operations using them. The best we can do is turn on or
off flags like "Matches a Space", "Matches a wordchar", etc. (Which is
what i thought used to happen under use locale).

Looking into things, it seems that this code under use locale would
fail going back as far as 5.8.5. This is what I mean by bitrot of the
regex engine under use locale. Here is another case where the regex
engine has been completely broken nearly forever under use locale, and
nobody has noticed. IMO a really good argument to get rid of use
locale. Using it is dangerous.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @khwilliamson

On 03/15/2011 03​:41 AM, demerphq wrote​:

The offending patch looks like this​:

commit cf34198
Author​: Karl Williamson<public@​khwilliamson.com>
Date​: Tue Mar 8 15​:28​:05 2011 -0700

 regcomp\.c&#8203;: Change start class init for /l

 Before /l was added\, locale only applied to regular expressions as a
 whole\.  Now it can be subsections\, so the flag for allowing it
 should be treated as any other flag\.

diff --git a/regcomp.c b/regcomp.c
index 53047ee..18079ff 100644
--- a/regcomp.c
+++ b/regcomp.c
@​@​ -726,9 +726,7 @​@​ S_cl_anything(const RExC_state_t *pRExC_state,
struct regnode_charclass_class *c

  ANYOF\_CLASS\_ZERO\(cl\);
  ANYOF\_BITMAP\_SETALL\(cl\);

- cl->flags =
ANYOF_EOS|ANYOF_UNICODE_ALL|ANYOF_LOC_NONBITMAP_FOLD|ANYOF_NON_UTF8_LATIN1_ALL;
- if (LOC)
- cl->flags |= ANYOF_LOCALE;
+ cl->flags =
ANYOF_EOS|ANYOF_UNICODE_ALL|ANYOF_LOC_NONBITMAP_FOLD|ANYOF_NON_UTF8_LATIN1_ALL|ANYOF_LOCALE;
}

/* Can match anything (initialization) */
@​@​ -768,8 +766,6 @​@​ S_cl_init_zero(const RExC_state_t *pRExC_state,
struct regnode_charclass_class *
Zero(cl, 1, struct regnode_charclass_class);
cl->type = ANYOF;
cl_anything(pRExC_state, cl);
- if (LOC)
- cl->flags |= ANYOF_LOCALE;
}

/* 'And' a given class with another one. Can create false positives */

Unfortunately however it doesnt revert cleanly, and when i try to
revert it by hand lots of stuff fails.

However some observations​: adding ANYOF_LOCALE to this bitmap doesnt
make sense to me. In fact, generating a start class at compile time
under use locale doesn't make sense to me.

Consider that the *meaning* of SPACE/NSPACE under use locale has to be
determined at RUN TIME. So we cannot do any kind of compile time
charclass set operations using them. The best we can do is turn on or
off flags like "Matches a Space", "Matches a wordchar", etc. (Which is
what i thought used to happen under use locale).

Looking into things, it seems that this code under use locale would
fail going back as far as 5.8.5. This is what I mean by bitrot of the
regex engine under use locale. Here is another case where the regex
engine has been completely broken nearly forever under use locale, and
nobody has noticed. IMO a really good argument to get rid of use
locale. Using it is dangerous.

cheers,
Yves

I stayed up late last night, and now have a solution to this, which I
will submit after breakfast

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @nwc10

On Tue, Mar 15, 2011 at 07​:17​:51AM -0600, Karl Williamson wrote​:

I stayed up late last night, and now have a solution to this, which I
will submit after breakfast

This makes me think of​:

  Grumble, grumble. What's with all this sleep and eat stuff? I never
  remember Jarkko having to sleep or eat.

(Yitzchak Scott-Thoennes in <20031031163808.GA708@​efn.org> and intended to
have the same backhanded compliment as his original)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 15, 2011

From @khwilliamson

On 03/15/2011 07​:53 AM, Nicholas Clark wrote​:

On Tue, Mar 15, 2011 at 07​:17​:51AM -0600, Karl Williamson wrote​:

I stayed up late last night, and now have a solution to this, which I
will submit after breakfast

This makes me think of​:

 Grumble\, grumble\.  What's with all this sleep and eat stuff?  I never
 remember Jarkko having to sleep or eat\.

(Yitzchak Scott-Thoennes in<20031031163808.GA708@​efn.org> and intended to
have the same backhanded compliment as his original)

Nicholas Clark

For me, at my age, staying up late means midnight. Anyway, I chatted
with Yves about this, and he got interrupted before we completely went
through it, and now I have to leave, so I am deferring the patch to
later in the day.

@p5pRT
Copy link
Author

p5pRT commented Mar 17, 2011

From @khwilliamson

The commit message explains this patch, which was delayed from my
earlier announced time because of a plumbing emergency.

I haven't tested the other modules that Andreas said might be broken
because of the earlier commit.

--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 17, 2011

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant