Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixing up- and down-graded strings in regex broken in 5.18.0 #13013

Closed
p5pRT opened this issue Jun 4, 2013 · 13 comments
Closed

Mixing up- and down-graded strings in regex broken in 5.18.0 #13013

p5pRT opened this issue Jun 4, 2013 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 4, 2013

Migrated from rt.perl.org#118297 (status was 'resolved')

Searchable as RT118297$

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

From Ilmari.Mannsaker@net-a-porter.com

Created by ilmari.mannsaker@net-a-porter.com

$ perl -e 'utf8​::upgrade(my $u = "\x{e5}"); utf8​::downgrade(my $d =
"\x{e5}"); qr{$u $d}'
Malformed UTF-8 character (1 byte, need 3, after start byte 0xe5) in
regexp compilation at -e line 1.
Malformed UTF-8 character (1 byte, need 3, after start byte 0xe5) in
regexp compilation at -e line 1.

$ ../perl/Porting/bisect.pl -j6 --target=miniperl --start=v5.17.11
--end=v5.18.0-RC1 -e '$u = "\x{666}"; $d = "\x{e5}"; $SIG{__WARN__} =
sub { die $_[0] }; qr{$u $d}'

3573854 is the first bad commit
commit 3573854
Author​: David Mitchell <davem@​iabyn.com>
Date​: Mon Apr 15 17​:18​:30 2013 +0100

  Perl_re_op_compile()​: handle utf8 concating better

  When concatting the list of arguments together to form a final pattern
  string, the code formerly did a quick scan of all the args first, and
  if any of them were SvUTF8, it set the (empty) destination string
to UTF8
  before concatting all the individual args. This avoided the pattern
  getting upgraded to utf8 halfway through, and thus the indices for code
  blocks becoming invalid.

  However this was not 100% reliable because, as an "XXX" code comment of
  mine pointed out, when overloading is involved it is possible for
an arg
  to appear initially not to be utf8, but to be utf8 when its value is
  finally accessed. This results an obscure bug (as shown in the test
added
  for this commit), where literal /(?{code})/ still required 'use re
  "eval"'.

  The fix for this is to instead adjust the code block indices on the fly
  if the pattern string happens to get upgraded to utf8. This is easy(er)
  now that we have the new S_pat_upgrade_to_utf8() function.

  As well as fixing the bug, this also simplifies the main concat loop in
  the code, which will make it easier to handle interpolating arrays
(e.g.
  /@​foo/) when we move the interpolation from the join op into the regex
  engine itself shortly.

:100644 100644 f29284632e54afb24df68ec2d0ebfacd8eac5497
f7f309b281a6683815efa0f6d06b5661ffa41b84 M regcomp.c
:040000 040000 27e6c237516a8f9cb3caf0745da433604ab15764
e627a5a459c0bc59d1e0cd8d8f4d837e306d983f M t
bisect run success
That took 321 seconds

Perl Info

Flags:
     category=core
     severity=medium

Site configuration information for perl 5.18.0:

Configured by ilmari at Mon May 20 10:43:21 BST 2013.

Summary of my perl5 (revision 5 version 18 subversion 0) configuration:

   Platform:
     osname=linux, osvers=3.2.0-41-generic, archname=x86_64-linux
     uname='linux zarquon 3.2.0-41-generic #66-ubuntu smp thu apr 25 
03:27:11 utc 2013 x86_64 x86_64 x86_64 gnulinux '
     config_args='-de 
-Dprefix=/home/ilmari/perl5/perlbrew/perls/perl-5.18.0 
-Aeval:scriptdir=/home/ilmari/perl5/perlbrew/perls/perl-5.18.0/bin'
     hint=recommended, useposix=true, d_sigaction=define
     useithreads=undef, usemultiplicity=undef
     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
     use64bitint=define, use64bitall=define, uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     optimize='-O2',
     cppflags='-fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include'
     ccversion='', gccversion='4.6.3', gccosandvers=''
     intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
     ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
     libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib 
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
     libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
     perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
     libc=, so=so, useshrplib=false, libperl=libperl.a
     gnulibc_version='2.15'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
     cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib 
-fstack-protector'

Locally applied patches:



@INC for perl 5.18.0:
 
/home/ilmari/perl5/perlbrew/perls/perl-5.18.0/lib/site_perl/5.18.0/x86_64-linux
     /home/ilmari/perl5/perlbrew/perls/perl-5.18.0/lib/site_perl/5.18.0
     /home/ilmari/perl5/perlbrew/perls/perl-5.18.0/lib/5.18.0/x86_64-linux
     /home/ilmari/perl5/perlbrew/perls/perl-5.18.0/lib/5.18.0
     .


Environment for perl 5.18.0:
     HOME=/home/ilmari
     LANG=en_GB.UTF-8
     LANGUAGE=en_GB:en
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
 
PATH=/home/ilmari/perl5/perlbrew/bin:/home/ilmari/perl5/perlbrew/perls/perl-5.18.0/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
     PERLBREW_BASHRC_VERSION=0.61
     PERLBREW_CPAN_MIRROR=http://cpanmirror.wtf.nap/
     PERLBREW_HOME=/home/ilmari/.perlbrew
     PERLBREW_MANPATH=/home/ilmari/perl5/perlbrew/perls/perl-5.18.0/man
 
PERLBREW_PATH=/home/ilmari/perl5/perlbrew/bin:/home/ilmari/perl5/perlbrew/perls/perl-5.18.0/bin
     PERLBREW_PERL=perl-5.18.0
     PERLBREW_ROOT=/home/ilmari/perl5/perlbrew
     PERLBREW_VERSION=0.61
     PERL_BADLANG (unset)
     PERL_CPANM_OPT=--mirror=http://cpanmirror.wtf.nap/ --mirror-only
     SHELL=/bin/bash

NET-A-PORTER.COM 
Irresistible fashion at your fingertips



CONFIDENTIALITY NOTICE
The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Net-A-Porter Group Limited. 

The Net-A-Porter Group Limited is a company registered in England & Wales Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

From Ilmari.Mannsaker@net-a-porter.com

More specifically, the error occurs when a variable containing a
downgraded non-ASCII string is interpolated in a regex after an upgraded
one.

This doesn't warn​:

  $ perl -e 'utf8​::upgrade(my $u = "\x{e5}"); utf8​::downgrade(my $d =
"\x{e5}"); qr{$d $u}'

Nor does this​:

  $ perl -e 'utf8​::downgrade(my $d = "\x{e5}"); qr{\x{666} $d}'

--
ilmari

NET-A-PORTER.COM
Irresistible fashion at your fingertips

CONFIDENTIALITY NOTICE
The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Net-A-Porter Group Limited.

The Net-A-Porter Group Limited is a company registered in England & Wales Number​: 3820604 Registered Office​: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

From @ilmari

"D. Ilmari Mannsåker" <Ilmari.Mannsaker@​net-a-porter.com> writes​:

More specifically, the error occurs when a variable containing a
downgraded non-ASCII string is interpolated in a regex after an upgraded
one.

Here's a patch​:

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

From @ilmari

0001-perl-118297-Fix-interpolating-downgraded-variables-i.patch
From 793b6ee4c6e36858a997bd7b3286b0056b265655 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 4 Jun 2013 18:15:24 +0100
Subject: [PATCH] [perl #118297] Fix interpolating downgraded variables into
 upgraded regexp

The code alredy upgraded the pattern if interpolating an upgraded
string into it, but not vice versa.  Just use sv_catsv_nomg() instead
of sv_catpvn_nomg(), so that it can upgrade as necessary.
---
 regcomp.c  |  5 ++---
 t/re/pat.t | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 0c0f073..6bd7efd 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5117,16 +5117,15 @@ S_concat_pat(pTHX_ RExC_state_t * const pRExC_state,
                  *     sv_catsv_nomg(pat, msv);
                  * that allows us to adjust code block indices if
                  * needed */
-                STRLEN slen, dlen;
+                STRLEN dlen;
                 char *dst = SvPV_force_nomg(pat, dlen);
-                const char *src = SvPV_flags_const(msv, slen, 0);
                 orig_patlen = dlen;
                 if (SvUTF8(msv) && !SvUTF8(pat)) {
                     S_pat_upgrade_to_utf8(aTHX_ pRExC_state, &dst, &dlen, n);
                     sv_setpvn(pat, dst, dlen);
                     SvUTF8_on(pat);
                 }
-                sv_catpvn_nomg(pat, src, slen);
+                sv_catsv_nomg(pat, msv);
                 rx = msv;
             }
             else
diff --git a/t/re/pat.t b/t/re/pat.t
index 05bb650..bdfea87 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan tests => 467;  # Update this when adding/deleting tests.
+plan tests => 470;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -1349,6 +1349,19 @@ EOP
         like("ab", qr/a( ?#foo)b/x);
     }
 
+    { # 118297: Mixing up- and down-graded strings in regex
+        utf8::upgrade(my $u = "\x{e5}");
+        utf8::downgrade(my $d = "\x{e5}");
+        my $warned;
+        local $SIG{__WARN__} = sub { $warned++ if $_[0] =~ /\AMalformed UTF-8/ };
+        my $re = qr/$u$d/;
+        ok(!$warned, "no warnings when interpolating mixed up-/downgraded strings in pattern");
+        my $c = "\x{e5}\x{e5}";
+        utf8::downgrade($c);
+        like($c, $re, "mixed up-/downgraded pattern matches downgraded string");
+        utf8::upgrade($c);
+        like($c, $re, "mixed up-/downgraded pattern matches upgraded string");
+    }
 
 } # End of sub run_tests
 
-- 
1.8.1.2

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

From @ilmari

--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen

@p5pRT
Copy link
Author

p5pRT commented Jun 4, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

From @cpansprout

On Tue Jun 04 14​:04​:54 2013, ilmari wrote​:

"D. Ilmari Manns�ker" <Ilmari.Mannsaker@​net-a-porter.com> writes​:

More specifically, the error occurs when a variable containing a
downgraded non-ASCII string is interpolated in a regex after an upgraded
one.

Here's a patch​:

Thank you. Applied as b837239.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jun 5, 2013
@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

From @ilmari

"Father Chrysostomos via RT" <perlbug-followup@​perl.org> writes​:

On Tue Jun 04 14​:04​:54 2013, ilmari wrote​:

"D. Ilmari Manns�ker" <Ilmari.Mannsaker@​net-a-porter.com> writes​:

More specifically, the error occurs when a variable containing a
downgraded non-ASCII string is interpolated in a regex after an upgraded
one.

Here's a patch​:

Thank you. Applied as b837239.

Thanks. Since this is a regression from 5.16, I reckon it's a candidate
for backporting to maint-5.18.

--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

From @xdg

On Wed, Jun 5, 2013 at 4​:03 AM, Dagfinn Ilmari Mannsåker
<ilmari@​ilmari.org> wrote​:

Thanks. Since this is a regression from 5.16, I reckon it's a candidate
for backporting to maint-5.18.

+1

--
David Golden <xdg@​xdg.me>
Take back your inbox! → http​://www.bunchmail.com/
Twitter/IRC​: @​xdg

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

From @iabyn

On Wed, Jun 05, 2013 at 09​:53​:15AM -0400, David Golden wrote​:

On Wed, Jun 5, 2013 at 4​:03 AM, Dagfinn Ilmari Mannsåker
<ilmari@​ilmari.org> wrote​:

Thanks. Since this is a regression from 5.16, I reckon it's a candidate
for backporting to maint-5.18.

+1

+1 (I created the original bug)

--
"Emacs isn't a bad OS once you get used to it.
It just lacks a decent editor."

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2013

From @xdg

On Wed, Jun 5, 2013 at 11​:49 AM, Dave Mitchell <davem@​iabyn.com> wrote​:

+1 (I created the original bug)

In that case, would you mind doing the backports, then?

Thank you,
David

--
David Golden <xdg@​xdg.me>
Take back your inbox! → http​://www.bunchmail.com/
Twitter/IRC​: @​xdg

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2013

From @iabyn

On Wed, Jun 05, 2013 at 12​:37​:01PM -0400, David Golden wrote​:

On Wed, Jun 5, 2013 at 11​:49 AM, Dave Mitchell <davem@​iabyn.com> wrote​:

+1 (I created the original bug)

In that case, would you mind doing the backports, then?

Now cherry-picked in maint-5.18 as ae33553

--
More than any other time in history, mankind faces a crossroads. One path
leads to despair and utter hopelessness. The other, to total extinction.
Let us pray we have the wisdom to choose correctly.
  -- Woody Allen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant