New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixing up- and down-graded strings in regex broken in 5.18.0 #13013
Comments
From Ilmari.Mannsaker@net-a-porter.comCreated by ilmari.mannsaker@net-a-porter.com$ perl -e 'utf8::upgrade(my $u = "\x{e5}"); utf8::downgrade(my $d = $ ../perl/Porting/bisect.pl -j6 --target=miniperl --start=v5.17.11 3573854 is the first bad commit Perl_re_op_compile(): handle utf8 concating better When concatting the list of arguments together to form a final pattern However this was not 100% reliable because, as an "XXX" code comment of The fix for this is to instead adjust the code block indices on the fly As well as fixing the bug, this also simplifies the main concat loop in :100644 100644 f29284632e54afb24df68ec2d0ebfacd8eac5497 Perl Info
|
From Ilmari.Mannsaker@net-a-porter.comMore specifically, the error occurs when a variable containing a This doesn't warn: $ perl -e 'utf8::upgrade(my $u = "\x{e5}"); utf8::downgrade(my $d = Nor does this: $ perl -e 'utf8::downgrade(my $d = "\x{e5}"); qr{\x{666} $d}' -- NET-A-PORTER.COM CONFIDENTIALITY NOTICE The Net-A-Porter Group Limited is a company registered in England & Wales Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF |
From @ilmari"D. Ilmari Mannsåker" <Ilmari.Mannsaker@net-a-porter.com> writes:
Here's a patch: |
From @ilmari0001-perl-118297-Fix-interpolating-downgraded-variables-i.patchFrom 793b6ee4c6e36858a997bd7b3286b0056b265655 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 4 Jun 2013 18:15:24 +0100
Subject: [PATCH] [perl #118297] Fix interpolating downgraded variables into
upgraded regexp
The code alredy upgraded the pattern if interpolating an upgraded
string into it, but not vice versa. Just use sv_catsv_nomg() instead
of sv_catpvn_nomg(), so that it can upgrade as necessary.
---
regcomp.c | 5 ++---
t/re/pat.t | 15 ++++++++++++++-
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 0c0f073..6bd7efd 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5117,16 +5117,15 @@ S_concat_pat(pTHX_ RExC_state_t * const pRExC_state,
* sv_catsv_nomg(pat, msv);
* that allows us to adjust code block indices if
* needed */
- STRLEN slen, dlen;
+ STRLEN dlen;
char *dst = SvPV_force_nomg(pat, dlen);
- const char *src = SvPV_flags_const(msv, slen, 0);
orig_patlen = dlen;
if (SvUTF8(msv) && !SvUTF8(pat)) {
S_pat_upgrade_to_utf8(aTHX_ pRExC_state, &dst, &dlen, n);
sv_setpvn(pat, dst, dlen);
SvUTF8_on(pat);
}
- sv_catpvn_nomg(pat, src, slen);
+ sv_catsv_nomg(pat, msv);
rx = msv;
}
else
diff --git a/t/re/pat.t b/t/re/pat.t
index 05bb650..bdfea87 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 467; # Update this when adding/deleting tests.
+plan tests => 470; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1349,6 +1349,19 @@ EOP
like("ab", qr/a( ?#foo)b/x);
}
+ { # 118297: Mixing up- and down-graded strings in regex
+ utf8::upgrade(my $u = "\x{e5}");
+ utf8::downgrade(my $d = "\x{e5}");
+ my $warned;
+ local $SIG{__WARN__} = sub { $warned++ if $_[0] =~ /\AMalformed UTF-8/ };
+ my $re = qr/$u$d/;
+ ok(!$warned, "no warnings when interpolating mixed up-/downgraded strings in pattern");
+ my $c = "\x{e5}\x{e5}";
+ utf8::downgrade($c);
+ like($c, $re, "mixed up-/downgraded pattern matches downgraded string");
+ utf8::upgrade($c);
+ like($c, $re, "mixed up-/downgraded pattern matches upgraded string");
+ }
} # End of sub run_tests
--
1.8.1.2
|
From @ilmari-- |
The RT System itself - Status changed from 'new' to 'open' |
From @cpansproutOn Tue Jun 04 14:04:54 2013, ilmari wrote:
Thank you. Applied as b837239. -- Father Chrysostomos |
@cpansprout - Status changed from 'open' to 'resolved' |
From @ilmari"Father Chrysostomos via RT" <perlbug-followup@perl.org> writes:
Thanks. Since this is a regression from 5.16, I reckon it's a candidate -- |
From @xdgOn Wed, Jun 5, 2013 at 4:03 AM, Dagfinn Ilmari Mannsåker
+1 -- |
From @iabynOn Wed, Jun 05, 2013 at 09:53:15AM -0400, David Golden wrote:
+1 (I created the original bug) -- |
From @xdgOn Wed, Jun 5, 2013 at 11:49 AM, Dave Mitchell <davem@iabyn.com> wrote:
In that case, would you mind doing the backports, then? Thank you, -- |
From @iabynOn Wed, Jun 05, 2013 at 12:37:01PM -0400, David Golden wrote:
Now cherry-picked in maint-5.18 as ae33553 -- |
Migrated from rt.perl.org#118297 (status was 'resolved')
Searchable as RT118297$
The text was updated successfully, but these errors were encountered: