New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Length-caching bug in utf8::decode #10873
Comments
From @ikegamiCreated by @ikegamiThere's a length caching bug in utf8::decode. ----- BEGIN TEST CODE ----- use Test::More tests => 8; { # Baseline. ----- BEGIN TEST OUTPUT ----- davem says it's still present in blead. Perl Info
|
From @ikegamiOn Fri, Dec 3, 2010 at 1:41 PM, Eric Brine <perlbug-followup@perl.org>wrote:
I don't know how to invalidate the cache, but it should be done after |
From @tonycozOn Fri, Dec 03, 2010 at 07:22:34PM -0500, Eric Brine wrote:
Looks like: MAGIC *mg = NULL; except that it add the magic if it wasn't already there. Maybe: MAGIC *mg = mg_find(sv, PERL_MAGIC_utf8); Looks like the pos() cache should also be updated/cleared. Tony |
The RT System itself - Status changed from 'new' to 'open' |
From @tonycozOn Mon, Dec 06, 2010 at 04:50:52PM +1100, Tony Cook wrote:
Actually, looking at how this is all implemented, the solution could SvSETMAGIC(sv); since Perl_magic_setutf8() clears the saved length and pos cache. Putting that in XS_utf8_decode() in universal.c would be safest in Tony |
From @LeontOn Mon, Jan 10, 2011 at 1:54 PM, Tony Cook <tony@develop-help.com> wrote:
Adding a sv_utf8_decode_flags() may mitigate that a little, though we Leon |
From @tonycozOn Mon, Jan 10, 2011 at 11:54:01PM +1100, Tony Cook wrote:
Unfortunately while adding SvSETMAGIC() to the XS fixes the saved Tony |
From @tonycoz0001-perl-80190-utf8-.-calls-weren-t-calling-set-magic-co.patchFrom 3949878860f4fa25898f4f5f5d2b0fb46fd4a074 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Sun, 23 Jan 2011 19:16:55 +1100
Subject: [PATCH] [perl #80190] utf8::...() calls weren't calling set magic correctly
---
MANIFEST | 1 +
t/uni/length.t | 29 +++++++++++++++++++++++++++++
universal.c | 4 ++++
3 files changed, 34 insertions(+), 0 deletions(-)
create mode 100644 t/uni/length.t
diff --git a/MANIFEST b/MANIFEST
index 05ec676..cbb24b6 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5031,6 +5031,7 @@ t/uni/class.t See if Unicode classes work (\p)
t/uni/fold.t See if Unicode folding works
t/uni/greek.t See if Unicode in greek works
t/uni/latin2.t See if Unicode in latin2 works
+t/uni/length.t Test length caching around various utf8::*()
t/uni/lex_utf8.t See if Unicode in lexer works
t/uni/lower.t See if Unicode casing works
t/uni/overload.t See if Unicode overloading works
diff --git a/t/uni/length.t b/t/uni/length.t
new file mode 100644
index 0000000..a3bab41
--- /dev/null
+++ b/t/uni/length.t
@@ -0,0 +1,29 @@
+BEGIN {
+ chdir 't' if -d 't';
+ @INC = qw(../lib .);
+ require "test.pl";
+}
+
+$|=1;
+use Devel::Peek;
+plan tests => 14;
+# from RT #80190
+{ # Baseline.
+ my $s = "\xE8\xAB\x86\x0A";
+ utf8::downgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A");
+ utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n");
+}
+{ # Check for length-caching bug.
+ my $s = "\xE8\xAB\x86\x0A";
+ utf8::upgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A");
+ utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n");
+}
+
+# some other cases
+{
+ my $s = "\x{8AC6}\x0A";
+ is(length($s), 2); is($s, "\x{8AC6}\x0A");
+ utf8::encode($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A");
+ utf8::upgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A");
+}
+
diff --git a/universal.c b/universal.c
index 07bbe96..f149f74 100644
--- a/universal.c
+++ b/universal.c
@@ -684,6 +684,7 @@ XS(XS_utf8_encode)
if (items != 1)
croak_xs_usage(cv, "sv");
sv_utf8_encode(ST(0));
+ SvSETMAGIC(ST(0));
XSRETURN_EMPTY;
}
@@ -696,6 +697,7 @@ XS(XS_utf8_decode)
else {
SV * const sv = ST(0);
const bool RETVAL = sv_utf8_decode(sv);
+ SvSETMAGIC(sv);
ST(0) = boolSV(RETVAL);
sv_2mortal(ST(0));
}
@@ -714,6 +716,7 @@ XS(XS_utf8_upgrade)
dXSTARG;
RETVAL = sv_utf8_upgrade(sv);
+ SvSETMAGIC(sv);
XSprePUSH; PUSHi((IV)RETVAL);
}
XSRETURN(1);
@@ -730,6 +733,7 @@ XS(XS_utf8_downgrade)
const bool failok = (items < 2) ? 0 : (int)SvIV(ST(1));
const bool RETVAL = sv_utf8_downgrade(sv, failok);
+ SvSETMAGIC(sv);
ST(0) = boolSV(RETVAL);
sv_2mortal(ST(0));
}
--
1.7.1
|
From @iabynOn Wed, Jan 26, 2011 at 08:50:12PM +1100, Tony Cook wrote:
I've now fixed it (and pos issues too) with this commit; commit 75da9d4 reset pos and utf8 cache when de/encoding utf8 str M lib/utf8.t -- |
@iabyn - Status changed from 'open' to 'resolved' |
From @cpansproutOn Sat Mar 19 12:45:44 2011, davem wrote:
The problem with that approach is that it still leaves us with And for pos() to survive a modification to the scalar makes utf8::decode I think a more correct solution is to preserve taint magic explicitly I will do that soon if nobody objects. -- Father Chrysostomos |
From @cpansproutOn Fri Sep 28 09:49:35 2012, sprout wrote:
Done with 77fc86e. -- Father Chrysostomos |
Migrated from rt.perl.org#80190 (status was 'resolved')
Searchable as RT80190$
The text was updated successfully, but these errors were encountered: