New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Apparent utf8 bug in join() in 5.8.[012] #7023
Comments
From @obraYeah, I'm still not quite sure I believe it myself, but IO::Scalar From: Nicholas Adrian Vinen <hb@pandora.x256.com> Hello, When you attach a file to a ticket using RT it saves the file you The problem occurs inside join(). join() recycles string objects The solution is to apply the following patch to perl (tested with Inline Patchdiff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
--- perl-5.8.2/doop.c 2003-09-30 10:09:51.000000000 -0700
+++ perl-5.8.2-patched/doop.c 2004-01-05 23:23:13.000000000 -0800
@@ -647,6 +647,9 @@
register STRLEN len;
STRLEN delimlen;
STRLEN tmplen;
+ int utf8;
+
+ utf8 = (SvUTF8(del)!=0);
(void) SvPV(del, delimlen); /* stringify and get the delimlen */
/* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
@@ -674,22 +677,37 @@
SvTAINTED_off(sv);
if (items-- > 0) {
- if (*mark)
+ if (*mark) {
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv, *mark);
+ }
mark++;
}
if (delimlen) {
for (; items > 0; items--,mark++) {
sv_catsv(sv,del);
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv,*mark);
}
}
else {
- for (; items > 0; items--,mark++)
+ for (; items > 0; items--,mark++) {
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv,*mark);
+ }
}
SvSETMAGIC(sv);
+ if( utf8 )
+ {
+ if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
+ {
+ Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII strings");
+ }
+ SvUTF8_on(sv);
+ } else {
+ SvUTF8_off(sv);
+ }
}
void
There may be other perl functions with similar problems; this is
beyond the scope of my job, however I hope that the maintainers of
I hope this helps. Nicholas On Tue, Jan 06, 2004 at 01:46:22PM -0500, Jesse Vincent wrote:
I wish I knew! I had a really hard time reading the perl code. Here PP(pp_join) PP() looks like this: #define PP OP * Perl_##s(pTHX) pTHX looks like this: #define pTHX register struct perl_thread *thr PERL_UNUSED_DECL TARG is the string which is being joined into. dTARGET is a macro #define dTARGET SV * GETTARGET GETTARGET looks like this: #define GETTARGET targ = PAD_SV(PL_op->op_targ) PAD_SV looks like this: #define PAD_SV(po) (PL_curpad[po]) PL_curpad looks like this: #define PL_curpad (*Perl_Tcurpad_ptr(aTHX)) aTHX is defined like this: #define aTHX thr I think you can see why I wasn't very specific :( it's a mess... The simple test case is this: install RT3 on a server with a single Nicholas P.S. I tried to cross-post the email I sent onto the rt-devel and -- |
From nick.ing-simmons@elixent.comJesse Vincent <perl5-porters@perl.org> writes:
IO::Scalar is (or should be) largely redundant in perl5.8.* open(my $fh,"+<",\$scalar); |
The RT System itself - Status changed from 'new' to 'open' |
From @timbunceOn Fri, Jan 09, 2004 at 02:22:03PM +0000, Nick Ing-Simmons wrote:
IO::Scalar maybe largely redundant in perl5.8.*, but join() isn't. Tim. |
From @eserteJesse Vincent (via RT) <perlbug-followup@perl.org> writes:
Here's a test case: use strict; Regards, -- tksm - Perl/Tk program for searching and replacing in multiple files |
From BQW10602@nifty.com
This is parhaps due to SvPOK_only_UTF8() in sv_setpv() I disagree warning when UTF8 and ASCII are mixed. ### \A patch against perl-5.8.3 RC1 Inline Patchdiff -urN perl~/doop.c perl/doop.c
--- perl~/doop.c Fri Dec 19 05:47:58 2003
+++ perl/doop.c Mon Jan 12 10:08:10 2004
@@ -668,6 +668,10 @@
}
sv_setpv(sv, "");
+ /* sv_setpv retains old UTF8ness [perl #24846] */
+ if (SvUTF8(sv))
+ SvUTF8_off(sv);
+
if (PL_tainting && SvMAGICAL(sv))
SvTAINTED_off(sv);
diff -urN perl~/t/op/join.t perl/t/op/join.t
--- perl~/t/op/join.t Sat Dec 30 16:16:18 2000
+++ perl/t/op/join.t Mon Jan 12 10:34:22 2004
@@ -1,6 +1,6 @@
#!./perl
-print "1..14\n";
+print "1..18\n";
@x = (1, 2, 3);
if (join(':',@x) eq '1:2:3') {print "ok 1\n";} else {print "not ok 1\n";}
@@ -65,3 +65,29 @@
print "ok 14\n";
}
+{ # [perl #24846] $jb2 should be in bytes, not in utf8.
+ my $b = "abc\304";
+ my $u = "abc\x{0100}";
+
+ sub join_into_my_variable {
+ my $r = join("", @_);
+ return $r;
+ }
+
+ my $jb1 = join_into_my_variable("", $b);
+ my $ju1 = join_into_my_variable("", $u);
+ my $jb2 = join_into_my_variable("", $b);
+ my $ju2 = join_into_my_variable("", $u);
+
+ print "not " unless unpack('H*', $jb1) eq unpack('H*', $b);
+ print "ok 15\n";
+
+ print "not " unless unpack('H*', $ju1) eq unpack('H*', $u);
+ print "ok 16\n";
+
+ print "not " unless unpack('H*', $jb2) eq unpack('H*', $b);
+ print "ok 17\n";
+
+ print "not " unless unpack('H*', $ju2) eq unpack('H*', $u);
+ print "ok 18\n";
+}
Regards |
From @rgsSADAHIRO Tomoyuki wrote:
So do I.
Thanks, applied to bleadperl as #22117. |
@rgs - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#24846 (status was 'resolved')
Searchable as RT24846$
The text was updated successfully, but these errors were encountered: