New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chop fails on decoded string with trailing nul #8021
Comments
From jonathan-hankins@mindspring.comThis is a bug report for perl from jonathan-hankins@mindspring.com, I ran into this, and wondered if it is a bug. I have tested on perl 5.8.4 with Encode.pm version 1.99_01 (from Basically, if I take a string with a trailing nul, encode it (to any Am I missing something? Here is sample output from my test code below: -- @asc2 (untouched) before chop @asc (en/de-coded) after chop @asc2 (untouched) after chop And here is my code: -- use strict; $Data::Dumper::Useqq = 1; my @asc = ("hello, world!\n", "goodbye, cruel world!\0"); my @utf = (encode('UTF-16LE', $asc[0]), @asc = (decode('UTF-16LE', $utf[0]), print "\n\n"; -- Jonathan Hankins Homewood City Schools jhankins@homewood.k12.al.us
Looks like a bug to me. At first glance, I'd describe it as a case where Here is another demonstration: #!/usr/bin/perl use Encode; $_ = "foo\0"; $_ = decode( 'ascii', "foo\0" ); __END__ For me (macosx 10.3.9/darwin 7.9/perl 5.8.1, and freebsd 5.4/perl 5.8.6), David Graff Linguistic Data Consortium --=-=-= Flags: Site configuration information for perl v5.8.4: Configured by Debian Project at Tue Mar 8 20:31:23 EST 2005. Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Locally applied patches: @INC for perl v5.8.4: Environment for perl v5.8.4: |
From BQW10602@nifty.com
Thanks for the report. utf8_to_uvchr((U8*)s, 0) used in do_chop() returns 0, P.S. by the way, when the string in utf8 ends with malformed SADAHIRO Tomoyuki Inline Patchdiff -ur perl~/doop.c perl/doop.c
--- perl~/doop.c Mon Jul 11 04:49:52 2005
+++ perl/doop.c Sat Jul 16 21:53:44 2005
@@ -977,7 +977,7 @@
s = send - 1;
while (s > start && UTF8_IS_CONTINUATION(*s))
s--;
- if (utf8_to_uvchr((U8*)s, 0)) {
+ if (is_utf8_string((U8*)s, send - s)) {
sv_setpvn(astr, s, send - s);
*s = '\0';
SvCUR_set(sv, s - start);
diff -ur perl~/t/op/chop.t perl/t/op/chop.t
--- perl~/t/op/chop.t Fri Jan 23 23:19:45 2004
+++ perl/t/op/chop.t Sat Jul 16 20:59:16 2005
@@ -6,7 +6,7 @@
require './test.pl';
}
-plan tests => 133;
+plan tests => 137;
$_ = 'abc';
$c = do foo();
@@ -221,4 +221,14 @@
$a = "A$/";
$b = chomp $a;
is ($b, 2);
+}
+
+{
+ # [perl #36569] chop fails on decoded string with trailing nul
+ my $asc = "perl\0";
+ my $utf = "perl".pack('U',0); # marked as utf8
+ is(chop($asc), "\0", "chopping ascii NUL");
+ is(chop($utf), "\0", "chopping utf8 NUL");
+ is($asc, "perl", "chopped ascii NUL");
+ is($utf, "perl", "chopped utf8 NUL");
}
END OF PATCH |
The RT System itself - Status changed from 'new' to 'open' |
From @TuxOn Sat, 16 Jul 2005 22:05:13 +0900, SADAHIRO Tomoyuki <bqw10602@nifty.com>
Thanks for the fast patch. Applied as change #25158
Seems reasonable, though just cutting off one byte of the string would maybe
-- |
From @ysthOn Sat, Jul 16, 2005 at 03:48:10PM +0200, H.Merijn Brand wrote:
Was there more to that sentence? I'd vote for removing and returning a malformed char, from the last That way, the data error is propagated onto the return value (as IMO fill buffer with bytes |
From @TuxOn Mon, 18 Jul 2005 20:33:28 -0700, Yitzchak Scott-Thoennes
No, I stopped after maybe. Because the more I thought about it, the less
-- |
p5p@spam.wizbit.be - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#36569 (status was 'resolved')
Searchable as RT36569$
The text was updated successfully, but these errors were encountered: