Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn() misunderstands UTF8-encoded scalar #9033

Closed
p5pRT opened this issue Sep 19, 2007 · 9 comments
Closed

warn() misunderstands UTF8-encoded scalar #9033

p5pRT opened this issue Sep 19, 2007 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 19, 2007

Migrated from rt.perl.org#45549 (status was 'resolved')

Searchable as RT45549$

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2007

From zefram@fysh.org

Created by zefram@fysh.org

$ perl -we '$a="\xee\n"; print STDERR $a; warn $a; utf8​::upgrade($a); print STDERR $a; warn $a'

I won't attempt to reproduce the output directly here, because it
involves non-ASCII characters, and it is the fate of such characters
that they always get mangled on the net. However, here's what you get
if the output (on stderr) is piped through `od -tx1`​:

0000000 ee 0a ee 0a ee 0a c3 ae 0a
0000011

So​: print() always outputs the character/byte values that are *logically*
in the string (that is, 0xee and 0x0a), but warn() always outputs the
byte values that are *physically* used to represent the string internally.
I believe warn()'s behaviour is a bug​: the visible behaviour of a string
should not depend on its internal representation.

Emitting the byte values 0xc3 0xae 0x0a for $a is, of course, appropriate
in some circumstances​: namely, when the output stream involves an encoding
layer to represent characters as bytes using the UTF-8 encoding. But in
this case I have not set up such a layer, and if I did then I should get
the UTF-8 output from all four output commands. The results should be
consistent either way.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.8:

Configured by Debian Project at Wed Dec  6 23:17:41 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.18.3, archname=i486-linux-gnu-thread-multi
    uname='linux saens 2.6.18.3 #1 smp sat nov 25 13:39:52 est 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 20061115 (prerelease) (Debian 4.1.1-20)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.8:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    .


Environment for perl v5.8.8:
    HOME=/home/zefram
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=en_GB
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/zefram/pub/i686-pc-linux-gnu/bin:/home/zefram/pub/common/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2010

From @cpansprout

The first patch changes warn.t to use test.pl, so I can use fresh_perl_like.

The second patch fixes warn to work with utf8 scalars and :utf8 handles. Since do_print already takes care of utf8, I’ve changed write_to_stderr to call it.

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2010

From @cpansprout

Inline Patch
diff -Nurp blead/t/op/warn.t blead-45549-warn/t/op/warn.t
--- blead/t/op/warn.t	2010-05-04 06:38:11.000000000 -0700
+++ blead-45549-warn/t/op/warn.t	2010-08-08 13:31:33.000000000 -0700
@@ -1,12 +1,14 @@
 #!./perl
 #line 3 warn.t
 
-print "1..18\n";
-my $test_num = 0;
-sub ok {
-    print $_[0] ? "" : "not ", "ok ", ++$test_num, "\n";
+BEGIN {
+    chdir 't' if -d 't';
+    @INC = '../lib';
+    require './test.pl';
 }
 
+plan 18;
+
 my @warnings;
 my $wa = []; my $ea = [];
 $SIG{__WARN__} = sub { push @warnings, $_[0] };
@@ -24,7 +26,7 @@ ok @warnings==1 && $warnings[0] eq "foob
 @warnings = ();
 $@ = "";
 warn "foo";
-ok @warnings==1 && $warnings[0] eq "foo at warn.t line 26.\n";
+ok @warnings==1 && $warnings[0] eq "foo at warn.t line 28.\n";
 
 @warnings = ();
 $@ = "";
@@ -35,13 +37,13 @@ ok @warnings==1 && ref($warnings[0]) eq 
 $@ = "";
 warn "";
 ok @warnings==1 &&
-    $warnings[0] eq "Warning: something's wrong at warn.t line 36.\n";
+    $warnings[0] eq "Warning: something's wrong at warn.t line 38.\n";
 
 @warnings = ();
 $@ = "";
 warn;
 ok @warnings==1 &&
-    $warnings[0] eq "Warning: something's wrong at warn.t line 42.\n";
+    $warnings[0] eq "Warning: something's wrong at warn.t line 44.\n";
 
 @warnings = ();
 $@ = "ERR\n";
@@ -56,7 +58,7 @@ ok @warnings==1 && $warnings[0] eq "foob
 @warnings = ();
 $@ = "ERR\n";
 warn "foo";
-ok @warnings==1 && $warnings[0] eq "foo at warn.t line 58.\n";
+ok @warnings==1 && $warnings[0] eq "foo at warn.t line 60.\n";
 
 @warnings = ();
 $@ = "ERR\n";
@@ -67,13 +69,13 @@ ok @warnings==1 && ref($warnings[0]) eq 
 $@ = "ERR\n";
 warn "";
 ok @warnings==1 &&
-    $warnings[0] eq "ERR\n\t...caught at warn.t line 68.\n";
+    $warnings[0] eq "ERR\n\t...caught at warn.t line 70.\n";
 
 @warnings = ();
 $@ = "ERR\n";
 warn;
 ok @warnings==1 &&
-    $warnings[0] eq "ERR\n\t...caught at warn.t line 74.\n";
+    $warnings[0] eq "ERR\n\t...caught at warn.t line 76.\n";
 
 @warnings = ();
 $@ = $ea;
@@ -88,7 +90,7 @@ ok @warnings==1 && $warnings[0] eq "foob
 @warnings = ();
 $@ = $ea;
 warn "foo";
-ok @warnings==1 && $warnings[0] eq "foo at warn.t line 90.\n";
+ok @warnings==1 && $warnings[0] eq "foo at warn.t line 92.\n";
 
 @warnings = ();
 $@ = $ea;

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2010

From @cpansprout

Inline Patch
diff -Nup blead-45549-warn/util.c blead-45549-warn2/util.c
--- blead-45549-warn/util.c	2010-07-27 00:51:15.000000000 -0700
+++ blead-45549-warn2/util.c	2010-08-08 14:14:33.000000000 -0700
@@ -1399,10 +1399,8 @@ Perl_write_to_stderr(pTHX_ SV* msv)
 	dSAVED_ERRNO;
 #endif
 	PerlIO * const serr = Perl_error_log;
-	STRLEN msglen;
-	const char* message = SvPVx_const(msv, msglen);
 
-	PERL_WRITE_MSG_TO_CONSOLE(serr, message, msglen);
+	do_print(msv, serr);
 	(void)PerlIO_flush(serr);
 #ifdef USE_SFIO
 	RESTORE_ERRNO;
diff -Nurp blead-45549-warn/t/op/warn.t blead-45549-warn2/t/op/warn.t
--- blead-45549-warn/t/op/warn.t	2010-08-08 13:31:33.000000000 -0700
+++ blead-45549-warn2/t/op/warn.t	2010-08-08 14:26:47.000000000 -0700
@@ -7,7 +7,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan 18;
+plan 20;
 
 my @warnings;
 my $wa = []; my $ea = [];
@@ -107,4 +107,28 @@ $@ = $ea;
 warn;
 ok @warnings==1 && ref($warnings[0]) eq "ARRAY" && $warnings[0] == $ea;
 
+fresh_perl_like(
+ '
+   $a = "\xee\n";
+   print STDERR $a; warn $a;
+   utf8::upgrade($a);
+   print STDERR $a; warn $a;
+ ',
+  qr/^\xee(?:\r?\n\xee){3}/,
+  {},
+ 'warn emits logical characters, not internal bytes [perl #45549]'  
+);
+
+fresh_perl_like(
+ '
+   $a = "\xee\n";
+   print STDERR $a; warn $a;
+   utf8::upgrade($a);
+   print STDERR $a; warn $a;
+ ',
+  qr/^\xc3\xae(?:\r?\n\xc3\xae){3}/,
+  { switches => ['-CE'] },
+ 'warn respects :utf8 layer'
+);
+
 1;

@p5pRT
Copy link
Author

p5pRT commented Aug 8, 2010

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 9, 2010

From @cpansprout

On Aug 8, 2010, at 2​:31 PM, Father Chrysostomos wrote​:

The first patch changes warn.t to use test.pl, so I can use fresh_perl_like.

The second patch fixes warn to work with utf8 scalars and :utf8 handles. Since do_print already takes care of utf8, I’ve changed write_to_stderr to call it.

That makes ‘warn chr 300’ produce a ‘Wide character in print’ warning on a byte-sized handle, which is a good thing, except that it should say ‘warn’. Patch attached.

@p5pRT
Copy link
Author

p5pRT commented Aug 9, 2010

From @cpansprout

Inline Patch
diff -Nup blead-45549-warn2/doio.c blead-45549-warn3/doio.c
--- blead-45549-warn2/doio.c	2010-07-03 08:47:11.000000000 -0700
+++ blead-45549-warn3/doio.c	2010-08-08 18:25:47.000000000 -0700
@@ -1241,7 +1241,7 @@ Perl_do_print(pTHX_ register SV *sv, Per
 	    else {
 		assert((char *)result == tmps);
 		Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8),
-				 "Wide character in print");
+				 "Wide character in %s", OP_DESC(PL_op));
 	    }
 	}
 	/* To detect whether the process is about to overstep its
diff -Nurp blead-45549-warn2/t/op/warn.t blead-45549-warn3/t/op/warn.t
--- blead-45549-warn2/t/op/warn.t	2010-08-08 14:26:47.000000000 -0700
+++ blead-45549-warn3/t/op/warn.t	2010-08-08 18:24:35.000000000 -0700
@@ -7,7 +7,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan 20;
+plan 21;
 
 my @warnings;
 my $wa = []; my $ea = [];
@@ -131,4 +131,11 @@ fresh_perl_like(
  'warn respects :utf8 layer'
 );
 
+fresh_perl_like(
+ 'warn chr 300',
+  qr/^Wide character in warn .*\n\xc4\xac at /,
+  {},
+ 'Wide character in warn (not print)'
+);
+
 1;

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

From @rgarcia

On 9 August 2010 03​:30, Father Chrysostomos <sprout@​cpan.org> wrote​:

On Aug 8, 2010, at 2​:31 PM, Father Chrysostomos wrote​:

The first patch changes warn.t to use test.pl, so I can use fresh_perl_like.

The second patch fixes warn to work with utf8 scalars and :utf8 handles. Since do_print already takes care of utf8, I’ve changed write_to_stderr to call it.

That makes ‘warn chr 300’ produce a ‘Wide character in print’ warning on a byte-sized handle, which is a good thing, except that it should say ‘warn’. Patch attached.

Thanks, all 3 patches applied to bleadperl.

@p5pRT
Copy link
Author

p5pRT commented Sep 16, 2010

@rgs - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant