Skip Menu |
Report information
Id: 131760
Status: pending release
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: steve.grazzini [at] grantstreet.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)

Attachments
0007-PATCH-perl-131670-Document-Wide-char-msg-better.patch



From: Steve Grazzini <steve.grazzini [...] grantstreet.com>
Date: Mon, 17 Jul 2017 17:18:36 +0000
Subject: "Wide character" errors are undocumented
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 1.1k
perldiag has an entry for the "Wide character in %s" warning from doio.c:

   Wide character in %s
           (S utf8) Perl met a wide character (>255) when it wasn't expecting one.  This 
           warning is by default on for I/O (like print).  The easiest way to quiet this warning
           is simply to add the ":utf8" layer to the output, e.g. "binmode STDOUT, ':utf8'".
           Another way to turn off the warning is to add "no warnings 'utf8';" but that is often
           closer to cheating.  In general, you are supposed to explicitly mark the filehandle
           with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors elsewhere:

   pp_sys.c:2028:  Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op));
   sv.c:3734:          Perl_croak(aTHX_ "Wide character in %s",
   sv.c:3737:          Perl_croak(aTHX_ "Wide character");
   sv.c:8519:          Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty confusing to users who
get a fatal error and try to interpret it with the description of the unrelated warning.

Thanks!
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.1k
On Mon, 17 Jul 2017 17:29:28 GMT, steve.grazzini@grantstreet.com wrote: Show quoted text
> perldiag has an entry for the "Wide character in %s" warning from doio.c: > > Wide character in %s > (S utf8) Perl met a wide character (>255) when it wasn't > expecting one. This > warning is by default on for I/O (like print). The easiest way > to quiet this warning > is simply to add the ":utf8" layer to the output, e.g. "binmode > STDOUT, ':utf8'". > Another way to turn off the warning is to add "no warnings > 'utf8';" but that is often > closer to cheating. In general, you are supposed to explicitly > mark the filehandle > with an encoding, see open and "binmode" in perlfunc. > > But it doesn't describe the several fatal "Wide character" errors elsewhere: > > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op)); > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); > > Those should probably be documented, too. As it is, it's pretty confusing > to users who > get a fatal error and try to interpret it with the description of the > unrelated warning. >
Thanks for your report. Not only are these warnings under-documented, they're under-tested as well. I created a branch in which I went to the places you cited in the source code and changed the first letter of the warning from 'W' to 'A..D' respectively. I then ran the the test suite. Tests where one of the 4 variant warnings and where we were attempting to match an error message would then fail. This source code point was extensively exercised in the test suite: ##### sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", ##### I could not find locations at which the following source code points were exercised in the test suite: ##### pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op)); sv.c:3737: Perl_croak(aTHX_ "Wide character"); sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); ##### List: suggestions? Thank you very much. -- James E Keenan (jkeenan@cpan.org)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Mon, 17 Jul 2017 22:56:22 GMT, jkeenan wrote: Show quoted text
> On Mon, 17 Jul 2017 17:29:28 GMT, steve.grazzini@grantstreet.com > wrote:
> > perldiag has an entry for the "Wide character in %s" warning from > > doio.c: > > > > Wide character in %s > > (S utf8) Perl met a wide character (>255) when it wasn't > > expecting one. This > > warning is by default on for I/O (like print). The easiest > > way > > to quiet this warning > > is simply to add the ":utf8" layer to the output, e.g. > > "binmode > > STDOUT, ':utf8'". > > Another way to turn off the warning is to add "no warnings > > 'utf8';" but that is often > > closer to cheating. In general, you are supposed to > > explicitly > > mark the filehandle > > with an encoding, see open and "binmode" in perlfunc. > > > > But it doesn't describe the several fatal "Wide character" errors > > elsewhere: > > > > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > > OP_DESC(PL_op)); > > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); > > > > Those should probably be documented, too. As it is, it's pretty > > confusing > > to users who > > get a fatal error and try to interpret it with the description of the > > unrelated warning. > >
> > Thanks for your report. Not only are these warnings under-documented, > they're under-tested as well. > > I created a branch in which I went to the places you cited in the > source code and changed the first letter of the warning from 'W' to > 'A..D' respectively. I then ran the the test suite.
That's the jkeenan/131760-wide-character branch, in case anyone else wants to play around with this. Show quoted text
> Tests where one > of the 4 variant warnings and where we were attempting to match an > error message would then fail. > > This source code point was extensively exercised in the test suite: > > ##### > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > ##### > > I could not find locations at which the following source code points > were exercised in the test suite: > > ##### > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > OP_DESC(PL_op)); > > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); > ##### > > List: suggestions? > > Thank you very much.
-- James E Keenan (jkeenan@cpan.org)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.7k
On Mon, 17 Jul 2017 22:56:22 GMT, jkeenan wrote: Show quoted text
> On Mon, 17 Jul 2017 17:29:28 GMT, steve.grazzini@grantstreet.com > wrote:
> > perldiag has an entry for the "Wide character in %s" warning from > > doio.c: > > > > Wide character in %s > > (S utf8) Perl met a wide character (>255) when it wasn't > > expecting one. This > > warning is by default on for I/O (like print). The easiest > > way > > to quiet this warning > > is simply to add the ":utf8" layer to the output, e.g. > > "binmode > > STDOUT, ':utf8'". > > Another way to turn off the warning is to add "no warnings > > 'utf8';" but that is often > > closer to cheating. In general, you are supposed to > > explicitly > > mark the filehandle > > with an encoding, see open and "binmode" in perlfunc. > > > > But it doesn't describe the several fatal "Wide character" errors > > elsewhere: > > > > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > > OP_DESC(PL_op)); > > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); > > > > Those should probably be documented, too. As it is, it's pretty > > confusing > > to users who > > get a fatal error and try to interpret it with the description of the > > unrelated warning. > >
> > Thanks for your report. Not only are these warnings under-documented, > they're under-tested as well. > > I created a branch in which I went to the places you cited in the > source code and changed the first letter of the warning from 'W' to > 'A..D' respectively. I then ran the the test suite. Tests where one > of the 4 variant warnings and where we were attempting to match an > error message would then fail. > > This source code point was extensively exercised in the test suite: > > ##### > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > ##### > > I could not find locations at which the following source code points > were exercised in the test suite: > > ##### > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > OP_DESC(PL_op)); >
Appears to have been added in this commit: c9cb0f41 (Nicholas Clark 2006-04-29 23:33:36 +0000 2028) Perl_croak(aTHX_ "Wide character in %s", OP_DES Show quoted text
> sv.c:3737: Perl_croak(aTHX_ "Wide character"); >
Appears to have been added in this commit: fa301091a (Jarkko Hietaniemi 2000-11-30 20:41:39 +0000 3737) Perl_croak(aTHX_ "Wide character"); Show quoted text
> sv.c:8519: Perl_croak(aTHX_ "Wide character in $/");
Appears to have been added in this commit: 4b3603a49 (Jarkko Hietaniemi 2000-10-17 14:11:31 +0000 2460) Perl_croak(aTHX_ "Wide character"); -- James E Keenan (jkeenan@cpan.org)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 3.6k
On Mon, 17 Jul 2017 16:28:59 -0700, jkeenan wrote: Show quoted text
> On Mon, 17 Jul 2017 22:56:22 GMT, jkeenan wrote:
> > On Mon, 17 Jul 2017 17:29:28 GMT, steve.grazzini@grantstreet.com > > wrote:
> > > perldiag has an entry for the "Wide character in %s" warning from > > > doio.c: > > > > > > Wide character in %s > > > (S utf8) Perl met a wide character (>255) when it wasn't > > > expecting one. This > > > warning is by default on for I/O (like print). The easiest > > > way > > > to quiet this warning > > > is simply to add the ":utf8" layer to the output, e.g. > > > "binmode > > > STDOUT, ':utf8'". > > > Another way to turn off the warning is to add "no warnings > > > 'utf8';" but that is often > > > closer to cheating. In general, you are supposed to > > > explicitly > > > mark the filehandle > > > with an encoding, see open and "binmode" in perlfunc. > > > > > > But it doesn't describe the several fatal "Wide character" errors > > > elsewhere: > > > > > > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > > > OP_DESC(PL_op)); > > > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > > > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > > > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/"); > > > > > > Those should probably be documented, too. As it is, it's pretty > > > confusing > > > to users who > > > get a fatal error and try to interpret it with the description of > > > the > > > unrelated warning. > > >
> > > > Thanks for your report. Not only are these warnings under- > > documented, > > they're under-tested as well. > > > > I created a branch in which I went to the places you cited in the > > source code and changed the first letter of the warning from 'W' to > > 'A..D' respectively. I then ran the the test suite. Tests where one > > of the 4 variant warnings and where we were attempting to match an > > error message would then fail. > > > > This source code point was extensively exercised in the test suite: > > > > ##### > > sv.c:3734: Perl_croak(aTHX_ "Wide character in %s", > > ##### > > > > I could not find locations at which the following source code points > > were exercised in the test suite: > > > > ##### > > pp_sys.c:2028: Perl_croak(aTHX_ "Wide character in %s", > > OP_DESC(PL_op)); > >
> > Appears to have been added in this commit: > > c9cb0f41 (Nicholas Clark 2006-04-29 23:33:36 +0000 2028) > Perl_croak(aTHX_ "Wide character in %s", OP_DES >
> > sv.c:3737: Perl_croak(aTHX_ "Wide character"); > >
> > Appears to have been added in this commit: > > fa301091a (Jarkko Hietaniemi 2000-11-30 20:41:39 +0000 3737) > Perl_croak(aTHX_ "Wide character"); > >
> > sv.c:8519: Perl_croak(aTHX_ "Wide character in $/");
> > Appears to have been added in this commit: > > 4b3603a49 (Jarkko Hietaniemi 2000-10-17 14:11:31 +0000 2460) > Perl_croak(aTHX_ "Wide character");
I looked at the code here. The one at line 8519 happens is the user doesn't specify the :utf8 layer, and sets the slurp $/ to include a wide character. So the advice given in the diagnostic applies; and we just don't have tests for it. I suspect that the other ones are valid, but again don't have tests, and that they formerly were exercised, but things have changed to avoid them. I know, for example, that we no longer accept wide characters in the bitwise operations like &, and so the calls that might have led to these lines of code getting hit are intercepted. In any event, attached is a generalization to the wording in perldiag that attempts to handle all the cases If I don't hear objections by April 19, I will apply it (so that it gets into 5.28) -- Karl Williamson
Subject: 0007-PATCH-perl-131670-Document-Wide-char-msg-better.patch
From 158bc1407a8f465dd0e1c2414b446ceb3c90f9b7 Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@cpan.org> Date: Tue, 3 Apr 2018 11:30:16 -0600 Subject: [PATCH 7/7] PATCH: [perl #131670] Document Wide char msg better --- pod/perldiag.pod | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 860b049368..fc7d4e2f81 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -7606,14 +7606,21 @@ under L<perlsyn/Experimental Details on given and when>. =item Wide character in %s -(S utf8) Perl met a wide character (>255) when it wasn't expecting -one. This warning is by default on for I/O (like print). The easiest -way to quiet this warning is simply to add the C<:utf8> layer to the -output, e.g. C<binmode STDOUT, ':utf8'>. Another way to turn off the -warning is to add C<no warnings 'utf8';> but that is often closer to +(S utf8) Perl met a wide character (ordinal >255) when it wasn't +expecting one. This warning is by default on for I/O (like print). + +If this warning does come from I/O, the easiest +way to quiet it is simply to add the C<:utf8> layer, I<e.g.>, +S<C<binmode STDOUT, ':utf8'>>. Another way to turn off the warning is +to add S<C<no warnings 'utf8';>> but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see L<open> and L<perlfunc/binmode>. +If the warning comes from other than I/O, this diagnostic probably +indicates that incorrect results are being obtained. You should examine +your code to determine how a wide character is getting to an operation +that doesn't handle them. + =item Wide character (U+%X) in %s (W locale) While in a single-byte locale (I<i.e.>, a non-UTF-8 -- 2.11.0
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 140b
Not having heard anything to the contrary by the deadline, I pushed my patch as 479b791bf828f8d105b334fe04ff82a4adfedcd7 -- Karl Williamson


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org