Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Wide character" errors are undocumented #16081

Closed
p5pRT opened this issue Jul 17, 2017 · 11 comments
Closed

"Wide character" errors are undocumented #16081

p5pRT opened this issue Jul 17, 2017 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 17, 2017

Migrated from rt.perl.org#131760 (status was 'resolved')

Searchable as RT131760$

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2017

From steve.grazzini@grantstreet.com

perldiag has an entry for the "Wide character in %s" warning from doio.c​:

  Wide character in %s
  (S utf8) Perl met a wide character (>255) when it wasn't
expecting one. This
  warning is by default on for I/O (like print). The easiest way
to quiet this warning
  is simply to add the "​:utf8" layer to the output, e.g. "binmode
STDOUT, '​:utf8'".
  Another way to turn off the warning is to add "no warnings
'utf8';" but that is often
  closer to cheating. In general, you are supposed to explicitly
mark the filehandle
  with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors elsewhere​:

  pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op));
  sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
  sv.c​:3737​: Perl_croak(aTHX_ "Wide character");
  sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty confusing
to users who
get a fatal error and try to interpret it with the description of the
unrelated warning.

Thanks!

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2017

From @jkeenan

On Mon, 17 Jul 2017 17​:29​:28 GMT, steve.grazzini@​grantstreet.com wrote​:

perldiag has an entry for the "Wide character in %s" warning from doio.c​:

Wide character in %s
(S utf8) Perl met a wide character (>255) when it wasn't
expecting one. This
warning is by default on for I/O (like print). The easiest way
to quiet this warning
is simply to add the "​:utf8" layer to the output, e.g. "binmode
STDOUT, '​:utf8'".
Another way to turn off the warning is to add "no warnings
'utf8';" but that is often
closer to cheating. In general, you are supposed to explicitly
mark the filehandle
with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors elsewhere​:

pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op));
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
sv.c​:3737​: Perl_croak(aTHX_ "Wide character");
sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty confusing
to users who
get a fatal error and try to interpret it with the description of the
unrelated warning.

Thanks for your report. Not only are these warnings under-documented, they're under-tested as well.

I created a branch in which I went to the places you cited in the source code and changed the first letter of the warning from 'W' to 'A..D' respectively. I then ran the the test suite. Tests where one of the 4 variant warnings and where we were attempting to match an error message would then fail.

This source code point was extensively exercised in the test suite​:

#####
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
#####

I could not find locations at which the following source code points were exercised in the test suite​:

#####
pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s", OP_DESC(PL_op));

sv.c​:3737​: Perl_croak(aTHX_ "Wide character");

sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");
#####

List​: suggestions?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2017

From @jkeenan

On Mon, 17 Jul 2017 22​:56​:22 GMT, jkeenan wrote​:

On Mon, 17 Jul 2017 17​:29​:28 GMT, steve.grazzini@​grantstreet.com
wrote​:

perldiag has an entry for the "Wide character in %s" warning from
doio.c​:

Wide character in %s
(S utf8) Perl met a wide character (>255) when it wasn't
expecting one. This
warning is by default on for I/O (like print). The easiest
way
to quiet this warning
is simply to add the "​:utf8" layer to the output, e.g.
"binmode
STDOUT, '​:utf8'".
Another way to turn off the warning is to add "no warnings
'utf8';" but that is often
closer to cheating. In general, you are supposed to
explicitly
mark the filehandle
with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors
elsewhere​:

pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
sv.c​:3737​: Perl_croak(aTHX_ "Wide character");
sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty
confusing
to users who
get a fatal error and try to interpret it with the description of the
unrelated warning.

Thanks for your report. Not only are these warnings under-documented,
they're under-tested as well.

I created a branch in which I went to the places you cited in the
source code and changed the first letter of the warning from 'W' to
'A..D' respectively. I then ran the the test suite.

That's the jkeenan/131760-wide-character branch, in case anyone else wants to play around with this.

Tests where one
of the 4 variant warnings and where we were attempting to match an
error message would then fail.

This source code point was extensively exercised in the test suite​:

#####
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
#####

I could not find locations at which the following source code points
were exercised in the test suite​:

#####
pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));

sv.c​:3737​: Perl_croak(aTHX_ "Wide character");

sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");
#####

List​: suggestions?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jul 17, 2017

From @jkeenan

On Mon, 17 Jul 2017 22​:56​:22 GMT, jkeenan wrote​:

On Mon, 17 Jul 2017 17​:29​:28 GMT, steve.grazzini@​grantstreet.com
wrote​:

perldiag has an entry for the "Wide character in %s" warning from
doio.c​:

Wide character in %s
(S utf8) Perl met a wide character (>255) when it wasn't
expecting one. This
warning is by default on for I/O (like print). The easiest
way
to quiet this warning
is simply to add the "​:utf8" layer to the output, e.g.
"binmode
STDOUT, '​:utf8'".
Another way to turn off the warning is to add "no warnings
'utf8';" but that is often
closer to cheating. In general, you are supposed to
explicitly
mark the filehandle
with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors
elsewhere​:

pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
sv.c​:3737​: Perl_croak(aTHX_ "Wide character");
sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty
confusing
to users who
get a fatal error and try to interpret it with the description of the
unrelated warning.

Thanks for your report. Not only are these warnings under-documented,
they're under-tested as well.

I created a branch in which I went to the places you cited in the
source code and changed the first letter of the warning from 'W' to
'A..D' respectively. I then ran the the test suite. Tests where one
of the 4 variant warnings and where we were attempting to match an
error message would then fail.

This source code point was extensively exercised in the test suite​:

#####
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
#####

I could not find locations at which the following source code points
were exercised in the test suite​:

#####
pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));

Appears to have been added in this commit​:

c9cb0f4 (Nicholas Clark 2006-04-29 23​:33​:36 +0000 2028) Perl_croak(aTHX_ "Wide character in %s", OP_DES

sv.c​:3737​: Perl_croak(aTHX_ "Wide character");

Appears to have been added in this commit​:

fa30109 (Jarkko Hietaniemi 2000-11-30 20​:41​:39 +0000 3737) Perl_croak(aTHX_ "Wide character");

sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Appears to have been added in this commit​:

4b3603a (Jarkko Hietaniemi 2000-10-17 14​:11​:31 +0000 2460) Perl_croak(aTHX_ "Wide character");

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2018

From @khwilliamson

On Mon, 17 Jul 2017 16​:28​:59 -0700, jkeenan wrote​:

On Mon, 17 Jul 2017 22​:56​:22 GMT, jkeenan wrote​:

On Mon, 17 Jul 2017 17​:29​:28 GMT, steve.grazzini@​grantstreet.com
wrote​:

perldiag has an entry for the "Wide character in %s" warning from
doio.c​:

Wide character in %s
(S utf8) Perl met a wide character (>255) when it wasn't
expecting one. This
warning is by default on for I/O (like print). The easiest
way
to quiet this warning
is simply to add the "​:utf8" layer to the output, e.g.
"binmode
STDOUT, '​:utf8'".
Another way to turn off the warning is to add "no warnings
'utf8';" but that is often
closer to cheating. In general, you are supposed to
explicitly
mark the filehandle
with an encoding, see open and "binmode" in perlfunc.

But it doesn't describe the several fatal "Wide character" errors
elsewhere​:

pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
sv.c​:3737​: Perl_croak(aTHX_ "Wide character");
sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Those should probably be documented, too. As it is, it's pretty
confusing
to users who
get a fatal error and try to interpret it with the description of
the
unrelated warning.

Thanks for your report. Not only are these warnings under-
documented,
they're under-tested as well.

I created a branch in which I went to the places you cited in the
source code and changed the first letter of the warning from 'W' to
'A..D' respectively. I then ran the the test suite. Tests where one
of the 4 variant warnings and where we were attempting to match an
error message would then fail.

This source code point was extensively exercised in the test suite​:

#####
sv.c​:3734​: Perl_croak(aTHX_ "Wide character in %s",
#####

I could not find locations at which the following source code points
were exercised in the test suite​:

#####
pp_sys.c​:2028​: Perl_croak(aTHX_ "Wide character in %s",
OP_DESC(PL_op));

Appears to have been added in this commit​:

c9cb0f4 (Nicholas Clark 2006-04-29 23​:33​:36 +0000 2028)
Perl_croak(aTHX_ "Wide character in %s", OP_DES

sv.c​:3737​: Perl_croak(aTHX_ "Wide character");

Appears to have been added in this commit​:

fa30109 (Jarkko Hietaniemi 2000-11-30 20​:41​:39 +0000 3737)
Perl_croak(aTHX_ "Wide character");

sv.c​:8519​: Perl_croak(aTHX_ "Wide character in $/");

Appears to have been added in this commit​:

4b3603a (Jarkko Hietaniemi 2000-10-17 14​:11​:31 +0000 2460)
Perl_croak(aTHX_ "Wide character");

I looked at the code here. The one at line 8519 happens is the user doesn't specify the :utf8 layer, and sets the slurp $/ to include a wide character. So the advice given in the diagnostic applies; and we just don't have tests for it. I suspect that the other ones are valid, but again don't have tests, and that they formerly were exercised, but things have changed to avoid them. I know, for example, that we no longer accept wide characters in the bitwise operations like &, and so the calls that might have led to these lines of code getting hit are intercepted.

In any event, attached is a generalization to the wording in perldiag that attempts to handle all the cases If I don't hear objections by April 19, I will apply it (so that it gets into 5.28)
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2018

From @khwilliamson

0007-PATCH-perl-131670-Document-Wide-char-msg-better.patch
From 158bc1407a8f465dd0e1c2414b446ceb3c90f9b7 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 3 Apr 2018 11:30:16 -0600
Subject: [PATCH 7/7] PATCH: [perl #131670] Document Wide char msg better

---
 pod/perldiag.pod | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 860b049368..fc7d4e2f81 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -7606,14 +7606,21 @@ under L<perlsyn/Experimental Details on given and when>.
 
 =item Wide character in %s
 
-(S utf8) Perl met a wide character (>255) when it wasn't expecting
-one.  This warning is by default on for I/O (like print).  The easiest
-way to quiet this warning is simply to add the C<:utf8> layer to the
-output, e.g. C<binmode STDOUT, ':utf8'>.  Another way to turn off the
-warning is to add C<no warnings 'utf8';> but that is often closer to
+(S utf8) Perl met a wide character (ordinal >255) when it wasn't
+expecting one.  This warning is by default on for I/O (like print).
+
+If this warning does come from I/O, the easiest
+way to quiet it is simply to add the C<:utf8> layer, I<e.g.>,
+S<C<binmode STDOUT, ':utf8'>>.  Another way to turn off the warning is
+to add S<C<no warnings 'utf8';>> but that is often closer to
 cheating.  In general, you are supposed to explicitly mark the
 filehandle with an encoding, see L<open> and L<perlfunc/binmode>.
 
+If the warning comes from other than I/O, this diagnostic probably
+indicates that incorrect results are being obtained.  You should examine
+your code to determine how a wide character is getting to an operation
+that doesn't handle them.
+
 =item Wide character (U+%X) in %s
 
 (W locale) While in a single-byte locale (I<i.e.>, a non-UTF-8
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2018

From @khwilliamson

Not having heard anything to the contrary by the deadline, I pushed my patch as
479b791
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2018

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release yesterday of Perl 5.28.0, this and 185 other issues have been
resolved.

Perl 5.28.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.28.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT p5pRT closed this as completed Jun 23, 2018
@p5pRT
Copy link
Author

p5pRT commented Jun 23, 2018

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant