Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perldoc double-encoding bog #11255

Closed
p5pRT opened this issue Apr 13, 2011 · 6 comments
Closed

perldoc double-encoding bog #11255

p5pRT opened this issue Apr 13, 2011 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 13, 2011

Migrated from rt.perl.org#88488 (status was 'open')

Searchable as RT88488$

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2011

From tchrist@perl.com

I never used perldoc, just normal tools, so I never noticed its output is
bollocksed up. Thanks you Brian for pointing this out. The dumb thing is
committing a double-encoding faux pas​:

  % perldoc pod/perlunicode.pod | grep 'Unicode support in Perl'
  perlunicode â�� Unicode support in Perl

Which is explained here​:

  % env PERL_UNICODE=0 perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -b
  perlunicode \xE2\x88\x92 Unicode support in Perl
  % env PERL_UNICODE=S perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -b
  perlunicode \xC3\xA2\xC2\x88\xC2\x92 Unicode support in Perl

  % env PERL_UNICODE=0 perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -x
  perlunicode \x{2212} Unicode support in Perl
  % env PERL_UNICODE=S perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -x
  perlunicode \x{E2}\x{88}\x{92} Unicode support in Perl

  % env PERL_UNICODE=0 perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -v
  perlunicode \N{MINUS SIGN} Unicode support in Perl
  % env PERL_UNICODE=S perldoc pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -v
  perlunicode \N{LATIN SMALL LETTER A WITH CIRCUMFLEX}\N{CHARACTER TABULATION SET}\N{PRIVATE USE TWO} Unicode support in Perl

Now compare with (what are for me) the standard tools, which work correctly​:

  % env PERL_UNICODE=0 pod2man pod/perlunicode.pod | nroff -man | grep 'Unicode support in Perl' | uniquote -x
  perlunicode \x{2212} Unicode support in Perl
  % env PERL_UNICODE=S pod2man pod/perlunicode.pod | nroff -man | grep 'Unicode support in Perl' | uniquote -x
  perlunicode \x{2212} Unicode support in Perl

  % env PERL_UNICODE=0 pod2text pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -x
  perlunicode - Unicode support in Perl
  % env PERL_UNICODE=S pod2text pod/perlunicode.pod | grep 'Unicode support in Perl' | uniquote -x
  perlunicode - Unicode support in Perl

--tom

@p5pRT
Copy link
Author

p5pRT commented Apr 15, 2011

From @briandfoy

On Wed Apr 13 09​:54​:11 2011, tom christiansen wrote​:

I never used perldoc, just normal tools, so I never noticed its output
is
bollocksed up. Thanks you Brian for pointing this out. The dumb
thing is
committing a double-encoding faux pas​:

I started to look at this and didn't immediately see where the problem could be. I'm a bit
swamped right now, but as I have time, I want to go step-by-step to ensure that each thing is
respecting the data passed to it.

Along with that, I want to make all of the core Perl docs use UTF-8 as the encoding. If they are
ASCII now, that shouldn't change anything. The trick is then making the Pod tools recognize and
respect the encoding (even if it is something other than UTF-8).

@p5pRT
Copy link
Author

p5pRT commented Apr 15, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2013

From @jkeenan

On Thu Apr 14 21​:29​:50 2011, comdog wrote​:

On Wed Apr 13 09​:54​:11 2011, tom christiansen wrote​:

I never used perldoc, just normal tools, so I never noticed its
output
is
bollocksed up. Thanks you Brian for pointing this out. The dumb
thing is
committing a double-encoding faux pas​:

I started to look at this and didn't immediately see where the problem
could be. I'm a bit
swamped right now, but as I have time, I want to go step-by-step to
ensure that each thing is
respecting the data passed to it.

Along with that, I want to make all of the core Perl docs use UTF-8 as
the encoding. If they are
ASCII now, that shouldn't change anything. The trick is then making
the Pod tools recognize and
respect the encoding (even if it is something other than UTF-8).

Tom, brian​: Would it be possible to get an update on the status of the
issues discussed in this ticket?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2016

From @dcollinsn

On Fri Feb 15 19​:12​:46 2013, jkeenan wrote​:

Tom, brian​: Would it be possible to get an update on the status of the
issues discussed in this ticket?

Thank you very much.
Jim Keenan

I'm having some trouble duplicating this. The hyphen that Tom's testcase shows isn't a unicode character under any of the versions of perl/perldoc I tested. In old 5.14ish versions, I found this inconsistency​:

$ PERL_UNICODE=0 perl-5.14.0/bin/perldoc perlunicode | grep "returns true if the" | uniquote -v
  \N{MIDDLE DOT} "DO_UTF8(sv)" returns true if the "UTF8" flag is on and the bytes
  pragma is not in effect. "SvUTF8(sv)" returns true if the "UTF8"
  \N{MIDDLE DOT} is_utf8_char(s) returns true if the pointer points to a valid UTF-8
$ PERL_UNICODE=S perl-5.14.0/bin/perldoc perlunicode | grep "returns true if the" | uniquote -v
  \N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}\N{MIDDLE DOT} "DO_UTF8(sv)" returns true if the "UTF8" flag is on and the bytes
  pragma is not in effect. "SvUTF8(sv)" returns true if the "UTF8"
  \N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}\N{MIDDLE DOT} is_utf8_char(s) returns true if the pointer points to a valid UTF-8

But that's just a literal * in newer versions. Does anyone know if this is still a potential issue?

--
Respectfully,
Dan Collins

@khwilliamson
Copy link
Contributor

This no longer occurs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants