Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perlpod not reading utf-8 on input (even when told to) #13396

Open
p5pRT opened this issue Nov 4, 2013 · 9 comments
Open

perlpod not reading utf-8 on input (even when told to) #13396

p5pRT opened this issue Nov 4, 2013 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 4, 2013

Migrated from rt.perl.org#120451 (status was 'open')

Searchable as RT120451$

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From perl-diddler@tlinx.org

Created by perl-diddler@tlinx.org

Note​: my perldoc version is​:

perldoc -V
Perldoc v3.17, under perl v5.016002 for linux

I was looking at output of pod documentation
where a superscript 1 was used ("¹" (U+00B9, encoding 0xc2 0xb9)
The source file has both a "use utf8" at the top as well as an
"=encoding utf-8" at the beginning of it's pod section.

hexdump shows it encoded as 0xc2 followed by 0xb9. But perldoc
displays it as two separate characters​: ¹ (U+00C2, U+00B9).
As a side note, I have "-CSA" in my PERL5OPT ENV var, telling
perl that the STDI/O streams are UTF-8 encoded as well as
arguments (but this is a file read, and wouldn't be affected by that).

---Example---
use utf8;

=encoding utf-8

=head1 EXAMPLE

This is a test¹.

¹-a Perldoc test, that is.
======end; execution via perldoc​:====

perldoc x.pm
x(3) User Contributed Perl Documentation x(3)

EXAMPLE
  This is a test¹.

  ¹-a Perldoc test, that is.

perl v5.16.2 2013-11-03 x(3)

Note the utf-8 is mangled in both places.

However, note, even w/o my use utf8, and encoding statements,
a dumb util like "file" can get it right.

Surely perldoc (and perl) could do at least as well​:

file x.pm
x.pm​: UTF-8 Unicode text

It's too bad perl has to get file encodings wrong when so many
others get it right, though perldoc seems like it has extra problems,
ignoring both internal hints that utf-8 is used.

Perl Info

Flags:
    category=utilities
    severity=high

This perlbug was built using Perl 5.16.2 - Fri Feb 15 01:17:37 UTC 2013
It is being executed now by  Perl 5.16.2 - Fri Feb 15 01:12:05 UTC 2013.

Site configuration information for perl 5.16.2:

Configured by abuild at Fri Feb 15 01:12:05 UTC 2013.

Summary of my perl5 (revision 5 version 16 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.4.6-2.10-default, archname=x86_64-linux-thread-multi
    uname='linux build34 3.4.6-2.10-default #1 smp thu jul 26 09:36:26 utc 2012 (641c197) x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV -Dotherlibdirs=/usr/lib/perl5/site_perl'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector'
    ccversion='', gccversion='4.7.2 20130108 [gcc-4_7-branch revision 195012]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.17.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.16.2/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 -fstack-protector'

Locally applied patches:
    


@INC for perl 5.16.2:
    /home/law/bin/lib
    /usr/lib/perl5/site_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.16.2
    /usr/lib/perl5/vendor_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.16.2
    /usr/lib/perl5/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/5.16.2
    /usr/lib/perl5/site_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.16.2
    /usr/lib/perl5/site_perl
    .


Environment for perl 5.16.2:
    HOME=/home/law
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64
    LOGDIR (unset)
    PATH=/home/law/bin/lib:/sbin:/usr/local/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/dell/srvadmin/bin:/usr/sbin:/etc/local/func_lib:/home/law/lib
    PERL5OPT=-Mutf8 -CSA -I/home/law/bin/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From nbtrap@nbtrap.com

Linda Walsh (via RT) <perlbug-followup@​perl.org> writes​:

The source file has both a "use utf8" at the top as well as an
"=encoding utf-8" at the beginning of it's pod section.

Shouldn't that be "=encoding utf8"?

However, note, even w/o my use utf8, and encoding statements,
a dumb util like "file" can get it right.

"file" is designed to get it right. Perl, on the other hand, doesn't
try to guess your file's encoding.

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From @ikegami

On Mon, Nov 4, 2013 at 11​:58 AM, Nathan Trapuzzano <nbtrap@​nbtrap.com>wrote​:

Linda Walsh (via RT) <perlbug-followup@​perl.org> writes​:

The source file has both a "use utf8" at the top as well as an
"=encoding utf-8" at the beginning of it's pod section.

Shouldn't that be "=encoding utf8"?

No. The argument is the name of an encoding. (Changing it to "utf8" has no
effect.)

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2013

From @iabyn

(I originally sent this via email to p5p yesterday, but it hasn't appeared, so I'm resending via the RT web interface)

The double encoding is occurring due to your PERL5OPT setting​:

$ perldoc -T /tmp/U.pm | hexdump -c | grep 't e'
0000020 i b u t e d P e r l D o c u
0000070 a t e s t 302 271 . \n \n
0000090 t e s t , t h a t i s . \n \n

$ PERL5OPT='-CSA' perldoc -T /tmp/U.pm | hexdump -c | grep 't e'
0000020 i b u t e d P e r l D o c u
0000070 a t e s t 303 202 302 271 . \n \n
0000090 d o c t e s t , t h a t i

Passing -CSA to a perl program which is already Unicode-aware is likely to
make it do the wrong thing.

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2013

From @ikegami

On Tue, Nov 5, 2013 at 7​:47 AM, Dave Mitchell via RT <
perlbug-followup@​perl.org> wrote​:

Passing -CSA to a perl program which is already Unicode-aware is likely to
make it do the wrong thing.

Sounds like a missing :raw. More specifically, it sounds like it encodes
manually (as opposed to using :encoding layer), but didn't ensure the
handle was a byte handle using :raw.

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From perl-diddler@tlinx.org

On Tue Nov 05 06​:57​:00 2013, ikegami@​adaelis.com wrote​:

On Tue, Nov 5, 2013 at 7​:47 AM, Dave Mitchell via RT <
perlbug-followup@​perl.org> wrote​:

Passing -CSA to a perl program which is already Unicode-aware is likely to
make it do the wrong thing.

Sounds like a missing :raw. More specifically, it sounds like it encodes
manually (as opposed to using :encoding layer), but didn't ensure the
handle was a byte handle using :raw.


If pod isn't using the perlio layers, that would make sense.

However, regarding it being the ENV -CSA, I would
note, that it is "counter-intuitive", that proclaiming STDIN & args to be in UTF-8,
would have much to do with using perlpod "filename", since it isn't obvious that
"filename" -- which has to be searched for as a file in perl's LIB path, and have a
".pm" added to it, would be read as "stdin" somewhere. I.e. I didn't specify that
the default *file* encoding was UTF-8, but that STDIO was.

It *could* be the case that declaring the source to be UTF-8 via "use utf8", and later
telling pod that it was utf-8, could also result in double jeopardy (it would seem more
likely than the ENV which shouldn't affect file-reading. But depending on how it is
implemented, it could be either.

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From perl-diddler@tlinx.org

As an update to this.

If the pod text is displayed as a web page, using
#!/bin/bash
( perl -Mojo -e 'plugin "PODRenderer"; a()->start' daemon -l http​://localhost​:3000 &)


for rendering, and firefox as a client, the UTF-8 chars display correctly!

So whatever Mojo does, perldoc should go forth and do likewise?

(the perl server above was started with the same ENV settings)...

@p5pRT p5pRT added the khw label Oct 25, 2019
@toddr toddr removed the khw label Oct 25, 2019
@khwilliamson
Copy link
Contributor

I believe this is fixed; please try it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants