Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perldoc mangles dashes and quotes #8723

Open
p5pRT opened this issue Jan 3, 2007 · 13 comments
Open

perldoc mangles dashes and quotes #8723

p5pRT opened this issue Jan 3, 2007 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 3, 2007

Migrated from rt.perl.org#41170 (status was 'open')

Searchable as RT41170$

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From nospam-abuse@bloodgate.com

perldoc mangles some characters when displaying text on the
console. For instances, on my system dashes (-) and single quotes
(') are replaced with "fancy" looking characters.

While this might look better, it destroys the possibility to
copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs, which often
contain code examples.

Attached is a very primitive sample package with POD, and
a text file that was created by doing​:

  perldoc A

on my console, then marking the displayed text with the mouse,
and then pasting the text into a new file with vi.

Some notes​:

* My system is fully Unicode, including the console.
* I use Konsole, the KDE console.
* this problems exists since a few years, it showed on SuSE 9.0,
  and it persists on SuSE 10.1
* it is not SuSE specific, Paul Johnson said​:
  "For me, on a unicode enabled xterm under ubuntu, the dashes come out
  fine, but the single quotes are dodgy, as are yours here too."

Since this problem has lead to at least one embarassing bug report from
me, I want this now fixed.

Best wishes,

Tels

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl v5.8.8 - Sat Apr 22 23:31:53 UTC 2006
It is being executed now by  Perl v5.8.8 - Sat Apr 22 23:26:49 UTC 2006.

Site configuration information for perl v5.8.8:

Configured by abuild at Sat Apr 22 23:26:49 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.16, archname=x86_64-linux-thread-multi
    uname='linux dvorak 2.6.16 #1 smp mon apr 10 04:51:13 utc 2006 x86_64 
x86_64 x86_64 gnulinux '
    
config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags 
='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    
optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe',
    
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement'
    ccversion='', gccversion='4.1.0 (SUSE Linux)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.4.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, 
ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl v5.8.8:
    /usr/lib/perl5/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/5.8.8
    /usr/lib/perl5/site_perl/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.8
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.8
    /usr/lib/perl5/vendor_perl
    .


Environment for perl v5.8.8:
    HOME=/home/te
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/jvm/jre/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/te/.local/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

-- 
 Signed on Wed Jan  3 20:51:04 2007 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Our second big loss has been the "IP" fudge, which is blurring the
 distinctions between patents, copyrights, trademarks, trade secrets,
 competative advantages, wishful thinking, bullshit, and marketing babble
 into one vague pile of lawyer poo." -- MarkusQ (450076), 2004-01-23


@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From nospam-abuse@bloodgate.com

A.pm

@p5pRT
Copy link
Author

p5pRT commented Jan 3, 2007

From nospam-abuse@bloodgate.com

  Normal paragraph​: Code​: "call−>me()". Inline​: call−>me(). ’single’ "double".

  Verbatim​: Code​: C<< call‐>me() >>. Inline​: call‐>me(). ’single’ "double".

  Normal text again.

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2007

From @JohnPeacock

Tels (via RT) wrote​:

* My system is fully Unicode, including the console.
* I use Konsole, the KDE console.

Happens with Gnome Terminal, as well.

* this problems exists since a few years, it showed on SuSE 9.0,
and it persists on SuSE 10.1

and SuSE 10.2 (which is *really* nice, Tels, you should upgrade!)...

SuSE 9.0 was the first time that they enabled Unicode by default, if I'm not
mistaken. I noticed the same thing when I was running Mandrake/Mandriva. My
solution to date has been to turn off UTF-8, but then I'm just some stupid
American... ;-)

John

--
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Blvd
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2007

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2007

From rick@bort.ca

On Jan 03 2007, Tels wrote​:

perldoc mangles some characters when displaying text on the
console. For instances, on my system dashes (-) and single quotes
(') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they
were probably generated with options similar to what I have below.

While this might look better, it destroys the possibility to
copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs, which often
contain code examples.

Attached is a very primitive sample package with POD, and
a text file that was created by doing​:

perldoc A

This should help

  perldoc -n 'nroff -Tascii' A

Then you can just alias that. Or you could just alias John's way

  perldoc="LANG=en_US perldoc"

I think you could also add the switch to $ENV{PERLDOC}.

TMTOWTDI

--
Rick Delaney
rick@​bort.ca

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2007

From nospam-abuse@bloodgate.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash​: SHA1

Moin John et. al.,

John Peacock <jpeacock@​rowman.com> wrote​:

Tels (via RT) wrote​:

* My system is fully Unicode, including the console.
* I use Konsole, the KDE console.

Happens with Gnome Terminal, as well.

Thanx for the confirmation. :D

* this problems exists since a few years, it showed on SuSE 9.0,
and it persists on SuSE 10.1

and SuSE 10.2 (which is *really* nice, Tels, you should upgrade!)...

First​: Upgrades of SuSE are always risky, especially with crypted file
systems (and/or "exotic" hardware like RAID, Sata, SCSII etc). The are
simple "Not done[tm]" around here, I'd rather build a new machine and
install from scratch. That was how I got from 9.0 to 10.1 :-)
Second​: After the MS stunt Novell pulled I will have nothing more to do with
SuSE/Novell at all. Using Linux was my way of escaping from stupid
monopolies, and I ain't not getting shafted through the backdoor by the
morons at Novell headquarters.

SuSE 9.0 was the first time that they enabled Unicode by default, if I'm
not mistaken. I noticed the same thing when I was running
Mandrake/Mandriva. My solution to date has been to turn off UTF-8, but
then I'm just some stupid American... ;-)

I just wish computers were invented in Japan or China, and not in the "128
letters are enough for us" Europe or America. But then, they would probably
messed it up even more than ASCII did mess us up until Unicode came along.

best wishes,

tels

- --
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Blvd
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

- --
Signed on Thu Jan 4 18​:45​:26 2007 with key 0x93B84C15.
View my photo gallery​: http​://bloodgate.com/photos
PGP key on http​://bloodgate.com/tels.asc or per email.

"Laugh and the world laughs with you, snore and you sleep alone." --
Unknown

-----BEGIN PGP SIGNATURE-----
Version​: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRZ0+NHcLPEOTuEwVAQJigQf+IVUUpxVw7XY787XWmSiW2ThAzrRX4/Tr
Pmw2aJNIfm0DkrcZ3wNEnfdT7Tr3xZlIh4gz3zkwy/jYaDz+PyMUvnZ7d1e7f93u
y/ueFTDu+9izHsv/LVmwOsfVWPGZKlLfcrnntiXoRpJsC3thvTRo7PVa/AXm+3ka
lsCD5XA3sHwSKVxiX7XcTvKXCVGfDQ01ZxyNwnz61/eyQfevVeQFYgyT71hdo6iL
rFkAEyA3Tz4a9C6AGLWkyjq5J1JViUJ45wqRAxvDkBQvjt19/e2+JoD1WLRJIhQx
RAcIwVElK4jI+Q09ae98woA6GRQDGfX/8xR8G1esoFRJdmflZfZ98A==
=Im4Q
-----END PGP SIGNATURE-----

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2007

From nospam-abuse@bloodgate.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash​: SHA1

Moin,

Rick Delaney <rick@​bort.ca> wrote​:

On Jan 03 2007, Tels wrote​:

perldoc mangles some characters when displaying text on the
console. For instances, on my system dashes (-) and single quotes
(') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they
were probably generated with options similar to what I have below.

What other manpages? I am talking about perldoc, not about manpages.

While this might look better, it destroys the possibility to
copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs, which often
contain code examples.

Attached is a very primitive sample package with POD, and
a text file that was created by doing​:

perldoc A

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow
* actually remember it
* type it everytime you come to a new machine or
** set it up on every machine so you dont need to type it?

I expect perldoc to NOT modify (or prettify) the written documentation when
outputting it. If, as a POD author, I f.i. write "$self->method()", then I
expect the output to be exactly that, and not some variant of it.

Then you can just alias that. Or you could just alias John's way

perldoc="LANG=en_US perldoc"

Ah, and I guess that then still works with a chinese POD file?

I think you could also add the switch to $ENV{PERLDOC}.

I prefer not to mess with my environment to work round broken software.

TMTOWTDI

Like, fix the bug?

Sorry if I sound bitter, but I firmly the believe the computer should work
for the human, not the other way around (2052, when the machines rise, they
will kill me for that remark, if they can actually manage to work that long
without fatally crashing...)

Best wishes,

Tels

- --
Signed on Thu Jan 4 18​:50​:04 2007 with key 0x93B84C15.
Get one of my photo posters​: http​://bloodgate.com/posters
PGP key on http​://bloodgate.com/tels.asc or per email.

"If Duke Nukem Forever is not out in 2001, something's very wrong." -
George Broussard, 2001 (http​://tinyurl.com/6m8nh)
-----BEGIN PGP SIGNATURE-----
Version​: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRZ1AEncLPEOTuEwVAQJ1Pgf+Iq4TlgQSAJB6rpdh+e4Gp+RsdGMIkGFJ
gA7RjAqshqQllNDVb26MdN2bf++++3B5bdJjOOKIB/pkqXbK08efzlgRROLPpt/x
eCY8BBDCVLxCDcrqjL177V2DgIAfn0qD/VM/RLBJjzl+45G+qmRsF2H7hX9ZNibF
MRK4vka1HCfpUE8+Prr02xqWl4nLbquxJoEUeFBoTJspkn9gK5NDhUqmvZ5rMnVR
EecHPOoS+g1jTq6CzzyrqPeu77WcVEX9AMtTTj3yThKFMD2qsXS9h0dFLhjLfQEJ
Ri5WSrD9KISRTn02/z2YLChxboJjI51KwxsS14pAy4jqoZ7rQB4Ayw==
=4iL6
-----END PGP SIGNATURE-----

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2007

From rick@bort.ca

On Jan 04 2007, Tels wrote​:

Rick Delaney <rick@​bort.ca> wrote​:

On Jan 03 2007, Tels wrote​:

perldoc mangles some characters when displaying text on the
console. For instances, on my system dashes (-) and single quotes
(') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they
were probably generated with options similar to what I have below.

What other manpages? I am talking about perldoc, not about manpages.

According to the first paragraph of perldoc's DESCRIPTION, perldoc is
essentially

  pod2man | nroff -man | $PAGER

Which is to say that perldoc's default formatting is that of manpages.
On my system (Ubuntu 6) I see just as many fancy quotes in other
manpages as I do in perldoc. The culprit for this is nroff (or troff or
whatever). That is what is fancying up the quotes.

While this might look better, it destroys the possibility to
copy & paste code out of perl documentation.

Or manpages.

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow
* actually remember it
* type it everytime you come to a new machine or
** set it up on every machine so you dont need to type it?

I expect nothing. I was simply suggesting a method you could use to get
output you might like. A workaround. I made and make no judgement
about the validity of this bug report. I'd be quite happy if the above
were the default behaviour but I won't lose any sleep if it stays the
same.

I expect perldoc to NOT modify (or prettify) the written documentation when
outputting it. If, as a POD author, I f.i. write "$self->method()", then I
expect the output to be exactly that, and not some variant of it.

If you don't want the documentation prettified at all, then you might be
happy with `perldoc -t`. It should be easy enough to remember.

As for fixing this, there may be something Pod​::Man could do to the
input it gives nroff, but I really don't know.

--
Rick Delaney
rick@​bort.ca

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2007

From nospam-abuse@bloodgate.com

Moin Rick,

On Friday 05 January 2007 03​:52, you wrote​:

On Jan 04 2007, Tels wrote​:

Rick Delaney <rick@​bort.ca> wrote​:

On Jan 03 2007, Tels wrote​:

perldoc mangles some characters when displaying text on the
console. For instances, on my system dashes (-) and single quotes
(') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because
they were probably generated with options similar to what I have
below.

What other manpages? I am talking about perldoc, not about manpages.

According to the first paragraph of perldoc's DESCRIPTION, perldoc is
essentially

pod2man | nroff \-man | $PAGER

Ah. Should have read the doc then. *goes hiding in a corner*

[snipabit]

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow
* actually remember it
* type it everytime you come to a new machine or
** set it up on every machine so you dont need to type it?

I expect nothing. I was simply suggesting a method you could use to get
output you might like. A workaround. I made and make no judgement
about the validity of this bug report. I'd be quite happy if the above
were the default behaviour but I won't lose any sleep if it stays the
same.

Ok, I made a test C.pm with unicode chars in it. One more problem surfaces​:

  te@​linux​:~/perl/perldoc> perldoc -t C
  ./C.pm​:23​: Unknown command paragraph​: =encoding utf8

Ugh. According to perldoc perlpod, "=encoding utf8" is the way to go.

Second problem​:

  perldoc C

doesn't even show the chinese characters, but shows the Umlauts (and mangled
dashes).

  perldoc -n 'nroff -Tascii' C

Doesn't work either, and it warns a lot about chars it can't find.

  perldoc -n 'nroff -TUTF8' C

Mangles the dashes, and looses the Chinese, again.

Funnily enough, a plain​:

  perldoc -t

shows all the characters properly, including chinese, dashes and umlauts.
(it warns about the =encoding directive, too, tho)

"perldoc -t" doesn't look as "nice", e.g. it doesn't have bold headers etc,
but at least it works correct. I think that either​:

* perldoc -t should be the default, since nroff seems to be broken
* or​: a workaround with nroff could be found, or nroff fixed, and then made
the default

Optionally, if people really like the bold headers etc, "-t" could
be "spiced up" a bit.

The current situation, where a plain "perldoc" is just wrong, is, well,
wrong :)

Sorry if I sounded to harsh in my first reply, thanx a lot for your reply.

best wishes,

tels

--
Signed on Fri Jan 5 12​:29​:02 2007 with key 0x93B84C15.
View my photo gallery​: http​://bloodgate.com/photos
PGP key on http​://bloodgate.com/tels.asc or per email.

"Die deutsche Zensoren - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - Dummköpfe - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - -." Heinrich Heine

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2007

From nospam-abuse@bloodgate.com

C.pm

@p5pRT
Copy link
Author

p5pRT commented Jan 6, 2007

From @demerphq

On 1/4/07, Tels <nospam-abuse@​bloodgate.com> wrote​:

I just wish computers were invented in Japan or China, and not in the "128
letters are enough for us" Europe or America. But then, they would probably
messed it up even more than ASCII did mess us up until Unicode came along.

Just be glad it wasn't the Hawaiians, they have only 12 letters. :-)

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 7, 2007

From rick@bort.ca

On Jan 05 2007, Tels wrote​:

* perldoc -t should be the default, since nroff seems to be broken
* or​: a workaround with nroff could be found, or nroff fixed, and then made
the default

A bit of research leads me to believe that Pod​::Man should be doing some
more escaping of its roff output. In particular, I think anything
between C<> or in verbatim paragraphs should be escaped. It is already
escaping '-' but it is doing nothing for quotes or other characters.

Perhaps we'll get a fresh shipment of round tuits now that it's a
new year.

--
Rick Delaney
rick@​bort.ca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants