Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print does not respect "use encoding 'utf8'" #7967

Closed
p5pRT opened this issue Jun 11, 2005 · 15 comments
Closed

print does not respect "use encoding 'utf8'" #7967

p5pRT opened this issue Jun 11, 2005 · 15 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 11, 2005

Migrated from rt.perl.org#36248 (status was 'rejected')

Searchable as RT36248$

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2005

From williams@tni.com

Created by williams@tni.com

#!perl

use utf8;
use encoding 'utf8';
use Encode;

# this simulates a utf8 string without the utf8 bit set,
# such as one gets from DBD​::mysql or LWP or etc etc.
$x = 'hÿpër';
Encode​::_utf8_off($x);

# prints 'hÿpër' correctly
print "$x\n";

# prints doubly-encoded utf8​: 'hÿpër'
print $x;
print "\n";

# prints 'hÿpër' correctly
print "$x\n";

# The point is that C< use encoding 'utf8' > did not make C< print >
# (or the IO routines) assume utf8 instead of latin1 when it decoded
# the string. It only works correctly when strings are concatenated.
# This should be regarded as a bug, IMHO.

Perl Info

Flags:
    category=library
    severity=medium

Site configuration information for perl v5.8.7:

Configured by williams at Sat Jun 11 11:26:38 MDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.6.9-1.681_fc3, archname=i686-linux
    uname='linux ip137.home 2.6.9-1.681_fc3 #1 thu nov 18 15:10:10 est 2004 i686 i686 i386 gnulinux '
    config_args='-de'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.4.2 20041017 (Red Hat 3.4.2-6.fc3)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.3.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.3'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:



@INC for perl v5.8.7:
    /home/williams/perl-5.8.7/lib
    /usr/local/lib/perl5/5.8.7/i686-linux
    /usr/local/lib/perl5/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.7/i686-linux
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.8.7:
    HOME=/home/williams
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/lib/jre/bin:/home/williams/bin:/sbin/:/usr/sbin:/usr/lib/jre/bin
    PERLLIB=/home/williams/perl-5.8.7/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Nov 20, 2011

From @jkeenan

On Sat Jun 11 11​:12​:41 2005, williams@​tni.com wrote​:

# The point is that C< use encoding 'utf8' > did not make C< print >
# (or the IO routines) assume utf8 instead of latin1 when it decoded
# the string. It only works correctly when strings are concatenated.
# This should be regarded as a bug, IMHO.

This appears to be the case with say() as well​:

#####
#!/usr/local/bin/perl
use strict;
use warnings;
use feature qw( :5.10 );
use utf8;
use encoding 'utf8';
use Encode;

my $x = 'hÿpër';
Encode​::_utf8_off($x);

print "$x\n";
print $x;
print "\n";
print "$x\n";
say '';
# try to say it
say $x;
say $x . '';
#####

Output​:

#####
$ perl 36248.pl
hÿpër
hÿpër
hÿpër

hÿpër
hÿpër
#####

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2013

From @jkeenan

On Sat Jun 11 11​:12​:41 2005, williams@​tni.com wrote​:

This is a bug report for perl from williams@​tni.com,
generated with the help of perlbug 1.35 running under perl v5.8.7.

-----------------------------------------------------------------
[Please enter your report here]

#!perl

use utf8;
use encoding 'utf8';
use Encode;

# this simulates a utf8 string without the utf8 bit set,
# such as one gets from DBD​::mysql or LWP or etc etc.
$x = 'hÿpër';
Encode​::_utf8_off($x);

# prints 'hÿpër' correctly
print "$x\n";

# prints doubly-encoded utf8​: 'hÿpër'
print $x;
print "\n";

# prints 'hÿpër' correctly
print "$x\n";

# The point is that C< use encoding 'utf8' > did not make C< print >
# (or the IO routines) assume utf8 instead of latin1 when it decoded
# the string. It only works correctly when strings are concatenated.
# This should be regarded as a bug, IMHO.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=library
severity=medium
---
Site configuration information for perl v5.8.7​:

Configured by williams at Sat Jun 11 11​:26​:38 MDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration​:
Platform​:
osname=linux, osvers=2.6.9-1.681_fc3, archname=i686-linux
uname='linux ip137.home 2.6.9-1.681_fc3 #1 thu nov 18 15​:10​:10 est
2004 i686 i686 i386 gnulinux '
config_args='-de'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -I/usr/local/include
-I/usr/include/gdbm'
ccversion='', gccversion='3.4.2 20041017 (Red Hat 3.4.2-6.fc3)',
gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries​:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.3.3.so, so=so, useshrplib=false,
libperl=libperl.a
gnulibc_version='2.3.3'
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:

---
@​INC for perl v5.8.7​:
/home/williams/perl-5.8.7/lib
/usr/local/lib/perl5/5.8.7/i686-linux
/usr/local/lib/perl5/5.8.7
/usr/local/lib/perl5/site_perl/5.8.7/i686-linux
/usr/local/lib/perl5/site_perl/5.8.7
/usr/local/lib/perl5/site_perl
.

---
Environment for perl v5.8.7​:
HOME=/home/williams
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/kerberos/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/X11R6/bin​:/usr/lib/jre/bin​:/home/williams/bin​:/sbin/​:/usr/sbin​:/usr/lib/jre/bin

PERLLIB=/home/williams/perl\-5\.8\.7/lib
PERL\_BADLANG \(unset\)
SHELL=/bin/bash

On the p5p list lately I believe there has been discussion of the
problems with 'use encoding'.

Could someone familiar with those issues review this older ticket?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2013

From @rjbs

I have not reviewed this, but​:

1. encoding is part of the Encode dist, which is upstream CPAN; this bug *probably* belongs
there

2. encoding is deprecated and will be removed, which means that it is unlikely that we'll see a
fixed version in core

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2013

From @cpansprout

On Fri Aug 30 19​:10​:28 2013, rjbs wrote​:

I have not reviewed this, but​:

1. encoding is part of the Encode dist, which is upstream CPAN; this
bug *probably* belongs
there

encoding.pm is a small wrapper around core functionality.

2. encoding is deprecated and will be removed, which means that it is
unlikely that we'll see a
fixed version in core

I have not reviewed the bug itself, but generally it is not clear at all
whether code using encoding.pm is behaving correctly or no, since no one
really knows how it is supposed to work. All I can say is that it is
completely broken and always has been.

I suggest we collect these in a meta ticket so that we can reject them
en masse once the core functionality that encoding.pm wraps is ripped out.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2013

From @Hugmeir

On Sat, Aug 31, 2013 at 2​:56 AM, Father Chrysostomos via RT <
perlbug-followup@​perl.org> wrote​:

On Fri Aug 30 19​:10​:28 2013, rjbs wrote​:

I have not reviewed this, but​:

1. encoding is part of the Encode dist, which is upstream CPAN; this
bug *probably* belongs
there

encoding.pm is a small wrapper around core functionality.

2. encoding is deprecated and will be removed, which means that it is
unlikely that we'll see a
fixed version in core

I have not reviewed the bug itself, but generally it is not clear at all
whether code using encoding.pm is behaving correctly or no, since no one
really knows how it is supposed to work. All I can say is that it is
completely broken and always has been.

I suggest we collect these in a meta ticket so that we can reject them
en masse once the core functionality that encoding.pm wraps is ripped out.

+10

@p5pRT
Copy link
Author

p5pRT commented Sep 4, 2013

From @ap

* Brian Fraser <fraserbn@​gmail.com> [2013-08-31 15​:10]​:

+10

One of those 10 is me.

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2014

From @khwilliamson

Since the 'use encoding' feature is scheduled to be removed in v5.22, we likely won't fix this, but as per
http​://markmail.org/message/kgbo6rasx4c7b3zw
this is being marked stalled, and a blocker for 5.22

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 10, 2014

@khwilliamson - Status changed from 'open' to 'stalled'

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2016

From @dcollinsn

On Mon Mar 10 14​:05​:55 2014, khw wrote​:

Since the 'use encoding' feature is scheduled to be removed in v5.22,
we likely won't fix this, but as per
http​://markmail.org/message/kgbo6rasx4c7b3zw
this is being marked stalled, and a blocker for 5.22

That link is, sadly, now stale.

As 5.22 has come and gone, and "encoding" is deprecated, what is the plan for tickets like this?

--
Dan Collins

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2016

The RT System itself - Status changed from 'stalled' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 8, 2016

From @cpansprout

On Thu Jul 07 17​:49​:07 2016, dcollinsn@​gmail.com wrote​:

On Mon Mar 10 14​:05​:55 2014, khw wrote​:

Since the 'use encoding' feature is scheduled to be removed in v5.22,
we likely won't fix this, but as per
http​://markmail.org/message/kgbo6rasx4c7b3zw
this is being marked stalled, and a blocker for 5.22

That link is, sadly, now stale.

As 5.22 has come and gone, and "encoding" is deprecated, what is the
plan for tickets like this?

I am hoping to remove the functionality that encoding.pm wraps some time soon. At that point, the tickets related to it will be rejected.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 15, 2016

From @cpansprout

The encoding.pm functionality was removed in the branch merged as a9cb10c.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 15, 2016

@cpansprout - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant