Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

printing $! when open.pm sets utf8 default on filehandles yields garbage #12035

Closed
p5pRT opened this issue Apr 3, 2012 · 18 comments
Closed

printing $! when open.pm sets utf8 default on filehandles yields garbage #12035

p5pRT opened this issue Apr 3, 2012 · 18 comments
Labels
Unicode and System Calls Bad interactions of syscalls and UTF-8

Comments

@p5pRT
Copy link

p5pRT commented Apr 3, 2012

Migrated from rt.perl.org#112208 (status was 'resolved')

Searchable as RT112208$

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

From doherty@cpan.org

Created by doherty@cpan.org

If your locale is set to something like ru_RU.UTF-8, then the following
program will output garbage​:

use strict;
use open qw(​:std :encoding(UTF-8);
use IO​::Socket;

unless (IO​::Socket​::INET->new("localhost​:1111")) {
  print $!, "\n";
}
__END__

Or another example​: perl -CS -MErrno -le '$!=Errno​::ETIMEDOUT; print $!'

This was originally reported as a bug against the utf8​::all module,
which uses open.pm to set default PerlIO layers for the caller​:
doherty/utf8-all#9

Perl Info

Flags:
    category=library
    severity=medium
    module=open

Site configuration information for perl 5.14.2:

Configured by mike at Fri Sep 30 15:36:02 ADT 2011.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
     Platform:
    osname=linux, osvers=2.6.32-34-generic, archname=x86_64-linux
    uname='linux charron 2.6.32-34-generic #77-ubuntu smp tue sep 13
19:39:17 utc 2011 x86_64 gnulinux '
    config_args='-de -Dprefix=/home/mike/perl5/perlbrew/perls/perl-5.14.2'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
    ccversion='', gccversion='4.4.3', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib
/usr/lib/x86_64-linux-gnu /lib64 /usr/lib64
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.11.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.11.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:


@INC for perl 5.14.2:

/home/mike/perl5/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2/x86_64-linux
    /home/mike/perl5/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2
    /home/mike/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux
    /home/mike/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2
    .


Environment for perl 5.14.2:
    HOME=/home/mike
    LANG=en_CA.UTF-8
    LANGUAGE=en_CA:en
    LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client64/lib
    LOGDIR (unset)

PATH=/home/mike/.bin:/home/mike/perl5/perlbrew/bin:/home/mike/perl5/perlbrew/perls/perl-5.14.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/mike/Downloads/android-sdk-linux_x86/tools:/home/mike/Downloads/android-sdk-linux_x86/platform-tools:/usr/lib/oracle/11.2/client64/bin
    PERLBREW_BASHRC_VERSION=0.42
    PERLBREW_HOME=/home/mike/.perlbrew
    PERLBREW_MANPATH=/home/mike/perl5/perlbrew/perls/perl-5.14.2/man

PERLBREW_PATH=/home/mike/perl5/perlbrew/bin:/home/mike/perl5/perlbrew/perls/perl-5.14.2/bin
    PERLBREW_PERL=perl-5.14.2
    PERLBREW_ROOT=/home/mike/perl5/perlbrew
    PERLBREW_VERSION=0.42
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

From @jkeenan

On Mon Apr 02 19​:36​:07 2012, doherty@​cpan.org wrote​:

use open qw(​:std :encoding(UTF-8);

Looks like it's missing a closing parenthesis​:

use open qw(​:std :encoding(UTF-8));

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

From @cpansprout

On Mon Apr 02 19​:36​:07 2012, doherty@​cpan.org wrote​:

This is a bug report for perl from doherty@​cpan.org,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
[Please describe your issue here]

If your locale is set to something like ru_RU.UTF-8, then the
following
program will output garbage​:

use strict;
use open qw(​:std :encoding(UTF-8);
use IO​::Socket;

unless (IO​::Socket​::INET->new("localhost​:1111")) {
print $!, "\n";
}
__END__

Or another example​: perl -CS -MErrno -le '$!=Errno​::ETIMEDOUT; print
$!'

This was originally reported as a bug against the utf8​::all module,
which uses open.pm to set default PerlIO layers for the caller​:
doherty/utf8-all#9

There has been some discussion about making syscalls work properly with
Unicode, under a pragma. This seems like something that should be taken
into account, too, so I’m linking this to the meta ticket (#105914).

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

From @Leont

On Tue Apr 03 13​:00​:37 2012, sprout wrote​:

There has been some discussion about making syscalls work properly with
Unicode, under a pragma. This seems like something that should be taken
into account, too, so I’m linking this to the meta ticket (#105914).

The problem is not in any syscall but in locale handling. The problem is
that strerror returns something appropriate for the current locale, but
perl always assumes it is in Latin-1.

Leon

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2012

From @cpansprout

On Tue Apr 03 13​:50​:43 2012, LeonT wrote​:

On Tue Apr 03 13​:00​:37 2012, sprout wrote​:

There has been some discussion about making syscalls work properly with
Unicode, under a pragma. This seems like something that should be taken
into account, too, so I’m linking this to the meta ticket (#105914).

The problem is not in any syscall but in locale handling. The problem is
that strerror returns something appropriate for the current locale, but
perl always assumes it is in Latin-1.

I was just wondering whether it could be made part of the same pragma.
After all, it’s conceivably the same sort of thing​: the byte sequence
coming from the OS is not Latin-1.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2013

From @jmdh

Created by @jmdh

As reported in Debian's bugtracker at
<http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=409704>

Quoting from Joey on the bug report​:

"
joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8 :std};open(foo) || print STDERR "error​: $!\n";'
error​: Aucun fichier ou répertoire de ce type
  ^^
This mojibake comes about because $! is a UTF-8 string in that locale, but it
is not decoded into perl's internal utf8 representation.

It's possible to work around the problem with the encoding pragma, but
not completely​:

joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8 :std}; use encoding 'utf8';open(foo) || print STDERR "error​: $!\n";'
error​: Aucun fichier ou répertoire de ce type

joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8 :std}; use encoding 'utf8';open(foo) || print STDERR "error​: ",$!,"\n";'
error​: Aucun fichier ou répertoire de ce type

The first example works because the encoding pragma converts the string
to utf8 during concacenation, but the second example shows that this is
not a solution because concacentation can't be relied on for all output.

The only solution if you want to use open qw{​:utf8 :std} in a program
seems to be manually using Encode​::decode_utf8 on every instance of $!
and $@​ in the program. Which is exactly the kind of error-prone busywork
that IO layers and perl's unicode model are supposed to avoid.."

The test in question no longer works on my Debian system, because
the French error message no longer contains an accent, but the same
behaviour can be reproduced using eg ja_JP.UTF-8, and also on 5.17.10.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.17.10:

Configured by dom at Sun Mar 31 23:53:26 BST 2013.

Summary of my perl5 (revision 5 version 17 subversion 10) configuration:
   
  Platform:
    osname=linux, osvers=3.2.0-4-686-pae, archname=i686-linux
    uname='linux callisto 3.2.0-4-686-pae #1 smp debian 3.2.39-2 i686 gnulinux '
    config_args='-de -Dprefix=/home/dom/perl5/perlbrew/perls/perl-5.17.10 -Dusedevel'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.17.10:
    /home/dom/perl5/perlbrew/perls/perl-5.17.10/lib/site_perl/5.17.10/i686-linux
    /home/dom/perl5/perlbrew/perls/perl-5.17.10/lib/site_perl/5.17.10
    /home/dom/perl5/perlbrew/perls/perl-5.17.10/lib/5.17.10/i686-linux
    /home/dom/perl5/perlbrew/perls/perl-5.17.10/lib/5.17.10
    .


Environment for perl 5.17.10:
    HOME=/home/dom
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/dom/perl5/perlbrew/bin:/home/dom/perl5/perlbrew/perls/perl-5.17.10/bin:~/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.43
    PERLBREW_HOME=/home/dom/.perlbrew
    PERLBREW_MANPATH=/home/dom/perl5/perlbrew/perls/perl-5.17.10/man
    PERLBREW_PATH=/home/dom/perl5/perlbrew/bin:/home/dom/perl5/perlbrew/perls/perl-5.17.10/bin
    PERLBREW_PERL=perl-5.17.10
    PERLBREW_ROOT=/home/dom/perl5/perlbrew
    PERLBREW_VERSION=0.43
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2013

From @Leont

On Mon Apr 01 10​:42​:44 2013, dom wrote​:

As reported in Debian's bugtracker at
<http​://bugs.debian.org/cgi-bin/bugreport.cgi?bug=409704>

Quoting from Joey on the bug report​:

"
joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8
:std};open(foo) || print STDERR "error​: $!\n";'
error​: Aucun fichier ou répertoire de ce type
^^
This mojibake comes about because $! is a UTF-8 string in that locale,
but it
is not decoded into perl's internal utf8 representation.

It's possible to work around the problem with the encoding pragma, but
not completely​:

joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8 :std}; use
encoding 'utf8';open(foo) || print STDERR "error​: $!\n";'
error​: Aucun fichier ou r�pertoire de ce type

joey@​kodama​:~>LANG=fr_FR.UTF-8 perl -e 'use open qw{​:utf8 :std}; use
encoding 'utf8';open(foo) || print STDERR "error​: ",$!,"\n";'
error​: Aucun fichier ou répertoire de ce type

The first example works because the encoding pragma converts the
string
to utf8 during concacenation, but the second example shows that this
is
not a solution because concacentation can't be relied on for all
output.

The only solution if you want to use open qw{​:utf8 :std} in a program
seems to be manually using Encode​::decode_utf8 on every instance of $!
and $@​ in the program. Which is exactly the kind of error-prone
busywork
that IO layers and perl's unicode model are supposed to avoid.."

The test in question no longer works on my Debian system, because
the French error message no longer contains an accent, but the same
behaviour can be reproduced using eg ja_JP.UTF-8, and also on 5.17.10.

This bug is a duplicate of #112208. The return value of sterror(3) is
not properly decoded in $!'s magic. I did write a proof-of-concept fix
utf8​::errno
(https://github.com/Leont/utf8-errno/blob/master/lib/utf8/errno.xs), but
a real solution would probably be different.

Leon

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2013

From @jmdh

On Mon, Apr 01, 2013 at 10​:54​:48AM -0700, Leon Timmermans via RT wrote​:

This bug is a duplicate of #112208. The return value of sterror(3) is
not properly decoded in $!'s magic. I did write a proof-of-concept fix
utf8​::errno
(https://github.com/Leont/utf8-errno/blob/master/lib/utf8/errno.xs), but
a real solution would probably be different.

Aha! Thanks, merged.

Dominic.

--
Dominic Hargreaves | http​://www.larted.org.uk/~dom/
PGP key 5178E2A5 from the.earth.li (keyserver,web,email)

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2013

From @khwilliamson

Fixed by commit 1500bd9
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2013

@khwilliamson - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2014

From @khwilliamson

The fix for this had to be reverted for v5.20, because it caused problems for other modules. There is a new plan to fix this for v5.22, and I'm adding this ticket to the blockers for 5.21.1

--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 1, 2014

@khwilliamson - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2014

From @khwilliamson

This is now fixed again in blead via commit
2c6ee1a
and its predecessor.
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jun 6, 2014

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

@p5pRT p5pRT closed this as completed Jun 2, 2015
@p5pRT p5pRT added the Unicode and System Calls Bad interactions of syscalls and UTF-8 label Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unicode and System Calls Bad interactions of syscalls and UTF-8
Projects
None yet
Development

No branches or pull requests

1 participant