Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warnings FATAL => utf8 not working on PerlIO::encoding layer and open pragma #11972

Open
p5pRT opened this issue Feb 25, 2012 · 3 comments
Open

Comments

@p5pRT
Copy link

p5pRT commented Feb 25, 2012

Migrated from rt.perl.org#111344 (status was 'open')

Searchable as RT111344$

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2012

From @daxim

Created by @daxim

Consider the following programs which should do the same. The file
"broken-utf8" contains only an incomplete UTF-8 sequence, e.g. try with
any of the octets \xc0 or \xc3 or \xc9.

1;perl -Mwarnings=FATAL,utf8 -CD -E'
  open my $fh, "<", "broken-utf8"; my $foo = <$fh>; say "survived"'
2;perl -Mwarnings=FATAL,utf8 -E'
  open my $fh, "<​:encoding(UTF-8)", "broken-utf8"; my $foo = <$fh>;
  say "survived"'
3;perl -Mwarnings=FATAL,utf8 -M'open=​:encoding(UTF-8)' -E'
  open my $fh, "<", "broken-utf8"; my $foo = <$fh>; say "survived"'
4;perl -Mwarnings=FATAL,utf8 -MEncode=decode -E'
  open my $fh, "<", "broken-utf8"; my $foo = decode "UTF-8", <$fh>,
  Encode​::FB_CROAK; say "survived"'

The problem is that the programs 2 and 3 do survive, I expect them to
throw an exception.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.14.2:

Configured by daxim at Wed Feb 15 10:59:16 CET 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.1.9-1.4-desktop,
archname=x86_64-linux-thread-multi uname='linux blackhorse.site
3.1.9-1.4-desktop #1 smp preempt fri jan 27 08:55:10 utc 2012 (efb5ff4)
x86_64 x86_64 x86_64 gnulinux ' config_args='-de
-Dprefix=/home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug
-DDEBUGGING -Dusethreads' hint=recommended, useposix=true,
d_sigaction=define useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags
='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT
-D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include' ccversion='', gccversion='4.6.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8 alignbytes=8, prototype=define Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/../lib64 /usr/lib/../lib64 /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64
libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
-lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.14.1.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.14.1' Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib
-fstack-protector'

Locally applied patches:
    


@INC for perl 5.14.2:
    /home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/lib/site_perl/5.14.2/x86_64-linux-thread-multi
    /home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/lib/site_perl/5.14.2
    /home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/lib/5.14.2/x86_64-linux-thread-multi
    /home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/lib/5.14.2
    .


Environment for perl 5.14.2:
    HOME=/home/daxim
    LANG=de_DE.UTF-8
    LANGUAGE=
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/bin:/home/daxim/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERLBREW_BASHRC_VERSION=0.41
    PERLBREW_HOME=/home/daxim/.perlbrew
    PERLBREW_MANPATH=/home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/man
    PERLBREW_PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.14.2-threads-debug/bin
    PERLBREW_PERL=perl-5.14.2-threads-debug
    PERLBREW_ROOT=/home/daxim/local/share/perlbrew
    PERLBREW_VERSION=0.36
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Feb 27, 2012

From tchrist@perl.com

Lars Dɪá´�á´�á´�á´�á´¡ 迪æ��æ�¯ <perlbug-followup@​perl.org> wrote
  on Sat, 25 Feb 2012 15​:48​:10 PST​:

Consider the following programs which should do the same. The file
"broken-utf8" contains only an incomplete UTF-8 sequence, e.g. try with
any of the octets \xc0 or \xc3 or \xc9.

1;perl -Mwarnings=FATAL,utf8 -CD -E'
open my $fh, "<", "broken-utf8"; my $foo = <$fh>; say "survived"'
2;perl -Mwarnings=FATAL,utf8 -E'
open my $fh, "<​:encoding(UTF-8)", "broken-utf8"; my $foo = <$fh>;
say "survived"'
3;perl -Mwarnings=FATAL,utf8 -M'open=​:encoding(UTF-8)' -E'
open my $fh, "<", "broken-utf8"; my $foo = <$fh>; say "survived"'
4;perl -Mwarnings=FATAL,utf8 -MEncode=decode -E'
open my $fh, "<", "broken-utf8"; my $foo = decode "UTF-8", <$fh>,
Encode​::FB_CROAK; say "survived"'

The problem is that the programs 2 and 3 do survive, I expect them to
throw an exception.

You're going to really hate this, but I'm afraid you've sent the bug to the
"wrong" place​: the Encode module is not part of the Perl core. (Yes, this
is one of my perennial kvetchings.)

Encode does not play well and get along with our warnings categories. That's
why I tell people to use the first version, not the other two​: you can depend
on what the core is doing. You can't depend on a third-part module that we
have no ability to patch or fix. It doesn't respect our utf8 warnings
subcategories.

The problem with the whole I/O layer approach is that you cannot specify what
to do with non-core 3rd-party layers, nor can you reliably predict what
they'll do. It all needs to be carefully redesigned, which must be done in
coordination with someone who does not even read this mailing list.

I suspect we are doing to coming with adverbial modifiers on I/O layers.

  $ perl -E 'binmode(STDOUT, "encoding(MacRoman)")|| die; say "\x{3b1}"; say "DONE"'
  "\x{03b1}" does not map to MacRoman at -e line 1.
  \x{03b1}
  DONE

  $ perl -Mwarnings=FATAL,utf8 -E 'binmode(STDOUT, "encoding(MacRoman)")|| die; say "\x{3b1}"; say "done"'
  "\x{03b1}" does not map to MacRoman at -e line 1.
  Exit 255

Notice it normally does a PERLQQ rewrite on something that doesn't fit. How
do you control that on a stream? You can't. The encode/decode arguments
cannot be passed into a layer's implicit transcoding.

Maybe we need to have :fatal or :perlqq pseudolayers. Except that I want the
default behavior to be :fatal. I don't think the default should be anything
but to raise an exception.

I would say that you want people to be able to get at the full contingent
of Encode possibilities via their I/O layers​:

  FB_DEFAULT FB_CROAK FB_QUIET FB_WARN FB_PERLQQ
  DIE_ON_ERR 0x0001 X
  WARN_ON_ERR 0x0002 X
  RETURN_ON_ERR 0x0004 X X
  LEAVE_SRC 0x0008 X
  PERLQQ 0x0100 X
  HTMLCREF 0x0200
  XMLCREF 0x0400

But that would make the core have to deal with all that silliness.
So the core should only croak on a problem, and you should have to
specifically disable than somehow. The problem is we do not normally
have "warnings that are by default fatalized but which can be turned off".

The whole situation with encodings and warnings and errors and silence
and scribbled alternates does not really fit into the existing system
of warnings vs exceptions per perldiag.

  (W) A warning (optional).
  (D) A deprecation (enabled by default).
  (S) A severe warning (enabled by default).
  (F) A fatal error (trappable).
  (P) An internal error you should never see (trappable).
  (X) A very fatal error (nontrappable).
  (A) An alien error message (not generated by Perl).

Too much is of type W and shoudln't be. And it certainly doesn't
play will with I/O layers.

Even the compiler puts up with bad UTF-8. It's all over the place.

--tom

@p5pRT
Copy link
Author

p5pRT commented Feb 27, 2012

The RT System itself - Status changed from 'new' to 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants