Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback (replacement char) with encode() vs binmode/open #8950

Closed
p5pRT opened this issue Jun 27, 2007 · 4 comments
Closed

Fallback (replacement char) with encode() vs binmode/open #8950

p5pRT opened this issue Jun 27, 2007 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 27, 2007

Migrated from rt.perl.org#43390 (status was 'open')

Searchable as RT43390$

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2007

From nospam-abuse@bloodgate.com

Created by nospam-abuse@bloodgate.com

Moin,

consider the three programs below. They take an UTF-8 text containing
russian characters and convert it to ISO-8859-1, which can not express
these characters. This means they should get replaced by '?'. However​:

* it doesn't seem possible to say what replacement char is uses, '?'
  seems the only possible choice and default
* only the encode() variant actually works, the other two warn and
  do not produce the right output

I believe this is a bug, all three programs should work like encode.pl​:

  # perl encode.pl
  ???????
  # perl binmode.pl
  "\x{0420}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{0443}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{0441}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{0441}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{043a}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{0438}" does not map to iso-8859-1 at binmode.pl line 12.
  "\x{0439}" does not map to iso-8859-1 at binmode.pl line 12.
  \x{0420}\x{0443}\x{0441}\x{0441}\x{043a}\x{0438}\x{0439}
  # perl open.pl
  "\x{0420}" does not map to iso-8859-1.
  "\x{0443}" does not map to iso-8859-1.
  "\x{0441}" does not map to iso-8859-1.
  "\x{0441}" does not map to iso-8859-1.
  "\x{043a}" does not map to iso-8859-1.
  "\x{0438}" does not map to iso-8859-1.
  "\x{0439}" does not map to iso-8859-1.
  # cat test.txt
  \x{0420}\x{0443}\x{0441}\x{0441}\x{043a}\x{0438}\x{0439}

encode.pl​:

  #####################################################################
  #!/usr/bin/perl -w
  use Encode qw/encode/;
  use utf8;
  no warnings 'utf8';
  my $russki = 'Русский';
  my $enc = '​:encoding(iso-8859-1)';
  my $encoded = encode('iso-8859-1', $russki);
  binmode (STDOUT, $enc) or
  die ("Cannot do binmode(STDOUT,$enc)​: $!");
  print $encoded,"\n";
  #####################################################################

binmode.pl​:

  #####################################################################
  use Encode qw/encode/;
  use utf8;
  no warnings 'utf8';
  my $russki = 'Русский';
  my $enc = '​:encoding(iso-8859-1)';
  binmode (STDOUT, $enc) or
  die ("Cannot do binmode(STDOUT,$enc)​: $!");
  print $encoded,"\n";
  #####################################################################

open.pl​:

  #####################################################################
  use Encode qw/encode/;
  use utf8;
  no warnings 'utf8';
  my $russki = 'Русский';
  my $enc = '​:encoding(iso-8859-1)';
  open my $FILE, ">$enc", 'test.txt' or
  die ("Cannot open (STDOUT,$enc)​: $!");
  print $FILE $russki,"\n";
  #####################################################################

All the best,

Tels

Perl Info
- ---
Flags:
    category=core
    severity=medium
- ---
This perlbug was built using Perl v5.8.8 - Sat Apr 22 23:31:53 UTC 2006
It is being executed now by  Perl v5.8.8 - Sat Apr 22 23:26:49 UTC 2006.

Site configuration information for perl v5.8.8:

Configured by abuild at Sat Apr 22 23:26:49 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.16, archname=x86_64-linux-thread-multi
    uname='linux dvorak 2.6.16 #1 smp mon apr 10 04:51:13 utc 2006 x86_64 
x86_64 x86_64 gnulinux '
    
config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags 
='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    
optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe',
    
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement'
    ccversion='', gccversion='4.1.0 (SUSE Linux)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.4.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, 
ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    

- ---
@INC for perl v5.8.8:
    /usr/lib/perl5/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/5.8.8
    /usr/lib/perl5/site_perl/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.8
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.8
    /usr/lib/perl5/vendor_perl
    .

- ---
Environment for perl v5.8.8:
    HOME=/home/te
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/te/.local/bin:/home/te/games/dirkdashing:/home/te/games/dirkdashing:/home/te/games/dirkdashing
    PERL_BADLANG (unset)
    SHELL=/bin/bash

- -- 
 Signed on Wed Jun 27 07:48:26 2007 with key 0x93B84C15.
 View my photo gallery: http://bloodgate.com/photos
 PGP key on http://bloodgate.com/tels.asc or per email.

 I am "Times Person of the Year 2006" 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRoH6bncLPEOTuEwVAQKizgf+OG87FsP/FDjuwC4TSsgSRvTs03pLLIOk
iIL0SZFYaUwufjNidplK05Pvw2/wY3KbE1vN+oXR4fmCFCwx/KL+UcNJMfN20DFa
9OyV9ZjfW4djOQ83CTTTrat6gx+wFkQHol/cSagPxfqUBynPvkTFFQNbYN2ojQWR
0IC86+BFqLAC5XF70vz1Mkb38ZY3IeNbo3RqidFO0YVkpBSG8gbEdmhEBsdkY/7p
99KHphrPyokhuCvYYru6Atlqd+Nb5ACMZv4jc1PnGwGhqzqtuQOCX2Q0c4S14Wx5
bxbUto1X/G1LzCUyUK0laovVuUQ2SqUvOR79Mn58Gb8vndH8aBvE3g==
=olFV
-----END PGP SIGNATURE-----

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2007

From @Juerd

Tels skribis 2007-06-26 22​:51 (-0700)​:

* it doesn't seem possible to say what replacement char is uses, '?'
seems the only possible choice and default

It's possible, but not entirely obvious​:

  juerd@​nano​:~$ perl -MEncode -le'print encode latin1 => "foo\x{0420}bar\x{e9}", sub { "%" }' | hexdump -C
  00000000 66 6f 6f 25 62 61 72 e9 0a |foo%bar..|
  00000009

* only the encode() variant actually works, the other two warn and
do not produce the right output

They seem to be using FB_PERLQQ for CHECK. The documentation says​:

  When the layer is pushed, the current value of
  $PerlIO​::encoding​::fallback is saved and used as the CHECK argument
  when calling the Encode methods encode() and decode().

This doesn't seem to work.
--
korajn salutojn,

  juerd waalboer​: perl hacker <juerd@​juerd.nl> <http​://juerd.nl/sig>
  convolution​: ict solutions and consultancy <sales@​convolution.nl>

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2007

The RT System itself - Status changed from 'new' to 'open'

@khwilliamson
Copy link
Contributor

These now print \x{...} for all characters not representable in 8859-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants