Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syswrite layer forgets to encode #9182

Closed
p5pRT opened this issue Jan 9, 2008 · 10 comments
Closed

syswrite layer forgets to encode #9182

p5pRT opened this issue Jan 9, 2008 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 9, 2008

Migrated from rt.perl.org#49548 (status was 'open')

Searchable as RT49548$

@p5pRT
Copy link
Author

p5pRT commented Jan 9, 2008

From markov@earth.overmeer.net

Created by markov@earth.overmeer.net

Running this simple script on Perl 5.8.* and 5.10.0

  my $x = "\x{fc}n";
  open OUT, '>​:encoding(latin1)', '/tmp/z' or die $!;
  syswrite OUT, $x, 2;
  close OUT;

produces an output file of three bytes utf8, although $x is not utf8.
Inside the syswrite(), the utf8 layer gets enabled but the encoding
layer apparenty not.

So, perldoc -f is correct, stating​:
  The "​:encoding(...)" layer implicitly introduces the "​:utf8" layer.
But that is only halfway the expected process.

As question aside​: $x is already in latin1. Is there no optimization
to avoid latin1 -> utf8 -> latin1 translations?

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.10.0:

Configured by markov at Tue Jan  1 23:12:26 CET 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.13-15.18-bigsmp, archname=i686-linux
    uname='linux earth 2.6.13-15.18-bigsmp #1 smp tue oct 2 17:36:20 utc 2007 i686 i686 i386 gnulinux '
    config_args='-de'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.0.2 20050901 (prerelease) (SUSE Linux)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    ../LogReport/lib
    /usr/local/lib/perl5/5.10.0/i686-linux
    /usr/local/lib/perl5/5.10.0
    /usr/local/lib/perl5/site_perl/5.10.0/i686-linux
    /usr/local/lib/perl5/site_perl/5.10.0
    .


Environment for perl 5.10.0:
    HOME=/home/markov
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/markov/shared/bin:~/shared/bin:/home/markov/shared/bin:~/shared/bin:/home/markov/shared/bin:~/shared/bin:/home/markov/shared/bin:/home/markov/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:./bin:./bin:./bin:./bin
    PERL5LIB=../LogReport/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2009

From @obra

Running this simple script on Perl 5.8.* and 5.10.0

my $x = "\x{fc}n";
open OUT, '>​:encoding(latin1)', '/tmp/z' or die $!;
syswrite OUT, $x, 2;
close OUT;

produces an output file of three bytes utf8, although $x is not utf8.
Inside the syswrite(), the utf8 layer gets enabled but the encoding
layer apparenty not.

Just to verify, this is still the case in 5.11.0 and 5.10.1. Could you
do me the favor of working up a patch that adds a test to the encoding
tests for perl?

Thanks,
Jesse

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2009

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @jkeenan

On Wed Jan 09 00​:52​:43 2008, markov@​earth.overmeer.net wrote​:

This is a bug report for perl from markov@​earth.overmeer.net,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

Running this simple script on Perl 5.8.* and 5.10.0

my $x = "\x{fc}n";
open OUT, '>​:encoding(latin1)', '/tmp/z' or die $!;
syswrite OUT, $x, 2;
close OUT;

produces an output file of three bytes utf8, although $x is not utf8.
Inside the syswrite(), the utf8 layer gets enabled but the encoding
layer apparenty not.

So, perldoc -f is correct, stating​:
The "​:encoding(...)" layer implicitly introduces the "​:utf8" layer.
But that is only halfway the expected process.

As question aside​: $x is already in latin1. Is there no optimization
to avoid latin1 -> utf8 -> latin1 translations?

I reviewed this older ticket today. The then pumpking requested a test case for 5.11, but none was submitted.

Is there anything here we still need to be concerned about? If not, then I would recommend the ticket be closed.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @Leont

On Sun, Jun 8, 2014 at 3​:15 PM, James E Keenan via RT <
perlbug-followup@​perl.org> wrote​:

Running this simple script on Perl 5.8.* and 5.10.0

my $x = "\x{fc}n";
open OUT, '>​:encoding(latin1)', '/tmp/z' or die $!;
syswrite OUT, $x, 2;
close OUT;

produces an output file of three bytes utf8, although $x is not utf8.
Inside the syswrite(), the utf8 layer gets enabled but the encoding
layer apparenty not.

So, perldoc -f is correct, stating​:
The "​:encoding(...)" layer implicitly introduces the "​:utf8" layer.
But that is only halfway the expected process.

As question aside​: $x is already in latin1. Is there no optimization
to avoid latin1 -> utf8 -> latin1 translations?

I reviewed this older ticket today. The then pumpking requested a test
case for 5.11, but none was submitted.

Is there anything here we still need to be concerned about? If not, then
I would recommend the ticket be closed.

This is a known issue for sysread, but I think it's not documented for
syswrite. They're both rather broken on UTF8 filehandles. This is
definitely a bug.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @jhi

On Sunday-201406-08, 10​:12, Leon Timmermans wrote​:

This is a known issue for sysread, but I think it's not documented for
syswrite. They're both rather broken on UTF8 filehandles. This is
definitely a bug.

Unless/until we have a (planned?) fix, it'd be more honest for them to
panic with UTF-8 filehandles, instead of producing garbage.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @Leont

On Sun, Jun 8, 2014 at 4​:16 PM, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

On Sunday-201406-08, 10​:12, Leon Timmermans wrote​:

This is a known issue for sysread, but I think it's not documented for
syswrite. They're both rather broken on UTF8 filehandles. This is
definitely a bug.

Unless/until we have a (planned?) fix, it'd be more honest for them to
panic with UTF-8 filehandles, instead of producing garbage.

I wouldn't mind that, but I'm sure some more bugwards-compatibility
oriented people would complain.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @karenetheridge

On Sun, Jun 08, 2014 at 04​:33​:11PM +0200, Leon Timmermans wrote​:

Unless/until we have a (planned?) fix, it'd be more honest for them to
panic with UTF-8 filehandles, instead of producing garbage.

I wouldn't mind that, but I'm sure some more bugwards-compatibility
oriented people would complain.

How difficult would it be to implement this, and then smoke the cpan to
guage the effects?

Or, we could JFDI now that we're early in the next dev cycle, so we can
revert it if it proves problematic.

@p5pRT
Copy link
Author

p5pRT commented Jun 8, 2014

From @Leont

On Sun, Jun 8, 2014 at 7​:19 PM, Karen Etheridge <perl@​froods.org> wrote​:

On Sun, Jun 08, 2014 at 04​:33​:11PM +0200, Leon Timmermans wrote​:

Unless/until we have a (planned?) fix, it'd be more honest for them to
panic with UTF-8 filehandles, instead of producing garbage.

I wouldn't mind that, but I'm sure some more bugwards-compatibility
oriented people would complain.

How difficult would it be to implement this, and then smoke the cpan to
guage the effects?

Implementing it should be fairly trivial (specially compared to recreating
the current behavior).

Leon

@Leont
Copy link
Contributor

Leont commented Mar 4, 2020

sysread and syswrite are no longer allowed on :utf8 handles, so this ticket can be closed.

@Leont Leont closed this as completed Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants