Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerlIO::encoding produces malformed utf8 #12493

Open
p5pRT opened this issue Oct 14, 2012 · 5 comments
Open

PerlIO::encoding produces malformed utf8 #12493

p5pRT opened this issue Oct 14, 2012 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 14, 2012

Migrated from rt.perl.org#115262 (status was 'open')

Searchable as RT115262$

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @cpansprout

PerlIO​::encoding passes invalid strings to encoding implementations.


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.17.5​:

Configured by sprout at Sat Sep 22 18​:51​:23 PDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 5) configuration​:
  Snapshot of​: 451f421
  Platform​:
  osname=darwin, osvers=10.5.0, archname=darwin-2level
  uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
  config_args='-de -Dusedevel -DDEBUGGING'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-O3 -g',
  cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.17.5​:
  /usr/local/lib/perl5/site_perl/5.17.5/darwin-2level
  /usr/local/lib/perl5/site_perl/5.17.5
  /usr/local/lib/perl5/5.17.5/darwin-2level
  /usr/local/lib/perl5/5.17.5
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.17.5​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/sprout
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From @cpansprout

On Sun Oct 14 14​:49​:40 2012, sprout wrote​:

PerlIO​::encoding passes invalid strings to encoding implementations.

A local mail server seemed to think this was spam and refused to send
the message until I had deleted the body. Here it is in full​:

use Encode​::Encoding;
package footf8 {
  @​ISA = Encode​::Encoding;
__PACKAGE__->Define('foo-tf8');
  sub encode($$;$) {
  my ($self, $buf, $chk) = @​_;
  use Devel​::Peek;
  Dump $buf;
  undef $_[1] if $chk;
  utf8​::encode $buf;
  $buf
  }
}
open $fh, ">encoding(foo-tf8)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That script dumps two malformed scalars, because the output is split in
the middle of chr 256.

Encode​::CN​::HZ actually expects this and uses some arcane Perl code
(which looks straightforward, but you have to know internals to
understand it) to work around it.

Other pure-Perl encoding implementations included with Encode.pm don’t work​:

open $fh, ">encoding(utf-7)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That produces malformed UTF8 messages.

PerlIO​::encoding should be caching the partial characters instead of
passing them to Perl code.


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.17.5​:

Configured by sprout at Sat Sep 22 18​:51​:23 PDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 5) configuration​:
  Snapshot of​: 451f421
  Platform​:
  osname=darwin, osvers=10.5.0, archname=darwin-2level
  uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri
nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
  config_args='-de -Dusedevel -DDEBUGGING'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-O3 -g',
  cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)',
gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
-fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.17.5​:
  /usr/local/lib/perl5/site_perl/5.17.5/darwin-2level
  /usr/local/lib/perl5/site_perl/5.17.5
  /usr/local/lib/perl5/5.17.5/darwin-2level
  /usr/local/lib/perl5/5.17.5
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.17.5​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/sprout
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
 
PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

From [Unknown Contact. See original ticket]

On Sun Oct 14 14​:49​:40 2012, sprout wrote​:

PerlIO​::encoding passes invalid strings to encoding implementations.

A local mail server seemed to think this was spam and refused to send
the message until I had deleted the body. Here it is in full​:

use Encode​::Encoding;
package footf8 {
  @​ISA = Encode​::Encoding;
__PACKAGE__->Define('foo-tf8');
  sub encode($$;$) {
  my ($self, $buf, $chk) = @​_;
  use Devel​::Peek;
  Dump $buf;
  undef $_[1] if $chk;
  utf8​::encode $buf;
  $buf
  }
}
open $fh, ">encoding(foo-tf8)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That script dumps two malformed scalars, because the output is split in
the middle of chr 256.

Encode​::CN​::HZ actually expects this and uses some arcane Perl code
(which looks straightforward, but you have to know internals to
understand it) to work around it.

Other pure-Perl encoding implementations included with Encode.pm don’t work​:

open $fh, ">encoding(utf-7)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That produces malformed UTF8 messages.

PerlIO​::encoding should be caching the partial characters instead of
passing them to Perl code.


Flags​:
  category=core
  severity=low


Site configuration information for perl 5.17.5​:

Configured by sprout at Sat Sep 22 18​:51​:23 PDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 5) configuration​:
  Snapshot of​: 451f421
  Platform​:
  osname=darwin, osvers=10.5.0, archname=darwin-2level
  uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri
nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '
  config_args='-de -Dusedevel -DDEBUGGING'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=undef, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-O3 -g',
  cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)',
gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
-fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib
  libs=-ldbm -ldl -lm -lutil -lc
  perllibs=-ldl -lm -lutil -lc
  libc=, so=dylib, useshrplib=false, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-L/usr/local/lib -fstack-protector'

Locally applied patches​:
 


@​INC for perl 5.17.5​:
  /usr/local/lib/perl5/site_perl/5.17.5/darwin-2level
  /usr/local/lib/perl5/site_perl/5.17.5
  /usr/local/lib/perl5/5.17.5/darwin-2level
  /usr/local/lib/perl5/5.17.5
  /usr/local/lib/perl5/site_perl
  .


Environment for perl 5.17.5​:
  DYLD_LIBRARY_PATH (unset)
  HOME=/Users/sprout
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
 
PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 14, 2012

@cpansprout - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 22, 2013

From @Leont

On Sun, Oct 14, 2012 at 11​:50 PM, Father Chrysostomos via RT
<perlbug-comment@​perl.org> wrote​:

use Encode​::Encoding;
package footf8 {
@​ISA = Encode​::Encoding;
__PACKAGE__->Define('foo-tf8');
sub encode($$;$) {
my ($self, $buf, $chk) = @​_;
use Devel​::Peek;
Dump $buf;
undef $_[1] if $chk;
utf8​::encode $buf;
$buf
}
}
open $fh, ">encoding(foo-tf8)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That script dumps two malformed scalars, because the output is split in
the middle of chr 256.

Encode​::CN​::HZ actually expects this and uses some arcane Perl code
(which looks straightforward, but you have to know internals to
understand it) to work around it.

Other pure-Perl encoding implementations included with Encode.pm don’t work​:

open $fh, ">encoding(utf-7)", \$s;
print $fh "a"x1023 . chr 256;
__END__

That produces malformed UTF8 messages.

PerlIO​::encoding should be caching the partial characters instead of
passing them to Perl code.

Yeah, this is the general design of the system. PerlIO doesn't do
characters, it does bytes. While you're right it could emulate
character semantics in Write(), it wouldn't be able to do the same in
Read() in variable-length encodings anyway, so the point is a bit
moot.

Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants