Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document deprecation of sysread on :utf8 handles #16544

Closed
p5pRT opened this issue May 2, 2018 · 20 comments
Closed

Document deprecation of sysread on :utf8 handles #16544

p5pRT opened this issue May 2, 2018 · 20 comments
Milestone

Comments

@p5pRT
Copy link

p5pRT commented May 2, 2018

Migrated from rt.perl.org#133170 (status was 'rejected')

Searchable as RT133170$

@p5pRT
Copy link
Author

p5pRT commented May 2, 2018

From @jimav

This is a bug report for perl from jim.avera@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.26.1.


`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

  Note that if the filehandle has been marked as "​:utf8", Unicode
  characters are read instead of bytes (the LENGTH, OFFSET, and the
  return value of "sysread" are in Unicode characters). The
  "​:encoding(...)" layer implicitly introduces the "​:utf8" layer.
  See "binmode", "open", and the open pragma.

However doing so provikes this at run time​:

  sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30

Suggest changing the documentation to say that this feature is deprecated,
so people don't waste time writing code which will become wrong later.



Flags​:
  category=core
  severity=low


Site configuration information for perl 5.26.1​:

Configured by Ubuntu at Sat Mar 10 18​:40​:42 UTC 2018.

Summary of my perl5 (revision 5 version 26 subversion 1) configuration​:
 
  Platform​:
  osname=linux
  osvers=4.9.0
  archname=x86_64-linux-gnu-thread-multi
  uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux '
  config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-5CtO_8/perl-5.26.1=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.26 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.26 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.26 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.26.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.26.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint
-Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.26.1'
  hint=recommended
  useposix=true
  d_sigaction=define
  useithreads=define
  usemultiplicity=define
  use64bitint=define
  use64bitall=define
  uselongdouble=undef
  usemymalloc=n
  default_inc_excludes_dot=define
  bincompat5005=undef
  Compiler​:
  cc='x86_64-linux-gnu-gcc'
  ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
  optimize='-O2 -g'
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
  ccversion=''
  gccversion='7.3.0'
  gccosandvers=''
  intsize=4
  longsize=8
  ptrsize=8
  doublesize=8
  byteorder=12345678
  doublekind=3
  d_longlong=define
  longlongsize=8
  d_longdbl=define
  longdblsize=16
  longdblkind=3
  ivtype='long'
  ivsize=8
  nvtype='double'
  nvsize=8
  Off_t='off_t'
  lseeksize=8
  alignbytes=8
  prototype=define
  Linker and Libraries​:
  ld='x86_64-linux-gnu-gcc'
  ldflags =' -fstack-protector-strong -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
  perllibs=-ldl -lm -lpthread -lc -lcrypt
  libc=libc-2.27.so
  so=so
  useshrplib=true
  libperl=libperl.so.5.26
  gnulibc_version='2.27'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs
  dlext=so
  d_dlsymun=undef
  ccdlflags='-Wl,-E'
  cccdlflags='-fPIC'
  lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'

Locally applied patches​:
  DEBPKG​:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
  DEBPKG​:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
  DEBPKG​:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
  DEBPKG​:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @​INC directories.
  DEBPKG​:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
  DEBPKG​:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
  DEBPKG​:fixes/respect_umask - Respect umask during installation
  DEBPKG​:debian/writable_site_dirs - Set umask approproately for site install directories
  DEBPKG​:debian/extutils_set_libperl_path - EU​:MM​: set location of libperl.a under /usr/lib
  DEBPKG​:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
  DEBPKG​:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
  DEBPKG​:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
  DEBPKG​:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
  DEBPKG​:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
  DEBPKG​:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
  DEBPKG​:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
  DEBPKG​:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
  DEBPKG​:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.26.1-6 in patchlevel.h
  DEBPKG​:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
  DEBPKG​:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN​::Distribution with correct name of html2text
  DEBPKG​:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
  DEBPKG​:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN​::FirstTime defaults with nonexisting site dirs if a parent is writable
  DEBPKG​:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize​::Storable​: respect 'nstore' option not respected
  DEBPKG​:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories
  DEBPKG​:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU​::MakeMaker honour MANnEXT settings in generated manpage headers
  DEBPKG​:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798
  DEBPKG​:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub
  DEBPKG​:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize
  DEBPKG​:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd
  DEBPKG​:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math​::Trig​: clarify definition of great_circle_midpoint
  DEBPKG​:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math​::Trig​: add missing SEE ALSO
  DEBPKG​:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math​::Trig​: document angle units
  DEBPKG​:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN​: Add link to main CPAN web site
  DEBPKG​:fixes/time_piece_doc - https://bugs.debian.org/817925 Time​::Piece​: Improve documentation for add_months and add_years
  DEBPKG​:fixes/extutils_makemaker_reproducible - https​://bugs.debian.org/835815 https://bugs.debian.org/834190 Make perllocal.pod files reproducible
  DEBPKG​:fixes/file_path_hurd_errno - File-Path​: Fix test failure in Hurd due to hard-coded ENOENT
  DEBPKG​:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems
  DEBPKG​:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters
  DEBPKG​:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack.
  DEBPKG​:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294)
  DEBPKG​:fixes/getopt-long-2 - [rt.cpan.org #120300] Withdraw part of commit 5d9947fb445327c7299d8beb009d609bc70066c0, which tries to implement more GNU getopt_long campatibility. GNU
  DEBPKG​:fixes/getopt-long-3 - provide a default value for optional arguments
  DEBPKG​:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068.
  DEBPKG​:fixes/test-builder-reset - https://bugs.debian.org/865894 Reset inside subtest maintains parent
  DEBPKG​:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa
  DEBPKG​:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4
  DEBPKG​:fixes/json-pp-example - [rt.cpan.org #92793] https://bugs.debian.org/871837 fix RT-92793​: bug in SYNOPSIS
  DEBPKG​:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less
  DEBPKG​:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
  DEBPKG​:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes
  DEBPKG​:fixes/rename-filexp.U-phase1 - regen-configure​: rename filexp.U to filexp_path.U, phase 1
  DEBPKG​:fixes/rename-filexp.U-phase2 - regen-configure​: rename filexp.U to filexp_path.U, phase 2
  DEBPKG​:fixes/packaging_test_skips - Skip various tests if PERL_BUILD_PACKAGING is set
  DEBPKG​:debian/mod_paths - Tweak @​INC ordering for Debian
  DEBPKG​:fixes/encode-alias-regexp - https​://bugs.debian.org/880085 fix dankogai/p5-encode#127
  DEBPKG​:fixes/regex-memory-leak - [910a6a8] https://bugs.debian.org/891196 [perl #132892] perl #132892​: avoid leak by mortalizing temporary copy of pattern
  DEBPKG​:fixes/CVE-2018-6797 - [perl #132227] (perl #132227) restart a node if we change to uni rules within the node and encounter a sharp S
  DEBPKG​:fixes/CVE-2018-6798/pt1 - [perl #132063] Heap buffer overflow
  DEBPKG​:fixes/CVE-2018-6798/pt2 - [perl #132063] 5.26.1​: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target
  DEBPKG​:fixes/CVE-2018-6798/pt3 - [perl #132063] (perl #132063) we should no longer warn for this code
  DEBPKG​:fixes/CVE-2018-6798/pt4 - [perl #132063] utf8.c​: Don't dump malformation past first NUL
  DEBPKG​:fixes/CVE-2018-6913 - [perl #131844] (perl #131844) fix various space calculation issues in pp_pack.c


@​INC for perl 5.26.1​:
  /home/jima/lib/perl
  /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi
  /home/jima/perl5/lib/perl5/5.26.1/x86_64-linux-gnu-thread-multi
  /home/jima/perl5/lib/perl5/5.26.1
  /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi
  /home/jima/perl5/lib/perl5
  /etc/perl
  /usr/local/lib/x86_64-linux-gnu/perl/5.26.1
  /usr/local/share/perl/5.26.1
  /usr/lib/x86_64-linux-gnu/perl5/5.26
  /usr/share/perl5
  /usr/lib/x86_64-linux-gnu/perl/5.26
  /usr/share/perl/5.26
  /home/jima/perl5/lib/perl5/5.26.0
  /home/jima/perl5/lib/perl5/5.26.0/x86_64-linux-gnu-thread-multi
  /usr/local/lib/site_perl
  /usr/lib/x86_64-linux-gnu/perl-base


Environment for perl 5.26.1​:
  HOME=/home/jima
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LC_COLLATE=C
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/jima/.local/bin​:/home/jima/perl5/bin​:/bin​:/home/jima/bin​:/home/jima/jima_tools/x86_64/bin​:/home/jima/jima_tools/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/bin/X11​:/usr/local/bin​:/usr/local/sbin​:/usr/games​:/usr/local/games​:/snap/bin​:/usr/lib/jvm/java-8-oracle/bin​:/usr/lib/jvm/java-8-oracle/db/bin​:/usr/lib/jvm/java-8-oracle/jre/bin​:.
  PERL5LIB=/home/jima/lib/perl​:/home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi​:/home/jima/perl5/lib/perl5
  PERL_BADLANG (unset)
  PERL_LOCAL_LIB_ROOT=/home/jima/perl5
  PERL_MB_OPT=--install_base /home/jima/perl5
  PERL_MM_OPT=INSTALL_BASE=/home/jima/perl5
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 2, 2018

From @Leont

On Wed, May 2, 2018 at 9​:39 PM, Jim Avera (via RT)
<perlbug-followup@​perl.org> wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "&#8203;:utf8"\, Unicode
characters are read instead of bytes \(the LENGTH\, OFFSET\, and the
return value of "sysread" are in Unicode characters\)\. The
"&#8203;:encoding\(\.\.\.\)" layer implicitly introduces the "&#8203;:utf8" layer\.
See "binmode"\, "open"\, and the open pragma\.

However doing so provikes this at run time​:

sysread\(\) is deprecated on :utf8 handles\. This will be a fatal error in Perl 5\.30

Suggest changing the documentation to say that this feature is deprecated,
so people don't waste time writing code which will become wrong later.

Indeed this should be modified.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 2, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @karenetheridge

IMO this should be considered a blocker for 5.28, as it is a documentation
issue for a change in this release.

On Wed, May 2, 2018 at 12​:46 PM, Leon Timmermans <fawaka@​gmail.com> wrote​:

On Wed, May 2, 2018 at 9​:39 PM, Jim Avera (via RT)
<perlbug-followup@​perl.org> wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "&#8203;:utf8"\, Unicode
characters are read instead of bytes \(the LENGTH\, OFFSET\, and the
return value of "sysread" are in Unicode characters\)\. The
"&#8203;:encoding\(\.\.\.\)" layer implicitly introduces the "&#8203;:utf8" layer\.
See "binmode"\, "open"\, and the open pragma\.

However doing so provikes this at run time​:

sysread\(\) is deprecated on :utf8 handles\. This will be a fatal error

in Perl 5.30

Suggest changing the documentation to say that this feature is
deprecated,
so people don't waste time writing code which will become wrong later.

Indeed this should be modified.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @tonycoz

On Wed, 02 May 2018 12​:39​:47 -0700, jim.avera@​gmail.com wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "​:utf8", Unicode
characters are read instead of bytes (the LENGTH, OFFSET, and the
return value of "sysread" are in Unicode characters). The
"​:encoding(...)" layer implicitly introduces the "​:utf8" layer.
See "binmode", "open", and the open pragma.

However doing so provikes this at run time​:

sysread() is deprecated on :utf8 handles. This will be a fatal error
in Perl 5.30

Suggest changing the documentation to say that this feature is
deprecated,
so people don't waste time writing code which will become wrong later.

How about the attached?

Tony

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @tonycoz

0001-perl-133170-document-deprecation-of-sysread-syswrite.patch
From d338352c918eae0919f56da288492ef9ac23f63a Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 3 May 2018 14:19:21 +1000
Subject: (perl #133170) document deprecation of sysread/syswrite/send/recv on
 :utf8

well, UTF8 flagged handles...
---
 pod/perlfunc.pod | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index fa08d4c3e9..170ae4f4e0 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6281,6 +6281,10 @@ string otherwise.  If there's an error, returns the undefined value.
 This call is actually implemented in terms of the L<recvfrom(2)> system call.
 See L<perlipc/"UDP: Message Passing"> for examples.
 
+Note that using C<recv> on a socket that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
 Note the I<characters>: depending on the status of the socket, either
 (8-bit) bytes or characters are received.  By default all sockets
 operate on bytes, but for example if the socket has been changed using
@@ -6288,7 +6292,9 @@ L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
 C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will
 operate on UTF8-encoded Unicode
 characters, not bytes.  Similarly for the C<:encoding> layer: in that
-case pretty much any characters can be read.
+case pretty much any characters can be read.  No validation is
+performed on the UTF-8, since any layers that perform such validation
+are bypassed by C<recv>.
 
 =item redo LABEL
 X<redo>
@@ -7080,6 +7086,10 @@ case it does a L<sendto(2)> syscall.  Returns the number of characters sent,
 or the undefined value on error.  The L<sendmsg(2)> syscall is currently
 unimplemented.  See L<perlipc/"UDP: Message Passing"> for examples.
 
+Note that using C<send> on a socket that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
 Note the I<characters>: depending on the status of the socket, either
 (8-bit) bytes or characters are sent.  By default all sockets operate
 on bytes, but for example if the socket has been changed using
@@ -8720,13 +8730,25 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
 anyway.  Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
 check for a return value for 0 to decide whether you're done.
 
-Note that if the filehandle has been marked as C<:utf8>, Unicode
-characters are read instead of bytes (the LENGTH, OFFSET, and the
-return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in Unicode characters).  The C<:encoding(...)> layer implicitly
-introduces the C<:utf8> layer.  See
-L<C<binmode>|/binmode FILEHANDLE, LAYER>,
-L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
+Note that using C<sysread> on a file that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
+If the filehandle has been marked as C<:utf8>, Unicode characters
+assumed to be UTF-8 encoded are read instead of bytes (the LENGTH,
+OFFSET, and the return value of L<C<sysread>|/sysread
+FILEHANDLE,SCALAR,LENGTH,OFFSET> are in Unicode characters).
+
+Note that UTF-8 encoded Unicode is read by C<sysread> even if the the
+C<:utf8> mark is introduced by a C<:encoding()> that isn't C<UTF-8>,
+nor is the UTF-8 validated.  Any other layers are also ignored, so if
+you've pushed layers to decompress your input and decode the result as
+C<UTF-16>, C<sysread> will treat your compressed UTF-16 data as
+C<UTF-8>.
+
+The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.  See
+L<C<binmode>|/binmode FILEHANDLE, LAYER>, L<C<open>|/open
+FILEHANDLE,EXPR>, and the L<open> pragma.
 
 =item sysseek FILEHANDLE,POSITION,WHENCE
 X<sysseek> X<lseek>
@@ -8888,6 +8910,10 @@ B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters
 encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
 return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET>
 are in (UTF8-encoded Unicode) characters.
+
+C<syswrite> on a filehandle marked C<:utf8> is deprecated, and will
+raise an exception in a future version of perl.
+
 The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
 Alternately, if the handle is not marked with an encoding but you
 attempt to write characters with code points over 255, raises an exception.
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @jimav

On 5/2/18 9​:21 PM, Tony Cook via RT wrote
| How about the attached?

Hi Tony,

Is it specifically :utf8 which will not be allowed, i.e., other layers
might still be allowed on a sysread file handle in v5.30?  I didn't
understand the new text which discussed interactions between the :utf8
layer and other layers such as :utf16.

Does it all boil down to requiring that the file handle read raw binary
octets (e.g. after binmode($fh) is called)?   If so it might be better
to just say the file handle must be in :raw mode rather than mention any
_specific_ encoding such as utf8.

-Jim

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @tonycoz

On Wed, 02 May 2018 23​:40​:58 -0700, jim.avera@​gmail.com wrote​:

On 5/2/18 9​:21 PM, Tony Cook via RT wrote
| How about the attached?

Hi Tony,

Is it specifically :utf8 which will not be allowed, i.e., other layers
might still be allowed on a sysread file handle in v5.30?  I didn't
understand the new text which discussed interactions between the :utf8
layer and other layers such as :utf16.

Does it all boil down to requiring that the file handle read raw binary
octets (e.g. after binmode($fh) is called)?   If so it might be better
to just say the file handle must be in :raw mode rather than mention any
_specific_ encoding such as utf8.

The problem isn't all layers.

The problem is specifically the way sysread etc handle layers that have the PERLIO_K_UTF8 flag set on them.

This includes the :utf8 layer (which is currently not a real layer) and :encoding() (as the sysread documentation mentions) and a hypothetical :utf16 layer would also set it, assuming it's intended to decode utf-16 characters into perl's internal extended UTF-8 so perl can deal with it as characters.

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, at which point it ignores the rest, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Tony

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @jimav

On 5/3/18 2​:40 AM, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, at which point it ignores the rest, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal
character representation which users really shouldn't need to be aware
of.   I hope some solution can be found which doesn't _require_
documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary
layers correctly, using buffering when data-transforming layers are
present but not otherwise?  What if sysread just called fh->read() in
those cases?

If buffering is used, then​: If the underlying device is seekable,
left-over octets in the hidden buffer should be discarded and a seek
done so they will be re-read later; that would protect coherency if
other cooperating processes might randomly update the file.

If the underlying source is not seekable, then left-over octets would
have to stay in the hidden buffer, but that's okay because there is no
way for those bytes to mutate before they are called for by the
application.  Note that for a tty in canonical mode, the OS will only
return one line at a time at least on *nix.

Just some uninformed ideas...

-Jim

@p5pRT
Copy link
Author

p5pRT commented May 3, 2018

From @jimav

On 5/3/18 4​:40 PM, Jim Avera wrote​:

What if sysread just called fh->read() in those cases?

In essence, my proposal is to make sysread() an synonym for fh->read()
with the exception that if the underlying source is seekable, then any
left-over octets (not needed to satisfy LENGTH characters) would be
discarded after each call and a seek done to re-read them later; and,
that buffering will be entirely skipped if there is no data-transforming
layer on the file descriptor.

Happily, :encoding(utf8) is not data-transforming because that is perl's
internal representation so the octets can simply be put into the user's
buffer and the utf8 flag set.

Even transforming decoders might often avoid left-over octets (and thus
avoid the seek-back) by predicting the number of octets needed in common
cases. For example, a UTF-16 decoder could read LENGTH*2 octets and that
would suffice if the codepoints happened to be ascii.   More
realistically a ISO-8859-1 decoder could guess LENGTH*1 and often be
right.  In other words, seeking-back might not be a big performance hit
in practice.  And any really perf-sensitive app shouldn't be using
layers at all, but should sysread() a raw file handle and do its own
decoding.

-Jim

@p5pRT
Copy link
Author

p5pRT commented May 4, 2018

From @Leont

On Fri, May 4, 2018 at 1​:40 AM, Jim Avera <jim.avera@​gmail.com> wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers
correctly, using buffering when data-transforming layers are present but not
otherwise?

If you want that, why wouldn't you just use read?

Leon

@p5pRT
Copy link
Author

p5pRT commented May 4, 2018

From @jimav

On 5/3/18 5​:05 PM, Leon Timmermans wrote​:

On Fri, May 4, 2018 at 1​:40 AM, Jim Avera <jim.avera@​gmail.com> wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers
correctly, using buffering when data-transforming layers are present but not
otherwise?
If you want that, why wouldn't you just use read?

Leon

Yes, but I gather there is all this complexity (desired by someone) to
allow certain layers to work with sysread(). Personally I would be happy
if sysread simply disallowed any layers, i.e. required a raw file handle.

On the other hand, if the app wants Unicode characters, it is convenient
that perl's internal rep is utf8, so reading from a fh with
:encoding(utf8) should be possible with no actual extra overhead (just
setting the utf8 flag on the user's buffer). Disallowing that one case
seems strange from a user perspective.

-Ko,

@p5pRT
Copy link
Author

p5pRT commented May 4, 2018

From @Grinnz

On Thu, May 3, 2018 at 8​:14 PM, Jim Avera <jim.avera@​gmail.com> wrote​:

On 5/3/18 5​:05 PM, Leon Timmermans wrote​:

On Fri, May 4, 2018 at 1​:40 AM, Jim Avera <jim.avera@​gmail.com> wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers
correctly, using buffering when data-transforming layers are present but
not
otherwise?

If you want that, why wouldn't you just use read?

Leon

Yes, but I gather there is all this complexity (desired by someone) to
allow certain layers to work with sysread(). Personally I would be happy if
sysread simply disallowed any layers, i.e. required a raw file handle.

On the other hand, if the app wants Unicode characters, it is convenient
that perl's internal rep is utf8, so reading from a fh with :encoding(utf8)
should be possible with no actual extra overhead (just setting the utf8
flag on the user's buffer). Disallowing that one case seems strange from a
user perspective.

-Ko,

From a user perspective, the utf8 flag should be irrelevant, and the
non-strict :utf8 or :encoding(utf8) layers shouldn't be used.

-Dan

@p5pRT
Copy link
Author

p5pRT commented May 6, 2018

From @tonycoz

On Thu, 03 May 2018 16​:41​:07 -0700, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only
one part of the layer stack - whether that PERLIO_K_UTF8 flag is set,
at which point it ignores the rest, slurps in the bytes and marks
them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the
layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal
character representation which users really shouldn't need to be aware
of.   I hope some solution can be found which doesn't _require_
documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary
layers correctly, using buffering when data-transforming layers are
present but not otherwise?  What if sysread just called fh->read() in
those cases?

Well, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

@p5pRT
Copy link
Author

p5pRT commented May 7, 2018

From @jimav

On 5/6/18 4​:35 PM, Tony Cook via RT wrote​:

Well, that would completely change the behaviour of sysread() in the
case of non-UTF-8 flagged file handles that have other layers on them.
...
This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

That seems to be some kind of secret or protected ticket!

  RT Error
  No permission to display that ticket
No details

@p5pRT
Copy link
Author

p5pRT commented May 7, 2018

From @tonycoz

On Sun, 06 May 2018 21​:14​:24 -0700, jim.avera@​gmail.com wrote​:

On 5/6/18 4​:35 PM, Tony Cook via RT wrote​:

Well, that would completely change the behaviour of sysread() in the
case of non-UTF-8 flagged file handles that have other layers on them.
...
This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

That seems to be some kind of secret or protected ticket!

  RT Error
  No permission to display that ticket
No details

I can see it as an anonymous guest (I opened a new browser).

Searching for the ticket number sent me to​:

https://rt.perl.org/Public/Bug/Display.html?id=125760

as did pasting the non-/Public/ address into the address bar.

If you still can't see it you might want to check with perlbug-admin (see the page footer) to see if something is messed up for your account.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 17, 2018

From @jkeenan

On Sun, 06 May 2018 23​:35​:33 GMT, tonyc wrote​:

On Thu, 03 May 2018 16​:41​:07 -0700, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only
one part of the layer stack - whether that PERLIO_K_UTF8 flag is
set,
at which point it ignores the rest, slurps in the bytes and marks
them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the
layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal
character representation which users really shouldn't need to be
aware
of.   I hope some solution can be found which doesn't _require_
documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary
layers correctly, using buffering when data-transforming layers are
present but not otherwise?  What if sysread just called fh->read() in
those cases?

Well, that would completely change the behaviour of sysread() in the
case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not
silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Oct 22, 2018

From @tonycoz

On Wed, 17 Oct 2018 06​:13​:06 -0700, jkeenan wrote​:

On Sun, 06 May 2018 23​:35​:33 GMT, tonyc wrote​:

On Thu, 03 May 2018 16​:41​:07 -0700, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only
one part of the layer stack - whether that PERLIO_K_UTF8 flag is
set,
at which point it ignores the rest, slurps in the bytes and marks
them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the
layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal
character representation which users really shouldn't need to be
aware
of.   I hope some solution can be found which doesn't _require_
documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary
layers correctly, using buffering when data-transforming layers are
present but not otherwise?  What if sysread just called fh->read() in
those cases?

Well, that would completely change the behaviour of sysread() in the
case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not
silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

No, this ticket is obsoleted by those operators now being fatal on :utf8 handles and the documentation updates that included.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2018

From @jkeenan

On Mon, 22 Oct 2018 23​:53​:10 GMT, tonyc wrote​:

On Wed, 17 Oct 2018 06​:13​:06 -0700, jkeenan wrote​:

On Sun, 06 May 2018 23​:35​:33 GMT, tonyc wrote​:

On Thu, 03 May 2018 16​:41​:07 -0700, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to
only
one part of the layer stack - whether that PERLIO_K_UTF8 flag
is
set,
at which point it ignores the rest, slurps in the bytes and
marks
them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the
layers - reading (or writing) bytes from/to the underlying
stream.

Hmm.  That's an unfortunate complexity involving perl's internal
character representation which users really shouldn't need to be
aware
of.   I hope some solution can be found which doesn't _require_
documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle
arbitrary
layers correctly, using buffering when data-transforming layers
are
present but not otherwise?  What if sysread just called fh-

read() in
those cases?

Well, that would completely change the behaviour of sysread() in
the
case of non-UTF-8 flagged file handles that have other layers on
them.

One reason for making this a deprecation warning is so we're not
silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

No, this ticket is obsoleted by those operators now being fatal on
:utf8 handles and the documentation updates that included.

Ok, closing.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2018

@jkeenan - Status changed from 'open' to 'rejected'

@p5pRT p5pRT closed this as completed Oct 23, 2018
@toddr toddr added this to the 5.30.0 milestone Oct 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants