Skip Menu |
Report information
Id: 133170
Status: rejected
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: jim.avera [at] gmail.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)

Attachments
0001-perl-133170-document-deprecation-of-sysread-syswrite.patch



Date: Wed, 2 May 2018 12:39:33 -0700 (PDT)
Subject: Document deprecation of sysread on :utf8 handles
From: jim.avera [...] gmail.com
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 12.6k
This is a bug report for perl from jim.avera@gmail.com, generated with the help of perlbug 1.40 running under perl 5.26.1. ----------------------------------------------------------------- `perlfunc -f sysread` says using :utf8 handles are perfectly okay: Note that if the filehandle has been marked as ":utf8", Unicode characters are read instead of bytes (the LENGTH, OFFSET, and the return value of "sysread" are in Unicode characters). The ":encoding(...)" layer implicitly introduces the ":utf8" layer. See "binmode", "open", and the open pragma. However doing so provikes this at run time: sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 Suggest changing the documentation to say that this feature is deprecated, so people don't waste time writing code which will become wrong later. ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.26.1: Configured by Ubuntu at Sat Mar 10 18:40:42 UTC 2018. Summary of my perl5 (revision 5 version 26 subversion 1) configuration: Platform: osname=linux osvers=4.9.0 archname=x86_64-linux-gnu-thread-multi uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-5CtO_8/perl-5.26.1=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.26 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.26 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.26 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.26.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.26.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.26.1' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='x86_64-linux-gnu-gcc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2 -g' cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='' gccversion='7.3.0' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='x86_64-linux-gnu-gcc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=libc-2.27.so so=so useshrplib=true libperl=libperl.so.5.26 gnulibc_version='2.27' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -L/usr/local/lib -fstack-protector-strong' Locally applied patches: DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking DEBPKG:fixes/respect_umask - Respect umask during installation DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.26.1-6 in patchlevel.h DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags} DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798 DEBPKG:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site DEBPKG:fixes/time_piece_doc - https://bugs.debian.org/817925 Time::Piece: Improve documentation for add_months and add_years DEBPKG:fixes/extutils_makemaker_reproducible - https://bugs.debian.org/835815 https://bugs.debian.org/834190 Make perllocal.pod files reproducible DEBPKG:fixes/file_path_hurd_errno - File-Path: Fix test failure in Hurd due to hard-coded ENOENT DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters DEBPKG:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack. DEBPKG:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294) DEBPKG:fixes/getopt-long-2 - [rt.cpan.org #120300] Withdraw part of commit 5d9947fb445327c7299d8beb009d609bc70066c0, which tries to implement more GNU getopt_long campatibility. GNU DEBPKG:fixes/getopt-long-3 - provide a default value for optional arguments DEBPKG:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068. DEBPKG:fixes/test-builder-reset - https://bugs.debian.org/865894 Reset inside subtest maintains parent DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4 DEBPKG:fixes/json-pp-example - [rt.cpan.org #92793] https://bugs.debian.org/871837 fix RT-92793: bug in SYNOPSIS DEBPKG:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes DEBPKG:fixes/rename-filexp.U-phase1 - regen-configure: rename filexp.U to filexp_path.U, phase 1 DEBPKG:fixes/rename-filexp.U-phase2 - regen-configure: rename filexp.U to filexp_path.U, phase 2 DEBPKG:fixes/packaging_test_skips - Skip various tests if PERL_BUILD_PACKAGING is set DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:fixes/encode-alias-regexp - https://bugs.debian.org/880085 fix https://github.com/dankogai/p5-encode/issues/127 DEBPKG:fixes/regex-memory-leak - [910a6a8] https://bugs.debian.org/891196 [perl #132892] perl #132892: avoid leak by mortalizing temporary copy of pattern DEBPKG:fixes/CVE-2018-6797 - [perl #132227] (perl #132227) restart a node if we change to uni rules within the node and encounter a sharp S DEBPKG:fixes/CVE-2018-6798/pt1 - [perl #132063] Heap buffer overflow DEBPKG:fixes/CVE-2018-6798/pt2 - [perl #132063] 5.26.1: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target DEBPKG:fixes/CVE-2018-6798/pt3 - [perl #132063] (perl #132063) we should no longer warn for this code DEBPKG:fixes/CVE-2018-6798/pt4 - [perl #132063] utf8.c: Don't dump malformation past first NUL DEBPKG:fixes/CVE-2018-6913 - [perl #131844] (perl #131844) fix various space calculation issues in pp_pack.c --- @INC for perl 5.26.1: /home/jima/lib/perl /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5/5.26.1/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5/5.26.1 /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /home/jima/perl5/lib/perl5/5.26.0 /home/jima/perl5/lib/perl5/5.26.0/x86_64-linux-gnu-thread-multi /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base --- Environment for perl 5.26.1: HOME=/home/jima LANG=en_US.UTF-8 LANGUAGE (unset) LC_COLLATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/jima/.local/bin:/home/jima/perl5/bin:/bin:/home/jima/bin:/home/jima/jima_tools/x86_64/bin:/home/jima/jima_tools/bin:/usr/bin:/usr/sbin:/sbin:/usr/bin/X11:/usr/local/bin:/usr/local/sbin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:. PERL5LIB=/home/jima/lib/perl:/home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/jima/perl5/lib/perl5 PERL_BADLANG (unset) PERL_LOCAL_LIB_ROOT=/home/jima/perl5 PERL_MB_OPT=--install_base /home/jima/perl5 PERL_MM_OPT=INSTALL_BASE=/home/jima/perl5 SHELL=/bin/bash
Date: Wed, 2 May 2018 21:46:01 +0200
From: Leon Timmermans <fawaka [...] gmail.com>
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
CC: bugs-bitbucket [...] rt.perl.org
To: Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 812b
On Wed, May 2, 2018 at 9:39 PM, Jim Avera (via RT) <perlbug-followup@perl.org> wrote: Show quoted text
> `perlfunc -f sysread` says using :utf8 handles are perfectly okay: > > Note that if the filehandle has been marked as ":utf8", Unicode > characters are read instead of bytes (the LENGTH, OFFSET, and the > return value of "sysread" are in Unicode characters). The > ":encoding(...)" layer implicitly introduces the ":utf8" layer. > See "binmode", "open", and the open pragma. > > However doing so provikes this at run time: > > sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 > > Suggest changing the documentation to say that this feature is deprecated, > so people don't waste time writing code which will become wrong later.
Indeed this should be modified. Leon
To: Leon Timmermans <fawaka [...] gmail.com>
Date: Wed, 2 May 2018 17:24:33 -0700
From: Karen Etheridge <perl [...] froods.org>
CC: Perl5 Porters <perl5-porters [...] perl.org>, bugs-bitbucket [...] rt.perl.org
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
IMO this should be considered a blocker for 5.28, as it is a documentation issue for a change in this release.



On Wed, May 2, 2018 at 12:46 PM, Leon Timmermans <fawaka@gmail.com> wrote:
Show quoted text
On Wed, May 2, 2018 at 9:39 PM, Jim Avera (via RT)
<perlbug-followup@perl.org> wrote:
> `perlfunc -f sysread` says using :utf8 handles are perfectly okay:
>
>     Note that if the filehandle has been marked as ":utf8", Unicode
>     characters are read instead of bytes (the LENGTH, OFFSET, and the
>     return value of "sysread" are in Unicode characters). The
>     ":encoding(...)" layer implicitly introduces the ":utf8" layer.
>     See "binmode", "open", and the open pragma.
>
> However doing so provikes this at run time:
>
>     sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30
>
> Suggest changing the documentation to say that this feature is deprecated,
> so people don't waste time writing code which will become wrong later.

Indeed this should be modified.

Leon

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 764b
On Wed, 02 May 2018 12:39:47 -0700, jim.avera@gmail.com wrote: Show quoted text
> `perlfunc -f sysread` says using :utf8 handles are perfectly okay: > > Note that if the filehandle has been marked as ":utf8", Unicode > characters are read instead of bytes (the LENGTH, OFFSET, and the > return value of "sysread" are in Unicode characters). The > ":encoding(...)" layer implicitly introduces the ":utf8" layer. > See "binmode", "open", and the open pragma. > > However doing so provikes this at run time: > > sysread() is deprecated on :utf8 handles. This will be a fatal error > in Perl 5.30 > > Suggest changing the documentation to say that this feature is > deprecated, > so people don't waste time writing code which will become wrong later.
How about the attached? Tony
Subject: 0001-perl-133170-document-deprecation-of-sysread-syswrite.patch
From d338352c918eae0919f56da288492ef9ac23f63a Mon Sep 17 00:00:00 2001 From: Tony Cook <tony@develop-help.com> Date: Thu, 3 May 2018 14:19:21 +1000 Subject: (perl #133170) document deprecation of sysread/syswrite/send/recv on :utf8 well, UTF8 flagged handles... --- pod/perlfunc.pod | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index fa08d4c3e9..170ae4f4e0 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -6281,6 +6281,10 @@ string otherwise. If there's an error, returns the undefined value. This call is actually implemented in terms of the L<recvfrom(2)> system call. See L<perlipc/"UDP: Message Passing"> for examples. +Note that using C<recv> on a socket that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + Note the I<characters>: depending on the status of the socket, either (8-bit) bytes or characters are received. By default all sockets operate on bytes, but for example if the socket has been changed using @@ -6288,7 +6292,9 @@ L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will operate on UTF8-encoded Unicode characters, not bytes. Similarly for the C<:encoding> layer: in that -case pretty much any characters can be read. +case pretty much any characters can be read. No validation is +performed on the UTF-8, since any layers that perform such validation +are bypassed by C<recv>. =item redo LABEL X<redo> @@ -7080,6 +7086,10 @@ case it does a L<sendto(2)> syscall. Returns the number of characters sent, or the undefined value on error. The L<sendmsg(2)> syscall is currently unimplemented. See L<perlipc/"UDP: Message Passing"> for examples. +Note that using C<send> on a socket that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + Note the I<characters>: depending on the status of the socket, either (8-bit) bytes or characters are sent. By default all sockets operate on bytes, but for example if the socket has been changed using @@ -8720,13 +8730,25 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys) anyway. Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and check for a return value for 0 to decide whether you're done. -Note that if the filehandle has been marked as C<:utf8>, Unicode -characters are read instead of bytes (the LENGTH, OFFSET, and the -return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> -are in Unicode characters). The C<:encoding(...)> layer implicitly -introduces the C<:utf8> layer. See -L<C<binmode>|/binmode FILEHANDLE, LAYER>, -L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma. +Note that using C<sysread> on a file that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + +If the filehandle has been marked as C<:utf8>, Unicode characters +assumed to be UTF-8 encoded are read instead of bytes (the LENGTH, +OFFSET, and the return value of L<C<sysread>|/sysread +FILEHANDLE,SCALAR,LENGTH,OFFSET> are in Unicode characters). + +Note that UTF-8 encoded Unicode is read by C<sysread> even if the the +C<:utf8> mark is introduced by a C<:encoding()> that isn't C<UTF-8>, +nor is the UTF-8 validated. Any other layers are also ignored, so if +you've pushed layers to decompress your input and decode the result as +C<UTF-16>, C<sysread> will treat your compressed UTF-16 data as +C<UTF-8>. + +The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. See +L<C<binmode>|/binmode FILEHANDLE, LAYER>, L<C<open>|/open +FILEHANDLE,EXPR>, and the L<open> pragma. =item sysseek FILEHANDLE,POSITION,WHENCE X<sysseek> X<lseek> @@ -8888,6 +8910,10 @@ B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET> are in (UTF8-encoded Unicode) characters. + +C<syswrite> on a filehandle marked C<:utf8> is deprecated, and will +raise an exception in a future version of perl. + The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. Alternately, if the handle is not marked with an encoding but you attempt to write characters with code points over 255, raises an exception. -- 2.11.0
Date: Wed, 2 May 2018 23:40:30 -0700
From: Jim Avera <jim.avera [...] gmail.com>
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 593b
On 5/2/18 9:21 PM, Tony Cook via RT wrote | How about the attached? Hi Tony, Is it specifically :utf8 which will not be allowed, i.e., other layers might still be allowed on a sysread file handle in v5.30?  I didn't understand the new text which discussed interactions between the :utf8 layer and other layers such as :utf16. Does it all boil down to requiring that the file handle read raw binary octets (e.g. after binmode($fh) is called)?   If so it might be better to just say the file handle must be in :raw mode rather than mention any _specific_ encoding such as utf8. -Jim
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.4k
On Wed, 02 May 2018 23:40:58 -0700, jim.avera@gmail.com wrote: Show quoted text
> On 5/2/18 9:21 PM, Tony Cook via RT wrote > | How about the attached? > > Hi Tony, > > Is it specifically :utf8 which will not be allowed, i.e., other layers > might still be allowed on a sysread file handle in v5.30?  I didn't > understand the new text which discussed interactions between the :utf8 > layer and other layers such as :utf16. > > Does it all boil down to requiring that the file handle read raw binary > octets (e.g. after binmode($fh) is called)?   If so it might be better > to just say the file handle must be in :raw mode rather than mention any > _specific_ encoding such as utf8.
The problem isn't all layers. The problem is specifically the way sysread etc handle layers that have the PERLIO_K_UTF8 flag set on them. This includes the :utf8 layer (which is currently not a real layer) and :encoding() (as the sysread documentation mentions) and a hypothetical :utf16 layer would also set it, assuming it's intended to decode utf-16 characters into perl's internal extended UTF-8 so perl can deal with it as characters. The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, at which point it ignores the rest, slurps in the bytes and marks them as SVf_UTF8. With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream. Tony
Date: Thu, 3 May 2018 16:40:41 -0700
To: perlbug-followup [...] perl.org
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
From: Jim Avera <jim.avera [...] gmail.com>
Download (untitled) / with headers
text/plain 1.4k
On 5/3/18 2:40 AM, Tony Cook via RT wrote: Show quoted text
> The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, at which point it ignores the rest, slurps in the bytes and marks them as SVf_UTF8. > > With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.
Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this. Is there any foreseeable path to making sysread() handle arbitrary layers correctly, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh->read() in those cases? If buffering is used, then: If the underlying device is seekable, left-over octets in the hidden buffer should be discarded and a seek done so they will be re-read later; that would protect coherency if other cooperating processes might randomly update the file. If the underlying source is not seekable, then left-over octets would have to stay in the hidden buffer, but that's okay because there is no way for those bytes to mutate before they are called for by the application.  Note that for a tty in canonical mode, the OS will only return one line at a time at least on *nix. Just some uninformed ideas... -Jim
Date: Thu, 3 May 2018 16:57:55 -0700
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
From: Jim Avera <jim.avera [...] gmail.com>
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 1.2k
On 5/3/18 4:40 PM, Jim Avera wrote: Show quoted text
> What if sysread just called fh->read() in those cases?
In essence, my proposal is to make sysread() an synonym for fh->read() with the exception that if the underlying source is seekable, then any left-over octets (not needed to satisfy LENGTH characters) would be discarded after each call and a seek done to re-read them later; and, that buffering will be entirely skipped if there is no data-transforming layer on the file descriptor. Happily, :encoding(utf8) is not data-transforming because that is perl's internal representation so the octets can simply be put into the user's buffer and the utf8 flag set. Even transforming decoders might often avoid left-over octets (and thus avoid the seek-back) by predicting the number of octets needed in common cases. For example, a UTF-16 decoder could read LENGTH*2 octets and that would suffice if the codepoints happened to be ascii.   More realistically a ISO-8859-1 decoder could guess LENGTH*1 and often be right.  In other words, seeking-back might not be a big performance hit in practice.  And any really perf-sensitive app shouldn't be using layers at all, but should sysread() a raw file handle and do its own decoding. -Jim
To: jim.avera [...] gmail.com
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
CC: perlbug <perlbug-followup [...] perl.org>
From: Leon Timmermans <fawaka [...] gmail.com>
Date: Fri, 4 May 2018 02:05:02 +0200
Download (untitled) / with headers
text/plain 296b
On Fri, May 4, 2018 at 1:40 AM, Jim Avera <jim.avera@gmail.com> wrote: Show quoted text
> Is there any foreseeable path to making sysread() handle arbitrary layers > correctly, using buffering when data-transforming layers are present but not > otherwise?
If you want that, why wouldn't you just use read? Leon
Date: Thu, 3 May 2018 17:14:25 -0700
CC: perlbug <perlbug-followup [...] perl.org>
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
From: Jim Avera <jim.avera [...] gmail.com>
To: Leon Timmermans <fawaka [...] gmail.com>
Download (untitled) / with headers
text/plain 894b
On 5/3/18 5:05 PM, Leon Timmermans wrote: Show quoted text
> On Fri, May 4, 2018 at 1:40 AM, Jim Avera <jim.avera@gmail.com> wrote:
>> Is there any foreseeable path to making sysread() handle arbitrary layers >> correctly, using buffering when data-transforming layers are present but not >> otherwise?
> If you want that, why wouldn't you just use read? > > Leon
Yes, but I gather there is all this complexity (desired by someone) to allow certain layers to work with sysread(). Personally I would be happy if sysread simply disallowed any layers, i.e. required a raw file handle. On the other hand, if the app wants Unicode characters, it is convenient that perl's internal rep is utf8, so reading from a fh with :encoding(utf8) should be possible with no actual extra overhead (just setting the utf8 flag on the user's buffer). Disallowing that one case seems strange from a user perspective. -Ko,
To: jim.avera [...] gmail.com
CC: Leon Timmermans <fawaka [...] gmail.com>, perlbug <perlbug-followup [...] perl.org>
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
From: Dan Book <grinnz [...] gmail.com>
Date: Thu, 3 May 2018 20:22:24 -0400
Download (untitled) / with headers
text/plain 1.1k
On Thu, May 3, 2018 at 8:14 PM, Jim Avera <jim.avera@gmail.com> wrote:
Show quoted text
On 5/3/18 5:05 PM, Leon Timmermans wrote:
On Fri, May 4, 2018 at 1:40 AM, Jim Avera <jim.avera@gmail.com> wrote:
Is there any foreseeable path to making sysread() handle arbitrary layers
correctly, using buffering when data-transforming layers are present but not
otherwise?
If you want that, why wouldn't you just use read?

Leon


Yes, but I gather there is all this complexity (desired by someone) to allow certain layers to work with sysread(). Personally I would be happy if sysread simply disallowed any layers, i.e. required a raw file handle.

On the other hand, if the app wants Unicode characters, it is convenient that perl's internal rep is utf8, so reading from a fh with :encoding(utf8) should be possible with no actual extra overhead (just setting the utf8 flag on the user's buffer). Disallowing that one case seems strange from a user perspective.

-Ko,

From a user perspective, the utf8 flag should be irrelevant, and the non-strict :utf8 or :encoding(utf8) layers shouldn't be used.

-Dan
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.2k
On Thu, 03 May 2018 16:41:07 -0700, jim.avera@gmail.com wrote: Show quoted text
> On 5/3/18 2:40 AM, Tony Cook via RT wrote:
> > The underlying problem is that sysread() etc pay attention to only > > one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, > > at which point it ignores the rest, slurps in the bytes and marks > > them as SVf_UTF8. > > > > With non-PERLIO_K_UTF8 layers sysread etc completely ignore the > > layers - reading (or writing) bytes from/to the underlying stream.
> > Hmm.  That's an unfortunate complexity involving perl's internal > character representation which users really shouldn't need to be aware > of.   I hope some solution can be found which doesn't _require_ > documenting and user-understanding of this. > > Is there any foreseeable path to making sysread() handle arbitrary > layers correctly, using buffering when data-transforming layers are > present but not otherwise?  What if sysread just called fh->read() in > those cases?
Well, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them. One reason for making this a deprecation warning is so we're not silently changing this behaviour. This deprecation was originally discussed in: https://rt.perl.org/Ticket/Display.html?id=125760 Tony
To: perlbug-followup [...] perl.org
Subject: Re: [perl #133170] Document deprecation of sysread on :utf8 handles
From: Jim Avera <jim.avera [...] gmail.com>
Date: Sun, 6 May 2018 21:14:00 -0700
Download (untitled) / with headers
text/plain 426b
On 5/6/18 4:35 PM, Tony Cook via RT wrote:
Show quoted text
Well, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them. ...
This deprecation was originally discussed in:

https://rt.perl.org/Ticket/Display.html?id=125760


That seems to be some kind of secret or protected ticket! 

  RT Error
  No permission to display that ticket
  No details

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 889b
On Sun, 06 May 2018 21:14:24 -0700, jim.avera@gmail.com wrote: Show quoted text
> On 5/6/18 4:35 PM, Tony Cook via RT wrote:
> > Well, that would completely change the behaviour of sysread() in the > > case of non-UTF-8 flagged file handles that have other layers on them. > > ... > > This deprecation was originally discussed in: > > > > https://rt.perl.org/Ticket/Display.html?id=125760 > > > >
> That seems to be some kind of secret or protected ticket! > >   RT Error >   No permission to display that ticket > No details
I can see it as an anonymous guest (I opened a new browser). Searching for the ticket number sent me to: https://rt.perl.org/Public/Bug/Display.html?id=125760 as did pasting the non-/Public/ address into the address bar. If you still can't see it you might want to check with perlbug-admin (see the page footer) to see if something is messed up for your account. Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.5k
On Sun, 06 May 2018 23:35:33 GMT, tonyc wrote: Show quoted text
> On Thu, 03 May 2018 16:41:07 -0700, jim.avera@gmail.com wrote:
> > On 5/3/18 2:40 AM, Tony Cook via RT wrote:
> > > The underlying problem is that sysread() etc pay attention to only > > > one part of the layer stack - whether that PERLIO_K_UTF8 flag is > > > set, > > > at which point it ignores the rest, slurps in the bytes and marks > > > them as SVf_UTF8. > > > > > > With non-PERLIO_K_UTF8 layers sysread etc completely ignore the > > > layers - reading (or writing) bytes from/to the underlying stream.
> > > > Hmm.  That's an unfortunate complexity involving perl's internal > > character representation which users really shouldn't need to be > > aware > > of.   I hope some solution can be found which doesn't _require_ > > documenting and user-understanding of this. > > > > Is there any foreseeable path to making sysread() handle arbitrary > > layers correctly, using buffering when data-transforming layers are > > present but not otherwise?  What if sysread just called fh->read() in > > those cases?
> > Well, that would completely change the behaviour of sysread() in the > case of non-UTF-8 flagged file handles that have other layers on them. > > One reason for making this a deprecation warning is so we're not > silently changing this behaviour. > > This deprecation was originally discussed in: > > https://rt.perl.org/Ticket/Display.html?id=125760 > > Tony
Tony: Should the patch you proposed in this RT be applied now? Thank you very much. -- James E Keenan (jkeenan@cpan.org)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.7k
On Wed, 17 Oct 2018 06:13:06 -0700, jkeenan wrote: Show quoted text
> On Sun, 06 May 2018 23:35:33 GMT, tonyc wrote:
> > On Thu, 03 May 2018 16:41:07 -0700, jim.avera@gmail.com wrote:
> > > On 5/3/18 2:40 AM, Tony Cook via RT wrote:
> > > > The underlying problem is that sysread() etc pay attention to only > > > > one part of the layer stack - whether that PERLIO_K_UTF8 flag is > > > > set, > > > > at which point it ignores the rest, slurps in the bytes and marks > > > > them as SVf_UTF8. > > > > > > > > With non-PERLIO_K_UTF8 layers sysread etc completely ignore the > > > > layers - reading (or writing) bytes from/to the underlying stream.
> > > > > > Hmm.  That's an unfortunate complexity involving perl's internal > > > character representation which users really shouldn't need to be > > > aware > > > of.   I hope some solution can be found which doesn't _require_ > > > documenting and user-understanding of this. > > > > > > Is there any foreseeable path to making sysread() handle arbitrary > > > layers correctly, using buffering when data-transforming layers are > > > present but not otherwise?  What if sysread just called fh->read() in > > > those cases?
> > > > Well, that would completely change the behaviour of sysread() in the > > case of non-UTF-8 flagged file handles that have other layers on them. > > > > One reason for making this a deprecation warning is so we're not > > silently changing this behaviour. > > > > This deprecation was originally discussed in: > > > > https://rt.perl.org/Ticket/Display.html?id=125760 > > > > Tony
> > Tony: Should the patch you proposed in this RT be applied now?
No, this ticket is obsoleted by those operators now being fatal on :utf8 handles and the documentation updates that included. Tony
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.9k
On Mon, 22 Oct 2018 23:53:10 GMT, tonyc wrote: Show quoted text
> On Wed, 17 Oct 2018 06:13:06 -0700, jkeenan wrote:
> > On Sun, 06 May 2018 23:35:33 GMT, tonyc wrote:
> > > On Thu, 03 May 2018 16:41:07 -0700, jim.avera@gmail.com wrote:
> > > > On 5/3/18 2:40 AM, Tony Cook via RT wrote:
> > > > > The underlying problem is that sysread() etc pay attention to > > > > > only > > > > > one part of the layer stack - whether that PERLIO_K_UTF8 flag > > > > > is > > > > > set, > > > > > at which point it ignores the rest, slurps in the bytes and > > > > > marks > > > > > them as SVf_UTF8. > > > > > > > > > > With non-PERLIO_K_UTF8 layers sysread etc completely ignore the > > > > > layers - reading (or writing) bytes from/to the underlying > > > > > stream.
> > > > > > > > Hmm.  That's an unfortunate complexity involving perl's internal > > > > character representation which users really shouldn't need to be > > > > aware > > > > of.   I hope some solution can be found which doesn't _require_ > > > > documenting and user-understanding of this. > > > > > > > > Is there any foreseeable path to making sysread() handle > > > > arbitrary > > > > layers correctly, using buffering when data-transforming layers > > > > are > > > > present but not otherwise?  What if sysread just called fh-
> > > > >read() in
> > > > those cases?
> > > > > > Well, that would completely change the behaviour of sysread() in > > > the > > > case of non-UTF-8 flagged file handles that have other layers on > > > them. > > > > > > One reason for making this a deprecation warning is so we're not > > > silently changing this behaviour. > > > > > > This deprecation was originally discussed in: > > > > > > https://rt.perl.org/Ticket/Display.html?id=125760 > > > > > > Tony
> > > > Tony: Should the patch you proposed in this RT be applied now?
> > No, this ticket is obsoleted by those operators now being fatal on > :utf8 handles and the documentation updates that included. >
Ok, closing. -- James E Keenan (jkeenan@cpan.org)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org