New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document deprecation of sysread on :utf8 handles #16544
Comments
From @jimavThis is a bug report for perl from jim.avera@gmail.com, `perlfunc -f sysread` says using :utf8 handles are perfectly okay: Note that if the filehandle has been marked as ":utf8", Unicode However doing so provikes this at run time: sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 Suggest changing the documentation to say that this feature is deprecated, Flags: Site configuration information for perl 5.26.1: Configured by Ubuntu at Sat Mar 10 18:40:42 UTC 2018. Summary of my perl5 (revision 5 version 26 subversion 1) configuration: Locally applied patches: @INC for perl 5.26.1: Environment for perl 5.26.1: |
From @LeontOn Wed, May 2, 2018 at 9:39 PM, Jim Avera (via RT)
Indeed this should be modified. Leon |
The RT System itself - Status changed from 'new' to 'open' |
From @karenetheridgeIMO this should be considered a blocker for 5.28, as it is a documentation On Wed, May 2, 2018 at 12:46 PM, Leon Timmermans <fawaka@gmail.com> wrote:
|
From @tonycozOn Wed, 02 May 2018 12:39:47 -0700, jim.avera@gmail.com wrote:
How about the attached? Tony |
From @tonycoz0001-perl-133170-document-deprecation-of-sysread-syswrite.patchFrom d338352c918eae0919f56da288492ef9ac23f63a Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 3 May 2018 14:19:21 +1000
Subject: (perl #133170) document deprecation of sysread/syswrite/send/recv on
:utf8
well, UTF8 flagged handles...
---
pod/perlfunc.pod | 42 ++++++++++++++++++++++++++++++++++--------
1 file changed, 34 insertions(+), 8 deletions(-)
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index fa08d4c3e9..170ae4f4e0 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6281,6 +6281,10 @@ string otherwise. If there's an error, returns the undefined value.
This call is actually implemented in terms of the L<recvfrom(2)> system call.
See L<perlipc/"UDP: Message Passing"> for examples.
+Note that using C<recv> on a socket that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
Note the I<characters>: depending on the status of the socket, either
(8-bit) bytes or characters are received. By default all sockets
operate on bytes, but for example if the socket has been changed using
@@ -6288,7 +6292,9 @@ L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will
operate on UTF8-encoded Unicode
characters, not bytes. Similarly for the C<:encoding> layer: in that
-case pretty much any characters can be read.
+case pretty much any characters can be read. No validation is
+performed on the UTF-8, since any layers that perform such validation
+are bypassed by C<recv>.
=item redo LABEL
X<redo>
@@ -7080,6 +7086,10 @@ case it does a L<sendto(2)> syscall. Returns the number of characters sent,
or the undefined value on error. The L<sendmsg(2)> syscall is currently
unimplemented. See L<perlipc/"UDP: Message Passing"> for examples.
+Note that using C<send> on a socket that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
Note the I<characters>: depending on the status of the socket, either
(8-bit) bytes or characters are sent. By default all sockets operate
on bytes, but for example if the socket has been changed using
@@ -8720,13 +8730,25 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
anyway. Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
check for a return value for 0 to decide whether you're done.
-Note that if the filehandle has been marked as C<:utf8>, Unicode
-characters are read instead of bytes (the LENGTH, OFFSET, and the
-return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in Unicode characters). The C<:encoding(...)> layer implicitly
-introduces the C<:utf8> layer. See
-L<C<binmode>|/binmode FILEHANDLE, LAYER>,
-L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
+Note that using C<sysread> on a file that has been marked as C<:utf8>
+is deprecated, and will result in an exception in future versions of
+perl.
+
+If the filehandle has been marked as C<:utf8>, Unicode characters
+assumed to be UTF-8 encoded are read instead of bytes (the LENGTH,
+OFFSET, and the return value of L<C<sysread>|/sysread
+FILEHANDLE,SCALAR,LENGTH,OFFSET> are in Unicode characters).
+
+Note that UTF-8 encoded Unicode is read by C<sysread> even if the the
+C<:utf8> mark is introduced by a C<:encoding()> that isn't C<UTF-8>,
+nor is the UTF-8 validated. Any other layers are also ignored, so if
+you've pushed layers to decompress your input and decode the result as
+C<UTF-16>, C<sysread> will treat your compressed UTF-16 data as
+C<UTF-8>.
+
+The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. See
+L<C<binmode>|/binmode FILEHANDLE, LAYER>, L<C<open>|/open
+FILEHANDLE,EXPR>, and the L<open> pragma.
=item sysseek FILEHANDLE,POSITION,WHENCE
X<sysseek> X<lseek>
@@ -8888,6 +8910,10 @@ B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters
encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET>
are in (UTF8-encoded Unicode) characters.
+
+C<syswrite> on a filehandle marked C<:utf8> is deprecated, and will
+raise an exception in a future version of perl.
+
The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
Alternately, if the handle is not marked with an encoding but you
attempt to write characters with code points over 255, raises an exception.
--
2.11.0
|
From @jimavOn 5/2/18 9:21 PM, Tony Cook via RT wrote Hi Tony, Is it specifically :utf8 which will not be allowed, i.e., other layers Does it all boil down to requiring that the file handle read raw binary -Jim |
From @tonycozOn Wed, 02 May 2018 23:40:58 -0700, jim.avera@gmail.com wrote:
The problem isn't all layers. The problem is specifically the way sysread etc handle layers that have the PERLIO_K_UTF8 flag set on them. This includes the :utf8 layer (which is currently not a real layer) and :encoding() (as the sysread documentation mentions) and a hypothetical :utf16 layer would also set it, assuming it's intended to decode utf-16 characters into perl's internal extended UTF-8 so perl can deal with it as characters. The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set, at which point it ignores the rest, slurps in the bytes and marks them as SVf_UTF8. With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream. Tony |
From @jimavOn 5/3/18 2:40 AM, Tony Cook via RT wrote:
Hmm. That's an unfortunate complexity involving perl's internal Is there any foreseeable path to making sysread() handle arbitrary If buffering is used, then: If the underlying device is seekable, If the underlying source is not seekable, then left-over octets would Just some uninformed ideas... -Jim |
From @jimavOn 5/3/18 4:40 PM, Jim Avera wrote:
In essence, my proposal is to make sysread() an synonym for fh->read() Happily, :encoding(utf8) is not data-transforming because that is perl's Even transforming decoders might often avoid left-over octets (and thus -Jim |
From @LeontOn Fri, May 4, 2018 at 1:40 AM, Jim Avera <jim.avera@gmail.com> wrote:
If you want that, why wouldn't you just use read? Leon |
From @jimavOn 5/3/18 5:05 PM, Leon Timmermans wrote:
Yes, but I gather there is all this complexity (desired by someone) to On the other hand, if the app wants Unicode characters, it is convenient -Ko, |
From @GrinnzOn Thu, May 3, 2018 at 8:14 PM, Jim Avera <jim.avera@gmail.com> wrote:
From a user perspective, the utf8 flag should be irrelevant, and the -Dan |
From @tonycozOn Thu, 03 May 2018 16:41:07 -0700, jim.avera@gmail.com wrote:
Well, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them. One reason for making this a deprecation warning is so we're not silently changing this behaviour. This deprecation was originally discussed in: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760 Tony |
From @jimavOn 5/6/18 4:35 PM, Tony Cook via RT wrote:
That seems to be some kind of secret or protected ticket! RT Error |
From @tonycozOn Sun, 06 May 2018 21:14:24 -0700, jim.avera@gmail.com wrote:
I can see it as an anonymous guest (I opened a new browser). Searching for the ticket number sent me to: https://rt.perl.org/Public/Bug/Display.html?id=125760 as did pasting the non-/Public/ address into the address bar. If you still can't see it you might want to check with perlbug-admin (see the page footer) to see if something is messed up for your account. Tony |
From @jkeenanOn Sun, 06 May 2018 23:35:33 GMT, tonyc wrote:
Tony: Should the patch you proposed in this RT be applied now? Thank you very much. -- |
From @tonycozOn Wed, 17 Oct 2018 06:13:06 -0700, jkeenan wrote:
No, this ticket is obsoleted by those operators now being fatal on :utf8 handles and the documentation updates that included. Tony |
From @jkeenanOn Mon, 22 Oct 2018 23:53:10 GMT, tonyc wrote:
Ok, closing. -- |
@jkeenan - Status changed from 'open' to 'rejected' |
Migrated from rt.perl.org#133170 (status was 'rejected')
Searchable as RT133170$
The text was updated successfully, but these errors were encountered: