Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC remove strange behaviour of sysread()/syswrite() on UTF-8 streams #14839

Closed
p5pRT opened this issue Aug 6, 2015 · 29 comments
Closed

RFC remove strange behaviour of sysread()/syswrite() on UTF-8 streams #14839

p5pRT opened this issue Aug 6, 2015 · 29 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 6, 2015

Migrated from rt.perl.org#125760 (status was 'resolved')

Searchable as RT125760$

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2015

From @tonycoz

One of the few remaining warts[1] in Perl's Unicode support is how
sysread() and syswrite() behave on streams with a unicode layer.

First sysread()​:

For example​:

  open my $fh, "<​:utf8", "filewithutf8.txt" or die;
  my $buf;
  sysread $fh, $buf, 1000;

will reads up to 1000 unvalidated UTF-8[2] *characters* from the stream.
That seems all fine and good, but the following​:

  open my $fh, "<​:encoding(UCS-2BE)", "filewithucs2be.txt" or die;
  my $buf;
  sysread $fh, $buf, 1000;

does exactly the same thing - only the fact that the stream is unicode
flagged (i.e., has the PERLIO_F_UTF8 flag) is referenced, the actual
layers are ignored.

This behaviour is mostly documented by​:

Note that if the filehandle has been marked as C<​:utf8> Unicode
characters are read instead of bytes (the LENGTH, OFFSET, and the
return value of sysread() are in Unicode characters).
The C<​:encoding(...)> layer implicitly introduces the C<​:utf8> layer.
See L</binmode>, L</open>, and the C<open> pragma, L<open>.

which skips mentioning that the "Unicode characters" read are always
UTF-8 encoded.

This, beyond the broken :utf8 layer itself, is one of the few pure
perl vectors for badly encoded SVf_UTF8 strings in the perl
interpreter.

Also it can be confusing, even an experienced CPAN author managed to get
it wrong[3].

My suggestion is that (eventually) sysread() on a file with the
PERLIO_F_UTF8 flag on should either do a simple octet read, as it does
without that flag, or fail.

For the transition sysread() would warn when passed a handle with
PERLIO_F_UTF8, presumably something like "sysread() on a unicode
handle is deprecated".

So what's the desired behaviour after the transition​:

1) sysread() would act as if the flag was not there, completely
  ignoring the layers rather than ignoring the layers *except* for
  the flag.

  This has the advantage that sysread() behaves consistently after
  the change. It may however make code that depends on the old
  behaviour silently misbehave.

2) sysread() fails, probably with EINVAL.

  While sysread() becomes no longer useful on handles with the flag,
  mixing low- and high-level I/O is generally unsafe anyway, and
  PerlIO layers are pretty much a high-level construct, so there
  isn't much lost.

  It prevents most silent mis-behaviour while remaining true to
  sysread()'s contract to read bytes from a file.

3) sysread() croaks.

  Similar to 2), but with more emphasis.

Then syswrite()​:

Unlike sysread(), syswrite() doesn't act a a vector for producing
corrupt internal perl data structures, but it does have the same issue
that it pays attention to only part of the layer state for the handle.

For example​:

  open my $fh, ">​:utf8", "filetobeutf8.txt" or die;
  my $data = "\x{101}";
  syswrite $fh, $data;

will write UTF-8 encoded data to the file, which is fine, but​:

  open my $fh, ">​:encoding(UCS-2BE)", "filetobeucs2.txt" or die;
  my $data = "\x{101}";
  syswrite $fh, $data;

does the same thing.

I believe if syswrite() is going to ignore any of the layer state of
the handle, it should ignore it all, so the examples above would throw
an exception, just as they do for handles without the flag when
syswrite() is called with wide characters.

Again, for the transition, syswrite() should produce a deprecation
warning.

Tony

[1] the other I can think of is that the :utf8 PerlIO layer doesn't
  validate, it's just a flag

[2] or utf8, perl's internal encoding

[3] https://rt.cpan.org/Public/Bug/Display.html?id=83126 and a few
  other tickets for the same distribution, and
  https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121870

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2015

From @ribasushi

On 08/06/2015 09​:00 AM, Tony Cook (via RT) wrote​:

# New Ticket Created by Tony Cook
# Please include the string​: [perl #125760]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760 >

One of the few remaining warts[1] in Perl's Unicode support is how
sysread() and syswrite() behave on streams with a unicode layer.

Having read the excellent analysis, my 2c is that both failure cases for
sysread and syswrite should ultimately croak.

I do not have an informed opinion on what the deprecation cycle would
look like, as it is likely very beneficial to exercise the croaking as
early as 5.23.x on a CPAN smoke, yet it is clearly too early for code in
the wild.

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2015

From @jhi

I think I am the original vector of these vectors... and I think I was wrong. Just make them croak.

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2015

From @rjbs

* Peter Rabbitson <rabbit-p5p@​rabbit.us> [2015-08-06T03​:12​:26]

Having read the excellent analysis, my 2c is that both failure cases for
sysread and syswrite should ultimately croak.

Yes. Thanks, Tony, and I agree.

I do not have an informed opinion on what the deprecation cycle would look
like, as it is likely very beneficial to exercise the croaking as early as
5.23.x on a CPAN smoke, yet it is clearly too early for code in the wild.

We should definitely get the warnings in place soon.

I think it would be beneficial if we had a way to mark any deprecation warning
as fatal, process wide, for the purpose of smoking (and other places, like
integration testing), but I think it needs more thought than a "hey it would be
neat" from me.

But if we're going to make it croak in 5.28, time to make it warn now.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2015

From @tonycoz

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same issue, so the patch
also deprecates them on :utf8 handles.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2015

From @tonycoz

0001-perl-125760-deprecate-sys-read-write-send-recv-on-ut.patch
From 549d60629b72c2b689e97815e582d6bc355d24db Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 10 Aug 2015 16:15:48 +1000
Subject: [PATCH] [perl #125760] deprecate sys(read|write)(), send(), recv()
 on :utf8

---
 pod/perldiag.pod      |   21 +++++++++++++++++++++
 pp_sys.c              |    8 ++++++++
 t/lib/warnings/pp_sys |   22 ++++++++++++++++++++++
 t/op/gmagic.t         |    1 +
 t/uni/overload.t      |    1 +
 5 files changed, 53 insertions(+)

diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 4f21dbe..f47fd3e 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2619,6 +2619,27 @@ provides a list context to its subscript, which can do weird things
 if you're expecting only one subscript.  When called in list context,
 it also returns the key in addition to the value.
 
+=item %s() is deprecated on :utf8 handles
+
+(W deprecated) The sysread(), recv(), syswrite() and send() operators
+are deprecated on handles that have the C<:utf8> layer, either
+explicitly, or implicitly, eg., with the C<:encoding(UTF-16LE)> layer.
+
+Both sysread() and recv() currently use only the C<:utf8> flag for the
+stream, ignoring the actual layers.  Since sysread() and recv() do no
+UTF-8 validation they can end up creating invalidly encoded scalars.
+
+Similarly, syswrite() and send() use only the C<:utf8> flag, otherwise
+ignoring any layers.  If the flag is set, both write the value UTF-8
+encoded, even if the layer is some different encoding, such as the
+example above.
+
+Ideally, all of these operators would completely ignore the C<:utf8>
+state, working only with bytes, but this would result in silently
+breaking existing code.  To avoid this a future version of perl will
+throw an exception when any of sysread(), recv(), syswrite() or send()
+are called on handle with the C<:utf8> layer.
+
 =item Insecure dependency in %s
 
 (F) You tried to do something that the tainting mechanism didn't like.
diff --git a/pp_sys.c b/pp_sys.c
index ebd675b..dc1b3ce 100644
--- a/pp_sys.c
+++ b/pp_sys.c
@@ -1691,6 +1691,11 @@ PP(pp_sysread)
     fd = PerlIO_fileno(IoIFP(io));
 
     if ((fp_utf8 = PerlIO_isutf8(IoIFP(io))) && !IN_BYTES) {
+        if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
+            Perl_ck_warner(aTHX_ packWARN(WARN_DEPRECATED),
+                           "%s() is deprecated on :utf8 handles",
+                           OP_DESC(PL_op));
+        }
 	buffer = SvPVutf8_force(bufsv, blen);
 	/* UTF-8 may not have been set if they are all low bytes */
 	SvUTF8_on(bufsv);
@@ -1950,6 +1955,9 @@ PP(pp_syswrite)
     doing_utf8 = DO_UTF8(bufsv);
 
     if (PerlIO_isutf8(IoIFP(io))) {
+        Perl_ck_warner(aTHX_ packWARN(WARN_DEPRECATED),
+                       "%s() is deprecated on :utf8 handles",
+                       OP_DESC(PL_op));
 	if (!SvUTF8(bufsv)) {
 	    /* We don't modify the original scalar.  */
 	    tmpbuf = bytes_to_utf8((const U8*) buffer, &blen);
diff --git a/t/lib/warnings/pp_sys b/t/lib/warnings/pp_sys
index a1e07f8..ea18bac 100644
--- a/t/lib/warnings/pp_sys
+++ b/t/lib/warnings/pp_sys
@@ -939,3 +939,25 @@ sleep(-1);
 
 EXPECT
 sleep() with negative argument at - line 2.
+########
+# NAME sysread() deprecated on :utf8
+use warnings 'deprecated';
+open my $fh, "<", "../harness" or die "# $!";
+my $buf;
+sysread $fh, $buf, 10;
+binmode $fh, ':utf8';
+sysread $fh, $buf, 10;
+EXPECT
+sysread() is deprecated on :utf8 handles at - line 6.
+########
+# NAME syswrite() deprecated on :utf8
+my $file = "syswwarn.tmp";
+use warnings 'deprecated';
+open my $fh, ">", $file or die "# $!";
+syswrite $fh, 'ABC';
+binmode $fh, ':utf8';
+syswrite $fh, 'ABC';
+close $fh;
+unlink $file;
+EXPECT
+syswrite() is deprecated on :utf8 handles at - line 6.
diff --git a/t/op/gmagic.t b/t/op/gmagic.t
index bcf1322..94e164e 100644
--- a/t/op/gmagic.t
+++ b/t/op/gmagic.t
@@ -77,6 +77,7 @@ expected_tie_calls(tied $c, 1, 2, 'chomping a ref');
  # Do this again, with a utf8 handle
     $c = *foo;                                         # 1 write
     open $h, "<:utf8", $outfile;
+    no warnings 'deprecated';
     sysread $h, $c, 3, 7;                              # 1 read; 1 write
     is $c, "*main::bar", 'what sysread wrote';         # 1 read
     expected_tie_calls(tied $c, 2, 2, 'calling sysread with tied buf');
diff --git a/t/uni/overload.t b/t/uni/overload.t
index 66cd5b8..ff89b08 100644
--- a/t/uni/overload.t
+++ b/t/uni/overload.t
@@ -169,6 +169,7 @@ foreach my $operator ('print', 'syswrite', 'syswrite len', 'syswrite off',
 	my $trail = $operator =~ /\blen\b/ ? "!" : "";
 	my $u = UTF8Toggle->new("$pad$E_acute\n$trail");
 	my $l = UTF8Toggle->new("$pad$e_acute\n$trail", 1);
+        no warnings 'deprecated';
 	if ($operator eq 'print') {
 	    no warnings 'utf8';
 	    print $fh $u;
-- 
1.7.10.4

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2015

From @tonycoz

On Sun Aug 09 23​:23​:05 2015, tonyc wrote​:

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn
now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same issue,
so the patch
also deprecates them on :utf8 handles.

Applied as fb10a8a.

Leaving open, as I expect *something* will break.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2015

From @Leont

On Mon, Aug 17, 2015 at 8​:40 AM, Tony Cook via RT <perlbug-followup@​perl.org

wrote​:

On Sun Aug 09 23​:23​:05 2015, tonyc wrote​:

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn
now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same issue,
so the patch
also deprecates them on :utf8 handles.

Applied as fb10a8a.

Leaving open, as I expect *something* will break.

Tony

Among commonly used modules, I don't expect File​::Slurp to like this, then
again the beast is so broken by design that that may be a feature.

Leon

@p5pRT
Copy link
Author

p5pRT commented Aug 19, 2015

From @iabyn

On Sun, Aug 16, 2015 at 11​:40​:06PM -0700, Tony Cook via RT wrote​:

On Sun Aug 09 23​:23​:05 2015, tonyc wrote​:

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn
now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same issue,
so the patch
also deprecates them on :utf8 handles.

Applied as fb10a8a.

It's causing lib/warnings.t to fail in smokes that do PERL_UNICODE=""

e.g.

[davem@​robin t]$ PERL_UNICODE="" ./perl harness ../lib/warnings.t
../lib/warnings.t .. 617/876 PROG​:
# pp_sys.c [pp_sysread]
use warnings 'io' ;
if ($^O eq 'dos') {
  print <<EOM ;
SKIPPED
# skipped on dos
EOM
  exit ;
}
my $file = "./xcv" ;
open(F, ">$file") ;
my $a = sysread(F, $a,10) ;
no warnings 'io' ;
my $a = sysread(F, $a,10) ;
close F ;
use warnings 'io' ;
sysread(F, $a, 10);
read(F, $a, 10);
sysread(NONEXISTENT, $a, 10);
read(NONEXISTENT, $a, 10);
unlink $file ;
EXPECTED​:
Filehandle F opened only for output at - line 12.
sysread() on closed filehandle F at - line 17.
read() on closed filehandle F at - line 18.
sysread() on unopened filehandle NONEXISTENT at - line 19.
read() on unopened filehandle NONEXISTENT at - line 20.
GOT​:
sysread() is deprecated on :utf8 handles at - line 12.
Filehandle F opened only for output at - line 12.
sysread() is deprecated on :utf8 handles at - line 14.
sysread() on closed filehandle F at - line 17.
read() on closed filehandle F at - line 18.
sysread() on unopened filehandle NONEXISTENT at - line 19.
read() on unopened filehandle NONEXISTENT at - line 20.
# Failed test 673 - at lib/warnings/pp_sys line 625
PROG​:
use warnings 'deprecated';
open my $fh, "&lt;", "../harness" or die "# $!";
my $buf;
sysread $fh, $buf, 10;
binmode $fh, '​:utf8';
sysread $fh, $buf, 10;
EXPECTED​:
sysread() is deprecated on :utf8 handles at - line 6.
GOT​:
sysread() is deprecated on :utf8 handles at - line 4.
sysread() is deprecated on :utf8 handles at - line 6.
# Failed test 689 - sysread() deprecated on :utf8 at lib/warnings/pp_sys line 943
PROG​:
my $file = "syswwarn.tmp";
use warnings 'deprecated';
open my $fh, ">", $file or die "# $!";
syswrite $fh, 'ABC';
binmode $fh, '​:utf8';
syswrite $fh, 'ABC';
close $fh;
unlink $file;
EXPECTED​:
syswrite() is deprecated on :utf8 handles at - line 6.
GOT​:
syswrite() is deprecated on :utf8 handles at - line 4.
syswrite() is deprecated on :utf8 handles at - line 6.
# Failed test 690 - syswrite() deprecated on :utf8 at lib/warnings/pp_sys line 953
../lib/warnings.t .. Failed 3/876 subtests
  (less 2 skipped subtests​: 871 okay)

Test Summary Report


../lib/warnings.t (Wstat​: 0 Tests​: 876 Failed​: 3)
  Failed tests​: 673, 689-690
Files=1, Tests=876, 11 wallclock secs ( 0.31 usr 0.02 sys + 8.64 cusr 2.42 csys = 11.39 CPU)
Result​: FAIL
[davem@​robin t]$

--
Indomitable in retreat, invincible in advance, insufferable in victory
  -- Churchill on Montgomery

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2015

From @tonycoz

On Wed Aug 19 01​:43​:50 2015, davem wrote​:

On Sun, Aug 16, 2015 at 11​:40​:06PM -0700, Tony Cook via RT wrote​:

On Sun Aug 09 23​:23​:05 2015, tonyc wrote​:

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn
now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same
issue,
so the patch
also deprecates them on :utf8 handles.

Applied as fb10a8a.

It's causing lib/warnings.t to fail in smokes that do PERL_UNICODE=""

Thanks, fixed in 60e6724.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 16, 2017

From @tonycoz

On Sun, 16 Aug 2015 23​:40​:06 -0700, tonyc wrote​:

On Sun Aug 09 23​:23​:05 2015, tonyc wrote​:

On Thu Aug 06 16​:12​:45 2015, perl.p5p@​rjbs.manxome.org wrote​:

But if we're going to make it croak in 5.28, time to make it warn
now.

Patch attached.

As chansen mentioned in #p5p, send() and recv() have the same issue,
so the patch
also deprecates them on :utf8 handles.

Applied as fb10a8a.

Leaving open, as I expect *something* will break.

Added to the 5.30 blockers ticket, since these should start croaking then.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 25, 2018

From @tonycoz

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 25, 2018

From @tonycoz

0001-perl-133170-fatalize-sysread-syswrite-recv-send-on-u.patch
From 4a5fd718cd1eaee030ed20301f5112458b7995e2 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 25 Sep 2018 11:18:40 +1000
Subject: (perl #133170) fatalize sysread/syswrite/recv/send on :utf8 handles

This includes removing the :utf8 logic from pp_syswrite.  pp_sysread
retains it, since it's also used for read().

Tests that are specifically testing the behaviour against :utf8
handles have been removed (eg in lib/open.t), several other tests
that incidentally used those functions on :utf8 handles have been
adapted to use :raw instead (eg. op/readline.t)
handles.
---
 lib/open.t            | 122 +-------------------------------------------------
 pod/perldiag.pod      |  17 +++----
 pod/perlfunc.pod      |  33 ++++----------
 pp_sys.c              |  80 ++++++---------------------------
 t/io/utf8.t           |  13 +++---
 t/lib/croak/pp_sys    |  20 +++++++++
 t/lib/warnings/pp_sys |  24 ----------
 t/op/gmagic.t         |   9 ----
 t/op/readline.t       |  10 ++---
 t/uni/overload.t      |   4 +-
 t/uni/readline.t      |   3 +-
 11 files changed, 64 insertions(+), 271 deletions(-)

diff --git a/lib/open.t b/lib/open.t
index 5150c7f8a2..fa17f1a97c 100644
--- a/lib/open.t
+++ b/lib/open.t
@@ -8,7 +8,7 @@ BEGIN {
 	require './charset_tools.pl';
 }
 
-plan 23;
+plan 11;
 
 # open::import expects 'open' as its first argument, but it clashes with open()
 sub import {
@@ -62,126 +62,6 @@ is( ${^OPEN}, ":raw :crlf\0:raw :crlf",
 is( $^H{'open_IO'}, 'crlf', 'should record last layer set in %^H' );
 
 SKIP: {
-    skip("no perlio, no :utf8", 12) unless (find PerlIO::Layer 'perlio');
-
-    eval <<EOE;
-    use open ':utf8';
-    open(O, ">utf8");
-    print O chr(0x100);
-    close O;
-    open(I, "<utf8");
-    is(ord(<I>), 0x100, ":utf8 single wide character round-trip");
-    close I;
-EOE
-
-    open F, ">a";
-    @a = map { chr(1 << ($_ << 2)) } 0..5; # 0x1, 0x10, .., 0x100000
-    unshift @a, chr(0); # ... and a null byte in front just for fun
-    print F @a;
-    close F;
-
-    sub systell {
-        use Fcntl 'SEEK_CUR';
-        sysseek($_[0], 0, SEEK_CUR);
-    }
-
-    require bytes; # not use
-
-    my $ok;
-
-    open F, "<:utf8", "a";
-    $ok = $a = 0;
-    for (@a) {
-        unless (
-		($c = sysread(F, $b, 1)) == 1  &&
-		length($b)               == 1  &&
-		ord($b)                  == ord($_) &&
-		systell(F)               == ($a += bytes::length($b))
-		) {
-	    print '# ord($_)           == ', ord($_), "\n";
-	    print '# ord($b)           == ', ord($b), "\n";
-	    print '# length($b)        == ', length($b), "\n";
-	    print '# bytes::length($b) == ', bytes::length($b), "\n";
-	    print '# systell(F)        == ', systell(F), "\n";
-	    print '# $a                == ', $a, "\n";
-	    print '# $c                == ', $c, "\n";
-	    last;
-	}
-	$ok++;
-    }
-    close F;
-    ok($ok == @a,
-       "on :utf8 streams sysread() should work on characters, not bytes");
-
-    sub diagnostics {
-	print '# ord($_)           == ', ord($_), "\n";
-	print '# bytes::length($_) == ', bytes::length($_), "\n";
-	print '# systell(G)        == ', systell(G), "\n";
-	print '# $a                == ', $a, "\n";
-	print '# $c                == ', $c, "\n";
-    }
-
-
-    my %actions = (
-		   syswrite => sub { syswrite G, shift; },
-		   'syswrite len' => sub { syswrite G, shift, 1; },
-		   'syswrite len pad' => sub {
-		       my $temp = shift() . "\243";
-		       syswrite G, $temp, 1; },
-		   'syswrite off' => sub { 
-		       my $temp = "\351" . shift();
-		       syswrite G, $temp, 1, 1; },
-		   'syswrite off pad' => sub { 
-		       my $temp = "\351" . shift() . "\243";
-		       syswrite G, $temp, 1, 1; },
-		  );
-
-    foreach my $key (sort keys %actions) {
-	# syswrite() on should work on characters, not bytes
-	open G, ">:utf8", "b";
-
-	print "# $key\n";
-	$ok = $a = 0;
-	for (@a) {
-	    unless (
-		    ($c = $actions{$key}($_)) == 1 &&
-		    systell(G)                == ($a += bytes::length($_))
-		   ) {
-		diagnostics();
-		last;
-	    }
-	    $ok++;
-	}
-	close G;
-	ok($ok == @a,
-	   "on :utf8 streams syswrite() should work on characters, not bytes");
-
-	open G, "<:utf8", "b";
-	$ok = $a = 0;
-	for (@a) {
-	    unless (
-		    ($c = sysread(G, $b, 1)) == 1 &&
-		    length($b)               == 1 &&
-		    ord($b)                  == ord($_) &&
-		    systell(G)               == ($a += bytes::length($_))
-		   ) {
-		print '# ord($_)           == ', ord($_), "\n";
-		print '# ord($b)           == ', ord($b), "\n";
-		print '# length($b)        == ', length($b), "\n";
-		print '# bytes::length($b) == ', bytes::length($b), "\n";
-		print '# systell(G)        == ', systell(G), "\n";
-		print '# $a                == ', $a, "\n";
-		print '# $c                == ', $c, "\n";
-		last;
-	    }
-	    $ok++;
-	}
-	close G;
-	ok($ok == @a,
-	   "checking syswrite() output on :utf8 streams by reading it back in");
-    }
-}
-SKIP: {
     skip("no perlio", 1) unless (find PerlIO::Layer 'perlio');
     skip("no Encode", 1) unless $Config{extensions} =~ m{\bEncode\b};
     skip("EBCDIC platform doesnt have 'use encoding' used by open ':locale'", 1)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 2c1fe74a87..657a427b1d 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3205,27 +3205,24 @@ neither as a system call nor an ioctl call (SIOCATMARK).
 Perl.  The current valid ones are given in
 L<perlrebackslash/\b{}, \b, \B{}, \B>.
 
-=item %s() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30
+=item %s() isn't allowed on :utf8 handles
 
-(D deprecated) The sysread(), recv(), syswrite() and send() operators are
-deprecated on handles that have the C<:utf8> layer, either explicitly, or
+(F) The sysread(), recv(), syswrite() and send() operators are
+not allowed on handles that have the C<:utf8> layer, either explicitly, or
 implicitly, eg., with the C<:encoding(UTF-16LE)> layer.
 
-Both sysread() and recv() currently use only the C<:utf8> flag for the stream,
-ignoring the actual layers.  Since sysread() and recv() do no UTF-8
+Previously sysread() and recv() currently use only the C<:utf8> flag for the stream,
+ignoring the actual layers.  Since sysread() and recv() did no UTF-8
 validation they can end up creating invalidly encoded scalars.
 
-Similarly, syswrite() and send() use only the C<:utf8> flag, otherwise ignoring
-any layers.  If the flag is set, both write the value UTF-8 encoded, even if
+Similarly, syswrite() and send() used only the C<:utf8> flag, otherwise ignoring
+any layers.  If the flag is set, both wrote the value UTF-8 encoded, even if
 the layer is some different encoding, such as the example above.
 
 Ideally, all of these operators would completely ignore the C<:utf8> state,
 working only with bytes, but this would result in silently breaking existing
 code.
 
-In Perl 5.30, it will no longer be possible to use sysread(), recv(),
-syswrite() or send() to read or send bytes from/to :utf8 handles.
-
 =item "%s" is more clearly written simply as "%s" in regex; marked by S<<-- HERE> in m/%s/
 
 (W regexp) (only under C<S<use re 'strict'>> or within C<(?[...])>)
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index a2fad3b8fc..316daff1cf 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6284,14 +6284,9 @@ string otherwise.  If there's an error, returns the undefined value.
 This call is actually implemented in terms of the L<recvfrom(2)> system call.
 See L<perlipc/"UDP: Message Passing"> for examples.
 
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are received.  By default all sockets
-operate on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will
-operate on UTF8-encoded Unicode
-characters, not bytes.  Similarly for the C<:encoding> layer: in that
-case pretty much any characters can be read.
+Note that if the socket has been marked as C<:utf8>, C<recv> will
+throw an exception.  The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item redo LABEL
 X<redo>
@@ -7083,14 +7078,9 @@ case it does a L<sendto(2)> syscall.  Returns the number of characters sent,
 or the undefined value on error.  The L<sendmsg(2)> syscall is currently
 unimplemented.  See L<perlipc/"UDP: Message Passing"> for examples.
 
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are sent.  By default all sockets operate
-on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or
-the L<open> pragma), the I/O will operate on UTF-8
-encoded Unicode characters, not bytes.  Similarly for the C<:encoding>
-layer: in that case pretty much any characters can be sent.
+Note that if the socket has been marked as C<:utf8>, C<send> will
+throw an exception.  The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item setpgrp PID,PGRP
 X<setpgrp> X<group>
@@ -8723,10 +8713,8 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
 anyway.  Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
 check for a return value of 0 to decide whether you're done.
 
-Note that if the filehandle has been marked as C<:utf8>, Unicode
-characters are read instead of bytes (the LENGTH, OFFSET, and the
-return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in Unicode characters).  The C<:encoding(...)> layer implicitly
+Note that if the filehandle has been marked as C<:utf8>, C<sysread> will
+throw an exception.  The C<:encoding(...)> layer implicitly
 introduces the C<:utf8> layer.  See
 L<C<binmode>|/binmode FILEHANDLE, LAYER>,
 L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
@@ -8887,10 +8875,7 @@ string other than the beginning.  A negative OFFSET specifies writing
 that many characters counting backwards from the end of the string.
 If SCALAR is of length zero, you can only use an OFFSET of 0.
 
-B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters
-encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
-return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in (UTF8-encoded Unicode) characters.
+B<WARNING>: If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception.
 The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
 Alternately, if the handle is not marked with an encoding but you
 attempt to write characters with code points over 255, raises an exception.
diff --git a/pp_sys.c b/pp_sys.c
index 4ae475d460..00faa7711f 100644
--- a/pp_sys.c
+++ b/pp_sys.c
@@ -1725,10 +1725,9 @@ PP(pp_sysread)
 
     if ((fp_utf8 = PerlIO_isutf8(IoIFP(io))) && !IN_BYTES) {
         if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
-            Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),
-                             "%s() is deprecated on :utf8 handles. "
-                             "This will be a fatal error in Perl 5.30",
-                             OP_DESC(PL_op));
+            Perl_croak(aTHX_
+                       "%s() isn't allowed on :utf8 handles",
+                       OP_DESC(PL_op));
         }
 	buffer = SvPVutf8_force(bufsv, blen);
 	/* UTF-8 may not have been set if they are all low bytes */
@@ -1939,7 +1938,6 @@ PP(pp_syswrite)
     const char *buffer;
     SSize_t retval;
     STRLEN blen;
-    STRLEN orig_blen_bytes;
     const int op_type = PL_op->op_type;
     bool doing_utf8;
     U8 *tmpbuf = NULL;
@@ -1985,20 +1983,12 @@ PP(pp_syswrite)
 
     /* Do this first to trigger any overloading.  */
     buffer = SvPV_const(bufsv, blen);
-    orig_blen_bytes = blen;
     doing_utf8 = DO_UTF8(bufsv);
 
     if (PerlIO_isutf8(IoIFP(io))) {
-        Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),
-                         "%s() is deprecated on :utf8 handles. "
-                         "This will be a fatal error in Perl 5.30",
-                         OP_DESC(PL_op));
-	if (!SvUTF8(bufsv)) {
-	    /* We don't modify the original scalar.  */
-	    tmpbuf = bytes_to_utf8((const U8*) buffer, &blen);
-	    buffer = (char *) tmpbuf;
-	    doing_utf8 = TRUE;
-	}
+        Perl_croak(aTHX_
+                   "%s() isn't allowed on :utf8 handles",
+                   OP_DESC(PL_op));
     }
     else if (doing_utf8) {
 	STRLEN tmplen = blen;
@@ -2031,25 +2021,10 @@ PP(pp_syswrite)
 #endif
     {
 	Size_t length = 0; /* This length is in characters.  */
-	STRLEN blen_chars;
 	IV offset;
 
-	if (doing_utf8) {
-	    if (tmpbuf) {
-		/* The SV is bytes, and we've had to upgrade it.  */
-		blen_chars = orig_blen_bytes;
-	    } else {
-		/* The SV really is UTF-8.  */
-		/* Don't call sv_len_utf8 on a magical or overloaded
-		   scalar, as we might get back a different result.  */
-		blen_chars = sv_or_pv_len_utf8(bufsv, buffer, blen);
-	    }
-	} else {
-	    blen_chars = blen;
-	}
-
 	if (MARK >= SP) {
-	    length = blen_chars;
+	    length = blen;
 	} else {
 #if Size_t_size > IVSIZE
 	    length = (Size_t)SvNVx(*++MARK);
@@ -2065,46 +2040,21 @@ PP(pp_syswrite)
 	if (MARK < SP) {
 	    offset = SvIVx(*++MARK);
 	    if (offset < 0) {
-		if (-offset > (IV)blen_chars) {
+		if (-offset > (IV)blen) {
 		    Safefree(tmpbuf);
 		    DIE(aTHX_ "Offset outside string");
 		}
-		offset += blen_chars;
-	    } else if (offset > (IV)blen_chars) {
+		offset += blen;
+	    } else if (offset > (IV)blen) {
 		Safefree(tmpbuf);
 		DIE(aTHX_ "Offset outside string");
 	    }
 	} else
 	    offset = 0;
-	if (length > blen_chars - offset)
-	    length = blen_chars - offset;
-	if (doing_utf8) {
-	    /* Here we convert length from characters to bytes.  */
-	    if (tmpbuf || SvGMAGICAL(bufsv) || SvAMAGIC(bufsv)) {
-		/* Either we had to convert the SV, or the SV is magical, or
-		   the SV has overloading, in which case we can't or mustn't
-		   or mustn't call it again.  */
-
-		buffer = (const char*)utf8_hop((const U8 *)buffer, offset);
-		length = utf8_hop((U8 *)buffer, length) - (U8 *)buffer;
-	    } else {
-		/* It's a real UTF-8 SV, and it's not going to change under
-		   us.  Take advantage of any cache.  */
-		I32 start = offset;
-		I32 len_I32 = length;
-
-		/* Convert the start and end character positions to bytes.
-		   Remember that the second argument to sv_pos_u2b is relative
-		   to the first.  */
-		sv_pos_u2b(bufsv, &start, &len_I32);
-
-		buffer += start;
-		length = len_I32;
-	    }
-	}
-	else {
-	    buffer = buffer+offset;
-	}
+	if (length > blen - offset)
+	    length = blen - offset;
+        buffer = buffer+offset;
+
 #ifdef PERL_SOCK_SYSWRITE_IS_SEND
 	if (IoTYPE(io) == IoTYPE_SOCKET) {
 	    retval = PerlSock_send(fd, buffer, length, 0);
@@ -2120,8 +2070,6 @@ PP(pp_syswrite)
     if (retval < 0)
 	goto say_undef;
     SP = ORIGMARK;
-    if (doing_utf8)
-        retval = utf8_length((U8*)buffer, (U8*)buffer + retval);
 
     Safefree(tmpbuf);
 #if Size_t_size > IVSIZE
diff --git a/t/io/utf8.t b/t/io/utf8.t
index 2b700595c8..fc927c671a 100644
--- a/t/io/utf8.t
+++ b/t/io/utf8.t
@@ -10,7 +10,7 @@ skip_all_without_perlio();
 no utf8; # needed for use utf8 not griping about the raw octets
 
 
-plan(tests => 63);
+plan(tests => 62);
 
 $| = 1;
 
@@ -312,16 +312,13 @@ is($failed, undef);
 {
     # [perl #23428] Somethings rotten in unicode semantics
     open F, ">$a_file";
-    binmode F, ":utf8";
-    no warnings qw(deprecated);
-    syswrite(F, $a = chr(0x100));
+    $a = "A";
+    utf8::upgrade($a);
+    syswrite(F, $a);
     close F;
-    is( ord($a), 0x100, '23428 syswrite should not downgrade scalar' );
-    like( $a, qr/^\w+/, '23428 syswrite should not downgrade scalar' );
+    ok(utf8::is_utf8($a), '23428 syswrite should not downgrade scalar' );
 }
 
-# sysread() and syswrite() tested in lib/open.t since Fcntl is used
-
 {
     # <FH> on a :utf8 stream should complain immediately with -w
     # if it finds bad UTF-8 (:encoding(utf8) works this way)
diff --git a/t/lib/croak/pp_sys b/t/lib/croak/pp_sys
index 8b7dc9d53d..be100da27a 100644
--- a/t/lib/croak/pp_sys
+++ b/t/lib/croak/pp_sys
@@ -73,3 +73,23 @@ open my $���������, "../harness";
 opendir $���������, ".";
 EXPECT
 Cannot open $��������� as a dirhandle: it is already open as a filehandle at - line 5.
+########
+# NAME sysread() disallowed on :utf8
+open my $fh, "<:raw", "../harness" or die "# $!";
+my $buf;
+sysread $fh, $buf, 10;
+binmode $fh, ':utf8';
+sysread $fh, $buf, 10;
+EXPECT
+sysread() isn't allowed on :utf8 handles at - line 5.
+########
+# NAME syswrite() disallowed on :utf8
+my $file = "syswwarn.tmp";
+open my $fh, ">:raw", $file or die "# $!";
+syswrite $fh, 'ABC';
+binmode $fh, ':utf8';
+syswrite $fh, 'ABC';
+close $fh;
+END { unlink $file; }
+EXPECT
+syswrite() isn't allowed on :utf8 handles at - line 5.
diff --git a/t/lib/warnings/pp_sys b/t/lib/warnings/pp_sys
index 90d3cc790d..5f6b83d2f6 100644
--- a/t/lib/warnings/pp_sys
+++ b/t/lib/warnings/pp_sys
@@ -890,30 +890,6 @@ sleep(-1);
 EXPECT
 sleep() with negative argument at - line 2.
 ########
-# NAME sysread() deprecated on :utf8
-open my $fh, "<:raw", "../harness" or die "# $!";
-my $buf;
-sysread $fh, $buf, 10;
-binmode $fh, ':utf8';
-sysread $fh, $buf, 10;
-no warnings 'deprecated';
-sysread $fh, $buf, 10;
-EXPECT
-sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at - line 5.
-########
-# NAME syswrite() deprecated on :utf8
-my $file = "syswwarn.tmp";
-open my $fh, ">:raw", $file or die "# $!";
-syswrite $fh, 'ABC';
-binmode $fh, ':utf8';
-syswrite $fh, 'ABC';
-no warnings 'deprecated';
-syswrite $fh, 'ABC';
-close $fh;
-unlink $file;
-EXPECT
-syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at - line 5.
-########
 # NAME stat on name with \0
 use warnings;
 my @x = stat("./\0-");
diff --git a/t/op/gmagic.t b/t/op/gmagic.t
index 210e8e5cc9..0ed575525f 100644
--- a/t/op/gmagic.t
+++ b/t/op/gmagic.t
@@ -76,15 +76,6 @@ expected_tie_calls(tied $c, 1, 2, 'chomping a ref');
     expected_tie_calls(tied $c, 2, 2, 'calling sysread with tied buf');
     close $h or die "$0 cannot close $outfile: $!";
 
- # Do this again, with a utf8 handle
-    $c = *foo;                                         # 1 write
-    open $h, "<:utf8", $outfile;
-    no warnings 'deprecated';
-    sysread $h, $c, 3, 7;                              # 1 read; 1 write
-    is $c, "*main::bar", 'what sysread wrote';         # 1 read
-    expected_tie_calls(tied $c, 2, 2, 'calling sysread with tied buf');
-    close $h or die "$0 cannot close $outfile: $!";
-
     unlink_all $outfile;
 }
 
diff --git a/t/op/readline.t b/t/op/readline.t
index c2727fe829..ba4efa71a4 100644
--- a/t/op/readline.t
+++ b/t/op/readline.t
@@ -215,9 +215,8 @@ SKIP: {
     my $line = 'ascii';
     my ( $in, $out );
     pipe $in, $out;
-    binmode $out, ':utf8';
+    binmode $out;
     binmode $in,  ':utf8';
-    no warnings qw(deprecated);
     syswrite $out, "...\n";
     $line .= readline $in;
 
@@ -228,10 +227,11 @@ SKIP: {
     my $line = "\x{2080} utf8";;
     my ( $in, $out );
     pipe $in, $out;
-    binmode $out, ':utf8';
+    binmode $out;
     binmode $in,  ':utf8';
-    no warnings qw(deprecated);
-    syswrite $out, "\x{2080}...\n";
+    my $outdata = "\x{2080}...\n";
+    utf8::encode($outdata);
+    syswrite $out, $outdata;
     $line .= readline $in;
 
     is( $line, "\x{2080} utf8\x{2080}...\n", 'appending from utf to utf8' );
diff --git a/t/uni/overload.t b/t/uni/overload.t
index 8e722c850e..a50e3ab7b2 100644
--- a/t/uni/overload.t
+++ b/t/uni/overload.t
@@ -9,7 +9,7 @@ BEGIN {
     set_up_inc( '../lib' );
 }
 
-plan(tests => 217);
+plan(tests => 193);
 
 package UTF8Toggle;
 use strict;
@@ -158,7 +158,7 @@ my $tmpfile = tempfile();
 
 foreach my $operator ('print', 'syswrite', 'syswrite len', 'syswrite off',
 		      'syswrite len off') {
-    foreach my $layer ('', ':utf8') {
+    foreach my $layer ('', $operator =~ /syswrite/ ? () : (':utf8')) {
 	open my $fh, "+>$layer", $tmpfile or die $!;
 	my $pad = $operator =~ /\boff\b/ ? "\243" : "";
 	my $trail = $operator =~ /\blen\b/ ? "!" : "";
diff --git a/t/uni/readline.t b/t/uni/readline.t
index 893a290893..253efe3a42 100644
--- a/t/uni/readline.t
+++ b/t/uni/readline.t
@@ -29,8 +29,7 @@ like($@, qr/Modification of a read-only value attempted/, '[perl #19566]');
 use strict;
 my $err;
 {
-  no warnings qw(deprecated);
-  open ���, '.' and sysread ���, $_, 1;
+  open ���, '.' and binmode ��� and sysread ���, $_, 1;
   $err = $! + 0;
   close ���;
 }
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Sep 25, 2018

From @tonycoz

On Mon, 24 Sep 2018 18​:26​:12 -0700, tonyc wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

Ignore that, I misread a test report.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 25, 2018

From @Leont

On Tue, Sep 25, 2018 at 3​:26 AM Tony Cook via RT
<perlbug-followup@​perl.org> wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

I should point out that File​::Slurp still hasn't been fix to not use
this misfeature (despite a ticket being open about it for half a
decade). File​::Slurp currently has 636 direct dependents (and an
unknown bug likely high number of indirect dependencies).

This change will break CPAN given the current state of File​::Slurp.

Leon

Leon

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2018

From @tonycoz

On Mon, 24 Sep 2018 21​:50​:54 -0700, tonyc wrote​:

On Mon, 24 Sep 2018 18​:26​:12 -0700, tonyc wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start
croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

Ignore that, I misread a test report.

The attached should be better.

sigtrap.pm in particular was a problem.

It has a signal handler (from the 5.000 commit) that appears to try to
behave safely when called as an unsafe signal handler, and uses syswrite()
to write to STDERR.

This is broken if STDERR happens to have any non-trivial layers on it.

I've modified the sigtrap code to try to use syswrite() in an eval block initially and fallback to print, and then check PerlIO​::get_layers() for
any non-default layers to decide whether to continue to use syswrite() if
it happened to succeed the first time.

This means if the layers don't produce something vaguely like ASCII, the
initial output will be garbled, but that was true of the original code.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2018

From @tonycoz

0001-perl-133170-fatalize-sysread-syswrite-recv-send-on-u.patch
From b22b806fcc2fa1ce4ed76ca42070f35b1b94a049 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 25 Sep 2018 11:18:40 +1000
Subject: (perl #133170) fatalize sysread/syswrite/recv/send on :utf8 handles

This includes removing the :utf8 logic from pp_syswrite.  pp_sysread
retains it, since it's also used for read().

Tests that are specifically testing the behaviour against :utf8
handles have been removed (eg in lib/open.t), several other tests
that incidentally used those functions on :utf8 handles have been
adapted to use :raw handles instead (eg. op/readline.t).

Test lib/sigtrap.t fails if STDERR is :utf8, in code from the
original 5.000 commit, which is intended to run in a signal handler
---
 cpan/autodie/t/recv.t |   3 ++
 lib/open.t            | 122 +-------------------------------------------------
 pod/perldiag.pod      |  17 +++----
 pod/perlfunc.pod      |  33 ++++----------
 pp_sys.c              |  80 ++++++---------------------------
 t/io/utf8.t           |  14 +++---
 t/lib/croak/pp_sys    |  20 +++++++++
 t/lib/warnings/pp_sys |  24 ----------
 t/op/gmagic.t         |   9 ----
 t/op/readline.t       |  10 ++---
 t/op/sysio.t          |  28 +-----------
 t/uni/overload.t      |   6 +--
 t/uni/readline.t      |   3 +-
 13 files changed, 70 insertions(+), 299 deletions(-)

diff --git a/cpan/autodie/t/recv.t b/cpan/autodie/t/recv.t
index f67b2f8187..97c7a4360d 100644
--- a/cpan/autodie/t/recv.t
+++ b/cpan/autodie/t/recv.t
@@ -13,6 +13,8 @@ $SIG{PIPE} = 'IGNORE';
 
 my ($sock1, $sock2);
 socketpair($sock1, $sock2, AF_UNIX, SOCK_STREAM, PF_UNSPEC);
+binmode $sock1;
+binmode $sock2;
 
 my $buffer;
 send($sock1, "xyz", 0);
@@ -40,6 +42,7 @@ SKIP: {
 eval {
     my $string = "now is the time...";
     open(my $fh, '<', \$string) or die("Can't open \$string for read");
+    binmode $fh;
     # $fh isn't a socket, so this should fail.
     recv($fh,$buffer,1,0);
 };
diff --git a/lib/open.t b/lib/open.t
index 5150c7f8a2..fa17f1a97c 100644
--- a/lib/open.t
+++ b/lib/open.t
@@ -8,7 +8,7 @@ BEGIN {
 	require './charset_tools.pl';
 }
 
-plan 23;
+plan 11;
 
 # open::import expects 'open' as its first argument, but it clashes with open()
 sub import {
@@ -62,126 +62,6 @@ is( ${^OPEN}, ":raw :crlf\0:raw :crlf",
 is( $^H{'open_IO'}, 'crlf', 'should record last layer set in %^H' );
 
 SKIP: {
-    skip("no perlio, no :utf8", 12) unless (find PerlIO::Layer 'perlio');
-
-    eval <<EOE;
-    use open ':utf8';
-    open(O, ">utf8");
-    print O chr(0x100);
-    close O;
-    open(I, "<utf8");
-    is(ord(<I>), 0x100, ":utf8 single wide character round-trip");
-    close I;
-EOE
-
-    open F, ">a";
-    @a = map { chr(1 << ($_ << 2)) } 0..5; # 0x1, 0x10, .., 0x100000
-    unshift @a, chr(0); # ... and a null byte in front just for fun
-    print F @a;
-    close F;
-
-    sub systell {
-        use Fcntl 'SEEK_CUR';
-        sysseek($_[0], 0, SEEK_CUR);
-    }
-
-    require bytes; # not use
-
-    my $ok;
-
-    open F, "<:utf8", "a";
-    $ok = $a = 0;
-    for (@a) {
-        unless (
-		($c = sysread(F, $b, 1)) == 1  &&
-		length($b)               == 1  &&
-		ord($b)                  == ord($_) &&
-		systell(F)               == ($a += bytes::length($b))
-		) {
-	    print '# ord($_)           == ', ord($_), "\n";
-	    print '# ord($b)           == ', ord($b), "\n";
-	    print '# length($b)        == ', length($b), "\n";
-	    print '# bytes::length($b) == ', bytes::length($b), "\n";
-	    print '# systell(F)        == ', systell(F), "\n";
-	    print '# $a                == ', $a, "\n";
-	    print '# $c                == ', $c, "\n";
-	    last;
-	}
-	$ok++;
-    }
-    close F;
-    ok($ok == @a,
-       "on :utf8 streams sysread() should work on characters, not bytes");
-
-    sub diagnostics {
-	print '# ord($_)           == ', ord($_), "\n";
-	print '# bytes::length($_) == ', bytes::length($_), "\n";
-	print '# systell(G)        == ', systell(G), "\n";
-	print '# $a                == ', $a, "\n";
-	print '# $c                == ', $c, "\n";
-    }
-
-
-    my %actions = (
-		   syswrite => sub { syswrite G, shift; },
-		   'syswrite len' => sub { syswrite G, shift, 1; },
-		   'syswrite len pad' => sub {
-		       my $temp = shift() . "\243";
-		       syswrite G, $temp, 1; },
-		   'syswrite off' => sub { 
-		       my $temp = "\351" . shift();
-		       syswrite G, $temp, 1, 1; },
-		   'syswrite off pad' => sub { 
-		       my $temp = "\351" . shift() . "\243";
-		       syswrite G, $temp, 1, 1; },
-		  );
-
-    foreach my $key (sort keys %actions) {
-	# syswrite() on should work on characters, not bytes
-	open G, ">:utf8", "b";
-
-	print "# $key\n";
-	$ok = $a = 0;
-	for (@a) {
-	    unless (
-		    ($c = $actions{$key}($_)) == 1 &&
-		    systell(G)                == ($a += bytes::length($_))
-		   ) {
-		diagnostics();
-		last;
-	    }
-	    $ok++;
-	}
-	close G;
-	ok($ok == @a,
-	   "on :utf8 streams syswrite() should work on characters, not bytes");
-
-	open G, "<:utf8", "b";
-	$ok = $a = 0;
-	for (@a) {
-	    unless (
-		    ($c = sysread(G, $b, 1)) == 1 &&
-		    length($b)               == 1 &&
-		    ord($b)                  == ord($_) &&
-		    systell(G)               == ($a += bytes::length($_))
-		   ) {
-		print '# ord($_)           == ', ord($_), "\n";
-		print '# ord($b)           == ', ord($b), "\n";
-		print '# length($b)        == ', length($b), "\n";
-		print '# bytes::length($b) == ', bytes::length($b), "\n";
-		print '# systell(G)        == ', systell(G), "\n";
-		print '# $a                == ', $a, "\n";
-		print '# $c                == ', $c, "\n";
-		last;
-	    }
-	    $ok++;
-	}
-	close G;
-	ok($ok == @a,
-	   "checking syswrite() output on :utf8 streams by reading it back in");
-    }
-}
-SKIP: {
     skip("no perlio", 1) unless (find PerlIO::Layer 'perlio');
     skip("no Encode", 1) unless $Config{extensions} =~ m{\bEncode\b};
     skip("EBCDIC platform doesnt have 'use encoding' used by open ':locale'", 1)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 2c1fe74a87..657a427b1d 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3205,27 +3205,24 @@ neither as a system call nor an ioctl call (SIOCATMARK).
 Perl.  The current valid ones are given in
 L<perlrebackslash/\b{}, \b, \B{}, \B>.
 
-=item %s() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30
+=item %s() isn't allowed on :utf8 handles
 
-(D deprecated) The sysread(), recv(), syswrite() and send() operators are
-deprecated on handles that have the C<:utf8> layer, either explicitly, or
+(F) The sysread(), recv(), syswrite() and send() operators are
+not allowed on handles that have the C<:utf8> layer, either explicitly, or
 implicitly, eg., with the C<:encoding(UTF-16LE)> layer.
 
-Both sysread() and recv() currently use only the C<:utf8> flag for the stream,
-ignoring the actual layers.  Since sysread() and recv() do no UTF-8
+Previously sysread() and recv() currently use only the C<:utf8> flag for the stream,
+ignoring the actual layers.  Since sysread() and recv() did no UTF-8
 validation they can end up creating invalidly encoded scalars.
 
-Similarly, syswrite() and send() use only the C<:utf8> flag, otherwise ignoring
-any layers.  If the flag is set, both write the value UTF-8 encoded, even if
+Similarly, syswrite() and send() used only the C<:utf8> flag, otherwise ignoring
+any layers.  If the flag is set, both wrote the value UTF-8 encoded, even if
 the layer is some different encoding, such as the example above.
 
 Ideally, all of these operators would completely ignore the C<:utf8> state,
 working only with bytes, but this would result in silently breaking existing
 code.
 
-In Perl 5.30, it will no longer be possible to use sysread(), recv(),
-syswrite() or send() to read or send bytes from/to :utf8 handles.
-
 =item "%s" is more clearly written simply as "%s" in regex; marked by S<<-- HERE> in m/%s/
 
 (W regexp) (only under C<S<use re 'strict'>> or within C<(?[...])>)
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index a2fad3b8fc..316daff1cf 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6284,14 +6284,9 @@ string otherwise.  If there's an error, returns the undefined value.
 This call is actually implemented in terms of the L<recvfrom(2)> system call.
 See L<perlipc/"UDP: Message Passing"> for examples.
 
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are received.  By default all sockets
-operate on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will
-operate on UTF8-encoded Unicode
-characters, not bytes.  Similarly for the C<:encoding> layer: in that
-case pretty much any characters can be read.
+Note that if the socket has been marked as C<:utf8>, C<recv> will
+throw an exception.  The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item redo LABEL
 X<redo>
@@ -7083,14 +7078,9 @@ case it does a L<sendto(2)> syscall.  Returns the number of characters sent,
 or the undefined value on error.  The L<sendmsg(2)> syscall is currently
 unimplemented.  See L<perlipc/"UDP: Message Passing"> for examples.
 
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are sent.  By default all sockets operate
-on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or
-the L<open> pragma), the I/O will operate on UTF-8
-encoded Unicode characters, not bytes.  Similarly for the C<:encoding>
-layer: in that case pretty much any characters can be sent.
+Note that if the socket has been marked as C<:utf8>, C<send> will
+throw an exception.  The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item setpgrp PID,PGRP
 X<setpgrp> X<group>
@@ -8723,10 +8713,8 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
 anyway.  Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
 check for a return value of 0 to decide whether you're done.
 
-Note that if the filehandle has been marked as C<:utf8>, Unicode
-characters are read instead of bytes (the LENGTH, OFFSET, and the
-return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in Unicode characters).  The C<:encoding(...)> layer implicitly
+Note that if the filehandle has been marked as C<:utf8>, C<sysread> will
+throw an exception.  The C<:encoding(...)> layer implicitly
 introduces the C<:utf8> layer.  See
 L<C<binmode>|/binmode FILEHANDLE, LAYER>,
 L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
@@ -8887,10 +8875,7 @@ string other than the beginning.  A negative OFFSET specifies writing
 that many characters counting backwards from the end of the string.
 If SCALAR is of length zero, you can only use an OFFSET of 0.
 
-B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters
-encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
-return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in (UTF8-encoded Unicode) characters.
+B<WARNING>: If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception.
 The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
 Alternately, if the handle is not marked with an encoding but you
 attempt to write characters with code points over 255, raises an exception.
diff --git a/pp_sys.c b/pp_sys.c
index 4ae475d460..00faa7711f 100644
--- a/pp_sys.c
+++ b/pp_sys.c
@@ -1725,10 +1725,9 @@ PP(pp_sysread)
 
     if ((fp_utf8 = PerlIO_isutf8(IoIFP(io))) && !IN_BYTES) {
         if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
-            Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),
-                             "%s() is deprecated on :utf8 handles. "
-                             "This will be a fatal error in Perl 5.30",
-                             OP_DESC(PL_op));
+            Perl_croak(aTHX_
+                       "%s() isn't allowed on :utf8 handles",
+                       OP_DESC(PL_op));
         }
 	buffer = SvPVutf8_force(bufsv, blen);
 	/* UTF-8 may not have been set if they are all low bytes */
@@ -1939,7 +1938,6 @@ PP(pp_syswrite)
     const char *buffer;
     SSize_t retval;
     STRLEN blen;
-    STRLEN orig_blen_bytes;
     const int op_type = PL_op->op_type;
     bool doing_utf8;
     U8 *tmpbuf = NULL;
@@ -1985,20 +1983,12 @@ PP(pp_syswrite)
 
     /* Do this first to trigger any overloading.  */
     buffer = SvPV_const(bufsv, blen);
-    orig_blen_bytes = blen;
     doing_utf8 = DO_UTF8(bufsv);
 
     if (PerlIO_isutf8(IoIFP(io))) {
-        Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED),
-                         "%s() is deprecated on :utf8 handles. "
-                         "This will be a fatal error in Perl 5.30",
-                         OP_DESC(PL_op));
-	if (!SvUTF8(bufsv)) {
-	    /* We don't modify the original scalar.  */
-	    tmpbuf = bytes_to_utf8((const U8*) buffer, &blen);
-	    buffer = (char *) tmpbuf;
-	    doing_utf8 = TRUE;
-	}
+        Perl_croak(aTHX_
+                   "%s() isn't allowed on :utf8 handles",
+                   OP_DESC(PL_op));
     }
     else if (doing_utf8) {
 	STRLEN tmplen = blen;
@@ -2031,25 +2021,10 @@ PP(pp_syswrite)
 #endif
     {
 	Size_t length = 0; /* This length is in characters.  */
-	STRLEN blen_chars;
 	IV offset;
 
-	if (doing_utf8) {
-	    if (tmpbuf) {
-		/* The SV is bytes, and we've had to upgrade it.  */
-		blen_chars = orig_blen_bytes;
-	    } else {
-		/* The SV really is UTF-8.  */
-		/* Don't call sv_len_utf8 on a magical or overloaded
-		   scalar, as we might get back a different result.  */
-		blen_chars = sv_or_pv_len_utf8(bufsv, buffer, blen);
-	    }
-	} else {
-	    blen_chars = blen;
-	}
-
 	if (MARK >= SP) {
-	    length = blen_chars;
+	    length = blen;
 	} else {
 #if Size_t_size > IVSIZE
 	    length = (Size_t)SvNVx(*++MARK);
@@ -2065,46 +2040,21 @@ PP(pp_syswrite)
 	if (MARK < SP) {
 	    offset = SvIVx(*++MARK);
 	    if (offset < 0) {
-		if (-offset > (IV)blen_chars) {
+		if (-offset > (IV)blen) {
 		    Safefree(tmpbuf);
 		    DIE(aTHX_ "Offset outside string");
 		}
-		offset += blen_chars;
-	    } else if (offset > (IV)blen_chars) {
+		offset += blen;
+	    } else if (offset > (IV)blen) {
 		Safefree(tmpbuf);
 		DIE(aTHX_ "Offset outside string");
 	    }
 	} else
 	    offset = 0;
-	if (length > blen_chars - offset)
-	    length = blen_chars - offset;
-	if (doing_utf8) {
-	    /* Here we convert length from characters to bytes.  */
-	    if (tmpbuf || SvGMAGICAL(bufsv) || SvAMAGIC(bufsv)) {
-		/* Either we had to convert the SV, or the SV is magical, or
-		   the SV has overloading, in which case we can't or mustn't
-		   or mustn't call it again.  */
-
-		buffer = (const char*)utf8_hop((const U8 *)buffer, offset);
-		length = utf8_hop((U8 *)buffer, length) - (U8 *)buffer;
-	    } else {
-		/* It's a real UTF-8 SV, and it's not going to change under
-		   us.  Take advantage of any cache.  */
-		I32 start = offset;
-		I32 len_I32 = length;
-
-		/* Convert the start and end character positions to bytes.
-		   Remember that the second argument to sv_pos_u2b is relative
-		   to the first.  */
-		sv_pos_u2b(bufsv, &start, &len_I32);
-
-		buffer += start;
-		length = len_I32;
-	    }
-	}
-	else {
-	    buffer = buffer+offset;
-	}
+	if (length > blen - offset)
+	    length = blen - offset;
+        buffer = buffer+offset;
+
 #ifdef PERL_SOCK_SYSWRITE_IS_SEND
 	if (IoTYPE(io) == IoTYPE_SOCKET) {
 	    retval = PerlSock_send(fd, buffer, length, 0);
@@ -2120,8 +2070,6 @@ PP(pp_syswrite)
     if (retval < 0)
 	goto say_undef;
     SP = ORIGMARK;
-    if (doing_utf8)
-        retval = utf8_length((U8*)buffer, (U8*)buffer + retval);
 
     Safefree(tmpbuf);
 #if Size_t_size > IVSIZE
diff --git a/t/io/utf8.t b/t/io/utf8.t
index 2b700595c8..0bc8a5c2bf 100644
--- a/t/io/utf8.t
+++ b/t/io/utf8.t
@@ -10,7 +10,7 @@ skip_all_without_perlio();
 no utf8; # needed for use utf8 not griping about the raw octets
 
 
-plan(tests => 63);
+plan(tests => 62);
 
 $| = 1;
 
@@ -312,16 +312,14 @@ is($failed, undef);
 {
     # [perl #23428] Somethings rotten in unicode semantics
     open F, ">$a_file";
-    binmode F, ":utf8";
-    no warnings qw(deprecated);
-    syswrite(F, $a = chr(0x100));
+    binmode F;
+    $a = "A";
+    utf8::upgrade($a);
+    syswrite(F, $a);
     close F;
-    is( ord($a), 0x100, '23428 syswrite should not downgrade scalar' );
-    like( $a, qr/^\w+/, '23428 syswrite should not downgrade scalar' );
+    ok(utf8::is_utf8($a), '23428 syswrite should not downgrade scalar' );
 }
 
-# sysread() and syswrite() tested in lib/open.t since Fcntl is used
-
 {
     # <FH> on a :utf8 stream should complain immediately with -w
     # if it finds bad UTF-8 (:encoding(utf8) works this way)
diff --git a/t/lib/croak/pp_sys b/t/lib/croak/pp_sys
index 8b7dc9d53d..be100da27a 100644
--- a/t/lib/croak/pp_sys
+++ b/t/lib/croak/pp_sys
@@ -73,3 +73,23 @@ open my $���������, "../harness";
 opendir $���������, ".";
 EXPECT
 Cannot open $��������� as a dirhandle: it is already open as a filehandle at - line 5.
+########
+# NAME sysread() disallowed on :utf8
+open my $fh, "<:raw", "../harness" or die "# $!";
+my $buf;
+sysread $fh, $buf, 10;
+binmode $fh, ':utf8';
+sysread $fh, $buf, 10;
+EXPECT
+sysread() isn't allowed on :utf8 handles at - line 5.
+########
+# NAME syswrite() disallowed on :utf8
+my $file = "syswwarn.tmp";
+open my $fh, ">:raw", $file or die "# $!";
+syswrite $fh, 'ABC';
+binmode $fh, ':utf8';
+syswrite $fh, 'ABC';
+close $fh;
+END { unlink $file; }
+EXPECT
+syswrite() isn't allowed on :utf8 handles at - line 5.
diff --git a/t/lib/warnings/pp_sys b/t/lib/warnings/pp_sys
index 90d3cc790d..5f6b83d2f6 100644
--- a/t/lib/warnings/pp_sys
+++ b/t/lib/warnings/pp_sys
@@ -890,30 +890,6 @@ sleep(-1);
 EXPECT
 sleep() with negative argument at - line 2.
 ########
-# NAME sysread() deprecated on :utf8
-open my $fh, "<:raw", "../harness" or die "# $!";
-my $buf;
-sysread $fh, $buf, 10;
-binmode $fh, ':utf8';
-sysread $fh, $buf, 10;
-no warnings 'deprecated';
-sysread $fh, $buf, 10;
-EXPECT
-sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at - line 5.
-########
-# NAME syswrite() deprecated on :utf8
-my $file = "syswwarn.tmp";
-open my $fh, ">:raw", $file or die "# $!";
-syswrite $fh, 'ABC';
-binmode $fh, ':utf8';
-syswrite $fh, 'ABC';
-no warnings 'deprecated';
-syswrite $fh, 'ABC';
-close $fh;
-unlink $file;
-EXPECT
-syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at - line 5.
-########
 # NAME stat on name with \0
 use warnings;
 my @x = stat("./\0-");
diff --git a/t/op/gmagic.t b/t/op/gmagic.t
index 210e8e5cc9..0ed575525f 100644
--- a/t/op/gmagic.t
+++ b/t/op/gmagic.t
@@ -76,15 +76,6 @@ expected_tie_calls(tied $c, 1, 2, 'chomping a ref');
     expected_tie_calls(tied $c, 2, 2, 'calling sysread with tied buf');
     close $h or die "$0 cannot close $outfile: $!";
 
- # Do this again, with a utf8 handle
-    $c = *foo;                                         # 1 write
-    open $h, "<:utf8", $outfile;
-    no warnings 'deprecated';
-    sysread $h, $c, 3, 7;                              # 1 read; 1 write
-    is $c, "*main::bar", 'what sysread wrote';         # 1 read
-    expected_tie_calls(tied $c, 2, 2, 'calling sysread with tied buf');
-    close $h or die "$0 cannot close $outfile: $!";
-
     unlink_all $outfile;
 }
 
diff --git a/t/op/readline.t b/t/op/readline.t
index c2727fe829..ba4efa71a4 100644
--- a/t/op/readline.t
+++ b/t/op/readline.t
@@ -215,9 +215,8 @@ SKIP: {
     my $line = 'ascii';
     my ( $in, $out );
     pipe $in, $out;
-    binmode $out, ':utf8';
+    binmode $out;
     binmode $in,  ':utf8';
-    no warnings qw(deprecated);
     syswrite $out, "...\n";
     $line .= readline $in;
 
@@ -228,10 +227,11 @@ SKIP: {
     my $line = "\x{2080} utf8";;
     my ( $in, $out );
     pipe $in, $out;
-    binmode $out, ':utf8';
+    binmode $out;
     binmode $in,  ':utf8';
-    no warnings qw(deprecated);
-    syswrite $out, "\x{2080}...\n";
+    my $outdata = "\x{2080}...\n";
+    utf8::encode($outdata);
+    syswrite $out, $outdata;
     $line .= readline $in;
 
     is( $line, "\x{2080} utf8\x{2080}...\n", 'appending from utf to utf8' );
diff --git a/t/op/sysio.t b/t/op/sysio.t
index ebcf821d37..c6d9bd8917 100644
--- a/t/op/sysio.t
+++ b/t/op/sysio.t
@@ -6,7 +6,7 @@ BEGIN {
   set_up_inc('../lib');
 }
 
-plan tests => 48;
+plan tests => 45;
 
 open(I, 'op/sysio.t') || die "sysio.t: cannot find myself: $!";
 binmode I;
@@ -221,32 +221,6 @@ close(I);
 
 unlink_all $outfile;
 
-# Check that utf8 IO doesn't upgrade the scalar
-{
-    no warnings 'deprecated';
-    open(I, ">$outfile") || die "sysio.t: cannot write $outfile: $!";
-    # Will skip harmlessly on stdioperl
-    eval {binmode STDOUT, ":utf8"};
-    die $@ if $@ and $@ !~ /^IO layers \(like ':utf8'\) unavailable/;
-
-    # y diaresis is \w when UTF8
-    $a = chr 255;
-
-    unlike($a, qr/\w/);
-
-    syswrite I, $a;
-
-    # Should not be upgraded as a side effect of syswrite.
-    unlike($a, qr/\w/);
-
-    # This should work
-    eval {syswrite I, 2;};
-    is($@, '');
-
-    close(I);
-}
-unlink_all $outfile;
-
 chdir('..');
 
 1;
diff --git a/t/uni/overload.t b/t/uni/overload.t
index 8e722c850e..161484500e 100644
--- a/t/uni/overload.t
+++ b/t/uni/overload.t
@@ -9,7 +9,7 @@ BEGIN {
     set_up_inc( '../lib' );
 }
 
-plan(tests => 217);
+plan(tests => 193);
 
 package UTF8Toggle;
 use strict;
@@ -158,8 +158,8 @@ my $tmpfile = tempfile();
 
 foreach my $operator ('print', 'syswrite', 'syswrite len', 'syswrite off',
 		      'syswrite len off') {
-    foreach my $layer ('', ':utf8') {
-	open my $fh, "+>$layer", $tmpfile or die $!;
+    foreach my $layer ('', $operator =~ /syswrite/ ? () : (':utf8')) {
+	open my $fh, "+>:raw$layer", $tmpfile or die $!;
 	my $pad = $operator =~ /\boff\b/ ? "\243" : "";
 	my $trail = $operator =~ /\blen\b/ ? "!" : "";
 	my $u = UTF8Toggle->new("$pad$E_acute\n$trail");
diff --git a/t/uni/readline.t b/t/uni/readline.t
index 893a290893..253efe3a42 100644
--- a/t/uni/readline.t
+++ b/t/uni/readline.t
@@ -29,8 +29,7 @@ like($@, qr/Modification of a read-only value attempted/, '[perl #19566]');
 use strict;
 my $err;
 {
-  no warnings qw(deprecated);
-  open ���, '.' and sysread ���, $_, 1;
+  open ���, '.' and binmode ��� and sysread ���, $_, 1;
   $err = $! + 0;
   close ���;
 }
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Sep 26, 2018

From @tonycoz

0002-perl-133170-adapt-sigtrap-for-layers-on-STDERR.patch
From 63745c81b519a507576574c9897eccc5b1ab9291 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 26 Sep 2018 11:12:34 +1000
Subject: (perl #133170) adapt sigtrap for layers on STDERR.

sigtrap defines a signal handler apparently intended to be called
under unsafe signals, since a) the code was written before safe
signals were implemented and b) it uses syswrite() for output and
avoid creating new SVs where it can.

Unfortunately syswrite() doesn't handle PerlIO layers, *and* with
syswrite() being disallowed for :utf8 handlers, throws an exception.

This causes the sigtrap tests to fail if PERL_UNICODE is set and the
current locale is a UTF-8 locale.

I want to avoid allocating new SVs until the point where the code
originally did so, so the code now attempts a syswrite() under
eval, falling back to print, and then at the point where the original
code started allocating SVs uses PerlIO::get_layers() to check if
any layers might make a difference to the output.
---
 lib/sigtrap.pm | 56 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/lib/sigtrap.pm b/lib/sigtrap.pm
index 7d801461d4..11d670942b 100644
--- a/lib/sigtrap.pm
+++ b/lib/sigtrap.pm
@@ -8,7 +8,7 @@ sigtrap - Perl pragma to enable simple signal handling
 
 use Carp;
 
-$VERSION = 1.08;
+$VERSION = 1.09;
 $Verbose ||= 0;
 
 sub import {
@@ -81,16 +81,49 @@ sub handler_die {
 
 sub handler_traceback {
     package DB;		# To get subroutine args.
+    my $use_print;
     $SIG{'ABRT'} = DEFAULT;
     kill 'ABRT', $$ if $panic++;
-    syswrite(STDERR, 'Caught a SIG', 12);
-    syswrite(STDERR, $_[0], length($_[0]));
-    syswrite(STDERR, ' at ', 4);
+
+    # This function might be called as an unsafe signal handler, so it
+    # tries to delay any memory allocations as long as possible.
+    #
+    # Unfortunately with PerlIO layers, using syswrite() here has always
+    # been broken.
+    #
+    # Calling PerlIO::get_layers() here is tempting, but that does
+    # allocations, which we're trying to avoid for this early code.
+    if (eval { syswrite(STDERR, 'Caught a SIG', 12); 1 }) {
+        syswrite(STDERR, $_[0], length($_[0]));
+        syswrite(STDERR, ' at ', 4);
+    }
+    else {
+        print STDERR 'Caught a SIG', $_[0], ' at ';
+        ++$use_print;
+    }
+
     ($pack,$file,$line) = caller;
-    syswrite(STDERR, $file, length($file));
-    syswrite(STDERR, ' line ', 6);
-    syswrite(STDERR, $line, length($line));
-    syswrite(STDERR, "\n", 1);
+    unless ($use_print) {
+        syswrite(STDERR, $file, length($file));
+        syswrite(STDERR, ' line ', 6);
+        syswrite(STDERR, $line, length($line));
+        syswrite(STDERR, "\n", 1);
+    }
+    else {
+        print STDERR $file, ' line ', $line, "\n";
+    }
+
+    # we've got our basic output done, from now on we can be freer with allocations
+    # find out whether we have any layers we need to worry about
+    unless ($use_print) {
+        my @layers = PerlIO::get_layers(*STDERR);
+        for my $name (@layers) {
+            unless ($name =~ /^(unix|perlio)$/) {
+                ++$use_print;
+                last;
+            }
+        }
+    }
 
     # Now go for broke.
     for ($i = 1; ($p,$f,$l,$s,$h,$w,$e,$r) = caller($i); $i++) {
@@ -116,7 +149,12 @@ sub handler_traceback {
 	}
 	$f = "file '$f'" unless $f eq '-e';
 	$mess = "$w$s$a called from $f line $l\n";
-	syswrite(STDERR, $mess, length($mess));
+        if ($use_print) {
+            print STDERR $mess;
+        }
+        else {
+            syswrite(STDERR, $mess, length($mess));
+        }
     }
     kill 'ABRT', $$;
 }
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2018

From @tonycoz

On Tue, 25 Sep 2018 10​:27​:33 -0700, LeonT wrote​:

On Tue, Sep 25, 2018 at 3​:26 AM Tony Cook via RT
<perlbug-followup@​perl.org> wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start
croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

I should point out that File​::Slurp still hasn't been fix to not use
this misfeature (despite a ticket being open about it for half a
decade). File​::Slurp currently has 636 direct dependents (and an
unknown bug likely high number of indirect dependencies).

This change will break CPAN given the current state of File​::Slurp.

Code using File​::Slurp with :utf8 is already broken, it just won't be
silently broken now.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2018

From @tonycoz

On Tue, 25 Sep 2018 18​:38​:58 -0700, tonyc wrote​:

On Mon, 24 Sep 2018 21​:50​:54 -0700, tonyc wrote​:

On Mon, 24 Sep 2018 18​:26​:12 -0700, tonyc wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start
croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

Ignore that, I misread a test report.

The attached should be better.

sigtrap.pm in particular was a problem.

It has a signal handler (from the 5.000 commit) that appears to try to
behave safely when called as an unsafe signal handler, and uses
syswrite()
to write to STDERR.

This is broken if STDERR happens to have any non-trivial layers on it.

I've modified the sigtrap code to try to use syswrite() in an eval
block initially and fallback to print, and then check
PerlIO​::get_layers() for
any non-default layers to decide whether to continue to use syswrite()
if
it happened to succeed the first time.

This means if the layers don't produce something vaguely like ASCII,
the
initial output will be garbled, but that was true of the original
code.

Applied as 1ed4b77 and 5c0551a.

Tony

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2018

From @Leont

On Wed, Oct 10, 2018 at 3​:14 AM Tony Cook via RT
<perlbug-followup@​perl.org> wrote​:

On Tue, 25 Sep 2018 10​:27​:33 -0700, LeonT wrote​:

On Tue, Sep 25, 2018 at 3​:26 AM Tony Cook via RT
<perlbug-followup@​perl.org> wrote​:

On Sun, 15 Oct 2017 22​:23​:32 -0700, tonyc wrote​:

Added to the 5.30 blockers ticket, since these should start
croaking then.

Patch to make them fatal attached, I'll apply this Soon(tm).

I should point out that File​::Slurp still hasn't been fix to not use
this misfeature (despite a ticket being open about it for half a
decade). File​::Slurp currently has 636 direct dependents (and an
unknown bug likely high number of indirect dependencies).

This change will break CPAN given the current state of File​::Slurp.

Code using File​::Slurp with :utf8 is already broken, it just won't be
silently broken now.

Tony

True, but it's now also uninstallable for all File​::Slurp users that
don't use :utf8 (which is probably the majority).

Leon

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2018

From tschoening@am-soft.de

Am Sun, 09 Aug 2015 23​:23​:05 -0700, tonyc schrieb​:

 if \(\(fp\_utf8 = PerlIO\_isutf8\(IoIFP\(io\)\)\) && \!IN\_BYTES\) \{

+ if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
+ Perl_ck_warner(aTHX_ packWARN(WARN_DEPRECATED),
+ "%s() is deprecated on :utf8 handles",
+ OP_DESC(PL_op));
+ }

Could someone please explain me if the following change is valid​:

binmode($DB​::OUT, '​:utf-8')
to
binmode($DB​::OUT, '​:encoding(UTF-8)')

From my understanding it is not because of statements like the following​:

+(W deprecated) The sysread(), recv(), syswrite() and send() operators
+are deprecated on handles that have the C<​:utf8> layer, either
+explicitly, or implicitly, eg., with the C<​:encoding(UTF-16LE)> layer.

https://rt-archive.perl.org/perl5/Ticket/Attachment/1360169/728149/0001-perl-125760-deprecate-sys-read-write-send-recv-on-ut.patch

+Note that if the filehandle has been marked as C<​:utf8>, C<sysread> will
+throw an exception. The C<​:encoding(...)> layer implicitly
introduces the C<​:utf8> layer.</sysread>

https://rt-archive.perl.org/perl5/Ticket/Attachment/1583747/827814/0001-perl-133170-fatalize-sysread-syswrite-recv-send-on-u.patch

But the change seems to fix the warnings. Shouldn't that change still result in the "​:utf8"-flag being applied and therefore being still wrong? The other attached patches of this bug e.g. regarding tests use "​:raw" only, like I understand this bug as well.

Thanks for any clarification!

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2018

From @tonycoz

On Sun, Nov 11, 2018 at 05​:24​:51AM -0800, Thorsten Schöning via RT wrote​:

Am Sun, 09 Aug 2015 23​:23​:05 -0700, tonyc schrieb​:

 if \(\(fp\_utf8 = PerlIO\_isutf8\(IoIFP\(io\)\)\) && \!IN\_BYTES\) \{

+ if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
+ Perl_ck_warner(aTHX_ packWARN(WARN_DEPRECATED),
+ "%s() is deprecated on :utf8 handles",
+ OP_DESC(PL_op));
+ }

Could someone please explain me if the following change is valid​:

binmode($DB​::OUT, '​:utf-8')
to
binmode($DB​::OUT, '​:encoding(UTF-8)')

No, that won't prevent the warning.

For example, with a default 5.28.0 build​:

$ ./perl -Ilib -e 'open my $f, ">", "foo"; binmode $f; binmode $f, "​:encoding(UTF8)"; syswrite($f, "foo")'
syswrite() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 at -e line 1.

In blead this throws an error instead.

From my understanding it is not because of statements like the following​:

+(W deprecated) The sysread(), recv(), syswrite() and send() operators
+are deprecated on handles that have the C<​:utf8> layer, either
+explicitly, or implicitly, eg., with the C<​:encoding(UTF-16LE)> layer.

https://rt-archive.perl.org/perl5/Ticket/Attachment/1360169/728149/0001-perl-125760-deprecate-sys-read-write-send-recv-on-ut.patch

+Note that if the filehandle has been marked as C<​:utf8>, C<sysread> will
+throw an exception. The C<​:encoding(...)> layer implicitly
introduces the C<​:utf8> layer.</sysread>

https://rt-archive.perl.org/perl5/Ticket/Attachment/1583747/827814/0001-perl-133170-fatalize-sysread-syswrite-recv-send-on-u.patch

But the change seems to fix the warnings. Shouldn't that change still result in the "​:utf8"-flag being applied and therefore being still wrong? The other attached patches of this bug e.g. regarding tests use "​:raw" only, like I understand this bug as well.

I'm not sure what you're saying here, if you could provide simple but
complete code that doesn't behave how you expect, the result you
expect and the result you got, we can decide if we have a code bug or
perhaps a documentation bug.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2019

From @khwilliamson

On Sun, 11 Nov 2018 19​:36​:38 -0800, tonyc wrote​:

On Sun, Nov 11, 2018 at 05​:24​:51AM -0800, Thorsten Schöning via RT
wrote​:

Am Sun, 09 Aug 2015 23​:23​:05 -0700, tonyc schrieb​:

if ((fp_utf8 = PerlIO_isutf8(IoIFP(io))) && !IN_BYTES) {
+ if (PL_op->op_type == OP_SYSREAD || PL_op->op_type ==
OP_RECV) {
+ Perl_ck_warner(aTHX_ packWARN(WARN_DEPRECATED),
+ "%s() is deprecated on :utf8 handles",
+ OP_DESC(PL_op));
+ }

Could someone please explain me if the following change is valid​:

binmode($DB​::OUT, '​:utf-8')
to
binmode($DB​::OUT, '​:encoding(UTF-8)')

No, that won't prevent the warning.

For example, with a default 5.28.0 build​:

$ ./perl -Ilib -e 'open my $f, ">", "foo"; binmode $f; binmode $f,
"​:encoding(UTF8)"; syswrite($f, "foo")'
syswrite() is deprecated on :utf8 handles. This will be a fatal error
in Perl 5.30 at -e line 1.

In blead this throws an error instead.

From my understanding it is not because of statements like the
following​:

+(W deprecated) The sysread(), recv(), syswrite() and send()
operators
+are deprecated on handles that have the C<​:utf8> layer, either
+explicitly, or implicitly, eg., with the C<​:encoding(UTF-16LE)>
layer.

https://rt-archive.perl.org/perl5/Ticket/Attachment/1360169/728149/0001-
perl-125760-deprecate-sys-read-write-send-recv-on-ut.patch

+Note that if the filehandle has been marked as C<​:utf8>,
C<sysread> will
+throw an exception. The C<​:encoding(...)> layer implicitly
introduces the C<​:utf8> layer.</sysread>

https://rt-archive.perl.org/perl5/Ticket/Attachment/1583747/827814/0001-
perl-133170-fatalize-sysread-syswrite-recv-send-on-u.patch

But the change seems to fix the warnings. Shouldn't that change still
result in the "​:utf8"-flag being applied and therefore being still
wrong? The other attached patches of this bug e.g. regarding tests
use "​:raw" only, like I understand this bug as well.

I'm not sure what you're saying here, if you could provide simple but
complete code that doesn't behave how you expect, the result you
expect and the result you got, we can decide if we have a code bug or
perhaps a documentation bug.

Tony

Is this ticket closable?
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Apr 23, 2019

From @iabyn

I've moved this ticket from 5.30.0 blocker to 5.32.0 blocker

@p5pRT
Copy link
Author

p5pRT commented May 31, 2019

From @jkeenan

On Tue, 23 Apr 2019 14​:57​:24 GMT, davem wrote​:

I've moved this ticket from 5.30.0 blocker to 5.32.0 blocker

Dave,

What issues still remain such that this ticket remains open and is a 5.32 blocker?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented May 31, 2019

From @tonycoz

On Fri, 31 May 2019 05​:18​:40 -0700, jkeenan wrote​:

On Tue, 23 Apr 2019 14​:57​:24 GMT, davem wrote​:

I've moved this ticket from 5.30.0 blocker to 5.32.0 blocker

Dave,

What issues still remain such that this ticket remains open and is a
5.32 blocker?

None. I only left it open to track related issues, which were all resolved to my knowledge.

Closing.

Tony

@p5pRT
Copy link
Author

p5pRT commented May 31, 2019

@tonycoz - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants