Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback #16059

Closed
p5pRT opened this issue Jul 1, 2017 · 43 comments
Closed

Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback #16059

p5pRT opened this issue Jul 1, 2017 · 43 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 1, 2017

Migrated from rt.perl.org#131683 (status was 'resolved')

Searchable as RT131683$

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2017

From @pali

Hi!

I would continue in discussion started in last year in p5p mailing list​:
https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Currently there is big mess with reporting warnings from
Encode​::encode() and Encode​::decode() functions implemented by Encode
module which is part of Perl core modules.

Moreover mess is bigger, because Encode is used by PerlIO​::encoding
module which is internal Perl module.

In perl you can enable some utf8 warnings by pragma warning. Next for
Encode​::encode() and Encode​::decode() functions you can enable some
warnings by Encode​::ENCODE_WARN_ON_ERR check flag passed as optional
argument to those functions. And third you can control some warnings via
$PerlIO​::encoding​::fallback variable which is relevant for :encoding
layer (which internally use Encode module and their encode/decode
functions).

This lead to couple of bug reports like Encode​::encode() or decode()
does not respect pragma warnings, or it does not respect check flag
Encode​::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for
some modules (e.g. UTF-16) to use only pragma warnings, which basically
fully broke Encode​::ENCODE_WARN_ON_ERR check flag. More complicated
behaviour happened when using :encoding layer...

https://rt.cpan.org/Public/Bug/Display.html?id=120505
https://rt.cpan.org/Public/Bug/Display.html?id=88592
dankogai/p5-encode#26 (comment)
https://rt.perl.org/Public/Bug/Display.html?id=128788
https://rt.cpan.org/Public/Bug/Display.html?id=116629
dankogai/p5-encode#59
dankogai/p5-encode@a6c2ba3
dankogai/p5-encode@07c8adb

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour. Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2017

From @pali

0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patch
From 62698e1c8c4f70082b0311fdb3cc0c881f6650c7 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH 1/2] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS

When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
 cpan/Encode/Encode.pm          |  131 +++++++++++-----------------------------
 cpan/Encode/Encode.xs          |   13 ++--
 cpan/Encode/Encode/encode.h    |    3 +
 cpan/Encode/Unicode/Unicode.xs |  113 ++++++++++++++++++++++------------
 cpan/Encode/t/decode.t         |    9 ++-
 cpan/Encode/t/enc_eucjp.t      |    4 ++
 cpan/Encode/t/utf32warnings.t  |  114 ++++++++++++++++++++++++++++++++++
 cpan/Encode/t/utf8messages.t   |   33 ----------
 cpan/Encode/t/utf8warnings.t   |  109 ++++++++-------------------------
 9 files changed, 273 insertions(+), 256 deletions(-)
 create mode 100644 cpan/Encode/t/utf32warnings.t
 delete mode 100644 cpan/Encode/t/utf8messages.t

diff --git a/cpan/Encode/Encode.pm b/cpan/Encode/Encode.pm
index 255f594..a99f70b 100644
--- a/cpan/Encode/Encode.pm
+++ b/cpan/Encode/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
 
 use Exporter 5.57 'import';
 
+use Carp ();
 our @CARP_NOT = qw(Encode::Encoder);
 
 # Public, encouraged API is exported by default
@@ -162,109 +163,54 @@ sub clone_encoding($) {
 }
 
 sub encode($$;$) {
-    my ( $name, $string, $check ) = @_;
-    return undef unless defined $string;
-    $string .= '';    # stringify;
-    $check ||= 0;
-    unless ( defined $name ) {
-        require Carp;
-        Carp::croak("Encoding name should not be undef");
-    }
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $octets;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $octets = $enc->encode( $string, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $octets = $enc->encode( $string, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $encode = $enc->can('encode');
+    Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $string = $_[1];
+        splice(@_, 1, 1, $string);
     }
-    $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $octets;
+    splice(@_, 2, 1, $check);
+    goto &$encode;
 }
 *str2bytes = \&encode;
 
 sub decode($$;$) {
-    my ( $name, $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check ||= 0;
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $string;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $string = $enc->decode( $octets, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $string = $enc->decode( $octets, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $decode = $enc->can('decode');
+    Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $octets = $_[1];
+        splice(@_, 1, 1, $octets);
     }
-    $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    splice(@_, 2, 1, $check);
+    goto &$decode;
 }
 *bytes2str = \&decode;
 
 sub from_to($$$;$) {
     my ( $string, $from, $to, $check ) = @_;
-    return undef unless defined $string;
-    $check ||= 0;
+    Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
     my $f = find_encoding($from);
-    unless ( defined $f ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$from'");
-    }
+    Carp::croak("Unknown encoding '$from'") unless defined $f;
     my $t = find_encoding($to);
-    unless ( defined $t ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$to'");
-    }
-
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $uni;
-    if ( ref($f) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $uni = $f->decode($string);
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $uni = $f->decode($string);
-    }
-
-    if ( ref($t) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $_[0] = $string = $t->encode( $uni, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $_[0] = $string = $t->encode( $uni, $check );
-    }
-
+    Carp::croak("Unknown encoding '$to'") unless defined $t;
+    return undef unless defined $string;
+    $check ||= 0;
+    my $uni = $f->decode($string);
+    $_[0] = $string = $t->encode( $uni, $check );
     return undef if ( $check && length($uni) );
     return defined( $_[0] ) ? length($string) : undef;
 }
@@ -279,14 +225,9 @@ sub encode_utf8($) {
 my $utf8enc;
 
 sub decode_utf8($;$) {
-    my ( $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check   ||= 0;
     $utf8enc ||= find_encoding('utf8');
-    my $string = $utf8enc->decode( $octets, $check );
-    $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    unshift(@_, $utf8enc);
+    goto &{$utf8enc->can('decode')};
 }
 
 onBOOT;
diff --git a/cpan/Encode/Encode.xs b/cpan/Encode/Encode.xs
index 6c077be..c1222a1 100644
--- a/cpan/Encode/Encode.xs
+++ b/cpan/Encode/Encode.xs
@@ -250,7 +250,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                    (UV)ch, enc->name[0]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(aTHX_ packWARN(WARN_UTF8),
                 ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
         }
@@ -289,7 +289,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                               enc->name[0], (UV)s[slen]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(
             aTHX_ packWARN(WARN_UTF8),
             ERR_DECODE_NOMAP,
@@ -446,7 +446,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
     char esc[UTF8_MAXLEN * 6 + 1];
     STRLEN i;
 
-    if (SvROK(check_sv)) {
+    if (!SvOK(check_sv)) {
+	fallback_cb = &PL_sv_undef;
+	check = 0;
+    }
+    else if (SvROK(check_sv)) {
 	/* croak("UTF-8 decoder doesn't support callback CHECK"); */
 	fallback_cb = check_sv;
 	check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -520,7 +524,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
             else
                 Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             if (encode)
                 Perl_warner(aTHX_ packWARN(WARN_UTF8),
                             ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1064,6 +1068,7 @@ BOOT:
     newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
     newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
     newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+    newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
     newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
     newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
     newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/cpan/Encode/Encode/encode.h b/cpan/Encode/Encode/encode.h
index 5fbcf76..9d56483 100644
--- a/cpan/Encode/Encode/encode.h
+++ b/cpan/Encode/Encode/encode.h
@@ -94,6 +94,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_WARN_ON_ERR    0x0002 /* warn on error; may proceed */
 #define  ENCODE_RETURN_ON_ERR  0x0004 /* immediately returns on NOREP */
 #define  ENCODE_LEAVE_SRC      0x0008 /* $src updated unless set */
+#define  ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
 #define  ENCODE_PERLQQ         0x0100 /* perlqq fallback string */
 #define  ENCODE_HTMLCREF       0x0200 /* HTML character ref. fb mode */
 #define  ENCODE_XMLCREF        0x0400 /* XML  character ref. fb mode */
@@ -107,4 +108,6 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_FB_HTMLCREF    (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
 #define  ENCODE_FB_XMLCREF     (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
 
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+
 #endif /* ENCODE_H */
diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs
index b3b1d2f..500d065 100644
--- a/cpan/Encode/Unicode/Unicode.xs
+++ b/cpan/Encode/Unicode/Unicode.xs
@@ -20,11 +20,15 @@
 /* For pre-5.14 source compatibility */
 #ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
 #   define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-#   define UTF8_DISALLOW_SURROGATE 0
+#   define UNICODE_WARN_FE_FF 0
+#   define UNICODE_WARN_NONCHAR 0
+#   define UNICODE_WARN_SUPER 0
+#endif
+#ifndef UTF8_DISALLOW_ILLEGAL_INTERCHANGE
+#   define UTF8_DISALLOW_ILLEGAL_INTERCHANGE 0
 #   define UTF8_WARN_SURROGATE 0
-#   define UTF8_DISALLOW_FE_FF 0
-#   define UTF8_WARN_FE_FF 0
 #   define UTF8_WARN_NONCHAR 0
+#   define UTF8_WARN_SUPER 0
 #endif
 
 #define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -233,53 +237,69 @@ CODE:
 		ucs2 = SvTRUE(sv);
 	    }
 	    if (ucs2 || size == 4) {
-		if (check) {
+		if (check & ENCODE_DIE_ON_ERR) {
 		    croak("%" SVf ":no surrogates allowed %" UVxf,
 			  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
 			  ord);
 		}
+		if (encode_ckWARN(check, WARN_SURROGATE)) {
+		    warner(packWARN(WARN_SURROGATE),
+			  "%" SVf ":no surrogates allowed %" UVxf,
+			  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+			  ord);
+		}
 		ord = FBCHAR;
 	    }
 	    else {
 		UV lo;
 		if (!isHiSurrogate(ord)) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":Malformed HI surrogate %" UVxf,
 			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
 			      ord);
 		    }
-		    else {
-			ord = FBCHAR;
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+			      ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else if (s+size > e) {
-		    if (check) {
-		        if (check & ENCODE_STOP_AT_PARTIAL) {
-		             s -= size;
-		             break;
-		        }
-		        else {
-		             croak("%" SVf ":Malformed HI surrogate %" UVxf,
-				   *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-				   ord);
-		        }
+		    if (check & ENCODE_STOP_AT_PARTIAL) {
+		        s -= size;
+		        break;
 		    }
-		    else {
-		        ord = FBCHAR;
+		    if (check & ENCODE_DIE_ON_ERR) {
+			croak("%" SVf ":Malformed HI surrogate %" UVxf,
+			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+			      ord);
+		    }
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+			      ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else {
 		    lo = enc_unpack(aTHX_ &s,e,size,endian);
 		    if (!isLoSurrogate(lo)) {
-			if (check) {
+			if (check & ENCODE_DIE_ON_ERR) {
 			    croak("%" SVf ":Malformed LO surrogate %" UVxf,
 				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
 				  ord);
 			}
-			else {
-			    s -= size;
-			    ord = FBCHAR;
+			if (encode_ckWARN(check, WARN_SURROGATE)) {
+			    warner(packWARN(WARN_SURROGATE),
+				  "%" SVf ":Malformed LO surrogate %" UVxf,
+				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+				  ord);
 			}
+			s -= size;
+			ord = FBCHAR;
 		    }
 		    else {
 			ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +309,18 @@ CODE:
 	}
 
 	if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
-	    if (check) {
+	    if (check & ENCODE_DIE_ON_ERR) {
 		croak("%" SVf ":Unicode character %" UVxf " is illegal",
 		      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
 		      ord);
-	    } else {
-		ord = FBCHAR;
 	    }
+	    if (encode_ckWARN(check, WARN_NONCHAR)) {
+	        warner(packWARN(WARN_NONCHAR),
+		      "%" SVf ":Unicode character %" UVxf " is illegal",
+		      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+		      ord);
+	    }
+	    ord = FBCHAR;
 	}
 
 	if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -314,15 +339,22 @@ CODE:
 	    resultbuf = (U8 *) SvGROW(result, newlen);
 	    resultbuflen = SvLEN(result);
 	}
+	U32 flags = 0;
+	if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UNICODE_WARN_SUPER;
+	if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UNICODE_WARN_SURROGATE;
+	if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UNICODE_WARN_NONCHAR;
+	d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord, flags);
+
+	if (!d) {
+	    d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+	}
 
-	d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord,
-                                            UNICODE_WARN_ILLEGAL_INTERCHANGE);
 	SvCUR_set(result, d - (U8 *)SvPVX(result));
     }
 
     if (s < e) {
 	/* unlikely to happen because it's fixed-length -- dankogai */
-	if (check & ENCODE_WARN_ON_ERR) {
+	if (encode_ckWARN(check, WARN_UTF8)) {
 	    Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
 			*hv_fetch((HV *)SvRV(obj),"Name",4,0));
 	}
@@ -407,23 +439,29 @@ CODE:
     }
     while (s < e && s+UTF8SKIP(s) <= e) {
 	STRLEN len;
-	UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
-                                               |UTF8_WARN_SURROGATE
-                                               |UTF8_DISALLOW_FE_FF
-                                               |UTF8_WARN_FE_FF
-                                               |UTF8_WARN_NONCHAR));
-	s += len;
-	if (size != 4 && invalid_ucs2(ord)) {
+	U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE;
+	if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER;
+	if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE;
+	if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR;
+	UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
+	if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
 	    if (!issurrogate(ord)) {
 		if (ucs2 == -1) {
 		    SV *sv = attr("ucs2", 4);
 		    ucs2 = SvTRUE(sv);
 		}
 		if (ucs2 || ord > 0x10FFFF) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
 				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
 		    }
+		    if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+			warner(packWARN(WARN_NON_UNICODE),
+				  "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+		    }
+		    enc_pack(aTHX_ result,size,endian,FBCHAR);
+		} else if (ord == 0) {
 		    enc_pack(aTHX_ result,size,endian,FBCHAR);
 		} else {
 		    UV hi = ((ord - 0x10000) >> 10)   + 0xD800;
@@ -440,6 +478,7 @@ CODE:
 	else {
 	    enc_pack(aTHX_ result,size,endian,ord);
 	}
+	s += len;
     }
     if (s < e) {
 	/* UTF-8 partial char happens often on PerlIO.
diff --git a/cpan/Encode/t/decode.t b/cpan/Encode/t/decode.t
index 3995412..ce4caab 100644
--- a/cpan/Encode/t/decode.t
+++ b/cpan/Encode/t/decode.t
@@ -49,9 +49,12 @@ $orig = "\xC3\x80";
 $orig =~ /(..)/;
 is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
 
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+    skip "Bug in Perl: https://rt.perl.org/Public/Bug/Display.html?id=131263", 1;
+    $orig = "\xC3\x80";
+    *a = $orig;
+    is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
 
 $orig = "\N{U+C0}";
 $orig =~ /(.)/;
diff --git a/cpan/Encode/t/enc_eucjp.t b/cpan/Encode/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/cpan/Encode/t/enc_eucjp.t
+++ b/cpan/Encode/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
     }
 }
 
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
 no warnings "deprecated";
 use encoding 'euc-jp';
 
diff --git a/cpan/Encode/t/utf32warnings.t b/cpan/Encode/t/utf32warnings.t
new file mode 100644
index 0000000..cf3ab67
--- /dev/null
+++ b/cpan/Encode/t/utf32warnings.t
@@ -0,0 +1,114 @@
+use strict;
+use warnings;
+BEGIN {
+    if ($] < 5.014){
+        print "1..0 # Skip: Perl 5.14.0 or later required\n";
+        exit 0;
+    }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 15;
+
+my $valid   = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+    local $warn;
+    my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+    is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/is not Unicode.* at $script line /, "Calling decode on UTF32-LE encode object with valid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+    is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/is not Unicode.* at $script line /, "Calling Encode::decode for UTF32-LE with valid string warns");
+}
+
+
+{
+    local $warn;
+    no warnings 'utf8';
+    my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    ok(!defined $warn, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+    local $warn;
+    open my $fd, '<:encoding(UTF32-LE)', \$valid or die;
+    my $str = <$fd>;
+    ok(!defined $warn, "Calling PerlIO :encoding on valid string produces no warnings");
+    is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+    local $warn;
+    open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+    my $str = <$fd>;
+    like($warn, qr/is not Unicode.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+    my $str = <$fd>;
+    ok(!defined $warn, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+    my $str = <$fd>;
+    ok(!defined $warn, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/cpan/Encode/t/utf8messages.t b/cpan/Encode/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/cpan/Encode/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/cpan/Encode/t/utf8warnings.t b/cpan/Encode/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/cpan/Encode/t/utf8warnings.t
+++ b/cpan/Encode/t/utf8warnings.t
@@ -1,94 +1,35 @@
 use strict;
 use warnings;
-BEGIN {
-    if ($] < 5.014){
-        print "1..0 # Skip: Perl 5.14.0 or later required\n";
-        exit 0;
-    }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
 
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
 
-my $valid   = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
 
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
 
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $valid );
-    is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-- 
1.7.9.5

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2017

From @pali

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch
From e4db61d9ea0a5f5a58f07f4b5214b879f20e4cc2 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Sat, 1 Jul 2017 12:34:25 +0200
Subject: [PATCH 2/2] PerlIO::encoding: Use Encode::ONLY_PRAGMA_WARNINGS in
 fallback by default

This would enable to respect utf8 warnings enabled/disabled by pramga
warnings when processing filehandle with :encoding layer.
---
 ext/PerlIO-encoding/encoding.pm |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ext/PerlIO-encoding/encoding.pm b/ext/PerlIO-encoding/encoding.pm
index 08d2df4..9e59a3c 100644
--- a/ext/PerlIO-encoding/encoding.pm
+++ b/ext/PerlIO-encoding/encoding.pm
@@ -14,7 +14,7 @@ require XSLoader;
 XSLoader::load();
 
 our $fallback =
-    Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::STOP_AT_PARTIAL();
+    Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::ONLY_PRAGMA_WARNINGS()|Encode::STOP_AT_PARTIAL();
 
 1;
 __END__
-- 
1.7.9.5

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2017

From @cpansprout

On Sat, 01 Jul 2017 04​:46​:15 -0700, pali@​cpan.org wrote​:

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When
Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour.
Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

It think this is a great improvement. Thank you for all your work.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2017

From @pali

On Saturday 01 July 2017 06​:08​:08 Father Chrysostomos via RT wrote​:

On Sat, 01 Jul 2017 04​:46​:15 -0700, pali@​cpan.org wrote​:

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When
Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour.
Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

It think this is a great improvement. Thank you for all your work.

Hi! Any comments for this new proposed behavior or implementation in
patches? I would like to hear some feedback...

@p5pRT
Copy link
Author

p5pRT commented Aug 9, 2017

From @pali

On Saturday 01 July 2017 06​:08​:08 Father Chrysostomos via RT wrote​:

On Sat, 01 Jul 2017 04​:46​:15 -0700, pali@​cpan.org wrote​:

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When
Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour.
Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

It think this is a great improvement. Thank you for all your work.

@​Father Chrysostomos​: Now patches and there fore more then month.
Any comments for either new behavior or its implementation?

I would like to hear your opinion... or if it is OK without any problem
could it be merged?

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2017

From @Leont

On Sat, Jul 1, 2017 at 1​:46 PM, via RT <perlbug-followup@​perl.org> wrote​:

Hi!

I would continue in discussion started in last year in p5p mailing list​:
https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Currently there is big mess with reporting warnings from
Encode​::encode() and Encode​::decode() functions implemented by Encode
module which is part of Perl core modules.

Moreover mess is bigger, because Encode is used by PerlIO​::encoding
module which is internal Perl module.

In perl you can enable some utf8 warnings by pragma warning. Next for
Encode​::encode() and Encode​::decode() functions you can enable some
warnings by Encode​::ENCODE_WARN_ON_ERR check flag passed as optional
argument to those functions. And third you can control some warnings via
$PerlIO​::encoding​::fallback variable which is relevant for :encoding
layer (which internally use Encode module and their encode/decode
functions).

This lead to couple of bug reports like Encode​::encode() or decode()
does not respect pragma warnings, or it does not respect check flag
Encode​::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for
some modules (e.g. UTF-16) to use only pragma warnings, which basically
fully broke Encode​::ENCODE_WARN_ON_ERR check flag. More complicated
behaviour happened when using :encoding layer...

https://rt.cpan.org/Public/Bug/Display.html?id=120505
https://rt.cpan.org/Public/Bug/Display.html?id=88592
dankogai/p5-encode#26 (comment)
https://rt.perl.org/Public/Bug/Display.html?id=128788
https://rt.cpan.org/Public/Bug/Display.html?id=116629
dankogai/p5-encode#59
https://github.com/dankogai/p5-encode/commit/
a6c2ba385875c2c03bd42350e23aef0188fb23b0
https://github.com/dankogai/p5-encode/commit/
07c8adb58e55c7cf66b3d6673bf50010fe1a69ea

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour. Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

IMHO $PerlIO​::encoding​::fallback is a liability, and instead of digging a
deeper hole we should probably fix it first. In particular​:

  local $PerlIO​::encoding​::fallback = FB_CROAK;
  open my $fh, '<​:encoding(utf-8)', $filename

May or may not DWIM depending on whether PerlIO​::encoding was already
loaded or not. Relying even further on it having a non-zero value doesn't
feel wise to me.

Leon

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2017

From @pali

On Thursday 10 August 2017 07​:58​:22 Leon Timmermans via RT wrote​:

On Sat, Jul 1, 2017 at 1​:46 PM, via RT <perlbug-followup@​perl.org> wrote​:

Hi!

I would continue in discussion started in last year in p5p mailing list​:
https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Currently there is big mess with reporting warnings from
Encode​::encode() and Encode​::decode() functions implemented by Encode
module which is part of Perl core modules.

Moreover mess is bigger, because Encode is used by PerlIO​::encoding
module which is internal Perl module.

In perl you can enable some utf8 warnings by pragma warning. Next for
Encode​::encode() and Encode​::decode() functions you can enable some
warnings by Encode​::ENCODE_WARN_ON_ERR check flag passed as optional
argument to those functions. And third you can control some warnings via
$PerlIO​::encoding​::fallback variable which is relevant for :encoding
layer (which internally use Encode module and their encode/decode
functions).

This lead to couple of bug reports like Encode​::encode() or decode()
does not respect pragma warnings, or it does not respect check flag
Encode​::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for
some modules (e.g. UTF-16) to use only pragma warnings, which basically
fully broke Encode​::ENCODE_WARN_ON_ERR check flag. More complicated
behaviour happened when using :encoding layer...

https://rt.cpan.org/Public/Bug/Display.html?id=120505
https://rt.cpan.org/Public/Bug/Display.html?id=88592
dankogai/p5-encode#26 (comment)
https://rt.perl.org/Public/Bug/Display.html?id=128788
https://rt.cpan.org/Public/Bug/Display.html?id=116629
dankogai/p5-encode#59
https://github.com/dankogai/p5-encode/commit/
a6c2ba385875c2c03bd42350e23aef0188fb23b0
https://github.com/dankogai/p5-encode/commit/
07c8adb58e55c7cf66b3d6673bf50010fe1a69ea

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour. Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

IMHO $PerlIO​::encoding​::fallback is a liability, and instead of digging a
deeper hole we should probably fix it first. In particular​:

local $PerlIO​::encoding​::fallback = FB_CROAK;
open my $fh, '<​:encoding(utf-8)', $filename

IMO this is different problem, not related to my proposed patches, which
are about how to handle warnings.

May or may not DWIM depending on whether PerlIO​::encoding was already
loaded or not. Relying even further on it having a non-zero value doesn't
feel wise to me.

I agree that above :encoding layer without STOP AT PARTIAL flag in
fallback should be fixed, but I think all those can be done
independently of this effort about warnings done in this ticket.

@p5pRT
Copy link
Author

p5pRT commented Aug 21, 2017

From @pali

On Monday 14 August 2017 10​:19​:58 pali@​cpan.org wrote​:

On Thursday 10 August 2017 07​:58​:22 Leon Timmermans via RT wrote​:

On Sat, Jul 1, 2017 at 1​:46 PM, via RT <perlbug-followup@​perl.org> wrote​:

Hi!

I would continue in discussion started in last year in p5p mailing list​:
https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Currently there is big mess with reporting warnings from
Encode​::encode() and Encode​::decode() functions implemented by Encode
module which is part of Perl core modules.

Moreover mess is bigger, because Encode is used by PerlIO​::encoding
module which is internal Perl module.

In perl you can enable some utf8 warnings by pragma warning. Next for
Encode​::encode() and Encode​::decode() functions you can enable some
warnings by Encode​::ENCODE_WARN_ON_ERR check flag passed as optional
argument to those functions. And third you can control some warnings via
$PerlIO​::encoding​::fallback variable which is relevant for :encoding
layer (which internally use Encode module and their encode/decode
functions).

This lead to couple of bug reports like Encode​::encode() or decode()
does not respect pragma warnings, or it does not respect check flag
Encode​::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for
some modules (e.g. UTF-16) to use only pragma warnings, which basically
fully broke Encode​::ENCODE_WARN_ON_ERR check flag. More complicated
behaviour happened when using :encoding layer...

https://rt.cpan.org/Public/Bug/Display.html?id=120505
https://rt.cpan.org/Public/Bug/Display.html?id=88592
dankogai/p5-encode#26 (comment)
https://rt.perl.org/Public/Bug/Display.html?id=128788
https://rt.cpan.org/Public/Bug/Display.html?id=116629
dankogai/p5-encode#59
https://github.com/dankogai/p5-encode/commit/
a6c2ba385875c2c03bd42350e23aef0188fb23b0
https://github.com/dankogai/p5-encode/commit/
07c8adb58e55c7cf66b3d6673bf50010fe1a69ea

As stated in previous discussion I'm proposing new behaviour​:

* Introduce new Encode check flag Encode​::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When Encode​::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode​::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode​::ENCODE_WARN_ON_ERR is not set.

* Add Encode​::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO​::encoding​::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour. Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

IMHO $PerlIO​::encoding​::fallback is a liability, and instead of digging a
deeper hole we should probably fix it first. In particular​:

local $PerlIO​::encoding​::fallback = FB_CROAK;
open my $fh, '<​:encoding(utf-8)', $filename

IMO this is different problem, not related to my proposed patches, which
are about how to handle warnings.

May or may not DWIM depending on whether PerlIO​::encoding was already
loaded or not. Relying even further on it having a non-zero value doesn't
feel wise to me.

I agree that above :encoding layer without STOP AT PARTIAL flag in
fallback should be fixed, but I think all those can be done
independently of this effort about warnings done in this ticket.

And the goal of those changes is how Encode handle warnings. PerlIO is
related just because $PerlIO​::encoding​::fallback is affected by Encode
changes.

I would like to move forward and would like to hear if those Encode
changes together with extending Encode flags and default value for
$PerlIO​::encoding​::fallback are OK, or if changes needs to be reworked
... or if whole idea for fixing those problems is wrong.

@p5pRT
Copy link
Author

p5pRT commented Aug 28, 2017

From @tonycoz

On Mon, 21 Aug 2017 06​:25​:01 -0700, pali@​cpan.org wrote​:

And the goal of those changes is how Encode handle warnings. PerlIO is
related just because $PerlIO​::encoding​::fallback is affected by Encode
changes.

I would like to move forward and would like to hear if those Encode
changes together with extending Encode flags and default value for
$PerlIO​::encoding​::fallback are OK, or if changes needs to be reworked
... or if whole idea for fixing those problems is wrong.

Most of this could be fixed by PerlIO​::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true.

The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through.

But a lot of that effort is wasted, for example​:

@​@​ -407,23 +439,29 @​@​ CODE​:
  }
  while (s < e && s+UTF8SKIP(s) <= e) {
  STRLEN len;
- UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
- |UTF8_WARN_SURROGATE
- |UTF8_DISALLOW_FE_FF
- |UTF8_WARN_FE_FF
- |UTF8_WARN_NONCHAR));
- s += len;
- if (size != 4 && invalid_ucs2(ord)) {
+ U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE;
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER;
+ if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE;
+ if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR;
+ UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
+ if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {

utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed.

The same is true for the calls to uvuni_to_utf8_flags().

In another case you're adding a completely new warning​:

+ if (encode_ckWARN(check, WARN_NONCHAR)) {
+ warner(packWARN(WARN_NONCHAR),
+ "%" SVf "​:Unicode character %" UVxf " is illegal",
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
+ }
+ ord = FBCHAR;
  }

which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode().

Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.)

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 28, 2017

From @pali

On Sunday 27 August 2017 18​:38​:15 Tony Cook via RT wrote​:

On Mon, 21 Aug 2017 06​:25​:01 -0700, pali@​cpan.org wrote​:

And the goal of those changes is how Encode handle warnings. PerlIO is
related just because $PerlIO​::encoding​::fallback is affected by Encode
changes.

I would like to move forward and would like to hear if those Encode
changes together with extending Encode flags and default value for
$PerlIO​::encoding​::fallback are OK, or if changes needs to be reworked
... or if whole idea for fixing those problems is wrong.

Most of this could be fixed by PerlIO​::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true.

Seems yes.

The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through.

Right, with my approach for Encode​::ONLY_PRAGMA_WARNINGS, warnings would
be enabled/disabled according to warnings pragma. Like in other parts of
perl.

But a lot of that effort is wasted, for example​:

@​@​ -407,23 +439,29 @​@​ CODE​:
}
while (s < e && s+UTF8SKIP(s) <= e) {
STRLEN len;
- UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
- |UTF8_WARN_SURROGATE
- |UTF8_DISALLOW_FE_FF
- |UTF8_WARN_FE_FF
- |UTF8_WARN_NONCHAR));
- s += len;
- if (size != 4 && invalid_ucs2(ord)) {
+ U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE;
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER;
+ if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE;
+ if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR;
+ UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
+ if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {

utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed.

The same is true for the calls to uvuni_to_utf8_flags().

In another case you're adding a completely new warning​:

+ if (encode_ckWARN(check, WARN_NONCHAR)) {
+ warner(packWARN(WARN_NONCHAR),
+ "%" SVf "​:Unicode character %" UVxf " is illegal",
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
+ }
+ ord = FBCHAR;
}

which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode().

All above warning should be sent when user calls Encode with FB_WARN
bit, independently of lexical warnings. This is how all other Encode
module works, also in this way is Encode API designed and documented.

Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.)

Fixing this bug could be mean as backward incompatible. But current
behaviour does not make sense as it is not possible to correctly enable
(or disable) warnings for Encode module.

Basically above code which you quote is implementation of UTF-16
encoding. Code for UTF-8 or Latin-X is in different module. So there is
a big inconsistency between UTF-8 and UTF-16 and so it is hard to use.

Common operation is​: I got binary data from 3rd module (or network) and
it was told me that those data are UTF-16 encoded. I do not know if they
are really UTF-16 encoded, so I would run Encode module to silently
(without warnings) decodes them. If I need to be sure, I can run Encode
with FB_CROAK and handle exceptions...

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2017

From @tonycoz

From a perl build point of view there's a couple of issues​:

a) the MANIFEST is missing the new file and still contains the deleted file.

b) there's two compilation errors in the build​:

Unicode.xs​: In function ‘XS_Encode__Unicode_decode’​:
Unicode.xs​:342​:2​: error​: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  U32 flags = 0;
  ^
Unicode.xs​: In function ‘XS_Encode__Unicode_encode’​:
Unicode.xs​:446​:2​: error​: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
  UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
  ^

On Mon, 28 Aug 2017 00​:50​:49 -0700, pali@​cpan.org wrote​:

All above warning should be sent when user calls Encode with FB_WARN
bit, independently of lexical warnings. This is how all other Encode
module works, also in this way is Encode API designed and documented.

The problem is your patch makes these warnings lexically scoped with *only* FB_WARN​:

$ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode​::FB_WARN); print "no croak"'
no croak

$ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode​::FB_WARN); print "no croak"'
UTF-16 surrogate U+DC10 in goto at -e line 1.
no croak

The utf8n_to_uvuni() function *already* checks that these warnings are lexically enabled. Also, it only checks them when it actually finds the problem being reported.

Your code however calculates flags whether or not there's an error, and currently does it on every character encoded. When ONLY_PRAGMA_WARNINGS is set this results in three calls to Perl_ckwarn() inside the encoding loop. (This code should be using ckWARN_d() since these are default on warnings, but it should just set them all and let the API do the checking it does anyway.)

An issue with changes to Encode​::encode() as you might see above is the opaqueness of the warning​:

  UTF-16 surrogate U+DC10 in goto at -e line 1.

My code doesn't contain a goto. I think the only real fix to this would be to rewrite Encode​::encode/decode in C.

Tony

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2017

From @pali

On Tuesday 29 August 2017 18​:46​:43 Tony Cook via RT wrote​:

From a perl build point of view there's a couple of issues​:

a) the MANIFEST is missing the new file and still contains the deleted file.

That is something which would be fixed when I start preparing pull
request to upstream Encode.

b) there's two compilation errors in the build​:

Unicode.xs​: In function ‘XS_Encode__Unicode_decode’​:
Unicode.xs​:342​:2​: error​: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
U32 flags = 0;
^
Unicode.xs​: In function ‘XS_Encode__Unicode_encode’​:
Unicode.xs​:446​:2​: error​: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
^

Ok, that can be fixed.

On Mon, 28 Aug 2017 00​:50​:49 -0700, pali@​cpan.org wrote​:

All above warning should be sent when user calls Encode with FB_WARN
bit, independently of lexical warnings. This is how all other Encode
module works, also in this way is Encode API designed and documented.

The problem is your patch makes these warnings lexically scoped with *only* FB_WARN​:

$ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode​::FB_WARN); print "no croak"'
no croak

$ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode​::FB_WARN); print "no croak"'
UTF-16 surrogate U+DC10 in goto at -e line 1.
no croak

The utf8n_to_uvuni() function *already* checks that these warnings are lexically enabled. Also, it only checks them when it actually finds the problem being reported.

Ah :-( So we need utf8n_to_uvuni() function which report warnings even
when they are lexically disabled.

Your code however calculates flags whether or not there's an error, and currently does it on every character encoded. When ONLY_PRAGMA_WARNINGS is set this results in three calls to Perl_ckwarn() inside the encoding loop. (This code should be using ckWARN_d() since these are default on warnings, but it should just set them all and let the API do the checking it does anyway.)

An issue with changes to Encode​::encode() as you might see above is the opaqueness of the warning​:

UTF-16 surrogate U+DC10 in goto at -e line 1.

Yes, this is something which needs to be fixed.

My code doesn't contain a goto. I think the only real fix to this would be to rewrite Encode​::encode/decode in C.

Do you have your implementation? At least I do not see any other patch
in this ticket.

I used goto in Encode​::encode/decode dispatcher functions, so warnings
from modules would be correctly propagated to caller of the
Encode​::encode/decode function.

It is possible to reimplement Encode​::encode/decode function in C/XS in
same way how it is implemented in my patch, but without that warning
from goto?

@p5pRT
Copy link
Author

p5pRT commented Sep 11, 2017

From @tonycoz

On Wed, 30 Aug 2017 02​:22​:26 -0700, pali@​cpan.org wrote​:

On Tuesday 29 August 2017 18​:46​:43 Tony Cook via RT wrote​:

b) there's two compilation errors in the build​:

Unicode.xs​: In function ‘XS_Encode__Unicode_decode’​:
Unicode.xs​:342​:2​: error​: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
U32 flags = 0;
^
Unicode.xs​: In function ‘XS_Encode__Unicode_encode’​:
Unicode.xs​:446​:2​: error​: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
^

Ok, that can be fixed.

On Mon, 28 Aug 2017 00​:50​:49 -0700, pali@​cpan.org wrote​:

All above warning should be sent when user calls Encode with
FB_WARN
bit, independently of lexical warnings. This is how all other
Encode
module works, also in this way is Encode API designed and
documented.

The problem is your patch makes these warnings lexically scoped with
*only* FB_WARN​:

$ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x =
encode("utf-16", (my $y = chr(0xdc10)), &Encode​::FB_WARN); print "no
croak"'
no croak

$ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y =
chr(0xdc10)), &Encode​::FB_WARN); print "no croak"'
UTF-16 surrogate U+DC10 in goto at -e line 1.
no croak

The utf8n_to_uvuni() function *already* checks that these warnings
are lexically enabled. Also, it only checks them when it actually
finds the problem being reported.

Ah :-( So we need utf8n_to_uvuni() function which report warnings even
when they are lexically disabled.

Your code however calculates flags whether or not there's an error,
and currently does it on every character encoded. When
ONLY_PRAGMA_WARNINGS is set this results in three calls to
Perl_ckwarn() inside the encoding loop. (This code should be using
ckWARN_d() since these are default on warnings, but it should just
set them all and let the API do the checking it does anyway.)

An issue with changes to Encode​::encode() as you might see above is
the opaqueness of the warning​:

UTF-16 surrogate U+DC10 in goto at -e line 1.

Yes, this is something which needs to be fixed.

My code doesn't contain a goto. I think the only real fix to this
would be to rewrite Encode​::encode/decode in C.

Do you have your implementation? At least I do not see any other patch
in this ticket.

The "My code" I was referring to is the one-liner above. I meant it would confuse the user whose code doesn't include a goto (just a call to encode/decode()).

I used goto in Encode​::encode/decode dispatcher functions, so warnings
from modules would be correctly propagated to caller of the
Encode​::encode/decode function.

I guessed that, but the existing encode/decode functions already has code that handles that​:

  ...
  my $octets;
  if ( ref($enc) eq 'Encode​::Unicode' ) {
  my $warn = '';
  {
  local $SIG{__WARN__} = sub { $warn = shift };
  $octets = $enc->encode( $string, $check );
  }
  warnings​::warnif('utf8', $warn) if length $warn;
  }
  ...

It is possible to reimplement Encode​::encode/decode function in C/XS
in
same way how it is implemented in my patch, but without that warning
from goto?

The most complex part would probably be the call to find_encoding(), the rest I think is relatively simple.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 11, 2017

From @pali

On Sunday 10 September 2017 17​:01​:40 Tony Cook via RT wrote​:

The "My code" I was referring to is the one-liner above. I meant it would confuse the user whose code doesn't include a goto (just a call to encode/decode()).

So, you are referring to that unexpected warning with "goto" in its
message, right? Therefore I asked if it is possible to reimplement that
function in XS without possibility that such warning would be thrown.

I used goto in Encode​::encode/decode dispatcher functions, so warnings
from modules would be correctly propagated to caller of the
Encode​::encode/decode function.

I guessed that, but the existing encode/decode functions already has code that handles that​:

\.\.\.
my $octets;
if \( ref\($enc\) eq 'Encode&#8203;::Unicode' \) \{
    my $warn = '';
    \{
        local $SIG\{\_\_WARN\_\_\} = sub \{ $warn = shift \};
        $octets = $enc\->encode\( $string\, $check \);
    \}
    warnings&#8203;::warnif\('utf8'\, $warn\) if length $warn;
\}
\.\.\.

Yes, but that code is wrong for more reasons.

E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505
Also because it catch only 'utf8' warnings and also because it check
only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X.

My approach try to use goto to run encode function in current context,
so warning/error messages would be properly propagated to caller.

And introduce Encode​::ENCODE_WARN_ON_ERR, so above bug 120505 can be
fixed too.

It is possible to reimplement Encode​::encode/decode function in C/XS
in
same way how it is implemented in my patch, but without that warning
from goto?

The most complex part would probably be the call to find_encoding(), the rest I think is relatively simple.

Can you show me simple example how to write that "goto" part?

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @tonycoz

On Mon, 11 Sep 2017 00​:24​:32 -0700, pali@​cpan.org wrote​:

On Sunday 10 September 2017 17​:01​:40 Tony Cook via RT wrote​:

The "My code" I was referring to is the one-liner above. I meant it
would confuse the user whose code doesn't include a goto (just a call
to encode/decode()).

So, you are referring to that unexpected warning with "goto" in its
message, right? Therefore I asked if it is possible to reimplement
that
function in XS without possibility that such warning would be thrown.

The warning would only be thrown if the code was called where that warning is enabled.

I used goto in Encode​::encode/decode dispatcher functions, so
warnings
from modules would be correctly propagated to caller of the
Encode​::encode/decode function.

I guessed that, but the existing encode/decode functions already has
code that handles that​:

...
my $octets;
if ( ref($enc) eq 'Encode​::Unicode' ) {
my $warn = '';
{
local $SIG{__WARN__} = sub { $warn = shift };
$octets = $enc->encode( $string, $check );
}
warnings​::warnif('utf8', $warn) if length $warn;
}
...

Yes, but that code is wrong for more reasons.

E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505
Also because it catch only 'utf8' warnings and also because it check
only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X.

My approach try to use goto to run encode function in current context,
so warning/error messages would be properly propagated to caller.

Yes, that would no longer be necessary if the functions were written as XS.

And introduce Encode​::ENCODE_WARN_ON_ERR, so above bug 120505 can be
fixed too.

Did you mean ONLY_PRAGMA_WARNINGS here?

One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.

It is possible to reimplement Encode​::encode/decode function in
C/XS
in
same way how it is implemented in my patch, but without that
warning
from goto?

The most complex part would probably be the call to find_encoding(),
the rest I think is relatively simple.

Can you show me simple example how to write that "goto" part?

There's no goto involved, the XS simply doesn't have a new context to control the warnings, so the caller's warnings flags are what matter.

Tony

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @khwilliamson

On 09/11/2017 06​:13 PM, Tony Cook via RT wrote​:

One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.

Note that there is a new API in 5.26 which allows you to get a bit field
returned with a bit set for every error encountered in the input character.

As it says in the docs​:

To do your own error handling, call this function with the
C<UTF8_CHECK_ONLY>
flag to suppress any warnings, and then examine the C<*errors> return.

  UV utf8n_to_uvchr_error(const U8 *s, STRLEN curlen,
  STRLEN *retlen,
  const U32 flags,
  U32 * errors)

If we get Devel​::PPPort maintained, I had hoped to put it there.

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @pali

On Monday 11 September 2017 17​:13​:17 Tony Cook via RT wrote​:

On Mon, 11 Sep 2017 00​:24​:32 -0700, pali@​cpan.org wrote​:

On Sunday 10 September 2017 17​:01​:40 Tony Cook via RT wrote​:

The "My code" I was referring to is the one-liner above. I meant it
would confuse the user whose code doesn't include a goto (just a call
to encode/decode()).

So, you are referring to that unexpected warning with "goto" in its
message, right? Therefore I asked if it is possible to reimplement
that
function in XS without possibility that such warning would be thrown.

The warning would only be thrown if the code was called where that warning is enabled.

I used goto in Encode​::encode/decode dispatcher functions, so
warnings
from modules would be correctly propagated to caller of the
Encode​::encode/decode function.

I guessed that, but the existing encode/decode functions already has
code that handles that​:

...
my $octets;
if ( ref($enc) eq 'Encode​::Unicode' ) {
my $warn = '';
{
local $SIG{__WARN__} = sub { $warn = shift };
$octets = $enc->encode( $string, $check );
}
warnings​::warnif('utf8', $warn) if length $warn;
}
...

Yes, but that code is wrong for more reasons.

E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505
Also because it catch only 'utf8' warnings and also because it check
only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X.

My approach try to use goto to run encode function in current context,
so warning/error messages would be properly propagated to caller.

Yes, that would no longer be necessary if the functions were written as XS.

Exactly.

And introduce Encode​::ENCODE_WARN_ON_ERR, so above bug 120505 can be
fixed too.

Did you mean ONLY_PRAGMA_WARNINGS here?

Right, I did copy+paste error.

One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.

In XS, context warnings could be temporary changed.

It is possible to reimplement Encode​::encode/decode function in
C/XS
in
same way how it is implemented in my patch, but without that
warning
from goto?

The most complex part would probably be the call to find_encoding(),
the rest I think is relatively simple.

Can you show me simple example how to write that "goto" part?

There's no goto involved, the XS simply doesn't have a new context to control the warnings, so the caller's warnings flags are what matter.

Ok, then it should be just call_sv() instead of perl's goto, right?

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @pali

On Monday 11 September 2017 18​:24​:43 karl williamson via RT wrote​:

On 09/11/2017 06​:13 PM, Tony Cook via RT wrote​:

One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.

Note that there is a new API in 5.26 which allows you to get a bit field
returned with a bit set for every error encountered in the input character.

As it says in the docs​:

To do your own error handling, call this function with the
C<UTF8_CHECK_ONLY>
flag to suppress any warnings, and then examine the C<*errors> return.

UV    utf8n\_to\_uvchr\_error\(const U8 \*s\, STRLEN curlen\,
                         STRLEN \*retlen\,
                         const U32 flags\,
                         U32 \* errors\)

That would really help! Thanks for pointer.

If we get Devel​::PPPort maintained, I had hoped to put it there.

But Devel​::PPPort is unmaintained... What about taking it into p5p? Or
forking it?

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @pali

In attachment you can find patch with rewritten encode/decode/from_to
functions. Seems it was not a big problem.

With this patch there is no goto in warning messages...

$ perl -Iblib/lib -Iblib/arch -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)),
&Encode​::FB_WARN); print "no croak"'

UTF-16 surrogate U+DC10 in subroutine entry at -e line 1.
no croak

@p5pRT
Copy link
Author

p5pRT commented Sep 12, 2017

From @pali

0003-Rewrite-encode-decode-encode_utf8-decode_utf8-and-fr.patch
From 8e44ad64885c76f49e155ddb8cf6c5fcb5a2a011 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Wed, 13 Sep 2017 00:30:29 +0200
Subject: [PATCH] Rewrite encode, decode, encode_utf8, decode_utf8 and from_to
 functions to XS

---
 Encode.pm |   68 ---------------------
 Encode.xs |  196 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+), 68 deletions(-)

diff --git a/Encode.pm b/Encode.pm
index ce30bd5..6ed4a77 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -171,74 +171,6 @@ sub clone_encoding($) {
     return Storable::dclone($obj);
 }
 
-sub encode($$;$) {
-    my $name = $_[0];
-    my $check = $_[2];
-    Carp::croak("Encoding name should not be undef") unless defined $name;
-    my $enc = find_encoding($name);
-    Carp::croak("Unknown encoding '$name'") unless defined $enc;
-    my $encode = $enc->can('encode');
-    Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
-    $check ||= 0;
-    splice(@_, 0, 1, $enc);
-    if (ref $check or !$check or ($check & LEAVE_SRC)) {
-        my $string = $_[1];
-        splice(@_, 1, 1, $string);
-    }
-    splice(@_, 2, 1, $check);
-    goto &$encode;
-}
-*str2bytes = \&encode;
-
-sub decode($$;$) {
-    my $name = $_[0];
-    my $check = $_[2];
-    Carp::croak("Encoding name should not be undef") unless defined $name;
-    my $enc = find_encoding($name);
-    Carp::croak("Unknown encoding '$name'") unless defined $enc;
-    my $decode = $enc->can('decode');
-    Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
-    $check ||= 0;
-    splice(@_, 0, 1, $enc);
-    if (ref $check or !$check or ($check & LEAVE_SRC)) {
-        my $octets = $_[1];
-        splice(@_, 1, 1, $octets);
-    }
-    splice(@_, 2, 1, $check);
-    goto &$decode;
-}
-*bytes2str = \&decode;
-
-sub from_to($$$;$) {
-    my ( $string, $from, $to, $check ) = @_;
-    Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
-    my $f = find_encoding($from);
-    Carp::croak("Unknown encoding '$from'") unless defined $f;
-    my $t = find_encoding($to);
-    Carp::croak("Unknown encoding '$to'") unless defined $t;
-    return undef unless defined $string;
-    $check ||= 0;
-    my $uni = $f->decode($string);
-    $_[0] = $string = $t->encode( $uni, $check );
-    return undef if ( $check && length($uni) );
-    return defined( $_[0] ) ? length($string) : undef;
-}
-
-sub encode_utf8($) {
-    my ($str) = @_;
-    return undef unless defined $str;
-    utf8::encode($str);
-    return $str;
-}
-
-my $utf8enc;
-
-sub decode_utf8($;$) {
-    $utf8enc ||= find_encoding('utf8');
-    unshift(@_, $utf8enc);
-    goto &{$utf8enc->can('decode')};
-}
-
 onBOOT;
 
 if ($ON_EBCDIC) {
diff --git a/Encode.xs b/Encode.xs
index c1222a1..2cd6f4b 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -35,6 +35,14 @@ UNIMPLEMENTED(_encoded_bytes_to_utf8, I32)
 #define SvIV_nomg SvIV
 #endif
 
+#ifndef SvTRUE_nomg
+#define SvTRUE_nomg SvTRUE
+#endif
+
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
+#endif
+
 static void
 Encode_XSEncoding(pTHX_ encode_t * enc)
 {
@@ -589,6 +597,83 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
     return s;
 }
 
+static SV *
+find_encoding(pTHX_ SV *enc)
+{
+    dSP;
+    I32 count;
+    SV *m_enc;
+    SV *obj = &PL_sv_undef;
+#ifndef SV_NOSTEAL
+    U32 tmp;
+#endif
+
+    ENTER;
+    SAVETMPS;
+    PUSHMARK(sp);
+
+    m_enc = sv_newmortal();
+#ifndef SV_NOSTEAL
+    tmp = SvFLAGS(enc) & SVs_TEMP;
+    SvTEMP_off(enc);
+    sv_setsv_flags(m_enc, enc, 0);
+    SvFLAGS(enc) |= tmp;
+#else
+    sv_setsv_flags(m_enc, enc, SV_NOSTEAL);
+#endif
+    XPUSHs(m_enc);
+
+    PUTBACK;
+
+    count = call_pv("Encode::find_encoding", G_SCALAR);
+
+    SPAGAIN;
+
+    if (count > 0) {
+        obj = POPs;
+        SvREFCNT_inc(obj);
+    }
+
+    PUTBACK;
+    FREETMPS;
+    LEAVE;
+    return sv_2mortal(obj);
+}
+
+static SV *
+call_encoding(pTHX_ const char *method, SV *obj, SV *src, SV *check)
+{
+    dSP;
+    I32 count;
+    SV *dst = &PL_sv_undef;
+
+    PUSHMARK(sp);
+
+    if (check)
+        check = sv_2mortal(newSVsv(check));
+
+    if (!check || SvROK(check) || !SvTRUE_nomg(check) || (SvIV_nomg(check) & ENCODE_LEAVE_SRC))
+        src = sv_2mortal(newSVsv(src));
+
+    XPUSHs(obj);
+    XPUSHs(src);
+    XPUSHs(check ? check : &PL_sv_no);
+
+    PUTBACK;
+
+    count = call_method(method, G_SCALAR);
+
+    SPAGAIN;
+
+    if (count > 0) {
+        dst = POPs;
+        SvREFCNT_inc(dst);
+    }
+
+    PUTBACK;
+    return dst;
+}
+
 
 MODULE = Encode		PACKAGE = Encode::utf8	PREFIX = Method_
 
@@ -1054,6 +1139,117 @@ CODE:
 OUTPUT:
     RETVAL
 
+SV *
+decode(encoding, octets, check = NULL)
+SV *	encoding
+SV *	octets
+SV *	check
+ALIAS:
+    bytes2str = 0
+PREINIT:
+    SV *obj;
+INIT:
+    SvGETMAGIC(encoding);
+CODE:
+    if (!SvOK(encoding))
+        croak("Encoding name should not be undef");
+    obj = find_encoding(aTHX_ encoding);
+    if (!SvOK(obj))
+        croak("Unknown encoding '%" SVf "'", SVfARG(encoding));
+    RETVAL = call_encoding(aTHX_ "decode", obj, octets, check);
+OUTPUT:
+    RETVAL
+
+SV *
+encode(encoding, string, check = NULL)
+SV *	encoding
+SV *	string
+SV *	check
+ALIAS:
+    str2bytes = 0
+PREINIT:
+    SV *obj;
+INIT:
+    SvGETMAGIC(encoding);
+CODE:
+    if (!SvOK(encoding))
+        croak("Encoding name should not be undef");
+    obj = find_encoding(aTHX_ encoding);
+    if (!SvOK(obj))
+        croak("Unknown encoding '%" SVf "'", SVfARG(encoding));
+    RETVAL = call_encoding(aTHX_ "encode", obj, string, check);
+OUTPUT:
+    RETVAL
+
+SV *
+decode_utf8(octets, check = NULL)
+SV *	octets
+SV *	check
+PREINIT:
+    HV *hv;
+    SV **sv;
+CODE:
+    hv = get_hv("Encode::Encoding", 0);
+    if (!hv)
+        croak("utf8 encoding was not found");
+    sv = hv_fetch(hv, "utf8", 4, 0);
+    if (!sv || !*sv || !SvOK(*sv))
+        croak("utf8 encoding was not found");
+    RETVAL = call_encoding(aTHX_ "decode", *sv, octets, check);
+OUTPUT:
+    RETVAL
+
+SV *
+encode_utf8(string)
+SV *	string
+CODE:
+    RETVAL = newSVsv(string);
+    if (SvOK(RETVAL))
+        sv_utf8_encode(RETVAL);
+OUTPUT:
+    RETVAL
+
+SV *
+from_to(octets, from, to, check = NULL)
+SV *	octets
+SV *	from
+SV *	to
+SV *	check
+PREINIT:
+    SV *from_obj;
+    SV *to_obj;
+    SV *string;
+    SV *new_octets;
+    U8 *ptr;
+    STRLEN len;
+INIT:
+    SvGETMAGIC(from);
+    SvGETMAGIC(to);
+CODE:
+    if (!SvOK(from) || !SvOK(to))
+        croak("Encoding name should not be undef");
+    from_obj = find_encoding(aTHX_ from);
+    if (!SvOK(from_obj))
+        croak("Unknown encoding '%" SVf "'", SVfARG(from));
+    to_obj = find_encoding(aTHX_ to);
+    if (!SvOK(to_obj))
+        croak("Unknown encoding '%" SVf "'", SVfARG(to));
+    string = sv_2mortal(call_encoding(aTHX_ "decode", from_obj, octets, NULL));
+    new_octets = sv_2mortal(call_encoding(aTHX_ "encode", to_obj, string, check));
+    SvGETMAGIC(new_octets);
+    if (SvOK(new_octets) && (!check || SvROK(check) || !SvTRUE_nomg(check) || sv_len(string) == 0)) {
+        ptr = (U8 *)SvPV_nomg(new_octets, len);
+        if (SvUTF8(new_octets))
+            len = utf8_length(ptr, ptr+len);
+        RETVAL = newSVuv(len);
+    } else {
+        RETVAL = &PL_sv_undef;
+    }
+    sv_setsv_nomg(octets, new_octets);
+    SvSETMAGIC(octets);
+OUTPUT:
+    RETVAL
+
 void
 onBOOT()
 CODE:
-- 
1.7.9.5

@p5pRT
Copy link
Author

p5pRT commented Sep 13, 2017

From @tonycoz

On Tue, 12 Sep 2017 16​:06​:31 -0700, pali@​cpan.org wrote​:

In attachment you can find patch with rewritten encode/decode/from_to
functions. Seems it was not a big problem.

With this patch there is no goto in warning messages...

$ perl -Iblib/lib -Iblib/arch -MEncode -le 'my $x = encode("utf-16",
(my $y = chr(0xdc10)),
&Encode​::FB_WARN); print "no croak"'

UTF-16 surrogate U+DC10 in subroutine entry at -e line 1.
no croak

That looks sane to me.

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2018

From @pali

In attachment is a v2 version of Encode patch which handle warnings of
Encode​::Unicode module. For now it needs to be build against
smoke-me/khw-encode branch which contains new functions
uvchr_to_utf8_flags_msgs() and utf8n_to_uvchr_msgs().

Hopefully now it should be complete, just there is missing support for
perl versions which do not provide above two functions.

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2018

From @pali

0001-Automatically-compute-length-in-attr-macro.patch
From 7b0bc1224d51a23a9475447e15d8085c7d0a5005 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Wed, 7 Feb 2018 22:42:54 +0100
Subject: [PATCH] Automatically compute length in attr() macro

---
 Unicode/Unicode.xs | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index b3b1d2f..d3d8223 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -123,8 +123,8 @@ MODULE = Encode::Unicode PACKAGE = Encode::Unicode
 
 PROTOTYPES: DISABLE
 
-#define attr(k, l)  (hv_exists((HV *)SvRV(obj),k,l) ? \
-    *hv_fetch((HV *)SvRV(obj),k,l,0) : &PL_sv_undef)
+#define attr(k)  (hv_exists((HV *)SvRV(obj),"" k "",sizeof(k)-1) ? \
+    *hv_fetch((HV *)SvRV(obj),"" k "",sizeof(k)-1,0) : &PL_sv_undef)
 
 void
 decode(obj, str, check = 0)
@@ -133,9 +133,9 @@ SV *	str
 IV	check
 CODE:
 {
-    SV *sve      = attr("endian", 6);
+    SV *sve      = attr("endian");
     U8 endian    = *((U8 *)SvPV_nolen(sve));
-    SV *svs      = attr("size", 4);
+    SV *svs      = attr("size");
     int size     = SvIV(svs);
     int ucs2     = -1; /* only needed in the event of surrogate pairs */
     SV *result   = newSVpvn("",0);
@@ -209,7 +209,7 @@ CODE:
 	}
 #if 1
 	/* Update endian for next sequence */
-	sv = attr("renewed", 7);
+	sv = attr("renewed");
 	if (SvTRUE(sv)) {
 	    (void)hv_store((HV *)SvRV(obj),"endian",6,newSVpv((char *)&endian,1),0);
 	}
@@ -229,7 +229,7 @@ CODE:
 	U8 *d;
 	if (issurrogate(ord)) {
 	    if (ucs2 == -1) {
-		SV *sv = attr("ucs2", 4);
+		SV *sv = attr("ucs2");
 		ucs2 = SvTRUE(sv);
 	    }
 	    if (ucs2 || size == 4) {
@@ -351,9 +351,9 @@ SV *	utf8
 IV	check
 CODE:
 {
-    SV *sve = attr("endian", 6);
+    SV *sve = attr("endian");
     U8 endian = *((U8 *)SvPV_nolen(sve));
-    SV *svs = attr("size", 4);
+    SV *svs = attr("size");
     const int size = SvIV(svs);
     int ucs2 = -1; /* only needed if there is invalid_ucs2 input */
     const STRLEN usize = (size > 0 ? size : 1);
@@ -399,7 +399,7 @@ CODE:
 	enc_pack(aTHX_ result,size,endian,BOM_BE);
 #if 1
 	/* Update endian for next sequence */
-	sv = attr("renewed", 7);
+	sv = attr("renewed");
 	if (SvTRUE(sv)) {
 	    (void)hv_store((HV *)SvRV(obj),"endian",6,newSVpv((char *)&endian,1),0);
 	}
@@ -416,7 +416,7 @@ CODE:
 	if (size != 4 && invalid_ucs2(ord)) {
 	    if (!issurrogate(ord)) {
 		if (ucs2 == -1) {
-		    SV *sv = attr("ucs2", 4);
+		    SV *sv = attr("ucs2");
 		    ucs2 = SvTRUE(sv);
 		}
 		if (ucs2 || ord > 0x10FFFF) {
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2018

From @pali

v2-0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patch
From c99c35366e0f95dd985b749d8ba1b0ea763b3f65 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH v2] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS

When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
 Encode.pm          | 131 +++++++------------------
 Encode.xs          |  13 ++-
 Encode/encode.h    |   4 +
 Unicode/Unicode.xs | 148 +++++++++++++++++-----------
 t/decode.t         |   9 +-
 t/enc_eucjp.t      |   4 +
 t/utf32warnings.t  | 283 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/utf8messages.t   |  33 -------
 t/utf8warnings.t   | 109 +++++----------------
 9 files changed, 460 insertions(+), 274 deletions(-)
 create mode 100644 t/utf32warnings.t
 delete mode 100644 t/utf8messages.t

diff --git a/Encode.pm b/Encode.pm
index 249ac6b..7ee988f 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
 
 use Exporter 5.57 'import';
 
+use Carp ();
 our @CARP_NOT = qw(Encode::Encoder);
 
 # Public, encouraged API is exported by default
@@ -171,109 +172,54 @@ sub clone_encoding($) {
 }
 
 sub encode($$;$) {
-    my ( $name, $string, $check ) = @_;
-    return undef unless defined $string;
-    $string .= '';    # stringify;
-    $check ||= 0;
-    unless ( defined $name ) {
-        require Carp;
-        Carp::croak("Encoding name should not be undef");
-    }
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $octets;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $octets = $enc->encode( $string, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $octets = $enc->encode( $string, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $encode = $enc->can('encode');
+    Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $string = $_[1];
+        splice(@_, 1, 1, $string);
     }
-    $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $octets;
+    splice(@_, 2, 1, $check);
+    goto &$encode;
 }
 *str2bytes = \&encode;
 
 sub decode($$;$) {
-    my ( $name, $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check ||= 0;
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $string;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $string = $enc->decode( $octets, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $string = $enc->decode( $octets, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $decode = $enc->can('decode');
+    Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $octets = $_[1];
+        splice(@_, 1, 1, $octets);
     }
-    $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    splice(@_, 2, 1, $check);
+    goto &$decode;
 }
 *bytes2str = \&decode;
 
 sub from_to($$$;$) {
     my ( $string, $from, $to, $check ) = @_;
-    return undef unless defined $string;
-    $check ||= 0;
+    Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
     my $f = find_encoding($from);
-    unless ( defined $f ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$from'");
-    }
+    Carp::croak("Unknown encoding '$from'") unless defined $f;
     my $t = find_encoding($to);
-    unless ( defined $t ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$to'");
-    }
-
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $uni;
-    if ( ref($f) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $uni = $f->decode($string);
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $uni = $f->decode($string);
-    }
-
-    if ( ref($t) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $_[0] = $string = $t->encode( $uni, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $_[0] = $string = $t->encode( $uni, $check );
-    }
-
+    Carp::croak("Unknown encoding '$to'") unless defined $t;
+    return undef unless defined $string;
+    $check ||= 0;
+    my $uni = $f->decode($string);
+    $_[0] = $string = $t->encode( $uni, $check );
     return undef if ( $check && length($uni) );
     return defined( $_[0] ) ? length($string) : undef;
 }
@@ -288,14 +234,9 @@ sub encode_utf8($) {
 my $utf8enc;
 
 sub decode_utf8($;$) {
-    my ( $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check   ||= 0;
     $utf8enc ||= find_encoding('utf8');
-    my $string = $utf8enc->decode( $octets, $check );
-    $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    unshift(@_, $utf8enc);
+    goto &{$utf8enc->can('decode')};
 }
 
 onBOOT;
diff --git a/Encode.xs b/Encode.xs
index 6c077be..c1222a1 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -250,7 +250,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                    (UV)ch, enc->name[0]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(aTHX_ packWARN(WARN_UTF8),
                 ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
         }
@@ -289,7 +289,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                               enc->name[0], (UV)s[slen]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(
             aTHX_ packWARN(WARN_UTF8),
             ERR_DECODE_NOMAP,
@@ -446,7 +446,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
     char esc[UTF8_MAXLEN * 6 + 1];
     STRLEN i;
 
-    if (SvROK(check_sv)) {
+    if (!SvOK(check_sv)) {
+	fallback_cb = &PL_sv_undef;
+	check = 0;
+    }
+    else if (SvROK(check_sv)) {
 	/* croak("UTF-8 decoder doesn't support callback CHECK"); */
 	fallback_cb = check_sv;
 	check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -520,7 +524,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
             else
                 Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             if (encode)
                 Perl_warner(aTHX_ packWARN(WARN_UTF8),
                             ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1064,6 +1068,7 @@ BOOT:
     newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
     newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
     newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+    newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
     newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
     newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
     newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/Encode/encode.h b/Encode/encode.h
index 5fbcf76..1c7c066 100644
--- a/Encode/encode.h
+++ b/Encode/encode.h
@@ -94,6 +94,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_WARN_ON_ERR    0x0002 /* warn on error; may proceed */
 #define  ENCODE_RETURN_ON_ERR  0x0004 /* immediately returns on NOREP */
 #define  ENCODE_LEAVE_SRC      0x0008 /* $src updated unless set */
+#define  ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
 #define  ENCODE_PERLQQ         0x0100 /* perlqq fallback string */
 #define  ENCODE_HTMLCREF       0x0200 /* HTML character ref. fb mode */
 #define  ENCODE_XMLCREF        0x0400 /* XML  character ref. fb mode */
@@ -107,4 +108,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_FB_HTMLCREF    (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
 #define  ENCODE_FB_XMLCREF     (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
 
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+#define encode_ckWARN_packed(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || Perl_ckwarn(aTHX_ w)))
+
 #endif /* ENCODE_H */
diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index d3d8223..0b3a413 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -17,14 +17,8 @@
 #define isLoSurrogate(x)	(0xDC00 <= (x)  && (x) <= 0xDFFF )
 #define invalid_ucs2(x)         ( issurrogate(x) || 0xFFFF < (x) )
 
-/* For pre-5.14 source compatibility */
-#ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
-#   define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-#   define UTF8_DISALLOW_SURROGATE 0
-#   define UTF8_WARN_SURROGATE 0
-#   define UTF8_DISALLOW_FE_FF 0
-#   define UTF8_WARN_FE_FF 0
-#   define UTF8_WARN_NONCHAR 0
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
 #endif
 
 #define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -133,6 +127,7 @@ SV *	str
 IV	check
 CODE:
 {
+    SV *name     = attr("Name");
     SV *sve      = attr("endian");
     U8 endian    = *((U8 *)SvPV_nolen(sve));
     SV *svs      = attr("size");
@@ -233,53 +228,61 @@ CODE:
 		ucs2 = SvTRUE(sv);
 	    }
 	    if (ucs2 || size == 4) {
-		if (check) {
+		if (check & ENCODE_DIE_ON_ERR) {
 		    croak("%" SVf ":no surrogates allowed %" UVxf,
-			  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-			  ord);
+			  SVfARG(name), ord);
+		}
+		if (encode_ckWARN(check, WARN_SURROGATE)) {
+		    warner(packWARN(WARN_SURROGATE),
+			  "%" SVf ":no surrogates allowed %" UVxf,
+			  SVfARG(name), ord);
 		}
 		ord = FBCHAR;
 	    }
 	    else {
 		UV lo;
 		if (!isHiSurrogate(ord)) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":Malformed HI surrogate %" UVxf,
-			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-			      ord);
+			      SVfARG(name), ord);
 		    }
-		    else {
-			ord = FBCHAR;
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else if (s+size > e) {
-		    if (check) {
-		        if (check & ENCODE_STOP_AT_PARTIAL) {
-		             s -= size;
-		             break;
-		        }
-		        else {
-		             croak("%" SVf ":Malformed HI surrogate %" UVxf,
-				   *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-				   ord);
-		        }
+		    if (check & ENCODE_STOP_AT_PARTIAL) {
+		        s -= size;
+		        break;
 		    }
-		    else {
-		        ord = FBCHAR;
+		    if (check & ENCODE_DIE_ON_ERR) {
+			croak("%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
+		    }
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else {
 		    lo = enc_unpack(aTHX_ &s,e,size,endian);
 		    if (!isLoSurrogate(lo)) {
-			if (check) {
+			if (check & ENCODE_DIE_ON_ERR) {
 			    croak("%" SVf ":Malformed LO surrogate %" UVxf,
-				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-				  ord);
+				  SVfARG(name), ord);
 			}
-			else {
-			    s -= size;
-			    ord = FBCHAR;
+			if (encode_ckWARN(check, WARN_SURROGATE)) {
+			    warner(packWARN(WARN_SURROGATE),
+				  "%" SVf ":Malformed LO surrogate %" UVxf,
+				  SVfARG(name), ord);
 			}
+			s -= size;
+			ord = FBCHAR;
 		    }
 		    else {
 			ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +292,16 @@ CODE:
 	}
 
 	if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
-	    if (check) {
+	    if (check & ENCODE_DIE_ON_ERR) {
 		croak("%" SVf ":Unicode character %" UVxf " is illegal",
-		      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-		      ord);
-	    } else {
-		ord = FBCHAR;
+		      SVfARG(name), ord);
 	    }
+	    if (encode_ckWARN(check, WARN_NONCHAR)) {
+	        warner(packWARN(WARN_NONCHAR),
+		      "%" SVf ":Unicode character %" UVxf " is illegal",
+		      SVfARG(name), ord);
+	    }
+	    ord = FBCHAR;
 	}
 
 	if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -315,16 +321,28 @@ CODE:
 	    resultbuflen = SvLEN(result);
 	}
 
-	d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord,
-                                            UNICODE_WARN_ILLEGAL_INTERCHANGE);
+        HV *hv = NULL;
+        d = uvchr_to_utf8_flags_msgs(resultbuf+SvCUR(result), UNI_TO_NATIVE(ord), UNICODE_DISALLOW_ILLEGAL_INTERCHANGE | UNICODE_WARN_ILLEGAL_INTERCHANGE, &hv);
+        if (hv) {
+            sv_2mortal((SV *)hv);
+            SV *message = *hv_fetch(hv, "text", 4, 0);
+            U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+            if (check & ENCODE_DIE_ON_ERR)
+                croak("%" SVf, SVfARG(message));
+            if (encode_ckWARN_packed(check, categories))
+                warner(categories, "%" SVf, SVfARG(message));
+            d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+        }
+
 	SvCUR_set(result, d - (U8 *)SvPVX(result));
     }
 
     if (s < e) {
 	/* unlikely to happen because it's fixed-length -- dankogai */
-	if (check & ENCODE_WARN_ON_ERR) {
-	    Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
-			*hv_fetch((HV *)SvRV(obj),"Name",4,0));
+        if (check & ENCODE_DIE_ON_ERR)
+            croak("%" SVf ":Partial character", SVfARG(name));
+        if (encode_ckWARN(check, WARN_UTF8)) {
+            warner(packWARN(WARN_UTF8),"%" SVf ":Partial character", SVfARG(name));
 	}
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
@@ -351,6 +369,7 @@ SV *	utf8
 IV	check
 CODE:
 {
+    SV *name = attr("Name");
     SV *sve = attr("endian");
     U8 endian = *((U8 *)SvPV_nolen(sve));
     SV *svs = attr("size");
@@ -406,25 +425,43 @@ CODE:
 #endif
     }
     while (s < e && s+UTF8SKIP(s) <= e) {
-	STRLEN len;
-	UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
-                                               |UTF8_WARN_SURROGATE
-                                               |UTF8_DISALLOW_FE_FF
-                                               |UTF8_WARN_FE_FF
-                                               |UTF8_WARN_NONCHAR));
-	s += len;
-	if (size != 4 && invalid_ucs2(ord)) {
+        STRLEN len;
+        AV *msgs = NULL;
+        UV ord = NATIVE_TO_UNI(utf8n_to_uvchr_msgs(s, e-s, &len, UTF8_DISALLOW_ILLEGAL_INTERCHANGE | UTF8_WARN_ILLEGAL_INTERCHANGE, NULL, &msgs));
+        if (msgs) {
+            SSize_t i;
+            SSize_t len = av_len(msgs)+1;
+            sv_2mortal((SV *)msgs);
+            for (i = 0; i < len; ++i) {
+                SV *sv = *av_fetch(msgs, i, 0);
+                HV *hv = (HV *)SvRV(sv);
+                SV *message = *hv_fetch(hv, "text", 4, 0);
+                U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+                if (check & ENCODE_DIE_ON_ERR)
+                    croak("%" SVf, SVfARG(message));
+                if (encode_ckWARN_packed(check, categories))
+                    warner(categories, "%" SVf, SVfARG(message));
+            }
+        }
+	if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
 	    if (!issurrogate(ord)) {
 		if (ucs2 == -1) {
 		    SV *sv = attr("ucs2");
 		    ucs2 = SvTRUE(sv);
 		}
 		if (ucs2 || ord > 0x10FFFF) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
-				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+				  SVfARG(name),ord);
+		    }
+		    if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+			warner(packWARN(WARN_NON_UNICODE),
+				  "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+				  SVfARG(name),ord);
 		    }
 		    enc_pack(aTHX_ result,size,endian,FBCHAR);
+		} else if (ord == 0) {
+		    enc_pack(aTHX_ result,size,endian,FBCHAR);
 		} else {
 		    UV hi = ((ord - 0x10000) >> 10)   + 0xD800;
 		    UV lo = ((ord - 0x10000) & 0x3FF) + 0xDC00;
@@ -440,6 +477,7 @@ CODE:
 	else {
 	    enc_pack(aTHX_ result,size,endian,ord);
 	}
+	s += len;
     }
     if (s < e) {
 	/* UTF-8 partial char happens often on PerlIO.
@@ -449,7 +487,7 @@ CODE:
 	if (check & (ENCODE_DIE_ON_ERR|ENCODE_LEAVE_SRC)) {
 	    Perl_croak(aTHX_ "%" SVf ":partial character is not allowed "
 		       "when CHECK = 0x%" UVuf,
-		       *hv_fetch((HV *)SvRV(obj),"Name",4,0), check);
+		       SVfARG(name), check);
 	}
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
diff --git a/t/decode.t b/t/decode.t
index 93c992c..0c3b669 100644
--- a/t/decode.t
+++ b/t/decode.t
@@ -51,9 +51,12 @@ $orig = "\xC3\x80";
 $orig =~ /(..)/;
 is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
 
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+    skip "Perl Version ($]) is older than v5.27.1", 1 if $] < 5.027001;
+    $orig = "\xC3\x80";
+    *a = $orig;
+    is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
 
 $orig = "\N{U+C0}";
 $orig =~ /(.)/;
diff --git a/t/enc_eucjp.t b/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/t/enc_eucjp.t
+++ b/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
     }
 }
 
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
 no warnings "deprecated";
 use encoding 'euc-jp';
 
diff --git a/t/utf32warnings.t b/t/utf32warnings.t
new file mode 100644
index 0000000..d84939f
--- /dev/null
+++ b/t/utf32warnings.t
@@ -0,0 +1,283 @@
+use strict;
+use warnings;
+BEGIN {
+    if ($] < 5.014){
+        print "1..0 # Skip: Perl 5.14.0 or later required\n";
+        exit 0;
+    }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 38;
+
+my $valid   = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+    local $warn;
+    my $ret = $enc->encode( "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling encode on UTF32-LE encode object with valid string produces no warnings");
+    is($ret, $valid, "Calling encode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling encode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = Encode::encode( $enc, "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling Encode::encode for UTF32-LE with valid string produces no warnings");
+    is($ret, $valid, "Calling Encode::encode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling Encode::encode for UTF32-LE with invalid string warns");
+}
+
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+    is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Calling decode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+    is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Calling Encode::decode for UTF32-LE with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+    local $warn;
+    my $tmp = $valid;
+    $tmp .= ''; # de-COW
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+    is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    use Devel::Peek;
+    $tmp .= ''; # de-COW
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    like($warn, qr/may not be portable.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    $tmp .= ''; # de-COW
+    no warnings 'utf8';
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    $tmp .= ''; # de-COW
+    no warnings;
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
+
+
+{
+    local $warn;
+    my $str;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "a";
+    close $fh;
+    is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+    is($str, $valid, "PerlIO encodes string correctly");
+}
+
+
+{
+    local $warn;
+    my $str;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+    local $warn;
+    my $str;
+    no warnings 'utf8';
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    my $str;
+    no warnings;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/t/utf8messages.t b/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/t/utf8warnings.t b/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/t/utf8warnings.t
+++ b/t/utf8warnings.t
@@ -1,94 +1,35 @@
 use strict;
 use warnings;
-BEGIN {
-    if ($] < 5.014){
-        print "1..0 # Skip: Perl 5.14.0 or later required\n";
-        exit 0;
-    }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
 
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
 
-my $valid   = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
 
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
 
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $valid );
-    is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented May 8, 2018

From @pali

In attachment is a v3 version of Encode patch.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2018

From @pali

v3-0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patch
From aa43c063128b889db7379435209219a850071d0f Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH v3] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS

When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
 Encode.pm          | 131 +++++++------------------
 Encode.xs          |  13 ++-
 Encode/encode.h    |   4 +
 Unicode/Unicode.xs | 148 +++++++++++++++++-----------
 t/decode.t         |   9 +-
 t/enc_eucjp.t      |   4 +
 t/utf32warnings.t  | 283 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/utf8messages.t   |  33 -------
 t/utf8warnings.t   | 109 +++++----------------
 9 files changed, 460 insertions(+), 274 deletions(-)
 create mode 100644 t/utf32warnings.t
 delete mode 100644 t/utf8messages.t

diff --git a/Encode.pm b/Encode.pm
index 9234ae0..ec0992d 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
 
 use Exporter 5.57 'import';
 
+use Carp ();
 our @CARP_NOT = qw(Encode::Encoder);
 
 # Public, encouraged API is exported by default
@@ -171,109 +172,54 @@ sub clone_encoding($) {
 }
 
 sub encode($$;$) {
-    my ( $name, $string, $check ) = @_;
-    return undef unless defined $string;
-    $string .= '';    # stringify;
-    $check ||= 0;
-    unless ( defined $name ) {
-        require Carp;
-        Carp::croak("Encoding name should not be undef");
-    }
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $octets;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $octets = $enc->encode( $string, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $octets = $enc->encode( $string, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $encode = $enc->can('encode');
+    Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $string = $_[1];
+        splice(@_, 1, 1, $string);
     }
-    $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $octets;
+    splice(@_, 2, 1, $check);
+    goto &$encode;
 }
 *str2bytes = \&encode;
 
 sub decode($$;$) {
-    my ( $name, $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check ||= 0;
+    my $name = $_[0];
+    my $check = $_[2];
+    Carp::croak("Encoding name should not be undef") unless defined $name;
     my $enc = find_encoding($name);
-    unless ( defined $enc ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$name'");
-    }
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $string;
-    if ( ref($enc) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $string = $enc->decode( $octets, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $string = $enc->decode( $octets, $check );
+    Carp::croak("Unknown encoding '$name'") unless defined $enc;
+    my $decode = $enc->can('decode');
+    Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+    $check ||= 0;
+    splice(@_, 0, 1, $enc);
+    if (ref $check or !$check or ($check & LEAVE_SRC)) {
+        my $octets = $_[1];
+        splice(@_, 1, 1, $octets);
     }
-    $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    splice(@_, 2, 1, $check);
+    goto &$decode;
 }
 *bytes2str = \&decode;
 
 sub from_to($$$;$) {
     my ( $string, $from, $to, $check ) = @_;
-    return undef unless defined $string;
-    $check ||= 0;
+    Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
     my $f = find_encoding($from);
-    unless ( defined $f ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$from'");
-    }
+    Carp::croak("Unknown encoding '$from'") unless defined $f;
     my $t = find_encoding($to);
-    unless ( defined $t ) {
-        require Carp;
-        Carp::croak("Unknown encoding '$to'");
-    }
-
-    # For Unicode, warnings need to be caught and re-issued at this level
-    # so that callers can disable utf8 warnings lexically.
-    my $uni;
-    if ( ref($f) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $uni = $f->decode($string);
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $uni = $f->decode($string);
-    }
-
-    if ( ref($t) eq 'Encode::Unicode' ) {
-        my $warn = '';
-        {
-            local $SIG{__WARN__} = sub { $warn = shift };
-            $_[0] = $string = $t->encode( $uni, $check );
-        }
-        warnings::warnif('utf8', $warn) if length $warn;
-    }
-    else {
-        $_[0] = $string = $t->encode( $uni, $check );
-    }
-
+    Carp::croak("Unknown encoding '$to'") unless defined $t;
+    return undef unless defined $string;
+    $check ||= 0;
+    my $uni = $f->decode($string);
+    $_[0] = $string = $t->encode( $uni, $check );
     return undef if ( $check && length($uni) );
     return defined( $_[0] ) ? length($string) : undef;
 }
@@ -288,14 +234,9 @@ sub encode_utf8($) {
 my $utf8enc;
 
 sub decode_utf8($;$) {
-    my ( $octets, $check ) = @_;
-    return undef unless defined $octets;
-    $octets .= '';
-    $check   ||= 0;
     $utf8enc ||= find_encoding('utf8');
-    my $string = $utf8enc->decode( $octets, $check );
-    $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
-    return $string;
+    unshift(@_, $utf8enc);
+    goto &{$utf8enc->can('decode')};
 }
 
 onBOOT;
diff --git a/Encode.xs b/Encode.xs
index 9796181..09dca71 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -237,7 +237,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                    (UV)ch, enc->name[0]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(aTHX_ packWARN(WARN_UTF8),
                 ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
         }
@@ -276,7 +276,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
                               enc->name[0], (UV)s[slen]);
             return &PL_sv_undef; /* never reaches but be safe */
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             Perl_warner(
             aTHX_ packWARN(WARN_UTF8),
             ERR_DECODE_NOMAP,
@@ -460,7 +460,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
                     ? UTF8_DISALLOW_ILLEGAL_INTERCHANGE
                     : UTF8_ALLOW_NON_STRICT;
 
-    if (SvROK(check_sv)) {
+    if (!SvOK(check_sv)) {
+	fallback_cb = &PL_sv_undef;
+	check = 0;
+    }
+    else if (SvROK(check_sv)) {
 	/* croak("UTF-8 decoder doesn't support callback CHECK"); */
 	fallback_cb = check_sv;
 	check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -581,7 +585,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
             else
                 Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
         }
-        if (check & ENCODE_WARN_ON_ERR){
+        if (encode_ckWARN(check, WARN_UTF8)) {
             if (encode)
                 Perl_warner(aTHX_ packWARN(WARN_UTF8),
                             ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1029,6 +1033,7 @@ BOOT:
     newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
     newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
     newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+    newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
     newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
     newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
     newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/Encode/encode.h b/Encode/encode.h
index df5554f..5189930 100644
--- a/Encode/encode.h
+++ b/Encode/encode.h
@@ -99,6 +99,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_WARN_ON_ERR    0x0002 /* warn on error; may proceed */
 #define  ENCODE_RETURN_ON_ERR  0x0004 /* immediately returns on NOREP */
 #define  ENCODE_LEAVE_SRC      0x0008 /* $src updated unless set */
+#define  ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
 #define  ENCODE_PERLQQ         0x0100 /* perlqq fallback string */
 #define  ENCODE_HTMLCREF       0x0200 /* HTML character ref. fb mode */
 #define  ENCODE_XMLCREF        0x0400 /* XML  character ref. fb mode */
@@ -112,4 +113,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
 #define  ENCODE_FB_HTMLCREF    (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
 #define  ENCODE_FB_XMLCREF     (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
 
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+#define encode_ckWARN_packed(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || Perl_ckwarn(aTHX_ w)))
+
 #endif /* ENCODE_H */
diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index ad88106..c9e4391 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -17,14 +17,8 @@
 #define isLoSurrogate(x)	(0xDC00 <= (x)  && (x) <= 0xDFFF )
 #define invalid_ucs2(x)         ( issurrogate(x) || 0xFFFF < (x) )
 
-/* For pre-5.14 source compatibility */
-#ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
-#   define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-#   define UTF8_DISALLOW_SURROGATE 0
-#   define UTF8_WARN_SURROGATE 0
-#   define UTF8_DISALLOW_FE_FF 0
-#   define UTF8_WARN_FE_FF 0
-#   define UTF8_WARN_NONCHAR 0
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
 #endif
 
 #define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -133,6 +127,7 @@ SV *	str
 IV	check
 CODE:
 {
+    SV *name     = attr("Name");
     SV *sve      = attr("endian");
     U8 endian    = *((U8 *)SvPV_nolen(sve));
     SV *svs      = attr("size");
@@ -233,53 +228,61 @@ CODE:
 		ucs2 = SvTRUE(sv);
 	    }
 	    if (ucs2 || size == 4) {
-		if (check) {
+		if (check & ENCODE_DIE_ON_ERR) {
 		    croak("%" SVf ":no surrogates allowed %" UVxf,
-			  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-			  ord);
+			  SVfARG(name), ord);
+		}
+		if (encode_ckWARN(check, WARN_SURROGATE)) {
+		    warner(packWARN(WARN_SURROGATE),
+			  "%" SVf ":no surrogates allowed %" UVxf,
+			  SVfARG(name), ord);
 		}
 		ord = FBCHAR;
 	    }
 	    else {
 		UV lo;
 		if (!isHiSurrogate(ord)) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":Malformed HI surrogate %" UVxf,
-			      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-			      ord);
+			      SVfARG(name), ord);
 		    }
-		    else {
-			ord = FBCHAR;
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else if (s+size > e) {
-		    if (check) {
-		        if (check & ENCODE_STOP_AT_PARTIAL) {
-		             s -= size;
-		             break;
-		        }
-		        else {
-		             croak("%" SVf ":Malformed HI surrogate %" UVxf,
-				   *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-				   ord);
-		        }
+		    if (check & ENCODE_STOP_AT_PARTIAL) {
+		        s -= size;
+		        break;
 		    }
-		    else {
-		        ord = FBCHAR;
+		    if (check & ENCODE_DIE_ON_ERR) {
+			croak("%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
+		    }
+		    if (encode_ckWARN(check, WARN_SURROGATE)) {
+			warner(packWARN(WARN_SURROGATE),
+			      "%" SVf ":Malformed HI surrogate %" UVxf,
+			      SVfARG(name), ord);
 		    }
+		    ord = FBCHAR;
 		}
 		else {
 		    lo = enc_unpack(aTHX_ &s,e,size,endian);
 		    if (!isLoSurrogate(lo)) {
-			if (check) {
+			if (check & ENCODE_DIE_ON_ERR) {
 			    croak("%" SVf ":Malformed LO surrogate %" UVxf,
-				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-				  ord);
+				  SVfARG(name), ord);
 			}
-			else {
-			    s -= size;
-			    ord = FBCHAR;
+			if (encode_ckWARN(check, WARN_SURROGATE)) {
+			    warner(packWARN(WARN_SURROGATE),
+				  "%" SVf ":Malformed LO surrogate %" UVxf,
+				  SVfARG(name), ord);
 			}
+			s -= size;
+			ord = FBCHAR;
 		    }
 		    else {
 			ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +292,16 @@ CODE:
 	}
 
 	if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
-	    if (check) {
+	    if (check & ENCODE_DIE_ON_ERR) {
 		croak("%" SVf ":Unicode character %" UVxf " is illegal",
-		      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
-		      ord);
-	    } else {
-		ord = FBCHAR;
+		      SVfARG(name), ord);
 	    }
+	    if (encode_ckWARN(check, WARN_NONCHAR)) {
+	        warner(packWARN(WARN_NONCHAR),
+		      "%" SVf ":Unicode character %" UVxf " is illegal",
+		      SVfARG(name), ord);
+	    }
+	    ord = FBCHAR;
 	}
 
 	if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -315,16 +321,28 @@ CODE:
 	    resultbuflen = SvLEN(result);
 	}
 
-	d = uvchr_to_utf8_flags(resultbuf+SvCUR(result), ord,
-                                            UNICODE_WARN_ILLEGAL_INTERCHANGE);
+        HV *hv = NULL;
+        d = uvchr_to_utf8_flags_msgs(resultbuf+SvCUR(result), ord, UNICODE_DISALLOW_ILLEGAL_INTERCHANGE | UNICODE_WARN_ILLEGAL_INTERCHANGE, &hv);
+        if (hv) {
+            sv_2mortal((SV *)hv);
+            SV *message = *hv_fetch(hv, "text", 4, 0);
+            U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+            if (check & ENCODE_DIE_ON_ERR)
+                croak("%" SVf, SVfARG(message));
+            if (encode_ckWARN_packed(check, categories))
+                warner(categories, "%" SVf, SVfARG(message));
+            d = uvchr_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+        }
+
 	SvCUR_set(result, d - (U8 *)SvPVX(result));
     }
 
     if (s < e) {
 	/* unlikely to happen because it's fixed-length -- dankogai */
-	if (check & ENCODE_WARN_ON_ERR) {
-	    Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
-			*hv_fetch((HV *)SvRV(obj),"Name",4,0));
+        if (check & ENCODE_DIE_ON_ERR)
+            croak("%" SVf ":Partial character", SVfARG(name));
+        if (encode_ckWARN(check, WARN_UTF8)) {
+            warner(packWARN(WARN_UTF8),"%" SVf ":Partial character", SVfARG(name));
 	}
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
@@ -351,6 +369,7 @@ SV *	utf8
 IV	check
 CODE:
 {
+    SV *name = attr("Name");
     SV *sve = attr("endian");
     U8 endian = *((U8 *)SvPV_nolen(sve));
     SV *svs = attr("size");
@@ -406,25 +425,43 @@ CODE:
 #endif
     }
     while (s < e && s+UTF8SKIP(s) <= e) {
-	STRLEN len;
-	UV ord = utf8n_to_uvchr(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
-                                               |UTF8_WARN_SURROGATE
-                                               |UTF8_DISALLOW_FE_FF
-                                               |UTF8_WARN_FE_FF
-                                               |UTF8_WARN_NONCHAR));
-	s += len;
-	if (size != 4 && invalid_ucs2(ord)) {
+        STRLEN len;
+        AV *msgs = NULL;
+        UV ord = utf8n_to_uvchr_msgs(s, e-s, &len, UTF8_DISALLOW_ILLEGAL_INTERCHANGE | UTF8_WARN_ILLEGAL_INTERCHANGE, NULL, &msgs);
+        if (msgs) {
+            SSize_t i;
+            SSize_t len = av_len(msgs)+1;
+            sv_2mortal((SV *)msgs);
+            for (i = 0; i < len; ++i) {
+                SV *sv = *av_fetch(msgs, i, 0);
+                HV *hv = (HV *)SvRV(sv);
+                SV *message = *hv_fetch(hv, "text", 4, 0);
+                U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+                if (check & ENCODE_DIE_ON_ERR)
+                    croak("%" SVf, SVfARG(message));
+                if (encode_ckWARN_packed(check, categories))
+                    warner(categories, "%" SVf, SVfARG(message));
+            }
+        }
+	if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
 	    if (!issurrogate(ord)) {
 		if (ucs2 == -1) {
 		    SV *sv = attr("ucs2");
 		    ucs2 = SvTRUE(sv);
 		}
 		if (ucs2 || ord > 0x10FFFF) {
-		    if (check) {
+		    if (check & ENCODE_DIE_ON_ERR) {
 			croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
-				  *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+				  SVfARG(name),ord);
+		    }
+		    if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+			warner(packWARN(WARN_NON_UNICODE),
+				  "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+				  SVfARG(name),ord);
 		    }
 		    enc_pack(aTHX_ result,size,endian,FBCHAR);
+		} else if (ord == 0) {
+		    enc_pack(aTHX_ result,size,endian,FBCHAR);
 		} else {
 		    UV hi = ((ord - 0x10000) >> 10)   + 0xD800;
 		    UV lo = ((ord - 0x10000) & 0x3FF) + 0xDC00;
@@ -440,6 +477,7 @@ CODE:
 	else {
 	    enc_pack(aTHX_ result,size,endian,ord);
 	}
+	s += len;
     }
     if (s < e) {
 	/* UTF-8 partial char happens often on PerlIO.
@@ -449,7 +487,7 @@ CODE:
 	if (check & (ENCODE_DIE_ON_ERR|ENCODE_LEAVE_SRC)) {
 	    Perl_croak(aTHX_ "%" SVf ":partial character is not allowed "
 		       "when CHECK = 0x%" UVuf,
-		       *hv_fetch((HV *)SvRV(obj),"Name",4,0), check);
+		       SVfARG(name), check);
 	}
     }
     if (check && !(check & ENCODE_LEAVE_SRC)) {
diff --git a/t/decode.t b/t/decode.t
index 93c992c..0c3b669 100644
--- a/t/decode.t
+++ b/t/decode.t
@@ -51,9 +51,12 @@ $orig = "\xC3\x80";
 $orig =~ /(..)/;
 is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
 
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+    skip "Perl Version ($]) is older than v5.27.1", 1 if $] < 5.027001;
+    $orig = "\xC3\x80";
+    *a = $orig;
+    is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
 
 $orig = "\N{U+C0}";
 $orig =~ /(.)/;
diff --git a/t/enc_eucjp.t b/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/t/enc_eucjp.t
+++ b/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
     }
 }
 
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
 no warnings "deprecated";
 use encoding 'euc-jp';
 
diff --git a/t/utf32warnings.t b/t/utf32warnings.t
new file mode 100644
index 0000000..d84939f
--- /dev/null
+++ b/t/utf32warnings.t
@@ -0,0 +1,283 @@
+use strict;
+use warnings;
+BEGIN {
+    if ($] < 5.014){
+        print "1..0 # Skip: Perl 5.14.0 or later required\n";
+        exit 0;
+    }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 38;
+
+my $valid   = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+    local $warn;
+    my $ret = $enc->encode( "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling encode on UTF32-LE encode object with valid string produces no warnings");
+    is($ret, $valid, "Calling encode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling encode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = Encode::encode( $enc, "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling Encode::encode for UTF32-LE with valid string produces no warnings");
+    is($ret, $valid, "Calling Encode::encode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling Encode::encode for UTF32-LE with invalid string warns");
+}
+
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+    is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Calling decode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+    local $warn;
+    my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+    is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+    local $warn;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Calling Encode::decode for UTF32-LE with invalid string warns");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+    is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+    local $warn;
+    no warnings 'utf8';
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+    local $warn;
+    no warnings;
+    Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+    like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+    local $warn;
+    my $tmp = $valid;
+    $tmp .= ''; # de-COW
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+    is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    use Devel::Peek;
+    $tmp .= ''; # de-COW
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    like($warn, qr/may not be portable.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    $tmp .= ''; # de-COW
+    no warnings 'utf8';
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    my $tmp = $invalid;
+    $tmp .= ''; # de-COW
+    no warnings;
+    open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+    my $str = <$fh>;
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
+
+
+{
+    local $warn;
+    my $str;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "a";
+    close $fh;
+    is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+    is($str, $valid, "PerlIO encodes string correctly");
+}
+
+
+{
+    local $warn;
+    my $str;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+    local $warn;
+    my $str;
+    no warnings 'utf8';
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+    local $warn;
+    my $str;
+    no warnings;
+    open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+    print $fh "\x{D800}";
+    close $fh;
+    is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/t/utf8messages.t b/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/t/utf8warnings.t b/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/t/utf8warnings.t
+++ b/t/utf8warnings.t
@@ -1,94 +1,35 @@
 use strict;
 use warnings;
-BEGIN {
-    if ($] < 5.014){
-        print "1..0 # Skip: Perl 5.14.0 or later required\n";
-        exit 0;
-    }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
 
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
 
-my $valid   = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
 
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
 
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $valid );
-    is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
 
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
 
-{
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::Unicode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $ret = Encode::decode( $enc, $invalid );
-    is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
-    no warnings 'utf8';
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
-    no warnings;
-    @warnings = ();
-    my $inplace = $invalid;
-    Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
-    is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-- 
2.11.0

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2018

From @pali

Encode part was sent and now merged into upstream Encode module​:
dankogai/p5-encode#134

PerlIO​::encoding is remaining part for core.

@p5pRT
Copy link
Author

p5pRT commented Jan 21, 2019

From @pali

On Saturday 30 June 2018 11​:23​:40 pali@​cpan.org wrote​:

Encode part was sent and now merged into upstream Encode module​:
dankogai/p5-encode#134

Changes in these pull requests were now released on CPAN in Encode
version 2.99.

PerlIO​::encoding is remaining part for core.

Can you update Encode module in Perl core and apply remaining
PerlIO​::encoding patch?

@p5pRT
Copy link
Author

p5pRT commented Jan 21, 2019

From @jkeenan

On Mon, 21 Jan 2019 08​:48​:26 GMT, pali@​cpan.org wrote​:

On Saturday 30 June 2018 11​:23​:40 pali@​cpan.org wrote​:

Encode part was sent and now merged into upstream Encode module​:
dankogai/p5-encode#134

Changes in these pull requests were now released on CPAN in Encode
version 2.99.

PerlIO​::encoding is remaining part for core.

Can you update Encode module in Perl core

Done.

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jan 21, 2019

From @pali

On Monday 21 January 2019 09​:48​:27 James E Keenan via RT wrote​:

On Mon, 21 Jan 2019 08​:48​:26 GMT, pali@​cpan.org wrote​:

Can you update Encode module in Perl core

Done.

Thanks!

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

@p5pRT
Copy link
Author

p5pRT commented Jan 21, 2019

From @Leont

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

Leon

@p5pRT
Copy link
Author

p5pRT commented Jan 22, 2019

From @pali

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of utf8
pragma warnings when processing filehandle with :encoding layer.

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2019

From @pali

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of utf8
pragma warnings when processing filehandle with :encoding layer.

Leon, it is enough?

@p5pRT
Copy link
Author

p5pRT commented Feb 12, 2019

From @pali

On Thursday 07 February 2019 16​:05​:40 pali@​cpan.org wrote​:

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of utf8
pragma warnings when processing filehandle with :encoding layer.

Leon, it is enough?

PING

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2019

From @tonycoz

On Tue, 12 Feb 2019 07​:46​:22 -0800, pali@​cpan.org wrote​:

On Thursday 07 February 2019 16​:05​:40 pali@​cpan.org wrote​:

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is
the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is
correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of
utf8
pragma warnings when processing filehandle with :encoding layer.

Leon, it is enough?

PING

Do you mean the patch​:

- Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::STOP_AT_PARTIAL();
+ Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::ONLY_PRAGMA_WARNINGS()|Encode​::STOP_AT_PARTIAL();

which fails a test​:

../ext/PerlIO-encoding/t/fallback.t (Wstat​: 256 Tests​: 10 Failed​: 1)
  Failed test​: 2
  Non-zero exit status​: 1

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 13, 2019

From @pali

On Tuesday 12 February 2019 20​:54​:11 Tony Cook via RT wrote​:

On Tue, 12 Feb 2019 07​:46​:22 -0800, pali@​cpan.org wrote​:

On Thursday 07 February 2019 16​:05​:40 pali@​cpan.org wrote​:

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is
the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is
correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of
utf8
pragma warnings when processing filehandle with :encoding layer.

Leon, it is enough?

PING

Do you mean the patch​:

- Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::STOP_AT_PARTIAL();
+ Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::ONLY_PRAGMA_WARNINGS()|Encode​::STOP_AT_PARTIAL();

Yes.

which fails a test​:

../ext/PerlIO-encoding/t/fallback.t (Wstat​: 256 Tests​: 10 Failed​: 1)
Failed test​: 2
Non-zero exit status​: 1

It started failing? Ah :-( IIRC it worked when I created it year ago.

I will look at it...

@p5pRT
Copy link
Author

p5pRT commented Feb 14, 2019

From @pali

On Wednesday 13 February 2019 09​:36​:31 pali@​cpan.org wrote​:

On Tuesday 12 February 2019 20​:54​:11 Tony Cook via RT wrote​:

On Tue, 12 Feb 2019 07​:46​:22 -0800, pali@​cpan.org wrote​:

On Thursday 07 February 2019 16​:05​:40 pali@​cpan.org wrote​:

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this RT is
the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch

All other patches are part of Encode, and therefore applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what is
correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect state of
utf8
pragma warnings when processing filehandle with :encoding layer.

Leon, it is enough?

PING

Do you mean the patch​:

- Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::STOP_AT_PARTIAL();
+ Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::ONLY_PRAGMA_WARNINGS()|Encode​::STOP_AT_PARTIAL();

Yes.

which fails a test​:

../ext/PerlIO-encoding/t/fallback.t (Wstat​: 256 Tests​: 10 Failed​: 1)
Failed test​: 2
Non-zero exit status​: 1

It started failing? Ah :-( IIRC it worked when I created it year ago.

I will look at it...

Hi! Following patch should fix this problem. There is missing
"use warnings" and also "my $line" is declared more times.

Inline Patch
diff --git a/ext/PerlIO-encoding/t/fallback.t b/ext/PerlIO-encoding/t/fallback.t
index 3abdfd3f37..84ba097e71 100644
--- a/ext/PerlIO-encoding/t/fallback.t
+++ b/ext/PerlIO-encoding/t/fallback.t
@@ -16,11 +16,13 @@ BEGIN {
     import Encode qw(:fallback_all);
 }
 
+use warnings;
 use Test::More tests => 10;
 
 # $PerlIO::encoding = 0; # WARN_ON_ERR|PERLQQ;
 
 my $file = "fallback$$.txt";
+my $line;
 
 {
     my $message = '';
@@ -34,7 +36,7 @@ my $file = "fallback$$.txt";
 }
 
 open($fh,'<',$file) || die "File cannot be re-opened";
-my $line = <$fh>;
+$line = <$fh>;
 is($line,"\\x{20ac}0.02\n","perlqq escapes");
 close($fh);
 
@@ -46,7 +48,7 @@ print $fh $str,"0.02\n";
 close($fh);
 
 open($fh,'<',$file) || die "File cannot be re-opened";
-my $line = <$fh>;
+$line = <$fh>;
 is($line,"&#8364;0.02\n","HTML escapes");
 close($fh);
 
@@ -59,7 +61,7 @@ close($fh);
 }
 
 ok(open($fh,"<encoding(US-ASCII)",$file),"Opened as ASCII");
-my $line = <$fh>;
+$line = <$fh>;
 printf "# %x\n",ord($line);
 is($line,"\\xA30.02\n","Escaped non-mapped char");
 close($fh);

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2019

From @tonycoz

On Thu, 14 Feb 2019 01​:08​:33 -0800, pali@​cpan.org wrote​:

On Wednesday 13 February 2019 09​:36​:31 pali@​cpan.org wrote​:

On Tuesday 12 February 2019 20​:54​:11 Tony Cook via RT wrote​:

On Tue, 12 Feb 2019 07​:46​:22 -0800, pali@​cpan.org wrote​:

On Thursday 07 February 2019 16​:05​:40 pali@​cpan.org wrote​:

On Tuesday 22 January 2019 09​:38​:38 pali@​cpan.org wrote​:

On Monday 21 January 2019 15​:03​:02 Leon Timmermans via RT
wrote​:

On Mon, Jan 21, 2019 at 6​:51 PM <pali@​cpan.org> wrote​:

...and apply remaining
PerlIO​::encoding patch?

Can you specify which of the patches attached to this
RT is
the one still under consideration?

0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-
f.patch

All other patches are part of Encode, and therefore
applied.

Can you add a perldelta entry?

I do not know where is correct place to put entries, nor what
is
correct
formatting... But entry could contain something like this​:

===
Modules PerlIO​::encoding and Encode were fixed to respect
state of
utf8
pragma warnings when processing filehandle with :encoding
layer.

Leon, it is enough?

PING

Do you mean the patch​:

-
Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::STOP_AT_PARTIAL();
+
Encode​::PERLQQ()|Encode​::WARN_ON_ERR()|Encode​::ONLY_PRAGMA_WARNINGS()|Encode​::STOP_AT_PARTIAL();

Yes.

which fails a test​:

../ext/PerlIO-encoding/t/fallback.t
(Wstat​: 256 Tests​: 10 Failed​: 1)
Failed test​: 2
Non-zero exit status​: 1

It started failing? Ah :-( IIRC it worked when I created it year ago.

I will look at it...

Hi! Following patch should fix this problem. There is missing
"use warnings" and also "my $line" is declared more times.

I just added use warnings to the block that does the warning test.

Thanks, applied your change as 7d0a46b.

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2019

@tonycoz - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant