Skip Menu |
Report information
Id: 131683
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: pali [at] cpan.org
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)

Attachments
0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patch
0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch
0003-Rewrite-encode-decode-encode_utf8-decode_utf8-and-fr.patch



Date: Sat, 1 Jul 2017 13:08:13 +0200
From: pali [...] cpan.org
Subject: Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
To: perlbug [...] perl.org
Download (untitled) / with headers
text/plain 2.6k
Hi! I would continue in discussion started in last year in p5p mailing list: https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html Currently there is big mess with reporting warnings from Encode::encode() and Encode::decode() functions implemented by Encode module which is part of Perl core modules. Moreover mess is bigger, because Encode is used by PerlIO::encoding module which is internal Perl module. In perl you can enable some utf8 warnings by pragma warning. Next for Encode::encode() and Encode::decode() functions you can enable some warnings by Encode::ENCODE_WARN_ON_ERR check flag passed as optional argument to those functions. And third you can control some warnings via $PerlIO::encoding::fallback variable which is relevant for :encoding layer (which internally use Encode module and their encode/decode functions). This lead to couple of bug reports like Encode::encode() or decode() does not respect pragma warnings, or it does not respect check flag Encode::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for some modules (e.g. UTF-16) to use only pragma warnings, which basically fully broke Encode::ENCODE_WARN_ON_ERR check flag. More complicated behaviour happened when using :encoding layer... https://rt.cpan.org/Public/Bug/Display.html?id=120505 https://rt.cpan.org/Public/Bug/Display.html?id=88592 https://github.com/dankogai/p5-encode/pull/26#issuecomment-235641347 https://rt.perl.org/Public/Bug/Display.html?id=128788 https://rt.cpan.org/Public/Bug/Display.html?id=116629 https://github.com/dankogai/p5-encode/issues/59 https://github.com/dankogai/p5-encode/commit/a6c2ba385875c2c03bd42350e23aef0188fb23b0 https://github.com/dankogai/p5-encode/commit/07c8adb58e55c7cf66b3d6673bf50010fe1a69ea As stated in previous discussion I'm proposing new behaviour: * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which would tell Encode that it should report only those warnings which are currently enabled by pragma warnings. When Encode::ONLY_PRAGMA_WARNINGS is not set then Encode would report all warnings. The whole flag Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag Encode::ENCODE_WARN_ON_ERR is not set. * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer variable $PerlIO::encoding::fallback, so by default every action on filehandle used by :encoding layer would report warnings according to pragma warnings. As this change affects both Perl & externally maintained Encode module on cpan, I'm opening ticket for Perl. In attachments are patches implementing above proposed behaviour. Encode patch is based on the last Encode version 2.91 (not in bleed yet).

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-porters [...] perl.org
On Sat, 01 Jul 2017 04:46:15 -0700, pali@cpan.org wrote: Show quoted text
> As stated in previous discussion I'm proposing new behaviour: > > * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which > would tell Encode that it should report only those warnings which are > currently enabled by pragma warnings. When > Encode::ONLY_PRAGMA_WARNINGS > is not set then Encode would report all warnings. The whole flag > Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag > Encode::ENCODE_WARN_ON_ERR is not set. > > * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer > variable $PerlIO::encoding::fallback, so by default every action on > filehandle used by :encoding layer would report warnings according to > pragma warnings. > > As this change affects both Perl & externally maintained Encode module > on cpan, I'm opening ticket for Perl. > > In attachments are patches implementing above proposed behaviour. > Encode > patch is based on the last Encode version 2.91 (not in bleed yet).
It think this is a great improvement. Thank you for all your work. -- Father Chrysostomos
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
To: perlbug-followup [...] perl.org
Date: Fri, 14 Jul 2017 09:47:08 +0200
From: pali [...] cpan.org
Download (untitled) / with headers
text/plain 1.2k
On Saturday 01 July 2017 06:08:08 Father Chrysostomos via RT wrote: Show quoted text
> On Sat, 01 Jul 2017 04:46:15 -0700, pali@cpan.org wrote:
> > As stated in previous discussion I'm proposing new behaviour: > > > > * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which > > would tell Encode that it should report only those warnings which are > > currently enabled by pragma warnings. When > > Encode::ONLY_PRAGMA_WARNINGS > > is not set then Encode would report all warnings. The whole flag > > Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag > > Encode::ENCODE_WARN_ON_ERR is not set. > > > > * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer > > variable $PerlIO::encoding::fallback, so by default every action on > > filehandle used by :encoding layer would report warnings according to > > pragma warnings. > > > > As this change affects both Perl & externally maintained Encode module > > on cpan, I'm opening ticket for Perl. > > > > In attachments are patches implementing above proposed behaviour. > > Encode > > patch is based on the last Encode version 2.91 (not in bleed yet).
> > It think this is a great improvement. Thank you for all your work.
Hi! Any comments for this new proposed behavior or implementation in patches? I would like to hear some feedback...
Date: Wed, 9 Aug 2017 14:02:58 +0200
To: perlbug-followup [...] perl.org
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
Download (untitled) / with headers
text/plain 1.3k
On Saturday 01 July 2017 06:08:08 Father Chrysostomos via RT wrote: Show quoted text
> On Sat, 01 Jul 2017 04:46:15 -0700, pali@cpan.org wrote:
> > As stated in previous discussion I'm proposing new behaviour: > > > > * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which > > would tell Encode that it should report only those warnings which are > > currently enabled by pragma warnings. When > > Encode::ONLY_PRAGMA_WARNINGS > > is not set then Encode would report all warnings. The whole flag > > Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag > > Encode::ENCODE_WARN_ON_ERR is not set. > > > > * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer > > variable $PerlIO::encoding::fallback, so by default every action on > > filehandle used by :encoding layer would report warnings according to > > pragma warnings. > > > > As this change affects both Perl & externally maintained Encode module > > on cpan, I'm opening ticket for Perl. > > > > In attachments are patches implementing above proposed behaviour. > > Encode > > patch is based on the last Encode version 2.91 (not in bleed yet).
> > It think this is a great improvement. Thank you for all your work.
@Father Chrysostomos: Now patches and there fore more then month. Any comments for either new behavior or its implementation? I would like to hear your opinion... or if it is OK without any problem could it be merged?
CC: "bugs-bitbucket [...] rt.perl.org" <bugs-bitbucket [...] rt.perl.org>
Date: Thu, 10 Aug 2017 16:57:55 +0200
To: Perl5 Porters <perl5-porters [...] perl.org>
From: Leon Timmermans <fawaka [...] gmail.com>
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
Download (untitled) / with headers
text/plain 3.1k
On Sat, Jul 1, 2017 at 1:46 PM, via RT <perlbug-followup@perl.org> wrote:
Show quoted text
Hi!

I would continue in discussion started in last year in p5p mailing list:
https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html

Currently there is big mess with reporting warnings from
Encode::encode() and Encode::decode() functions implemented by Encode
module which is part of Perl core modules.

Moreover mess is bigger, because Encode is used by PerlIO::encoding
module which is internal Perl module.

In perl you can enable some utf8 warnings by pragma warning. Next for
Encode::encode() and Encode::decode() functions you can enable some
warnings by Encode::ENCODE_WARN_ON_ERR check flag passed as optional
argument to those functions. And third you can control some warnings via
$PerlIO::encoding::fallback variable which is relevant for :encoding
layer (which internally use Encode module and their encode/decode
functions).

This lead to couple of bug reports like Encode::encode() or decode()
does not respect pragma warnings, or it does not respect check flag
Encode::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for
some modules (e.g. UTF-16) to use only pragma warnings, which basically
fully broke Encode::ENCODE_WARN_ON_ERR check flag. More complicated
behaviour happened when using :encoding layer...

https://rt.cpan.org/Public/Bug/Display.html?id=120505
https://rt.cpan.org/Public/Bug/Display.html?id=88592
https://github.com/dankogai/p5-encode/pull/26#issuecomment-235641347
https://rt.perl.org/Public/Bug/Display.html?id=128788
https://rt.cpan.org/Public/Bug/Display.html?id=116629
https://github.com/dankogai/p5-encode/issues/59
https://github.com/dankogai/p5-encode/commit/a6c2ba385875c2c03bd42350e23aef0188fb23b0
https://github.com/dankogai/p5-encode/commit/07c8adb58e55c7cf66b3d6673bf50010fe1a69ea

As stated in previous discussion I'm proposing new behaviour:

* Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which
would tell Encode that it should report only those warnings which are
currently enabled by pragma warnings. When Encode::ONLY_PRAGMA_WARNINGS
is not set then Encode would report all warnings. The whole flag
Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag
Encode::ENCODE_WARN_ON_ERR is not set.

* Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer
variable $PerlIO::encoding::fallback, so by default every action on
filehandle used by :encoding layer would report warnings according to
pragma warnings.

As this change affects both Perl & externally maintained Encode module
on cpan, I'm opening ticket for Perl.

In attachments are patches implementing above proposed behaviour. Encode
patch is based on the last Encode version 2.91 (not in bleed yet).

IMHO $PerlIO::encoding::fallback is a liability, and instead of digging a deeper hole we should probably fix it first. In particular:

  local $PerlIO::encoding::fallback = FB_CROAK;
  open my $fh, '<:encoding(utf-8)', $filename

May or may not DWIM depending on whether PerlIO::encoding was already loaded or not. Relying even further on it having a non-zero value doesn't feel wise to me.

Leon
From: pali [...] cpan.org
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
To: perlbug-followup [...] perl.org
Date: Mon, 14 Aug 2017 10:19:58 +0200
Download (untitled) / with headers
text/plain 3.6k
On Thursday 10 August 2017 07:58:22 Leon Timmermans via RT wrote: Show quoted text
> On Sat, Jul 1, 2017 at 1:46 PM, via RT <perlbug-followup@perl.org> wrote: >
> > Hi! > > > > I would continue in discussion started in last year in p5p mailing list: > > https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html > > > > Currently there is big mess with reporting warnings from > > Encode::encode() and Encode::decode() functions implemented by Encode > > module which is part of Perl core modules. > > > > Moreover mess is bigger, because Encode is used by PerlIO::encoding > > module which is internal Perl module. > > > > In perl you can enable some utf8 warnings by pragma warning. Next for > > Encode::encode() and Encode::decode() functions you can enable some > > warnings by Encode::ENCODE_WARN_ON_ERR check flag passed as optional > > argument to those functions. And third you can control some warnings via > > $PerlIO::encoding::fallback variable which is relevant for :encoding > > layer (which internally use Encode module and their encode/decode > > functions). > > > > This lead to couple of bug reports like Encode::encode() or decode() > > does not respect pragma warnings, or it does not respect check flag > > Encode::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for > > some modules (e.g. UTF-16) to use only pragma warnings, which basically > > fully broke Encode::ENCODE_WARN_ON_ERR check flag. More complicated > > behaviour happened when using :encoding layer... > > > > https://rt.cpan.org/Public/Bug/Display.html?id=120505 > > https://rt.cpan.org/Public/Bug/Display.html?id=88592 > > https://github.com/dankogai/p5-encode/pull/26#issuecomment-235641347 > > https://rt.perl.org/Public/Bug/Display.html?id=128788 > > https://rt.cpan.org/Public/Bug/Display.html?id=116629 > > https://github.com/dankogai/p5-encode/issues/59 > > https://github.com/dankogai/p5-encode/commit/ > > a6c2ba385875c2c03bd42350e23aef0188fb23b0 > > https://github.com/dankogai/p5-encode/commit/ > > 07c8adb58e55c7cf66b3d6673bf50010fe1a69ea > > > > As stated in previous discussion I'm proposing new behaviour: > > > > * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which > > would tell Encode that it should report only those warnings which are > > currently enabled by pragma warnings. When Encode::ONLY_PRAGMA_WARNINGS > > is not set then Encode would report all warnings. The whole flag > > Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag > > Encode::ENCODE_WARN_ON_ERR is not set. > > > > * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer > > variable $PerlIO::encoding::fallback, so by default every action on > > filehandle used by :encoding layer would report warnings according to > > pragma warnings. > > > > As this change affects both Perl & externally maintained Encode module > > on cpan, I'm opening ticket for Perl. > > > > In attachments are patches implementing above proposed behaviour. Encode > > patch is based on the last Encode version 2.91 (not in bleed yet). > >
> > IMHO $PerlIO::encoding::fallback is a liability, and instead of digging a > deeper hole we should probably fix it first. In particular: > > local $PerlIO::encoding::fallback = FB_CROAK; > open my $fh, '<:encoding(utf-8)', $filename
IMO this is different problem, not related to my proposed patches, which are about how to handle warnings. Show quoted text
> May or may not DWIM depending on whether PerlIO::encoding was already > loaded or not. Relying even further on it having a non-zero value doesn't > feel wise to me.
I agree that above :encoding layer without STOP AT PARTIAL flag in fallback should be fixed, but I think all those can be done independently of this effort about warnings done in this ticket.
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
Date: Mon, 21 Aug 2017 15:24:41 +0200
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 4.2k
On Monday 14 August 2017 10:19:58 pali@cpan.org wrote: Show quoted text
> On Thursday 10 August 2017 07:58:22 Leon Timmermans via RT wrote:
> > On Sat, Jul 1, 2017 at 1:46 PM, via RT <perlbug-followup@perl.org> wrote: > >
> > > Hi! > > > > > > I would continue in discussion started in last year in p5p mailing list: > > > https://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg239061.html > > > > > > Currently there is big mess with reporting warnings from > > > Encode::encode() and Encode::decode() functions implemented by Encode > > > module which is part of Perl core modules. > > > > > > Moreover mess is bigger, because Encode is used by PerlIO::encoding > > > module which is internal Perl module. > > > > > > In perl you can enable some utf8 warnings by pragma warning. Next for > > > Encode::encode() and Encode::decode() functions you can enable some > > > warnings by Encode::ENCODE_WARN_ON_ERR check flag passed as optional > > > argument to those functions. And third you can control some warnings via > > > $PerlIO::encoding::fallback variable which is relevant for :encoding > > > layer (which internally use Encode module and their encode/decode > > > functions). > > > > > > This lead to couple of bug reports like Encode::encode() or decode() > > > does not respect pragma warnings, or it does not respect check flag > > > Encode::ENCODE_WARN_ON_ERR passed as argument. Some people hacked for > > > some modules (e.g. UTF-16) to use only pragma warnings, which basically > > > fully broke Encode::ENCODE_WARN_ON_ERR check flag. More complicated > > > behaviour happened when using :encoding layer... > > > > > > https://rt.cpan.org/Public/Bug/Display.html?id=120505 > > > https://rt.cpan.org/Public/Bug/Display.html?id=88592 > > > https://github.com/dankogai/p5-encode/pull/26#issuecomment-235641347 > > > https://rt.perl.org/Public/Bug/Display.html?id=128788 > > > https://rt.cpan.org/Public/Bug/Display.html?id=116629 > > > https://github.com/dankogai/p5-encode/issues/59 > > > https://github.com/dankogai/p5-encode/commit/ > > > a6c2ba385875c2c03bd42350e23aef0188fb23b0 > > > https://github.com/dankogai/p5-encode/commit/ > > > 07c8adb58e55c7cf66b3d6673bf50010fe1a69ea > > > > > > As stated in previous discussion I'm proposing new behaviour: > > > > > > * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which > > > would tell Encode that it should report only those warnings which are > > > currently enabled by pragma warnings. When Encode::ONLY_PRAGMA_WARNINGS > > > is not set then Encode would report all warnings. The whole flag > > > Encode::ONLY_PRAGMA_WARNINGS would have no effect when flag > > > Encode::ENCODE_WARN_ON_ERR is not set. > > > > > > * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer > > > variable $PerlIO::encoding::fallback, so by default every action on > > > filehandle used by :encoding layer would report warnings according to > > > pragma warnings. > > > > > > As this change affects both Perl & externally maintained Encode module > > > on cpan, I'm opening ticket for Perl. > > > > > > In attachments are patches implementing above proposed behaviour. Encode > > > patch is based on the last Encode version 2.91 (not in bleed yet). > > >
> > > > IMHO $PerlIO::encoding::fallback is a liability, and instead of digging a > > deeper hole we should probably fix it first. In particular: > > > > local $PerlIO::encoding::fallback = FB_CROAK; > > open my $fh, '<:encoding(utf-8)', $filename
> > IMO this is different problem, not related to my proposed patches, which > are about how to handle warnings. >
> > May or may not DWIM depending on whether PerlIO::encoding was already > > loaded or not. Relying even further on it having a non-zero value doesn't > > feel wise to me.
> > I agree that above :encoding layer without STOP AT PARTIAL flag in > fallback should be fixed, but I think all those can be done > independently of this effort about warnings done in this ticket.
And the goal of those changes is how Encode handle warnings. PerlIO is related just because $PerlIO::encoding::fallback is affected by Encode changes. I would like to move forward and would like to hear if those Encode changes together with extending Encode flags and default value for $PerlIO::encoding::fallback are OK, or if changes needs to be reworked ... or if whole idea for fixing those problems is wrong.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.5k
On Mon, 21 Aug 2017 06:25:01 -0700, pali@cpan.org wrote: Show quoted text
> And the goal of those changes is how Encode handle warnings. PerlIO is > related just because $PerlIO::encoding::fallback is affected by Encode > changes. > > I would like to move forward and would like to hear if those Encode > changes together with extending Encode flags and default value for > $PerlIO::encoding::fallback are OK, or if changes needs to be reworked > ... or if whole idea for fixing those problems is wrong.
Most of this could be fixed by PerlIO::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true. The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through. But a lot of that effort is wasted, for example: @@ -407,23 +439,29 @@ CODE: } while (s < e && s+UTF8SKIP(s) <= e) { STRLEN len; - UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE - |UTF8_WARN_SURROGATE - |UTF8_DISALLOW_FE_FF - |UTF8_WARN_FE_FF - |UTF8_WARN_NONCHAR)); - s += len; - if (size != 4 && invalid_ucs2(ord)) { + U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE; + if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER; + if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE; + if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR; + UV ord = utf8n_to_uvuni(s, e-s, &len, flags); + if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) { utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed. The same is true for the calls to uvuni_to_utf8_flags(). In another case you're adding a completely new warning: + if (encode_ckWARN(check, WARN_NONCHAR)) { + warner(packWARN(WARN_NONCHAR), + "%" SVf ":Unicode character %" UVxf " is illegal", + *hv_fetch((HV *)SvRV(obj),"Name",4,0), + ord); + } + ord = FBCHAR; } which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode(). Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.) Tony
To: perlbug-followup [...] perl.org
Date: Mon, 28 Aug 2017 09:50:31 +0200
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
Download (untitled) / with headers
text/plain 3.7k
On Sunday 27 August 2017 18:38:15 Tony Cook via RT wrote: Show quoted text
> On Mon, 21 Aug 2017 06:25:01 -0700, pali@cpan.org wrote:
> > And the goal of those changes is how Encode handle warnings. PerlIO is > > related just because $PerlIO::encoding::fallback is affected by Encode > > changes. > > > > I would like to move forward and would like to hear if those Encode > > changes together with extending Encode flags and default value for > > $PerlIO::encoding::fallback are OK, or if changes needs to be reworked > > ... or if whole idea for fixing those problems is wrong.
> > Most of this could be fixed by PerlIO::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true.
Seems yes. Show quoted text
> The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through.
Right, with my approach for Encode::ONLY_PRAGMA_WARNINGS, warnings would be enabled/disabled according to warnings pragma. Like in other parts of perl. Show quoted text
> But a lot of that effort is wasted, for example: > > @@ -407,23 +439,29 @@ CODE: > } > while (s < e && s+UTF8SKIP(s) <= e) { > STRLEN len; > - UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE > - |UTF8_WARN_SURROGATE > - |UTF8_DISALLOW_FE_FF > - |UTF8_WARN_FE_FF > - |UTF8_WARN_NONCHAR)); > - s += len; > - if (size != 4 && invalid_ucs2(ord)) { > + U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE; > + if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER; > + if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE; > + if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR; > + UV ord = utf8n_to_uvuni(s, e-s, &len, flags); > + if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) { > > utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed. > > The same is true for the calls to uvuni_to_utf8_flags(). > > In another case you're adding a completely new warning: > > + if (encode_ckWARN(check, WARN_NONCHAR)) { > + warner(packWARN(WARN_NONCHAR), > + "%" SVf ":Unicode character %" UVxf " is illegal", > + *hv_fetch((HV *)SvRV(obj),"Name",4,0), > + ord); > + } > + ord = FBCHAR; > } > > which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode().
All above warning should be sent when user calls Encode with FB_WARN bit, independently of lexical warnings. This is how all other Encode module works, also in this way is Encode API designed and documented. Show quoted text
> Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.)
Fixing this bug could be mean as backward incompatible. But current behaviour does not make sense as it is not possible to correctly enable (or disable) warnings for Encode module. Basically above code which you quote is implementation of UTF-16 encoding. Code for UTF-8 or Latin-X is in different module. So there is a big inconsistency between UTF-8 and UTF-16 and so it is hard to use. Common operation is: I got binary data from 3rd module (or network) and it was told me that those data are UTF-16 encoded. I do not know if they are really UTF-16 encoded, so I would run Encode module to silently (without warnings) decodes them. If I need to be sure, I can run Encode with FB_CROAK and handle exceptions...
RT-Send-CC: perl5-porters [...] perl.org
From a perl build point of view there's a couple of issues: a) the MANIFEST is missing the new file and still contains the deleted file. b) there's two compilation errors in the build: Unicode.xs: In function ‘XS_Encode__Unicode_decode’: Unicode.xs:342:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] U32 flags = 0; ^ Unicode.xs: In function ‘XS_Encode__Unicode_encode’: Unicode.xs:446:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] UV ord = utf8n_to_uvuni(s, e-s, &len, flags); ^ On Mon, 28 Aug 2017 00:50:49 -0700, pali@cpan.org wrote: Show quoted text
> All above warning should be sent when user calls Encode with FB_WARN > bit, independently of lexical warnings. This is how all other Encode > module works, also in this way is Encode API designed and documented.
The problem is your patch makes these warnings lexically scoped with *only* FB_WARN: $ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' no croak $ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' UTF-16 surrogate U+DC10 in goto at -e line 1. no croak The utf8n_to_uvuni() function *already* checks that these warnings are lexically enabled. Also, it only checks them when it actually finds the problem being reported. Your code however calculates flags whether or not there's an error, and currently does it on every character encoded. When ONLY_PRAGMA_WARNINGS is set this results in three calls to Perl_ckwarn() inside the encoding loop. (This code should be using ckWARN_d() since these are default on warnings, but it should just set them all and let the API do the checking it does anyway.) An issue with changes to Encode::encode() as you might see above is the opaqueness of the warning: UTF-16 surrogate U+DC10 in goto at -e line 1. My code doesn't contain a goto. I think the only real fix to this would be to rewrite Encode::encode/decode in C. Tony
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
Date: Wed, 30 Aug 2017 11:20:09 +0200
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 2.8k
On Tuesday 29 August 2017 18:46:43 Tony Cook via RT wrote: Show quoted text
> From a perl build point of view there's a couple of issues: > > a) the MANIFEST is missing the new file and still contains the deleted file.
That is something which would be fixed when I start preparing pull request to upstream Encode. Show quoted text
> b) there's two compilation errors in the build: > > Unicode.xs: In function ‘XS_Encode__Unicode_decode’: > Unicode.xs:342:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] > U32 flags = 0; > ^ > Unicode.xs: In function ‘XS_Encode__Unicode_encode’: > Unicode.xs:446:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] > UV ord = utf8n_to_uvuni(s, e-s, &len, flags); > ^
Ok, that can be fixed. Show quoted text
> On Mon, 28 Aug 2017 00:50:49 -0700, pali@cpan.org wrote:
> > All above warning should be sent when user calls Encode with FB_WARN > > bit, independently of lexical warnings. This is how all other Encode > > module works, also in this way is Encode API designed and documented.
> > The problem is your patch makes these warnings lexically scoped with *only* FB_WARN: > > $ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' > no croak > > $ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' > UTF-16 surrogate U+DC10 in goto at -e line 1. > no croak > > The utf8n_to_uvuni() function *already* checks that these warnings are lexically enabled. Also, it only checks them when it actually finds the problem being reported.
Ah :-( So we need utf8n_to_uvuni() function which report warnings even when they are lexically disabled. Show quoted text
> Your code however calculates flags whether or not there's an error, and currently does it on every character encoded. When ONLY_PRAGMA_WARNINGS is set this results in three calls to Perl_ckwarn() inside the encoding loop. (This code should be using ckWARN_d() since these are default on warnings, but it should just set them all and let the API do the checking it does anyway.) > > An issue with changes to Encode::encode() as you might see above is the opaqueness of the warning: > > UTF-16 surrogate U+DC10 in goto at -e line 1.
Yes, this is something which needs to be fixed. Show quoted text
> My code doesn't contain a goto. I think the only real fix to this would be to rewrite Encode::encode/decode in C.
Do you have your implementation? At least I do not see any other patch in this ticket. I used goto in Encode::encode/decode dispatcher functions, so warnings from modules would be correctly propagated to caller of the Encode::encode/decode function. It is possible to reimplement Encode::encode/decode function in C/XS in same way how it is implemented in my patch, but without that warning from goto?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 3.4k
On Wed, 30 Aug 2017 02:22:26 -0700, pali@cpan.org wrote: Show quoted text
> On Tuesday 29 August 2017 18:46:43 Tony Cook via RT wrote:
> > b) there's two compilation errors in the build: > > > > Unicode.xs: In function ‘XS_Encode__Unicode_decode’: > > Unicode.xs:342:2: error: ISO C90 forbids mixed declarations and code > > [-Werror=declaration-after-statement] > > U32 flags = 0; > > ^ > > Unicode.xs: In function ‘XS_Encode__Unicode_encode’: > > Unicode.xs:446:2: error: ISO C90 forbids mixed declarations and code > > [-Werror=declaration-after-statement] > > UV ord = utf8n_to_uvuni(s, e-s, &len, flags); > > ^
> > Ok, that can be fixed. >
> > On Mon, 28 Aug 2017 00:50:49 -0700, pali@cpan.org wrote:
> > > All above warning should be sent when user calls Encode with > > > FB_WARN > > > bit, independently of lexical warnings. This is how all other > > > Encode > > > module works, also in this way is Encode API designed and > > > documented.
> > > > The problem is your patch makes these warnings lexically scoped with > > *only* FB_WARN: > > > > $ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = > > encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no > > croak"' > > no croak > > > > $ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = > > chr(0xdc10)), &Encode::FB_WARN); print "no croak"' > > UTF-16 surrogate U+DC10 in goto at -e line 1. > > no croak > > > > The utf8n_to_uvuni() function *already* checks that these warnings > > are lexically enabled. Also, it only checks them when it actually > > finds the problem being reported.
> > Ah :-( So we need utf8n_to_uvuni() function which report warnings even > when they are lexically disabled.
Show quoted text
> > Your code however calculates flags whether or not there's an error, > > and currently does it on every character encoded. When > > ONLY_PRAGMA_WARNINGS is set this results in three calls to > > Perl_ckwarn() inside the encoding loop. (This code should be using > > ckWARN_d() since these are default on warnings, but it should just > > set them all and let the API do the checking it does anyway.) > > > > An issue with changes to Encode::encode() as you might see above is > > the opaqueness of the warning: > > > > UTF-16 surrogate U+DC10 in goto at -e line 1.
> > Yes, this is something which needs to be fixed. >
> > My code doesn't contain a goto. I think the only real fix to this > > would be to rewrite Encode::encode/decode in C.
> > Do you have your implementation? At least I do not see any other patch > in this ticket.
The "My code" I was referring to is the one-liner above. I meant it would confuse the user whose code doesn't include a goto (just a call to encode/decode()). Show quoted text
> > I used goto in Encode::encode/decode dispatcher functions, so warnings > from modules would be correctly propagated to caller of the > Encode::encode/decode function.
I guessed that, but the existing encode/decode functions already has code that handles that: ... my $octets; if ( ref($enc) eq 'Encode::Unicode' ) { my $warn = ''; { local $SIG{__WARN__} = sub { $warn = shift }; $octets = $enc->encode( $string, $check ); } warnings::warnif('utf8', $warn) if length $warn; } ... Show quoted text
> It is possible to reimplement Encode::encode/decode function in C/XS > in > same way how it is implemented in my patch, but without that warning > from goto?
The most complex part would probably be the call to find_encoding(), the rest I think is relatively simple. Tony
To: perlbug-followup [...] perl.org
Date: Mon, 11 Sep 2017 09:24:12 +0200
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
Download (untitled) / with headers
text/plain 1.7k
On Sunday 10 September 2017 17:01:40 Tony Cook via RT wrote: Show quoted text
> The "My code" I was referring to is the one-liner above. I meant it would confuse the user whose code doesn't include a goto (just a call to encode/decode()).
So, you are referring to that unexpected warning with "goto" in its message, right? Therefore I asked if it is possible to reimplement that function in XS without possibility that such warning would be thrown. Show quoted text
> > I used goto in Encode::encode/decode dispatcher functions, so warnings > > from modules would be correctly propagated to caller of the > > Encode::encode/decode function.
> > I guessed that, but the existing encode/decode functions already has code that handles that: > > ... > my $octets; > if ( ref($enc) eq 'Encode::Unicode' ) { > my $warn = ''; > { > local $SIG{__WARN__} = sub { $warn = shift }; > $octets = $enc->encode( $string, $check ); > } > warnings::warnif('utf8', $warn) if length $warn; > } > ...
Yes, but that code is wrong for more reasons. E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505 Also because it catch only 'utf8' warnings and also because it check only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X. My approach try to use goto to run encode function in current context, so warning/error messages would be properly propagated to caller. And introduce Encode::ENCODE_WARN_ON_ERR, so above bug 120505 can be fixed too. Show quoted text
> > It is possible to reimplement Encode::encode/decode function in C/XS > > in > > same way how it is implemented in my patch, but without that warning > > from goto?
> > The most complex part would probably be the call to find_encoding(), the rest I think is relatively simple.
Can you show me simple example how to write that "goto" part?
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
On Mon, 11 Sep 2017 00:24:32 -0700, pali@cpan.org wrote: Show quoted text
> On Sunday 10 September 2017 17:01:40 Tony Cook via RT wrote:
> > The "My code" I was referring to is the one-liner above. I meant it > > would confuse the user whose code doesn't include a goto (just a call > > to encode/decode()).
> > So, you are referring to that unexpected warning with "goto" in its > message, right? Therefore I asked if it is possible to reimplement > that > function in XS without possibility that such warning would be thrown.
The warning would only be thrown if the code was called where that warning is enabled. Show quoted text
> > > I used goto in Encode::encode/decode dispatcher functions, so > > > warnings > > > from modules would be correctly propagated to caller of the > > > Encode::encode/decode function.
> > > > I guessed that, but the existing encode/decode functions already has > > code that handles that: > > > > ... > > my $octets; > > if ( ref($enc) eq 'Encode::Unicode' ) { > > my $warn = ''; > > { > > local $SIG{__WARN__} = sub { $warn = shift }; > > $octets = $enc->encode( $string, $check ); > > } > > warnings::warnif('utf8', $warn) if length $warn; > > } > > ...
> > Yes, but that code is wrong for more reasons. > > E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505 > Also because it catch only 'utf8' warnings and also because it check > only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X. > > My approach try to use goto to run encode function in current context, > so warning/error messages would be properly propagated to caller.
Yes, that would no longer be necessary if the functions were written as XS. Show quoted text
> And introduce Encode::ENCODE_WARN_ON_ERR, so above bug 120505 can be > fixed too.
Did you mean ONLY_PRAGMA_WARNINGS here? One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS. Show quoted text
> > > It is possible to reimplement Encode::encode/decode function in > > > C/XS > > > in > > > same way how it is implemented in my patch, but without that > > > warning > > > from goto?
> > > > The most complex part would probably be the call to find_encoding(), > > the rest I think is relatively simple.
> > Can you show me simple example how to write that "goto" part?
There's no goto involved, the XS simply doesn't have a new context to control the warnings, so the caller's warnings flags are what matter. Tony
Date: Mon, 11 Sep 2017 19:24:25 -0600
To: perlbug-followup [...] perl.org
CC: perl5-porters [...] perl.org
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
Download (untitled) / with headers
text/plain 841b
On 09/11/2017 06:13 PM, Tony Cook via RT wrote: Show quoted text
> One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.
Note that there is a new API in 5.26 which allows you to get a bit field returned with a bit set for every error encountered in the input character. As it says in the docs: To do your own error handling, call this function with the C<UTF8_CHECK_ONLY> flag to suppress any warnings, and then examine the C<*errors> return. UV utf8n_to_uvchr_error(const U8 *s, STRLEN curlen, STRLEN *retlen, const U32 flags, U32 * errors) If we get Devel::PPPort maintained, I had hoped to put it there.
To: perlbug-followup [...] perl.org
From: pali [...] cpan.org
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
Date: Tue, 12 Sep 2017 09:31:45 +0200
Download (untitled) / with headers
text/plain 2.8k
On Monday 11 September 2017 17:13:17 Tony Cook via RT wrote: Show quoted text
> On Mon, 11 Sep 2017 00:24:32 -0700, pali@cpan.org wrote:
> > On Sunday 10 September 2017 17:01:40 Tony Cook via RT wrote:
> > > The "My code" I was referring to is the one-liner above. I meant it > > > would confuse the user whose code doesn't include a goto (just a call > > > to encode/decode()).
> > > > So, you are referring to that unexpected warning with "goto" in its > > message, right? Therefore I asked if it is possible to reimplement > > that > > function in XS without possibility that such warning would be thrown.
> > The warning would only be thrown if the code was called where that warning is enabled. >
> > > > I used goto in Encode::encode/decode dispatcher functions, so > > > > warnings > > > > from modules would be correctly propagated to caller of the > > > > Encode::encode/decode function.
> > > > > > I guessed that, but the existing encode/decode functions already has > > > code that handles that: > > > > > > ... > > > my $octets; > > > if ( ref($enc) eq 'Encode::Unicode' ) { > > > my $warn = ''; > > > { > > > local $SIG{__WARN__} = sub { $warn = shift }; > > > $octets = $enc->encode( $string, $check ); > > > } > > > warnings::warnif('utf8', $warn) if length $warn; > > > } > > > ...
> > > > Yes, but that code is wrong for more reasons. > > > > E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505 > > Also because it catch only 'utf8' warnings and also because it check > > only UTF-16/UTF-32 encodings, not UTF-8 or some Latin-X. > > > > My approach try to use goto to run encode function in current context, > > so warning/error messages would be properly propagated to caller.
> > Yes, that would no longer be necessary if the functions were written as XS.
Exactly. Show quoted text
> > And introduce Encode::ENCODE_WARN_ON_ERR, so above bug 120505 can be > > fixed too.
> > Did you mean ONLY_PRAGMA_WARNINGS here?
Right, I did copy+paste error. Show quoted text
> One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.
In XS, context warnings could be temporary changed. Show quoted text
> > > > It is possible to reimplement Encode::encode/decode function in > > > > C/XS > > > > in > > > > same way how it is implemented in my patch, but without that > > > > warning > > > > from goto?
> > > > > > The most complex part would probably be the call to find_encoding(), > > > the rest I think is relatively simple.
> > > > Can you show me simple example how to write that "goto" part?
> > There's no goto involved, the XS simply doesn't have a new context to control the warnings, so the caller's warnings flags are what matter.
Ok, then it should be just call_sv() instead of perl's goto, right?
To: perlbug-followup [...] perl.org
Date: Tue, 12 Sep 2017 09:34:14 +0200
From: pali [...] cpan.org
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
On Monday 11 September 2017 18:24:43 karl williamson via RT wrote: Show quoted text
> On 09/11/2017 06:13 PM, Tony Cook via RT wrote:
> > One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.
> > Note that there is a new API in 5.26 which allows you to get a bit field > returned with a bit set for every error encountered in the input character. > > As it says in the docs: > > To do your own error handling, call this function with the > C<UTF8_CHECK_ONLY> > flag to suppress any warnings, and then examine the C<*errors> return. > > UV utf8n_to_uvchr_error(const U8 *s, STRLEN curlen, > STRLEN *retlen, > const U32 flags, > U32 * errors)
That would really help! Thanks for pointer. Show quoted text
> If we get Devel::PPPort maintained, I had hoped to put it there.
But Devel::PPPort is unmaintained... What about taking it into p5p? Or forking it?
Date: Wed, 13 Sep 2017 00:48:48 +0200
Subject: Re: [perl #131683] Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback
From: pali [...] cpan.org
To: perlbug-followup [...] perl.org
Download (untitled) / with headers
text/plain 369b
In attachment you can find patch with rewritten encode/decode/from_to functions. Seems it was not a big problem. With this patch there is no goto in warning messages... $ perl -Iblib/lib -Iblib/arch -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' UTF-16 surrogate U+DC10 in subroutine entry at -e line 1. no croak

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 477b
On Tue, 12 Sep 2017 16:06:31 -0700, pali@cpan.org wrote: Show quoted text
> In attachment you can find patch with rewritten encode/decode/from_to > functions. Seems it was not a big problem. > > With this patch there is no goto in warning messages... > > $ perl -Iblib/lib -Iblib/arch -MEncode -le 'my $x = encode("utf-16", > (my $y = chr(0xdc10)), > &Encode::FB_WARN); print "no croak"' > > UTF-16 surrogate U+DC10 in subroutine entry at -e line 1. > no croak
That looks sane to me. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org