Skip Menu |
Report information
Id: 124113
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: chmrr <alex [at] chmrr.net>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: Applied
Severity: medium
Type: core
Perl Version: 5.20.2
Fixed In: (no value)

Attachments
0001-perl-124113-Make-check-for-multi-dimensional-arrays-.patch
0001-toke.c-UTF-8-aware-warning-cleanups.patch
0002-Allow-unquoted-UTF-8-HERE-document-terminators.patch
0003-Fix-.without-parentheses-is-ambuguous-warning-for-UT.patch
0004-Adjust-callsites-that-use-UTF8SKIP-without-checking-.patch
test.pl



Date: Wed, 18 Mar 2015 20:12:18 -0400 (EDT)
From: alex [...] chmrr.net
Subject: Compile-time warning with UTF8 variable in array index
To: perlbug [...] perl.org
This is a bug report for perl from alex@chmrr.net, generated with the help of perlbug 1.40 running under perl 5.20.2. ----------------------------------------------------------------- [Please describe your issue here] Perl 5.18.0 and above cause compile-time warnings with: $array[ $𝛃 + 0 ]; ..but not: $array[ 0 + $𝛃 ]; I've attached a test.pl file containing the above, in case the U+1D6C3 gets crrupted in transit. The warnings are: Passing malformed UTF-8 to "XPosixWord" is deprecated at test.pl line 13. Malformed UTF-8 character (unexpected continuation byte 0x9d, with no preceding start byte) at test.pl line 13. Bisect points to: commit 281235491e0eef7be051126b2c99109c4f5332be Author: Karl Williamson <public@khwilliamson.com> Date: Sun Dec 23 10:03:16 2012 -0700 Deprecate calling isFOO_utf8() with malformed handy.h has character classification macros to determine if a UTF-8 encoded character is of a given type FOO, such as isALPHA_utf8(), etc. Code that calls these should have first made sure that the parameter is legal UTF-8. Prior to this patch, false was silently returned for all illegal UTF-8. Now, in most instances, a deprecation warning is raised. This is to catch bugs, and prepare for eventual elimination of this check, which fails to catch read-off-end-of-buffer malformations anyway. (One idea would be to leave the check in for DEBUGGING builds.) The cases where no deprecation warning is raised as a result of this commit is for the classes where the character does not have to be converted to a code point for its inclusion to be determined. For example, if malformed UTF-8 is checked to see if it is ASCII, we only need to check that it is one of the 128 ASCII characters. If it isn't, we don't bother to see if it is malformed or not. There are other cases, as well, such as with isSPACE(), where we check if the UTF-8 is one of a very finite set, without checking for malformedness. This commit causes a number of apparent bugs to be shown by the Perl test suite. These do not cause actual failures. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl 5.20.2: Configured by chmrr at Wed Mar 18 20:04:27 EDT 2015. Summary of my perl5 (revision 5 version 20 subversion 2) configuration: Platform: osname=linux, osvers=3.13.0-44-generic, archname=x86_64-linux uname='linux mycon.chmrr.net 3.13.0-44-generic #73-ubuntu smp tue dec 16 00:22:43 utc 2014 x86_64 x86_64 x86_64 gnulinux ' config_args='-de -Dprefix=/opt/perlbrew/perls/perl-5.20.2 -Aeval:scriptdir=/opt/perlbrew/perls/perl-5.20.2/bin' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.8.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' --- @INC for perl 5.20.2: /opt/perlbrew/perls/perl-5.20.2/lib/site_perl/5.20.2/x86_64-linux /opt/perlbrew/perls/perl-5.20.2/lib/site_perl/5.20.2 /opt/perlbrew/perls/perl-5.20.2/lib/5.20.2/x86_64-linux /opt/perlbrew/perls/perl-5.20.2/lib/5.20.2 . --- Environment for perl 5.20.2: HOME=/home/chmrr LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/opt/perlbrew/bin:/opt/perlbrew/perls/perl-5.20.2/bin:/home/chmrr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games PERLBREW_BASHRC_VERSION=0.67 PERLBREW_HOME=/home/chmrr/.perlbrew PERLBREW_MANPATH=/opt/perlbrew/perls/perl-5.20.2/man PERLBREW_PATH=/opt/perlbrew/bin:/opt/perlbrew/perls/perl-5.20.2/bin PERLBREW_PERL=perl-5.20.2 PERLBREW_ROOT=/opt/perlbrew PERLBREW_VERSION=0.66 PERL_BADLANG (unset) SHELL=/bin/bash
Download test.pl
text/x-perl 572b

Message body is not shown because sender requested not to inline it.

Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
To: perl5-porters [...] perl.org
Date: Mon, 23 Mar 2015 00:12:16 -0400
From: Alex Vandiver <alex [...] chmrr.net>
Download (untitled) / with headers
text/plain 262b
On Wed, 18 Mar 2015 17:12:38 -0700 Alex Vandiver (via RT) Show quoted text
> Perl 5.18.0 and above cause compile-time warnings with: > > $array[ $𝛃 + 0 ]; > > ..but not: > > $array[ 0 + $𝛃 ];
Patches for this, as well as 2.5 related problems, attached. - Alex

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 882b
On Sun Mar 22 21:12:43 2015, alex@chmrr.net wrote: Show quoted text
> On Wed, 18 Mar 2015 17:12:38 -0700 Alex Vandiver (via RT)
> > Perl 5.18.0 and above cause compile-time warnings with: > > > > $array[ $𝛃 + 0 ]; > > > > ..but not: > > > > $array[ 0 + $𝛃 ];
> > Patches for this, as well as 2.5 related problems, attached. > - Alex
Thank you. I have applied the first three patches: $ git log --oneline -3 8ce2ba8 Fix "...without parentheses is ambuguous" warning for UTF-8 function nam 6e59c86 Allow unquoted UTF-8 HERE-document terminators b3089e9 [perl #124113] Make check for multi-dimensional arrays be UTF8-aware I don’t like the fact that the fourth one lacks tests, even though I believe it corrects the behaviour. Also, what it addresses is not a regression of any sort, so it needs to wait until after 5.22 since we are in code freeze. -- Father Chrysostomos
From: Dave Mitchell <davem [...] iabyn.com>
CC: perl5-porters [...] perl.org
Date: Mon, 30 Mar 2015 12:00:26 +0100
To: Father Chrysostomos via RT <perlbug-followup [...] perl.org>
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
Download (untitled) / with headers
text/plain 1.8k
On Fri, Mar 27, 2015 at 01:17:58PM -0700, Father Chrysostomos via RT wrote: Show quoted text
> Thank you. I have applied the first three patches: > > $ git log --oneline -3 > 8ce2ba8 Fix "...without parentheses is ambuguous" warning for UTF-8 function nam
This one is intermittently failing smokes. The test is: # toke.c # Fix 'Use of "..." without parentheses is ambiguous' warning for # Unicode function names use utf8; use warnings; sub 𝛃(;$) { return 0; } my $v = 𝛃 - 5; Run by hand, this (correctly) gives me: Warning: Use of "𝛃" without parentheses is ambiguous at /tmp/p line 7. 'od' shows that the bytes that make up the beta in the src are: f0 9d 9b 83 (i.e. codepoint \x{1d6c3}) and that the bytes output for the beta in the warning message when run by hand are: f0 9d 9b 83 According to the George Greer's smoke log, http://m-l.org/~perl/smoke/perl/linux/blead_g++/log38f18a308b948c6eaf187519a16d060e1ec7cc20.log.gz The output is: EXPECTED: Warning: Use of "AAAA" without parentheses is ambiguous at - line 7. GOT: Warning: Use of "BBBB" without parentheses is ambiguous at - line 7. where the bytes that make up AAAA and BBBB are: c3 b0 c2 9d c2 9b c2 83 c3 83 c2 b0 c3 82 c2 9d c3 82 c2 9b c3 82 c2 83 AAAA is the original bytes double-encoded, while BBBB is triple-encoded. I guess that one extra level of encoding is caused by the smoker code when generating smoke logs, but I don't see why the 'got' message should have an extra level of encoding on top of that, and why it's intermittent (sometimes a mismatch between TEST and harness, and for some configurations not at all), and why it doesn't fail for me. The smokes seem to only fail for the permutations with LC_ALL=en_US.utf8. -- Nothing ventured, nothing lost.
To: perl5-porters [...] perl.org
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
From: Alex Vandiver <alex [...] chmrr.net>
Date: Mon, 30 Mar 2015 14:05:00 -0400
Download (untitled) / with headers
text/plain 449b
On Mon, 30 Mar 2015 12:00:26 +0100 Dave Mitchell <davem@iabyn.com> wrote: Show quoted text
> On Fri, Mar 27, 2015 at 01:17:58PM -0700, Father Chrysostomos via RT wrote:
> > Thank you. I have applied the first three patches: > > > > $ git log --oneline -3 > > 8ce2ba8 Fix "...without parentheses is ambuguous" warning for UTF-8 function nam
> > This one is intermittently failing smokes. > [snip]
Thanks for the note -- I'll take a closer look tonight. - Alex
To: Dave Mitchell <davem [...] iabyn.com>
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
CC: Father Chrysostomos via RT <perlbug-followup [...] perl.org>, perl5-porters [...] perl.org
From: Nicholas Clark <nick [...] ccl4.org>
Date: Mon, 30 Mar 2015 19:38:24 +0100
Download (untitled) / with headers
text/plain 1.5k
On Mon, Mar 30, 2015 at 12:00:26PM +0100, Dave Mitchell wrote: Show quoted text
> I guess that one extra level of encoding is caused by the smoker code when > generating smoke logs, but I don't see why the 'got' message should have > an extra level of encoding on top of that, and why it's intermittent > (sometimes a mismatch between TEST and harness, and for some > configurations not at all), and why it doesn't fail for me. > > The smokes seem to only fail for the permutations with LC_ALL=en_US.utf8.
I can consistently see the failures under t/harness on dromedary with LC_ALL=en_US.utf8 I ran a bisect as: LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start v5.21.10 --target lib/warnings.t and it reports that the errors start at this commit: commit 8ce2ba821761a7ada1e1def512c0374977759cf7 Author: Alex Vandiver <alex@chmrr.net> Date: Sun Mar 22 23:08:24 2015 -0400 Fix "...without parentheses is ambuguous" warning for UTF-8 function names While isWORDCHAR_lazy_if is UTF-8 aware, checking advanced byte-by-byte. This lead to errors of the form: Passing malformed UTF-8 to "XPosixWord" is deprecated Malformed UTF-8 character (unexpected continuation byte 0x9d, with no preceding start byte) Warning: Use of "οΏ½" without parentheses is ambiguous Use UTF8SKIP to advance character-by-character, not byte-by-byte. (and by implication they are not a side effect of a later commit) I'm not in a position to investigate further as to why, let alone provide a fix. Nicholas Clark
Date: Tue, 31 Mar 2015 04:15:48 -0400
CC: Dave Mitchell <davem [...] iabyn.com>, perl5-porters [...] perl.org
From: Alex Vandiver <alex [...] chmrr.net>
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
To: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 577b
On Mon, 30 Mar 2015 19:38:24 +0100 Nicholas Clark <nick@ccl4.org> wrote: Show quoted text
> I can consistently see the failures under t/harness on dromedary with > LC_ALL=en_US.utf8 > > I ran a bisect as: > > LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start v5.21.10 --target lib/warnings.t
The test failure requires PERL_UNICODE="", and uncovers a warning which was missing a UTF8fARG(). A little more poking around uncovered a couple more as well; patch attached. The fact that the wide character is reported "in print" and not "in warn" is likely its own bug. - Alex

Message body is not shown because sender requested not to inline it.

CC: Nicholas Clark <nick [...] ccl4.org>, perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Date: Tue, 31 Mar 2015 09:50:35 +0100
To: Alex Vandiver <alex [...] chmrr.net>
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
Download (untitled) / with headers
text/plain 851b
On Tue, Mar 31, 2015 at 04:15:48AM -0400, Alex Vandiver wrote: Show quoted text
> On Mon, 30 Mar 2015 19:38:24 +0100 Nicholas Clark <nick@ccl4.org> wrote:
> > I can consistently see the failures under t/harness on dromedary with > > LC_ALL=en_US.utf8 > > > > I ran a bisect as: > > > > LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start v5.21.10 --target lib/warnings.t
> > The test failure requires PERL_UNICODE="", and uncovers a warning which > was missing a UTF8fARG(). A little more poking around uncovered a > couple more as well; patch attached. The fact that the wide character > is reported "in print" and not "in warn" is likely its own bug.
Ah, I always forget the PERL_UNICODE="" bit. Thanks, applied as v5.21.10-49-gb59c097. -- But Pity stayed his hand. "It's a pity I've run out of bullets", he thought. -- "Bored of the Rings"
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 897b
On Tue Mar 31 01:51:07 2015, davem wrote: Show quoted text
> On Tue, Mar 31, 2015 at 04:15:48AM -0400, Alex Vandiver wrote:
> > On Mon, 30 Mar 2015 19:38:24 +0100 Nicholas Clark <nick@ccl4.org> > > wrote:
> > > I can consistently see the failures under t/harness on dromedary > > > with > > > LC_ALL=en_US.utf8 > > > > > > I ran a bisect as: > > > > > > LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start > > > v5.21.10 --target lib/warnings.t
> > > > The test failure requires PERL_UNICODE="", and uncovers a warning > > which > > was missing a UTF8fARG(). A little more poking around uncovered a > > couple more as well; patch attached. The fact that the wide > > character > > is reported "in print" and not "in warn" is likely its own bug.
> > Ah, I always forget the PERL_UNICODE="" bit. > > Thanks, applied as v5.21.10-49-gb59c097.
Can this ticket be closed? It's listed in perl5220delta.
From: Dave Mitchell <davem [...] iabyn.com>
To: "l.mai [...] web.de via RT" <perlbug-followup [...] perl.org>
Date: Mon, 29 Feb 2016 10:03:31 +0000
CC: perl5-porters [...] perl.org
Subject: Re: [perl #124113] Compile-time warning with UTF8 variable in array index
Download (untitled) / with headers
text/plain 1.9k
On Fri, Feb 26, 2016 at 11:03:17AM -0800, l.mai@web.de via RT wrote: Show quoted text
> On Tue Mar 31 01:51:07 2015, davem wrote:
> > On Tue, Mar 31, 2015 at 04:15:48AM -0400, Alex Vandiver wrote:
> > > On Mon, 30 Mar 2015 19:38:24 +0100 Nicholas Clark <nick@ccl4.org> > > > wrote:
> > > > I can consistently see the failures under t/harness on dromedary > > > > with > > > > LC_ALL=en_US.utf8 > > > > > > > > I ran a bisect as: > > > > > > > > LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start > > > > v5.21.10 --target lib/warnings.t
> > > > > > The test failure requires PERL_UNICODE="", and uncovers a warning > > > which > > > was missing a UTF8fARG(). A little more poking around uncovered a > > > couple more as well; patch attached. The fact that the wide > > > character > > > is reported "in print" and not "in warn" is likely its own bug.
> > > > Ah, I always forget the PERL_UNICODE="" bit. > > > > Thanks, applied as v5.21.10-49-gb59c097.
> > Can this ticket be closed? It's listed in perl5220delta.
There was one un-applied patch still in the ticket: 0004-Adjust-callsites-that-use-UTF8SKIP-without-checking-.patch Which I've just applied, as v5.23.8-35-g9538abe, so the ticket can be closed. Originally FC was reluctant to apply it since there weren't any tests and it was near 5.22.0 release. However, looking at it more closely, I don't think it fixes a bug or changes behaviour. In more detail, it has a three changes like: while (t < PL_bufend && isWORDCHAR_lazy_if(t,UTF)) - t += UTF8SKIP(t); + t += UTF ? UTF8SKIP(t) : 1; but if UTF is false, then isWORDCHAR_lazy_if() will be false for any byte >= 0x80, so UTF8SKIP wouldn't be called anyway. For bytes < 0x80, UTF8SKIP returns 1. So there's no change in behaviour. However, in terms of consistency with the rest of toke.c and for avoiding future bugs, its work applying the change anyway -- Red sky at night - gerroff my land! Red sky at morning - gerroff my land! -- old farmers' sayings #14


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org