Skip Menu |
Report information
Id: 132272
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: mauke- <l.mai [at] web.de>
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: core
Perl Version: 5.26.0
Fixed In: (no value)



To: perlbug [...] perl.org
Subject: qw() differs from split " " with OGHAM SPACE MARK
From: l.mai [...] web.de
Date: Wed, 11 Oct 2017 22:54:43 +0200
Download (untitled) / with headers
text/plain 3.2k
This is a bug report for perl from l.mai@web.de, generated with the help of perlbug 1.40 running under perl 5.26.0. ----------------------------------------------------------------- [Please describe your issue here] $ perl -wE 'use utf8; say for qw(foo bar baz)' foo bar baz $ perl -wE 'use utf8; say for split " ", q(foo bar baz)' foo bar baz I think qw() should behave like split " " and consider U+1680 OGHAM SPACE MARK to be whitespace. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.26.0: Configured by mauke at Fri Sep 22 13:28:36 CEST 2017. Summary of my perl5 (revision 5 version 26 subversion 0) configuration: Platform: osname=linux osvers=4.9.41-1-lts archname=i686-linux uname='linux simplicio 4.9.41-1-lts #1 smp mon aug 7 17:57:02 cest 2017 i686 gnulinux ' config_args='' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2 -march=native' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='7.2.0' gccosandvers='' intsize=4 longsize=4 ptrsize=4 doublesize=8 byteorder=1234 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=12 longdblkind=3 ivtype='long' ivsize=4 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=4 prototype=define Linker and Libraries: ld='cc' ldflags ='-fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/7.2.0/include-fixed /usr/lib /lib libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.26.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.26' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -march=native -L/usr/local/lib -fstack-protector-strong' --- @INC for perl 5.26.0: /home/mauke/usr/lib/perl5/site_perl/5.26.0/i686-linux /home/mauke/usr/lib/perl5/site_perl/5.26.0 /home/mauke/usr/lib/perl5/5.26.0/i686-linux /home/mauke/usr/lib/perl5/5.26.0 --- Environment for perl 5.26.0: HOME=/home/mauke LANG=en_US.UTF-8 LANGUAGE=en_US LC_COLLATE=C LC_MONETARY=de_DE.UTF-8 LC_TIME=de_DE.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl PERLBREW_BASHRC_VERSION=0.73 PERLBREW_HOME=/home/mauke/.perlbrew PERLBREW_ROOT=/home/mauke/perl5/perlbrew PERL_BADLANG (unset) PERL_UNICODE=SAL SHELL=/bin/bash
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 195b
This is likely due to a change in properties in this character in a recent Unicode version. I can look into it, but it's low priority because Ogham is used mainly by scholars -- Karl Williamson
To: perl5-porters [...] perl.org
Date: Thu, 2 Nov 2017 19:38:46 +0000
Subject: Re: [perl #132272] qw() differs from split " " with OGHAM SPACE MARK
From: Zefram <zefram [...] fysh.org>
Download (untitled) / with headers
text/plain 857b
l.mai@web.de wrote: Show quoted text
>I think qw() should behave like split " " and consider U+1680 OGHAM SPACE MARK >to be whitespace.
This is not specific to Ogham. split and qw also differ in the treatment of U+a0 "no-break space", U+200a "hair space", and so on. It's also not specific to qw. U+a0, U+1680, et al, are also not treated as whitespace by the main tokeniser: $ perl5.27.5 -lwe $'use utf8; print\xc2\xa0123' Unrecognized character \x{a0}; marked by <-- HERE after tf8; print<-- HERE near column 16 at -e line 1. $ perl5.27.5 -lwe $'use utf8; print\xe1\x9a\x80123' Unrecognized character \x{1680}; marked by <-- HERE after tf8; print<-- HERE near column 16 at -e line 1. Since qw is consistent with primary tokenisation, I think this is not a behavioural bug, but a documentation bug. Fixed in commit 5a9c3bf448373ee0812df85d750f6234ee11c9c4. -zefram
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 456b
On Thu, 02 Nov 2017 12:39:06 -0700, zefram@fysh.org wrote: Show quoted text
> l.mai@web.de wrote:
> > I think qw() should behave like split " " and consider U+1680 OGHAM > > SPACE MARK > > to be whitespace.
> > [...] > Since qw is consistent with primary tokenisation, I think this is > not a behavioural bug, but a documentation bug. Fixed in commit > 5a9c3bf448373ee0812df85d750f6234ee11c9c4.
I will be resolving this issue within 7 days unless there is an objection.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org