Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qw() differs from split " " with OGHAM SPACE MARK #16182

Closed
p5pRT opened this issue Oct 11, 2017 · 6 comments
Closed

qw() differs from split " " with OGHAM SPACE MARK #16182

p5pRT opened this issue Oct 11, 2017 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 11, 2017

Migrated from rt.perl.org#132272 (status was 'resolved')

Searchable as RT132272$

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2017

From @mauke

Created by @mauke

$ perl -wE 'use utf8; say for qw(foo bar baz)'
foo bar baz

$ perl -wE 'use utf8; say for split " ", q(foo bar baz)'
foo
bar
baz

I think qw() should behave like split " " and consider U+1680 OGHAM SPACE MARK
to be whitespace.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.26.0:

Configured by mauke at Fri Sep 22 13:28:36 CEST 2017.

Summary of my perl5 (revision 5 version 26 subversion 0) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.41-1-lts
    archname=i686-linux
    uname='linux simplicio 4.9.41-1-lts #1 smp mon aug 7 17:57:02 cest 2017 i686 gnulinux '
    config_args=''
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=undef
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -march=native'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='7.2.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=4
    doublesize=8
    byteorder=1234
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=12
    longdblkind=3
    ivtype='long'
    ivsize=4
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=4
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags ='-fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/i686-pc-linux-gnu/7.2.0/include-fixed /usr/lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.26.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.26'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -march=native -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.26.0:
    /home/mauke/usr/lib/perl5/site_perl/5.26.0/i686-linux
    /home/mauke/usr/lib/perl5/site_perl/5.26.0
    /home/mauke/usr/lib/perl5/5.26.0/i686-linux
    /home/mauke/usr/lib/perl5/5.26.0


Environment for perl 5.26.0:
    HOME=/home/mauke
    LANG=en_US.UTF-8
    LANGUAGE=en_US
    LC_COLLATE=C
    LC_MONETARY=de_DE.UTF-8
    LC_TIME=de_DE.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mauke/perl5/perlbrew/bin:/home/mauke/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_BASHRC_VERSION=0.73
    PERLBREW_HOME=/home/mauke/.perlbrew
    PERLBREW_ROOT=/home/mauke/perl5/perlbrew
    PERL_BADLANG (unset)
    PERL_UNICODE=SAL
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 21, 2017

From @khwilliamson

This is likely due to a change in properties in this character in a recent Unicode version. I can look into it, but it's low priority because Ogham is used mainly by scholars
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Oct 21, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 2, 2017

From zefram@fysh.org

l.mai@​web.de wrote​:

I think qw() should behave like split " " and consider U+1680 OGHAM SPACE MARK
to be whitespace.

This is not specific to Ogham. split and qw also differ in the treatment
of U+a0 "no-break space", U+200a "hair space", and so on. It's also not
specific to qw. U+a0, U+1680, et al, are also not treated as whitespace
by the main tokeniser​:

$ perl5.27.5 -lwe $'use utf8; print\xc2\xa0123'
Unrecognized character \x{a0}; marked by <-- HERE after tf8; print<-- HERE near column 16 at -e line 1.
$ perl5.27.5 -lwe $'use utf8; print\xe1\x9a\x80123'
Unrecognized character \x{1680}; marked by <-- HERE after tf8; print<-- HERE near column 16 at -e line 1.

Since qw is consistent with primary tokenisation, I think this is
not a behavioural bug, but a documentation bug. Fixed in commit
5a9c3bf.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2017

From @xsawyerx

On Thu, 02 Nov 2017 12​:39​:06 -0700, zefram@​fysh.org wrote​:

l.mai@​web.de wrote​:

I think qw() should behave like split " " and consider U+1680 OGHAM
SPACE MARK
to be whitespace.

[...]
Since qw is consistent with primary tokenisation, I think this is
not a behavioural bug, but a documentation bug. Fixed in commit
5a9c3bf.

I will be resolving this issue within 7 days unless there is an objection.

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2017

@xsawyerx - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant