Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splitting accented words #9952

Closed
p5pRT opened this issue Nov 7, 2009 · 5 comments
Closed

splitting accented words #9952

p5pRT opened this issue Nov 7, 2009 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 7, 2009

Migrated from rt.perl.org#70317 (status was 'resolved')

Searchable as RT70317$

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2009

From mochan@em.fis.unam.mx

Created by mochan@em.fis.unam.mx

The following code makes a list of words, one to a line, and seems to
work correctly even when the words contain accented words (my locale
is en_US.UTF-8)​:

$ echo aei aéi | perl -CS -ne 'print join "\n", split /\W/'
aei
aéi

However, the following almost identical code fails and incorrectly
splits accented words at the accented characters

$ echo aei aéi | perl -CS -ne 'print join "\n", split /[\W]/'
aei
a
i

Both fail if I remove the -CS switch and both pass if I replace \W by
\P{IsWord}. I guess that the difference between \W and [\W] is a bug.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.10.0:

Configured by mochan at Thu Aug 20 22:55:52 CDT 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.26-2-amd64, archname=x86_64-linux
    uname='linux em 2.6.26-2-amd64 #1 smp sun jul 26 20:35:48 utc 2009 x86_64 gnulinux '
    config_args='-de'
    hint=previous, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='4.3.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.7'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/local/lib/perl5/5.10.0/x86_64-linux
    /usr/local/lib/perl5/5.10.0
    /usr/local/lib/perl5/site_perl/5.10.0/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.10.0
    .


Environment for perl 5.10.0:
    HOME=/home/mochan
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/mochan/bin:/home/mochan/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2009

From jarich@perltraining.com.au

Dear Luis,

Thank you for your report.

Both fail if I remove the -CS switch and both pass if I replace \W by
\P{IsWord}. I guess that the difference between \W and [\W] is a bug.

That indeed looks like a bug.

Thanks again,

  Jacinta

@p5pRT
Copy link
Author

p5pRT commented Nov 11, 2009

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

From @khwilliamson

This has been fixed in the 5.13 series, and should be in 5.14, available
soon.
--Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2011

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant