New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use locale;" breaks \w on matching c-cedilla, o-diaeresis and u-diaeresis under tr_TR.utf8 and de_DE.utf8 locales #9410
Comments
From pva@gentoo.orgCreated by pva@gentoo.orgIn linux (tried Gentoo and Debian) \w and [:alnum:] matches does not work $ cat test-file $ perl -e 'use locale; open(IN, "< test-file"); while(<IN>) { print if /\w/; }' You see, only strings with English letters are matched and none with Russian $ locale -a Also I've tried with cp1251 locale. Converted test-file with iconv into cp1251 perl -e 'use locale; This did not matched anything non ASCII too. This is bug filed with perl-5.10.0 Strange thing is that in FreeBSD this works as it should. Perl Info
|
From @druud62Peter schreef:
s/uft/utf/ -- "Gewoon is een tijger." |
The RT System itself - Status changed from 'new' to 'open' |
From p5p@spam.wizbit.beOn Fri Jul 11 00:07:32 2008, pva wrote:
Can you send the test-file as an attachment? Kind regards, Bram |
From @jmdhCreated by @jmdhThis is a bug report for perl from dom@earth.li, From <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=529305>: ---------------- #/usr/bin/perl use locale; print "$_ is " . ( /\w/ ? "" : "not " ) . "a word character\n" The output is Locale is tr_TR.utf8 Looking (with my uneducated eyes) in /usr/share/i18n/locales/tr_TR it seems This is reproducible with 8b3945e Perl Info
|
From @jmdhOn Sun Apr 28 10:22:47 2013, dom wrote:
This might be the same as #56820, but I'm not sure. |
@jmdh - Status changed from 'new' to 'open' |
From @khwilliamsonOn 04/28/2013 11:22 AM, Dominic Hargreaves (via RT) wrote:
I tracked this down, and it appears to me to be a bug in the C library I'm doing some surmisal here. What I think is going on is that under a To get whether a character above ASCII is an alnum, one must use Perl assumes that isalnum() will work properly on any character whose It would probably be a lot of work for Perl to change to also use the C That would fix this bug as a side effect, and is quite easy to implement. The objections to last year's proposal all seem to me to stem from |
From @khwilliamson#include <stdio.h>
#include <ctype.h>
#include <locale.h>
#include <wctype.h>
int
main(int argc, char** argv)
{
int i;
if (setlocale(LC_ALL, "de_DE.utf8")) {
printf("Locale is %s\n", setlocale(LC_ALL, NULL));
for (i = 0; i < 256; i++) {
if (iswpunct(i) != ispunct(i)) {
printf("\\x%02X wpunct and punct differ\n", i);
}
}
}
} |
From @khwilliamsonAlso, starting in 5.16, there is a work-around available for this use locale ':not_characters'; and use any of several I/O methods mentioned in that doc which convert |
From @khwilliamsonOn Sat May 04 19:51:08 2013, khw wrote:
-- |
From @khwilliamsonFixed by commit |
@khwilliamson - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#56820 (status was 'resolved')
Searchable as RT56820$
The text was updated successfully, but these errors were encountered: