Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'use locale' effectless? #1217

Closed
p5pRT opened this issue Feb 23, 2000 · 8 comments
Closed

'use locale' effectless? #1217

p5pRT opened this issue Feb 23, 2000 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 23, 2000

Migrated from rt.perl.org#2200 (status was 'resolved')

Searchable as RT2200$

@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2000

From gomar@md.media-web.de

Created by gomar@mindless.com

The following snippet of code (ISO8859-1 charset)​:

  use locale;
  print int('��' =~ /��/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl
5.5.650. I've tried other locale settings too, such as de_DE.ISO8859-1
etc., yet without any improvements. Since I'm not aware of any error
on my part or a misconfiguration of my system, I suppose this to be a
bug. Again, I haven't been experiencing any problems with respect to
locale settings before, neither with Perl 5.00503 and lower, nor with
other programs. (Note​: Both Perl 5.00503 and 5.5.650 are linked against
the same versions of libc and libnsl, in case that might matter.)

Perl Info


Site configuration information for perl v5.5.650:

Configured by root at Tue Feb 15 19:19:50 CET 2000.

Summary of my perl5 (revision 5.0 version 5 subversion 650) configuration:
  Platform:
    osname=linux, osvers=2.3.40, archname=i586-linux
    uname='linux c241-1 2.3.40 #1 fre jan 21 09:14:00 cet 2000 i586 unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
    usesocks=undef useperlio=undef d_sfio=undef
    use64bits=define uselargefiles=define usemultiplicity=undef
  Compiler:
    cc='cc', optimize='-O2', gccversion=2.95.2 19991024 (release)
    cppflags='-Dbool=char -DHAS_BOOL -fno-strict-aliasing -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -fno-strict-aliasing -I/usr/local/include -DUSE_LONG_LONG'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=/lib/libc-2.1.2.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.5.650:
    /home/gomar/perl
    /usr/lib/perl5/5.5.650/i586-linux
    /usr/lib/perl5/5.5.650
    /usr/lib/perl5/site_perl/5.5.650/i586-linux
    /usr/lib/perl5/site_perl/5.5.650
    /usr/lib/perl5/site_perl/5.005/i586-linux
    /usr/lib/perl5/site_perl/5.005
    /usr/lib/perl5/site_perl
    .


Environment for perl v5.5.650:
    HOME=/home/gomar
    LANG=de
    LANGUAGE (unset)
    LC_ALL=de_DE
    LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib:/usr/X11R6/lib:/usr/openwin/lib:/usr/local/kde/lib:/usr/lib/qt/lib:/usr/local/lib/gtk/themes/engines
    LOGDIR (unset)
    PATH=/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local/kde/bin:/usr/lib/java/bin/Linux/green_threads:/usr/X11R6/pbmplus:/usr/games:/usr/local/games:/usr/openwin/bin:/usr/games:/home/gomar/bin:.:/usr/bin/TeX
    PERL5LIB=/home/gomar/perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Feb 23, 2000

From @jhi

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale;
print int('�' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with Perl
5.5.650. I've tried other locale settings too, such as de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660, Digital UNIX, with various
European locales, and the fact that in 5.005_03 the bug didn't exist.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

From @smpeters

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale;
print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with
Perl
5.5.650. I've tried other locale settings too, such as
de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660, Digital UNIX, with various
European locales, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems, and I have not
been able to reproduce the problem.

steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1

I'm assuming that this has been fixed (and should be in Changes)
somewhere between 5.5.650 and 5.8.

@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

@smpeters - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Feb 6, 2006
@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

From @demerphq

On 2/6/06, Steve Peters via RT <perlbug-followup@​perl.org> wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale;
print int('�' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with
Perl
5.5.650. I've tried other locale settings too, such as
de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660, Digital UNIX, with various
European locales, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems, and I have not
been able to reproduce the problem.

steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1

I'm assuming that this has been fixed (and should be in Changes)
somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

  int("?"=~/\?/i)

where the OP had

  print int('Ã�' =~ /ü/i);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching
capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode
strings (but i may recall incorrectly). I think that if the OP
converts the expression to a unicode string then the match should be
ok.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

From @smpeters

On Mon, Feb 06, 2006 at 07​:10​:36PM +0100, demerphq wrote​:

On 2/6/06, Steve Peters via RT <perlbug-followup@​perl.org> wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale;
print int('Ü' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with
Perl
5.5.650. I've tried other locale settings too, such as
de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660, Digital UNIX, with various
European locales, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems, and I have not
been able to reproduce the problem.

steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1

I'm assuming that this has been fixed (and should be in Changes)
somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

int\("?"=~/\\?/i\)

where the OP had

print int\('Ü' =~ /ü/i\);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching
capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode
strings (but i may recall incorrectly). I think that if the OP
converts the expression to a unicode string then the match should be
ok.

Hmmm...the prints are there, just on the previous lines. I think you missed
something with my annoying line wrapping. Sorry abot that.

Steve Peters
steve@​fisharerojo.org

@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

From @demerphq

On 2/6/06, Steve Peters <steve@​fisharerojo.org> wrote​:

On Mon, Feb 06, 2006 at 07​:10​:36PM +0100, demerphq wrote​:

On 2/6/06, Steve Peters via RT <perlbug-followup@​perl.org> wrote​:

[RT_System - Wed Feb 23 00​:21​:03 2000]​:

gomar@​md.media-web.de writes​:

The following snippet of code (ISO8859-1 charset)​:

use locale;
print int('�' =~ /ü/i);

together with appropriate environment settings for LC_ALL (=de_DE) and
LANG (=de) prints 1 (expected result) with Perl 5.00503 and 0 with
Perl
5.5.650. I've tried other locale settings too, such as
de_DE.ISO8859-1

I confirm that the bug exists in 5.5.660, Digital UNIX, with various
European locales, and the fact that in 5.005_03 the bug didn't exist.

With various Perl 5.8's on several operating systems, and I have not
been able to reproduce the problem.

steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_ZW.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_GB.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1
steve@​kirk​:~/smoke/smoke_cfg$ LC_ALL=en_DK.utf8 perl -Mlocale -wle'print
int("?" =~ /\?/i)'
1

I'm assuming that this has been fixed (and should be in Changes)
somewhere between 5.5.650 and 5.8.

I wonder.... Your sample code appears different from the OP's.

You have

int\("?"=~/\\?/i\)

where the OP had

print int\('�' =~ /ü/i\);

The latter could be said verbosly as

"the integer value of the return of case insensitively matching
capital U umlaut with lowercase u umlaut"

As far as I recall perl doesn't do local based matching on non unicode
strings (but i may recall incorrectly). I think that if the OP
converts the expression to a unicode string then the match should be
ok.

Hmmm...the prints are there, just on the previous lines. I think you missed
something with my annoying line wrapping. Sorry abot that.

No, the issue isnt the missing prints. Its that you are matching ?
against ? whereas the OP is matching CAPITAL-U-WITH-UMLAUT against
LOWER-U-WITH-UMLAUT (not sure if those are the 'real' names for these
letters.) U with umlaut looks like a U with two dots above it.

It could be my reader of course, but the OP's code and your code do
not render the same, so i suspect its your reader making them look the
same and not mine.

IOW, do the two lines following look the same or different?

int('�' =~ /\ü/i)
int("?"=~/\?/i)

If they look different then ignore me, if not then I suspect your
email client is lying to you.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 6, 2006

From @demerphq

On 2/6/06, demerphq <demerphq@​gmail.com> wrote​:

IOW, do the two lines following look the same or different?

int('�' =~ /\ü/i)
int("?"=~/\?/i)

If they look different then ignore me, if not then I suspect your
email client is lying to you.

Ignore the quotes on the left hand side when you are seeing if they
are different.

Or just look at this instead​:

int('�' =~ /\ü/i)
int('?'=~/\?/i)

(Sorry about the quotes confusing things :-)

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant