Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uppercase & Lowercase is not working on Turkish Characters #8078

Closed
p5pRT opened this issue Aug 19, 2005 · 9 comments
Closed

Uppercase & Lowercase is not working on Turkish Characters #8078

p5pRT opened this issue Aug 19, 2005 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 19, 2005

Migrated from rt.perl.org#36953 (status was 'rejected')

Searchable as RT36953$

@p5pRT
Copy link
Author

p5pRT commented Aug 19, 2005

From ismail@uludag.org.tr

Created by ismail@uludag.org.tr

Following perl program :

===========================================
use POSIX;

setlocale(LC_CTYPE, "tr_TR.UTF-8");
printf("%s %s\n", uc("ab�ıi"),lc("AB�Iİ"));

outputs :

AB�ıI ab�iİ

but it should output​:

AB�Iİ ab�iı

Because in turkish locale uppercase('i') = İ where as lowercase('I') = ı .

LC_ALL set to tr_TR.UTF-8 .

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.5:

Configured by Gentoo at Fri May 13 15:15:00 EEST 2005.

Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
  Platform:
    osname=linux, osvers=2.6.10-uludag, archname=i686-linux
    uname='linux paketler.uludag.org.tr 2.6.10-uludag #1 wed jan 26 16:42:26 eet 2005 i686 intel(r) pentium(r) 4 cpu 2.60ghz genuineintel gnulinux '
    config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth=  -Doptimize=-mcpu=i686 -O2 -pipe -fomit-frame-pointer -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/var/tmp/portage/perl-5.8.5-r5/image//usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux  -Dcf_by=Gentoo -Ud_csh -Di_ndbm -Di_gdbm -Di_db'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-mcpu=i686 -O2 -pipe -fomit-frame-pointer',
    cppflags='-DPERL5 -fno-strict-aliasing -pipe'
    ccversion='', gccversion='3.3.5-20050130 (Pardus Linux 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.4.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.5:
    /etc/perl
    /usr/lib/perl5/site_perl/5.8.5/i686-linux
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl/5.8.4
    /usr/lib/perl5/site_perl/5.8.4/i686-linux
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.5/i686-linux
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/5.8.5/i686-linux
    /usr/lib/perl5/5.8.5
    /usr/local/lib/site_perl
    /usr/lib/perl5/site_perl/5.8.4
    /usr/lib/perl5/site_perl/5.8.4/i686-linux
    .


Environment for perl v5.8.5:
    HOME=/home/cartman
    LANG=tr_TR.UTF-8
    LANGUAGE (unset)
    LC_ALL=tr_TR.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/lib/ccache/bin:/home/cartman/SVN/kdenonbeta/unsermake:/usr/kde/3.4/bin:/usr/lib/ccache/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.3.5-20050130:/opt/blackdown-jdk-1.4.2.01/bin:/opt/blackdown-jdk-1.4.2.01/jre/bin:/usr/qt/3/bin:/usr/kde/3.4/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 19, 2005

From @iabyn

On Fri, Aug 19, 2005 at 04​:49​:28AM -0700, ismail @​ uludag. org. tr wrote​:

===========================================
use POSIX;

setlocale(LC_CTYPE, "tr_TR.UTF-8");
printf("%s %s\n", uc("ab�ıi"),lc("AB�Iİ"));

outputs :

AB�ıI ab�iİ

but it should output​:

AB�Iİ ab�iı

If you include literal utf8 characters in your program's source file, then
you need to tell perl that it's utf8. Aslo, if you're printing utf8
characters to STDOUT, you need to tell perl that STDOUT should be utf8​:

  use POSIX;

  setlocale(LC_CTYPE, "tr_TR.UTF-8");
  use utf8;
  binmode STDOUT, '​:utf8';
  printf("%s %s\n", uc("abÄ�ıi"),lc("ABÄ�IÄ°"));

$ perl585 /tmp/x1
AB�II ab�ii�
$

--
"The GPL violates the U.S. Constitution, together with copyright,
antitrust and export control laws"
  -- SCO smoking crack again.

@p5pRT
Copy link
Author

p5pRT commented Aug 19, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 20, 2005

From guest@guest.guest.xxxxxxxx

[davem@​iabyn.com - Fri Aug 19 13​:19​:45 2005]​:

On Fri, Aug 19, 2005 at 04​:49​:28AM -0700, ismail @​ uludag. org. tr wrote​:

===========================================
use POSIX;

setlocale(LC_CTYPE, "tr_TR.UTF-8");
printf("%s %s\n", uc("ab�ıi"),lc("AB�Iİ"));

outputs :

AB�ıI ab�iİ

but it should output​:

AB�Iİ ab�iı

If you include literal utf8 characters in your program's source file, then
you need to tell perl that it's utf8. Aslo, if you're printing utf8
characters to STDOUT, you need to tell perl that STDOUT should be utf8​:

use POSIX;

setlocale\(LC\_CTYPE\, "tr\_TR\.UTF\-8"\);
use utf8;
binmode STDOUT\, '​:utf8';
printf\("%s %s\\n"\, uc\("ab�ıi"\)\,lc\("AB�Iİ"\)\);

$ perl585 /tmp/x1
AB�II ab�ii�
$

Well output is still wrong. It should be "AB�Iİ ab�iı". So there is
still a problem.

Regards,
ismail

@p5pRT
Copy link
Author

p5pRT commented Aug 20, 2005

From ismail@uludag.org.tr

On Fri, Aug 19, 2005 at 01​:19​:45PM -0700, Dave Mitchell via RT wrote​:

On Fri, Aug 19, 2005 at 04​:49​:28AM -0700, ismail @​ uludag. org. tr wrote​:

===========================================
use POSIX;

setlocale(LC_CTYPE, "tr_TR.UTF-8");
printf("%s %s\n", uc("ab�ıi"),lc("AB�Iİ"));

outputs :

AB�ıI ab�iİ

but it should output​:

AB�Iİ ab�iı

If you include literal utf8 characters in your program's source file, then
you need to tell perl that it's utf8. Aslo, if you're printing utf8
characters to STDOUT, you need to tell perl that STDOUT should be utf8​:

use POSIX;

setlocale\(LC\_CTYPE\, "tr\_TR\.UTF\-8"\);
use utf8;
binmode STDOUT\, '​:utf8';
printf\("%s %s\\n"\, uc\("ab�ıi"\)\,lc\("AB�Iİ"\)\);

$ perl585 /tmp/x1
AB�II ab�ii�
$
Output is still wrong. It should be "AB�Iİ ab�iı".

Regards,
ismail

--
Biggest lies ever told :

1. The check is in the mail
2. Don't worry, I won't come in your mouth
3. We're from the government and we're here to help
4. This patent is only for defense

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2005

From guest@guest.guest.xxxxxxxx

Is there any update on this? Maybe someone can tell me where to look to
fix this problem.

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2005

From shouldbedomo@mac.com

On 2005âÂ�Â�09âÂ�Â�30, at 14​:57, Guest via RT wrote​:

Is there any update on this? Maybe someone can tell me where to
look to
fix this problem.

Following the thread of mails for this bug report shows you can get

AB��I ab��ii��

which is correct, except that you want to lower-case the final LATIN
CAPITAL LETTER I WITH DOT ABOVE in your input string to LATIN SMALL
LETTER DOTLESS I. Although I make no claim whatever to being an
expert, this mapping is problematic​: see <http​://www.unicode.org/
Public/UNIDATA/CaseFolding.txt> where this issue has a special case
all of its own. In particular, the document says

# The mappings with status T [special case for uppercase I and
dotted uppercase I] can be used or omitted depending on the desired
case-folding
# behavior. (The default option is to exclude them.)

Perl -- directed by the locale supplied by your system -- seems to be
excluding this case, but instead implementing the full case-folding
specified in the document by delivering LATIN SMALL LETTER I followed
by COMBINING DOT ABOVE.

I can think of two ways that you can get the case-folding that you want​:

1. Find or create a locale definition that does case conversion the
way you want it. I fear that your Linux system probably does not have
one (but see what locale -a | grep tr throws up). I have not been
able to find such a definition on the Internet, but then I was using
English keywords for the search -- using Turkish might be more
rewarding.

2. Have perl do full case folding, then fix up the special cases with
regex substitutions.

Better ideas, anybody?
--
Dominic Dunlop

@p5pRT
Copy link
Author

p5pRT commented May 23, 2008

p5p@spam.wizbit.be - Status changed from 'open' to 'rejected'

@p5pRT p5pRT closed this as completed May 23, 2008
@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2012

From @khwilliamson

There is now a module in CPAN, Unicode​::Casing, which includes as part
of its tests, a full implementation of Turkish upper/lower casing.
Unicode screwed up in its rules for this, and as a result Turkish (and
Azeri) require locale-dependent rules, which Perl's core is not about to
furnish. But the CPAN module allows one to adopt the correct rules.

.--
Karl Williamson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant