broken Locale::Language in a UTF environment #5943

p5pRT · 2002-09-19T14:42:57Z

Migrated from rt.perl.org#17439 (status was 'resolved')

Searchable as RT17439$

p5pRT · 2002-09-19T14:42:58Z

From marty@kasei.com

Created by marty@kasei.com

This is a bug report for perl from marty@kasei.com,
generated with the help of perlbug 1.34 running under perl v5.8.0.

-----------------------------------------------------------------
When running in a UTF environment, Locale::Language doesn't load:

LANG=en_GB.utf8 perl -we 'use Locale::Language'

Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 115, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 109.
Malformed UTF-8 character (unexpected non-continuation byte 0x6c, immediately after start byte 0xe5) in lc at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 115, <DATA> line 178.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 178.
Malformed UTF-8 character (unexpected non-continuation byte 0x6b, immediately after start byte 0xfc) in lc at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 178.

The fix:

Inline Patch

--- lib/Locale/Language.pm.orig 2002-09-19 15:17:16.000000000 +0200
+++ lib/Locale/Language.pm      2002-09-19 15:17:41.000000000 +0200
@@ -231,7 +231,7 @@
 my:Burmese
 
 na:Nauru
-nb:Norwegian Bokmål
+nb:Norwegian Bokmal
 nd:Ndebele, North
 ne:Nepali
 ng:Ndonga
@@ -300,7 +300,7 @@
 uz:Uzbek
 
 vi:Vietnamese
-vo:Volapük
+vo:Volapuk
 
 wo:Wolof

Perl Info


Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.0:

Configured by Debian Project at Sat Sep 14 18:17:32 UTC 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.19, archname=i386-linux-thread-multi
    uname='linux cyberhq 2.4.19 #1 smp sun aug 4 11:30:45 pdt 2002 i686 unknown unknown gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8.0 -Darchlib=/usr/lib/perl/5.8.0 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.0 -Dsitearch=/usr/local/lib/perl/5.8.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.0 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.4 20011002 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=true, libperl=libperl.so.5.8.0
    gnulibc_version='2.2.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /home/marty/Perl
    /etc/perl
    /usr/local/lib/perl/5.8.0
    /usr/local/share/perl/5.8.0
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8.0
    /usr/share/perl/5.8.0
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.0:
    HOME=/home/marty
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/marty/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:/usr/local/sbin
    PERLLIB=/home/marty/Perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash

p5pRT · 2002-09-20T08:52:36Z

From marty+p5p@kasei.com

I should have added a better explanation to this bug report and the
proposed fix, so here goes:

The DATA is Locale::Language contains 2 Latin-1 characters.
When Perl 5.8 is running in a UTF8 locale it expects the DATA to be UTF8
so it dies when it finds a malformed character.

Adding 'use bytes' to Locale::Language would stop the death, but the
included non-ASCII characters don't work properly on non-Latin1 systems.
So I think it is better to replace the 2 problem characters with ASCII.

Here's my suggested patch. I've tried to ensure I've included the
actual Latin1 characters in this email, but as I don't use a Latin1
system they will probably be converted when I send this: sorry.

Inline Patch

--- lib/Locale/Language.pm.orig 2002-09-19 15:17:16.000000000 +0200
+++ lib/Locale/Language.pm      2002-09-19 15:17:41.000000000 +0200
@@ -231,7 +231,7 @@
 my:Burmese
 
 na:Nauru
-nb:Norwegian Bokm?l
+nb:Norwegian Bokmal
 nd:Ndebele, North
 ne:Nepali
 ng:Ndonga
@@ -300,7 +300,7 @@
 uz:Uzbek
 
 vi:Vietnamese
-vo:Volap?k
+vo:Volapuk
 
 wo:Wolof

--- ./lib/Locale/Codes/t/languages.t.orig 2002-09-19 15:17:16.000000000 +0200
+++ ./lib/Locale/Codes/t/languages.t 2002-09-19 15:17:16.000000000 +0200
@@ -47,7 +47,7 @@
  'code2language("nd") eq "Ndebele, North"',
  'code2language("ng") eq "Ndonga"',
  'code2language("nn") eq "Norwegian Nynorsk"',
- 'code2language("nb") eq "Norwegian Bokm?l"',
+ 'code2language("nb") eq "Norwegian Bokmal"',
  'code2language("ny") eq "Chichewa; Nyanja"',
  'code2language("oc") eq "Occitan (post 1500)"',
  'code2language("os") eq "Ossetian; Ossetic"',

--

Marty

p5pRT · 2002-09-20T15:22:20Z

From nick.ing-simmons@elixent.com

Marty Pauley <marty+p5p@kasei.com> writes:

I should have added a better explanation to this bug report and the
proposed fix, so here goes:

The DATA is Locale::Language contains 2 Latin-1 characters.
When Perl 5.8 is running in a UTF8 locale it expects the DATA to be UTF8
so it dies when it finds a malformed character.

Adding 'use bytes' to Locale::Language would stop the death, but the
included non-ASCII characters don't work properly on non-Latin1 systems.
So I think it is better to replace the 2 problem characters with ASCII.

Why not \x{00xx} escape ? - would be more robust for patching as well.
As mailers (including mine) are variously mangling these diffs.

Here's my suggested patch. I've tried to ensure I've included the
actual Latin1 characters in this email, but as I don't use a Latin1
system they will probably be converted when I send this: sorry.

--- lib/Locale/Language.pm.orig 2002-09-19 15:17:16.000000000 +0200
+++ lib/Locale/Language.pm 2002-09-19 15:17:41.000000000 +0200
@@ -231,7 +231,7 @@
my:Burmese

na:Nauru
-nb:Norwegian Bokm?l
+nb:Norwegian Bokmal
nd:Ndebele, North
ne:Nepali
ng:Ndonga
@@ -300,7 +300,7 @@
uz:Uzbek

vi:Vietnamese
-vo:Volap?k
+vo:Volapuk

wo:Wolof

--- ./lib/Locale/Codes/t/languages.t.orig 2002-09-19 15:17:16.000000000 +0200
+++ ./lib/Locale/Codes/t/languages.t 2002-09-19 15:17:16.000000000 +0200
@@ -47,7 +47,7 @@
'code2language("nd") eq "Ndebele, North"',
'code2language("ng") eq "Ndonga"',
'code2language("nn") eq "Norwegian Nynorsk"',
- 'code2language("nb") eq "Norwegian Bokm?l"',
+ 'code2language("nb") eq "Norwegian Bokmal"',
'code2language("ny") eq "Chichewa; Nyanja"',
'code2language("oc") eq "Occitan (post 1500)"',
'code2language("os") eq "Ossetian; Ossetic"',
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

p5pRT · 2002-09-23T12:00:29Z

From marty+p5p@kasei.com

On Fri Sep 20 16:22:02 2002, Nick Ing-Simmons wrote:

Marty Pauley <marty+p5p@kasei.com> writes:

Adding 'use bytes' to Locale::Language would stop the death, but the
included non-ASCII characters don't work properly on non-Latin1 systems.
So I think it is better to replace the 2 problem characters with ASCII.

Why not \x{00xx} escape ? - would be more robust for patching as well.
As mailers (including mine) are variously mangling these diffs.

The Language.pm file contains the Latin1 characters in the DATA section
so I can't use the escape sequences there. But the other reason was
more important to me: the Latin1 characters cause bad things to happen
when used in a non-Latin1 environment; in EUC-JP for example, they
either don't display at all, or they merge with the next character and
display some obscure kanji.

--
Marty

p5pRT · 2002-09-26T10:50:19Z

From @hvds

Marty Pauley <marty+p5p@kasei.com> wrote:
:Here's my suggested patch. I've tried to ensure I've included the
:actual Latin1 characters in this email, but as I don't use a Latin1
:system they will probably be converted when I send this: sorry.

Thanks, applied as change #17927.

Sending the patch as an attachment, either instead of or as well as
the inline version, is usually the best way to ensure the integrity
of the patch when you are unsure what your mailer will do to it.

Hugo

p5pRT · 2003-05-09T20:32:30Z

@cwest - Status changed from 'new' to 'resolved'

p5pRT closed this as completed May 9, 2003

p5pRT added Severity Medium distro-Linux type-core labels Oct 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broken Locale::Language in a UTF environment #5943

broken Locale::Language in a UTF environment #5943

p5pRT commented Sep 19, 2002

p5pRT commented Sep 19, 2002

p5pRT commented Sep 20, 2002

p5pRT commented Sep 20, 2002

p5pRT commented Sep 23, 2002

p5pRT commented Sep 26, 2002

p5pRT commented May 9, 2003

broken Locale::Language in a UTF environment #5943

broken Locale::Language in a UTF environment #5943

Comments

p5pRT commented Sep 19, 2002

p5pRT commented Sep 19, 2002

From marty@kasei.com

Created by marty@kasei.com

p5pRT commented Sep 20, 2002

From marty+p5p@kasei.com

p5pRT commented Sep 20, 2002

From nick.ing-simmons@elixent.com

p5pRT commented Sep 23, 2002

From marty+p5p@kasei.com

p5pRT commented Sep 26, 2002

From @hvds

p5pRT commented May 9, 2003