Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] I18N::Langinfo doesn't set UTF8 flag #15131

Closed
p5pRT opened this issue Jan 16, 2016 · 7 comments
Closed

[PATCH] I18N::Langinfo doesn't set UTF8 flag #15131

p5pRT opened this issue Jan 16, 2016 · 7 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Jan 16, 2016

Migrated from rt.perl.org#127288 (status was 'resolved')

Searchable as RT127288$

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2016

From @ntyni

This is a bug report for perl from Niko Tyni <ntyni@​debian.org>,
generated with the help of perlbug 1.40 running under perl 5.23.7.

I18N​::Langinfo​::langinf() can return UTF-8 strings but doesn't set
the UTF8 flag on them.

LC_ALL=fr_FR.UTF-8 perl -MDevel​::Peek -MI18N​::Langinfo=langinfo,MON_12 -e 'Dump langinfo(MON_12())'
SV = PV(0x1173b20) at 0x1172f30
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK)
  PV = 0x118e960 "d\303\251cembre"\0
  CUR = 9
  LEN = 11

The attached somewhat clumsy set of two patches fixes this, but perhaps
it's time to make something like __is_cur_LC_category_utf8() more visible?

(This was prompted by Time-Format test suite starting to fail on Perl
5.22 in non-English locales, presumably since commit 9717af6
which fixed POSIX​::strftime() in a similar way. The Time-Format test
suite compares POSIX​::strftime() and I18N​::Langinfo results. See
https://bugs.debian.org/811104 )


Flags​:
  category=library
  severity=low
  Type=Patch
  PatchStatus=HasPatch
  module=I18N​::Langinfo


Site configuration information for perl 5.23.7​:

Configured by niko at Sat Jan 16 13​:32​:56 EET 2016.

Summary of my perl5 (revision 5 version 23 subversion 7) configuration​:
  Local Commit​: 67271c83612f3ab129c8326d07ca55104a2f23f8
  Ancestor​: dff8a39
  Platform​:
  osname=linux, osvers=4.3.0-1-amd64, archname=x86_64-linux
  uname='linux estella 4.3.0-1-amd64 #1 smp debian 4.3.3-2 (2015-12-17) x86_64 gnulinux '
  config_args='-Dusedevel -des'
  hint=previous, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
  optimize='-O2',
  cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
  ccversion='', gccversion='5.3.1 20151219', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib
  libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
  perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
  libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.21'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Locally applied patches​:
  b8ea0feef314dd5432298db82d9f2b8afed1442f
  67271c83612f3ab129c8326d07ca55104a2f23f8


@​INC for perl 5.23.7​:
  lib
  /usr/local/lib/perl5/site_perl/5.23.7/x86_64-linux
  /usr/local/lib/perl5/site_perl/5.23.7
  /usr/local/lib/perl5/5.23.7/x86_64-linux
  /usr/local/lib/perl5/5.23.7
  .


Environment for perl 5.23.7​:
  HOME=/home/niko
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LC_CTYPE=fi_FI.UTF-8
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/home/niko/bin​:/home/niko/bin​:/home/niko/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/local/games​:/usr/games​:/sbin​:/usr/sbin​:/sbin​:/usr/sbin
  PERL_BADLANG (unset)
  SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2016

From @ntyni

0001-Make-__is_cur_LC_category_utf8-visible-to-I18N-Langi.patch
From b8ea0feef314dd5432298db82d9f2b8afed1442f Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Sat, 16 Jan 2016 10:41:29 +0200
Subject: [PATCH 1/2] Make __is_cur_LC_category_utf8() visible to
 I18N::Langinfo

PERL_EXT_LANGINFO is to be used by ext/I18N-Langinfo/Langinfo.xs
in a future commit.
---
 embed.fnc | 2 +-
 embed.h   | 4 ++--
 proto.h   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index effe0ad..f840ecd 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2495,7 +2495,7 @@ s	|char*	|stdize_locale	|NN char* locs
 #endif
 
 #if defined(USE_LOCALE) \
-    && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+    && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
 ApM	|bool	|_is_cur_LC_category_utf8|int category
 #	ifdef DEBUGGING
 AMnPpR	|char *	|_setlocale_debug_string|const int category		    \
diff --git a/embed.h b/embed.h
index 73c02d2..7ce8691 100644
--- a/embed.h
+++ b/embed.h
@@ -784,7 +784,7 @@
 #if defined(DEBUGGING)
 #define pad_setsv(a,b)		Perl_pad_setsv(aTHX_ a,b)
 #define pad_sv(a)		Perl_pad_sv(aTHX_ a)
-#  if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#  if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
 #define _setlocale_debug_string	Perl__setlocale_debug_string
 #  endif
 #endif
@@ -860,7 +860,7 @@
 #define sv_dup(a,b)		Perl_sv_dup(aTHX_ a,b)
 #define sv_dup_inc(a,b)		Perl_sv_dup_inc(aTHX_ a,b)
 #endif
-#if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
 #define _is_cur_LC_category_utf8(a)	Perl__is_cur_LC_category_utf8(aTHX_ a)
 #endif
 #if defined(USE_LOCALE_COLLATE)
diff --git a/proto.h b/proto.h
index 1bbdace..8087c2b 100644
--- a/proto.h
+++ b/proto.h
@@ -3829,7 +3829,7 @@ STATIC int	S_tokereport(pTHX_ I32 rv, const YYSTYPE* lvalp);
 #define PERL_ARGS_ASSERT_TOKEREPORT	\
 	assert(lvalp)
 #  endif
-#  if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#  if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
 PERL_CALLCONV char *	Perl__setlocale_debug_string(const int category, const char* const locale, const char* const retval)
 			__attribute__warn_unused_result__
 			__attribute__pure__;
@@ -5505,7 +5505,7 @@ PERL_CALLCONV SV*	Perl_sv_dup_inc(pTHX_ const SV *const sstr, CLONE_PARAMS *cons
 	assert(param)
 
 #endif
-#if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX))
+#if defined(USE_LOCALE)     && (defined(PERL_IN_LOCALE_C) || defined (PERL_EXT_POSIX) || defined(PERL_EXT_LANGINFO))
 PERL_CALLCONV bool	Perl__is_cur_LC_category_utf8(pTHX_ int category);
 #endif
 #if defined(USE_LOCALE) && defined(PERL_IN_LOCALE_C)
-- 
2.6.4

@p5pRT
Copy link
Author

p5pRT commented Jan 16, 2016

From @ntyni

0002-langinfo-Set-UTF-8-flag-appropriately-on-return.patch
From 67271c83612f3ab129c8326d07ca55104a2f23f8 Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Fri, 15 Jan 2016 21:20:24 +0200
Subject: [PATCH 2/2] langinfo: Set UTF-8 flag appropriately on return

nl_langinfo(3) can return non-ASCII strings in some locales.  Set the
UTF-8 flag if this is the case and LC_TIME is set to a UTF8 locale.

Largely inspired by commit 9717af6d049902fc887c.

Bug-Debian: https://bugs.debian.org/811109
---
 MANIFEST                      |  1 +
 ext/I18N-Langinfo/Langinfo.pm |  2 +-
 ext/I18N-Langinfo/Langinfo.xs | 17 ++++++++++++++++-
 ext/I18N-Langinfo/t/utf8.t    | 30 ++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 2 deletions(-)
 create mode 100644 ext/I18N-Langinfo/t/utf8.t

diff --git a/MANIFEST b/MANIFEST
index e75199d..2da10cc 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -3727,6 +3727,7 @@ ext/I18N-Langinfo/Langinfo.pm	I18N::Langinfo
 ext/I18N-Langinfo/Langinfo.xs	I18N::Langinfo
 ext/I18N-Langinfo/Makefile.PL	I18N::Langinfo
 ext/I18N-Langinfo/t/Langinfo.t	See whether I18N::Langinfo works
+ext/I18N-Langinfo/t/utf8.t	Test I18N::Langinfo utf8 handling
 ext/IPC-Open3/lib/IPC/Open2.pm	Open a two-ended pipe
 ext/IPC-Open3/lib/IPC/Open3.pm	Open a three-ended pipe
 ext/IPC-Open3/t/fd.t		See if IPC::Open3 works w/ file descriptors
diff --git a/ext/I18N-Langinfo/Langinfo.pm b/ext/I18N-Langinfo/Langinfo.pm
index 033d8de..e922035 100644
--- a/ext/I18N-Langinfo/Langinfo.pm
+++ b/ext/I18N-Langinfo/Langinfo.pm
@@ -72,7 +72,7 @@ our @EXPORT_OK = qw(
 	YESSTR
 );
 
-our $VERSION = '0.13';
+our $VERSION = '0.14';
 
 XSLoader::load();
 
diff --git a/ext/I18N-Langinfo/Langinfo.xs b/ext/I18N-Langinfo/Langinfo.xs
index 582b7fa..3910d6b 100644
--- a/ext/I18N-Langinfo/Langinfo.xs
+++ b/ext/I18N-Langinfo/Langinfo.xs
@@ -1,5 +1,8 @@
 #define PERL_NO_GET_CONTEXT
 
+/* for _is_cur_LC_category_utf8() */
+#define PERL_EXT_LANGINFO
+
 #include "EXTERN.h"
 #include "perl.h"
 #include "XSUB.h"
@@ -23,11 +26,23 @@ langinfo(code)
   PROTOTYPE: _
   CODE:
 #ifdef HAS_NL_LANGINFO
+        char *buf;
+        STRLEN len;
+        SV *sv;
 	if (code < 0) {
 	    SETERRNO(EINVAL, LIB_INVARG);
 	    RETVAL = &PL_sv_undef;
 	} else {
-            RETVAL = newSVpv(nl_langinfo(code), 0);
+            buf = nl_langinfo(code);
+            len = strlen(buf);
+            sv = newSVpv(buf, 0);
+            if (! is_ascii_string((U8*) buf, len)
+                && is_utf8_string((U8*) buf, len)
+                && _is_cur_LC_category_utf8(LC_TIME))
+            {
+                SvUTF8_on(sv);
+            }
+            RETVAL = sv;
         }
 #else
 	croak("nl_langinfo() not implemented on this architecture");
diff --git a/ext/I18N-Langinfo/t/utf8.t b/ext/I18N-Langinfo/t/utf8.t
new file mode 100644
index 0000000..2773a2e
--- /dev/null
+++ b/ext/I18N-Langinfo/t/utf8.t
@@ -0,0 +1,30 @@
+#!perl -T
+use strict;
+use Config;
+use Test::More;
+
+plan skip_all => "I18N::Langinfo or POSIX unavailable"
+    if $Config{'extensions'} !~ m!\bI18N/Langinfo\b!
+    or $Config{'extensions'} !~ m!\bPOSIX\b!;
+
+plan skip_all => "setlocale unavailable"
+    if $Config{'d_setlocale'} ne 'define';
+
+require POSIX;
+
+plan skip_all => "fr_FR.UTF-8 locale unavailable"
+    if !POSIX::setlocale(POSIX::LC_TIME(), 'fr_FR.UTF-8');
+
+plan tests => 5;
+
+use_ok('I18N::Langinfo', 'langinfo', 'MON_12');
+
+SKIP: {
+    my $string = eval { langinfo(MON_12()) };
+    is( $@, '', "calling langinfo() with MON_12" );
+    skip "returned string was empty, skipping next two tests", 2 unless $string;
+    ok( defined $string, "checking if the returned string is defined" );
+    cmp_ok( length($string), '>=', 1, "checking if the returned string has a positive length" );
+    ok($string =~ / ^ \p{ASCII}+ $ /x || utf8::is_utf8($string), "'$string' is pure ASCII or an UTF8 string");
+}
+
-- 
2.6.4

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2016

From @khwilliamson

On 01/16/2016 05​:15 AM, Niko Tyni (via RT) wrote​:

# New Ticket Created by Niko Tyni
# Please include the string​: [perl #127288]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=127288 >

This is a bug report for perl from Niko Tyni <ntyni@​debian.org>,
generated with the help of perlbug 1.40 running under perl 5.23.7.

I18N​::Langinfo​::langinf() can return UTF-8 strings but doesn't set
the UTF8 flag on them.

LC_ALL=fr_FR.UTF-8 perl -MDevel​::Peek -MI18N​::Langinfo=langinfo,MON_12 -e 'Dump langinfo(MON_12())'
SV = PV(0x1173b20) at 0x1172f30
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0x118e960 "d\303\251cembre"\0
CUR = 9
LEN = 11

The attached somewhat clumsy set of two patches fixes this, but perhaps
it's time to make something like __is_cur_LC_category_utf8() more visible?

(This was prompted by Time-Format test suite starting to fail on Perl
5.22 in non-English locales, presumably since commit 9717af6
which fixed POSIX​::strftime() in a similar way. The Time-Format test
suite compares POSIX​::strftime() and I18N​::Langinfo results. See
https://bugs.debian.org/811104 )

I'm not sure about the best way to go about this. I'm still reluctant
to make is_cur_LC_category_utf8() publicly accessible. But maybe it's
been out there long enough to do so. And I can't think of any way to do
this in XS code without exposing something that's not documented.

(In pure perl, within the scope of 'use locale', you can do fc(\xdf) and
see if the result is 'ss' or not, which it only will be if perl thinks
the LC_CTYPE locale is UTF-8.)

So if someone has an opinion about this, chime in.

Here's a couple problems with your patch.

It is possible for the various locale categories to be in different
locales. For example, LC_TIME could be in a UTF-8 locale, while
LC_NUMERIC is not. Thus a fully accurate program would need to know the
category of the field being requested, and call
is_cur_LC_category_utf8() with that category. Most fields that
nl_langinfo(3) returns on my Linux box are LC_TIME, but not all. So your
patch could be wrong if a different field, like the radix character, is
being requested. Since the fields are not standardized, I don't know
how to make it completely accurate, but one could know the few
exceptions and assume everything else is LC_TIME. There is a UTF-8
locale on the dromedary machine that has a radix character that is only
expressible in UTF-8. And there are plenty of currency signs in
LC_MONETARY that are UTF-8 as well.

For the .t file, looking in Configure for whether setlocale and POSIX,
etc, are available or not is not the full story. I have tried to
standardize in the 5.23 series all finding of available locales to use
functions from t/loc_tools.pl. One of them is locales_enabled() which
knows about more things affecting if locale handling is available than
any of the tests that I converted to use it. Since your test is in /ext,
it can freely use core tools, and it's simpler to use this anyway.

---
Flags​:
category=library
severity=low
Type=Patch
PatchStatus=HasPatch
module=I18N​::Langinfo
---
Site configuration information for perl 5.23.7​:

Configured by niko at Sat Jan 16 13​:32​:56 EET 2016.

Summary of my perl5 (revision 5 version 23 subversion 7) configuration​:
Local Commit​: 67271c83612f3ab129c8326d07ca55104a2f23f8
Ancestor​: dff8a39
Platform​:
osname=linux, osvers=4.3.0-1-amd64, archname=x86_64-linux
uname='linux estella 4.3.0-1-amd64 #1 smp debian 4.3.3-2 (2015-12-17) x86_64 gnulinux '
config_args='-Dusedevel -des'
hint=previous, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler​:
cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
optimize='-O2',
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion='', gccversion='5.3.1 20151219', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries​:
ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed /usr/include/x86_64-linux-gnu /usr/lib
libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.21'
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Locally applied patches​:
b8ea0feef314dd5432298db82d9f2b8afed1442f
67271c83612f3ab129c8326d07ca55104a2f23f8

---
@​INC for perl 5.23.7​:
lib
/usr/local/lib/perl5/site_perl/5.23.7/x86_64-linux
/usr/local/lib/perl5/site_perl/5.23.7
/usr/local/lib/perl5/5.23.7/x86_64-linux
/usr/local/lib/perl5/5.23.7
.

---
Environment for perl 5.23.7​:
HOME=/home/niko
LANG=en_US.UTF-8
LANGUAGE (unset)
LC_CTYPE=fi_FI.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/niko/bin​:/home/niko/bin​:/home/niko/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/local/games​:/usr/games​:/sbin​:/usr/sbin​:/sbin​:/usr/sbin
PERL_BADLANG (unset)
SHELL=/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2019

From @khwilliamson

Although these patches were not used directly, this bug got fixed as part of then general overhaul of locale handling in 5.28. And there are tests for this in ext/I18N-Langinfo/t/Langinfo.t, where it goes through the UTF-8 locales, and looks for a non-ASCII time-related unit. It checks that the returned string for that is flagged as UTF-8, and stops looking for more


Karl Williamson

@p5pRT p5pRT closed this as completed Mar 25, 2019
@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2019

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant