Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cygwin: setlocale(LC_ALL, "ja_JP.utf8") panic: strxfrm() gets absurd #13768

Closed
p5pRT opened this issue Apr 25, 2014 · 17 comments
Closed

cygwin: setlocale(LC_ALL, "ja_JP.utf8") panic: strxfrm() gets absurd #13768

p5pRT opened this issue Apr 25, 2014 · 17 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 25, 2014

Migrated from rt.perl.org#121734 (status was 'resolved')

Searchable as RT121734$

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2014

From @rurban

Subject​: cygwin​: setlocale(LC_ALL, "ja_JP.utf8") panic​: strxfrm() gets absurd
Message-Id​: <5.14.2_2132_1398453260@​win>
Reply-To​: rurban@​cpanel.net
To​: perlbug@​perl.org
From​: rurban@​cpanel.net

This is a bug report for perl from rurban@​x-ray.at,
generated with the help of perlbug 1.39 running under perl 5.14.2.


cygwin 5.18.2

$ ./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'
panic​: strxfrm() gets absurd - a => 100, ab => 100 at -e line 1.

all other locales before ja_JP.utf8 in lib/locale.t do work fine.
I haven't checked if that's the problem fixed in blead with setlocale by
kwh, or a cygwin bug.



Flags​:
  category=library
  severity=high
  module=POSIX


This perlbug was built using Perl 5.14.2 - Thu Jul 12 13​:58​:56 CDT 2012
It is being executed now by Perl 5.14.2 - Mon Sep 17 12​:58​:11 CDT 2012.

Site configuration information for perl 5.14.2​:

Configured by rurban at Mon Sep 17 12​:58​:11 CDT 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration​:
 
  Platform​:
  osname=cygwin, osvers=1.7.17s(0.26253), archname=cygwin-thread-multi-64int
  uname='cygwin_nt-5.1 win 1.7.17s(0.26253) 20120801 11​:02​:01 i686 cygwin '
  config_args='-de -Dlibperl=cygperl5_14.dll -Dcc=gcc-4 -Dld=g++-4 -Darchname=i686-cygwin-threads-64int -Dmksymlinks -Dusethreads -Accflags=-g'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
  use64bitint=define, use64bitall=undef, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='gcc-4', ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -g -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
  optimize='-O3',
  cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -g -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.5.3', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='g++-4', ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib /lib
  libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
  perllibs=-ldl -lcrypt
  libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=cygperl5_14.dll
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' --shared -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -L/usr/local/lib -fstack-protector'

Locally applied patches​:
  Bug#55162 File​::Spec​::case_tolerant performance
  CYG07 $vendorarch/auto/.rebase
  CYG15 static Win32CORE
  CYG17 cyg-1.7 paths-utf8
  0c612ce82 Fix building static extensions on cygwin, -UUSEIMPORTLIB
  1bac5ec Fix 64-bit threading sv.c​: S_anonymise_cv_maybe
  Cygwin​::sync_winenv added


@​INC for perl 5.14.2​:
  /usr/lib/perl5/site_perl/5.14/i686-cygwin-threads-64int
  /usr/lib/perl5/site_perl/5.14
  /usr/lib/perl5/vendor_perl/5.14/i686-cygwin-threads-64int
  /usr/lib/perl5/vendor_perl/5.14
  /usr/lib/perl5/5.14/i686-cygwin-threads-64int
  /usr/lib/perl5/5.14
  /usr/lib/perl5/site_perl/5.10
  /usr/lib/perl5/vendor_perl/5.10
  /usr/lib/perl5/site_perl/5.8
  .


Environment for perl 5.14.2​:
  HOME=/home/rurban
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/bin​:/usr/local/bin​:/home/rurban/bin​:/usr/sbin/​:/usr/local/bin​:/usr/bin​:/cygdrive/c/windows/system32​:/cygdrive/c/windows​:/cygdrive/c/windows/System32/Wbem​:/bin​:/home/rurban/bin​:/usr/lib/lapack
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2014

From @khwilliamson

On 04/25/2014 01​:33 PM, rurban@​cpanel.net (via RT) wrote​:

./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'

It works on my Linux; I notice that there is no -I parameter in your
example. Are you sure you are getting the correct library path?

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 25, 2014

From @rurban

On 04/25/2014 03​:02 PM, karl williamson via RT wrote​:

On 04/25/2014 01​:33 PM, rurban@​cpanel.net (via RT) wrote​:

./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'

It works on my Linux; I notice that there is no -I parameter in your
example. Are you sure you are getting the correct library path?

yes, sorry.

This fails only on cygwin, only tested on 32bit.
It fails with most eastern locales, but some eastern locales do pass.

I needed to this patch to pass the locale tests​:

Inline Patch
--- perl-5.18.2/lib/locale.t~	2014-01-06 16:46:45.000000000 -0600
+++ perl-5.18.2/lib/locale.t	2014-04-25 14:28:31.455279000 -0500
@@ -528,7 +528,9 @@
  sub trylocale {
      my $locale = shift;
      return if grep { $locale eq $_ } @Locale;
-    return unless setlocale(LC_ALL, $locale);
+    # warn "# setlocale(LC_ALL, $locale)\n";
+    return if ($^O eq 'cygwin' and $locale =~ 
/^(ja_JP\.utf8|japanese\.sjis|ko_KR|korean|zh_)/) #RT 121734 \+ or \!setlocale\(LC\_ALL\, $locale\);   my $badutf8;   \{   local $SIG\{\_\_WARN\_\_\} = sub \{

--
Reini

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @khwilliamson

On 04/25/2014 02​:10 PM, Reini Urban wrote​:

On 04/25/2014 03​:02 PM, karl williamson via RT wrote​:

On 04/25/2014 01​:33 PM, rurban@​cpanel.net (via RT) wrote​:

./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'

It works on my Linux; I notice that there is no -I parameter in your
example. Are you sure you are getting the correct library path?

yes, sorry.

This fails only on cygwin, only tested on 32bit.
It fails with most eastern locales, but some eastern locales do pass.

I had a look at the code. It's from these two lines in locale.c​:
  /* 50​: surely no system expands a char more. */
#define XFRMBUFSIZE (2 * 50)

That '50' looks like a guess. You could try bumping it up until
everything passes, and we could then use that instead of having to skip
locales in the tests.

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @jhi

On Wednesday-201406-04, 20​:30, Karl Williamson wrote​:

On 04/25/2014 02​:10 PM, Reini Urban wrote​:

On 04/25/2014 03​:02 PM, karl williamson via RT wrote​:

On 04/25/2014 01​:33 PM, rurban@​cpanel.net (via RT) wrote​:

./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'

It works on my Linux; I notice that there is no -I parameter in your
example. Are you sure you are getting the correct library path?

yes, sorry.

This fails only on cygwin, only tested on 32bit.
It fails with most eastern locales, but some eastern locales do pass.

I had a look at the code. It's from these two lines in locale.c​:
/* 50​: surely no system expands a char more. */
#define XFRMBUFSIZE (2 * 50)

That '50' looks like a guess. You could try bumping it up until
everything passes, and we could then use that instead of having to skip
locales in the tests.

Yes, a guess, by yours truly, from the time when mammoths and
sabertooths roamed the lands.

I've got a better idea than futzing with the futz factor. Just a minute.

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @jhi

On Wednesday-201406-04, 21​:22, Jarkko Hietaniemi wrote​:

I've got a better idea than futzing with the futz factor. Just a minute.

So try the attached patch. Instead of baking in a futz factor (which
some system apparently just out-futzed) this tries divining the
appropriate numbers using the NULL-buffer-zero-count feature of
strxfrm(), where no actual output is done, only the required length is
returned.

NOTE 1​: if the expansion factor of 50 really is not enough, this means
that for strxfrm() of input of N bytes, Perl will reserve a buffer of
more than 50 N...

NOTE 2​: now that I think of it, if the strxfrm() count-only feature
really works (see above), then the PL_collxfrm_base and PL_collxfrm_mult
can be deprecated as such.

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @jhi

Inline Patch
diff --git a/locale.c b/locale.c
index 8d49fac..a2e8374 100644
--- a/locale.c
+++ b/locale.c
@@ -313,18 +313,16 @@ Perl_new_collate(pTHX_ const char *newcoll)
 				 || strEQ(newcoll, "POSIX"));
 
 	{
-	  /*  2: at most so many chars ('a', 'b'). */
-	  /* 50: surely no system expands a char more. */
-#define XFRMBUFSIZE  (2 * 50)
-	  char xbuf[XFRMBUFSIZE];
-	  const Size_t fa = strxfrm(xbuf, "a",  XFRMBUFSIZE);
-	  const Size_t fb = strxfrm(xbuf, "ab", XFRMBUFSIZE);
-	  const SSize_t mult = fb - fa;
-	  if (mult < 1 && !(fa == 0 && fb == 0))
-	      Perl_croak(aTHX_ "panic: strxfrm() gets absurd - a => %"UVuf", ab => %"UVuf,
-			 (UV) fa, (UV) fb);
-	  PL_collxfrm_base = (fa > (Size_t)mult) ? (fa - mult) : 0;
-	  PL_collxfrm_mult = mult;
+            const char a[] = "A";
+            const char b[] = "ABCDEFGHIJ abcdefghij 0123456789";
+            const Size_t fa = strxfrm(NULL, a, 0);
+            const Size_t fb = strxfrm(NULL, b, 0);
+            if (fa == 0 || fb == 0 || fa >= fb) {
+                Perl_croak(aTHX_ "panic: strxfrm() gets absurd: "
+                           "%"UVuf" cf %"UVuf, (UV)fa, (UV)fb);
+            }
+            PL_collxfrm_mult = (double)fb / sizeof(b) + 1;
+            PL_collxfrm_base = fa > PL_collxfrm_mult ? fa : PL_collxfrm_mult;
 	}
     }
 

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @tonycoz

On Fri Apr 25 12​:33​:27 2014, rurban@​cpanel.net wrote​:

$ ./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'
panic​: strxfrm() gets absurd - a => 100, ab => 100 at -e line 1.

all other locales before ja_JP.utf8 in lib/locale.t do work fine.
I haven't checked if that's the problem fixed in blead with setlocale
by
kwh, or a cygwin bug.

I don't see this panic with current cygwin​:

tony@​saturn ~/dev/perl/git/perl
$ ./perl -Ilib -MPOSIX -e 'setlocale(LC_ALL, "ja_JP.utf8")'

tony@​saturn ~/dev/perl/git/perl
$ uname -a
CYGWIN_NT-6.1-WOW64 saturn 1.7.30(0.272/5/3) 2014-05-23 10​:36 i686 Cygwin

Tested with blead and 5.18.2 (with that minimal fix to cygwin/cygwin.c).

I do however see a bug in cygwin's strxfrm() implementation, here's the code from src/winsup/cygwin/nlsfuncs.cc​:

extern "C" size_t
strxfrm (char *__restrict s1, const char *__restrict s2, size_t sn)
{
  size_t ret;
  size_t n2;
  wchar_t *ws2;
  tmp_pathbuf tp;

  if (!collate_lcid)
  return strlcpy (s1, s2, sn);
  /* The ANSI version of LCMapString uses the default charset of the lcid,
  so we must use the Unicode version. */
  n2 = lc_mbstowcs (collate_mbtowc, collate_charset, NULL, s2, 0) + 1;
  ws2 = (n2 > NT_MAX_PATH ? (wchar_t *) malloc (n2 * sizeof (wchar_t))
  : tp.w_get ());
  lc_mbstowcs (collate_mbtowc, collate_charset, ws2, s2, n2);
  /* The sort key is a NUL-terminated byte string. */
  ret = LCMapStringW (collate_lcid, LCMAP_SORTKEY, ws2, -1, (PWCHAR) s1, sn);
  if (n2 > NT_MAX_PATH)
  free (ws2);
  if (ret == 0)
  {
  if (GetLastError () != ERROR_INSUFFICIENT_BUFFER)
  set_errno (EINVAL);
  return sn;
  }
  /* LCMapStringW returns byte count including the terminating NUL character.
  strxfrm is supposed to return length excluding the NUL. */
  return ret - 1;
}

when the key string received doesn't fit in the buffer, LCMapStringW() returns 0 and sets GetLastError() to ERROR_INSUFFICIENT_BUFFER, at which point strxfrm() returns the size of the output buffer - *not* the size of the sort key as required by ANSI C.

We also have a couple of bugs in POSIX​::strxfrm(), but that's not used by the code that produces the panic above.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2016

From @tonycoz

On Wed Jun 04 23​:58​:26 2014, tonyc wrote​:

I do however see a bug in cygwin's strxfrm() implementation, here's
the code from src/winsup/cygwin/nlsfuncs.cc​:

...

when the key string received doesn't fit in the buffer, LCMapStringW()
returns 0 and sets GetLastError() to ERROR_INSUFFICIENT_BUFFER, at
which point strxfrm() returns the size of the output buffer - *not*
the size of the sort key as required by ANSI C.

Reported to cygwin at https://cygwin.com/ml/cygwin/2016-04/msg00232.html

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

On Mon Apr 11 22​:10​:38 2016, tonyc wrote​:

On Wed Jun 04 23​:58​:26 2014, tonyc wrote​:

I do however see a bug in cygwin's strxfrm() implementation, here's
the code from src/winsup/cygwin/nlsfuncs.cc​:

...

when the key string received doesn't fit in the buffer, LCMapStringW()
returns 0 and sets GetLastError() to ERROR_INSUFFICIENT_BUFFER, at
which point strxfrm() returns the size of the output buffer - *not*
the size of the sort key as required by ANSI C.

Reported to cygwin at https://cygwin.com/ml/cygwin/2016-04/msg00232.html

Fixed in nightlies​:

https://cygwin.com/ml/cygwin/2016-04/msg00247.html

https://cygwin.com/ml/cygwin/2016-04/msg00279.html

Again, this doesn't seem to be the cause of the problem Reini originally reported (which I haven't reprotduced.)

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2016

From @tonycoz

On Wed Jun 04 23​:58​:26 2014, tonyc wrote​:

We also have a couple of bugs in POSIX​::strxfrm(), but that's not used
by the code that produces the panic above.

Which I fixed in 6ec5f82, if anyone is wondering what I was talking about.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2016

From @khwilliamson

On Wed Jun 04 23​:58​:26 2014, tonyc wrote​:

On Fri Apr 25 12​:33​:27 2014, rurban@​cpanel.net wrote​:

$ ./perl -MPOSIX -e'setlocale(LC_ALL, "ja_JP.utf8")'
panic​: strxfrm() gets absurd - a => 100, ab => 100 at -e line 1.

all other locales before ja_JP.utf8 in lib/locale.t do work fine.
I haven't checked if that's the problem fixed in blead with setlocale
by
kwh, or a cygwin bug.

I don't see this panic with current cygwin​:

tony@​saturn ~/dev/perl/git/perl
$ ./perl -Ilib -MPOSIX -e 'setlocale(LC_ALL, "ja_JP.utf8")'

tony@​saturn ~/dev/perl/git/perl
$ uname -a
CYGWIN_NT-6.1-WOW64 saturn 1.7.30(0.272/5/3) 2014-05-23 10​:36 i686
Cygwin

Tested with blead and 5.18.2 (with that minimal fix to
cygwin/cygwin.c).

I do however see a bug in cygwin's strxfrm() implementation, here's
the code from src/winsup/cygwin/nlsfuncs.cc​:

extern "C" size_t
strxfrm (char *__restrict s1, const char *__restrict s2, size_t sn)
{
size_t ret;
size_t n2;
wchar_t *ws2;
tmp_pathbuf tp;

if (!collate_lcid)
return strlcpy (s1, s2, sn);
/* The ANSI version of LCMapString uses the default charset of the
lcid,
so we must use the Unicode version. */
n2 = lc_mbstowcs (collate_mbtowc, collate_charset, NULL, s2, 0) + 1;
ws2 = (n2 > NT_MAX_PATH ? (wchar_t *) malloc (n2 * sizeof (wchar_t))
: tp.w_get ());
lc_mbstowcs (collate_mbtowc, collate_charset, ws2, s2, n2);
/* The sort key is a NUL-terminated byte string. */
ret = LCMapStringW (collate_lcid, LCMAP_SORTKEY, ws2, -1, (PWCHAR) s1,
sn);
if (n2 > NT_MAX_PATH)
free (ws2);
if (ret == 0)
{
if (GetLastError () != ERROR_INSUFFICIENT_BUFFER)
set_errno (EINVAL);
return sn;
}
/* LCMapStringW returns byte count including the terminating NUL
character.
strxfrm is supposed to return length excluding the NUL. */
return ret - 1;
}

when the key string received doesn't fit in the buffer, LCMapStringW()
returns 0 and sets GetLastError() to ERROR_INSUFFICIENT_BUFFER, at
which point strxfrm() returns the size of the output buffer - *not*
the size of the sort key as required by ANSI C.

We also have a couple of bugs in POSIX​::strxfrm(), but that's not used
by the code that produces the panic above.

Tony

I'm pretty sure I understand how the panic happens with this cygwin bug, and have a work-around smoking for 5.25. It seems to me though, that if a platform's strxfrm() is more badly broken that we shouldn't panic, but just not do locale-based collation for that locale, reverting to code point order, as we do for non-locale collation. Avoiding a panic is generally considered a good thing. My question is, should we raise a warning instead?
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2016

From @khwilliamson

Fixed by 79f120c
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Jun 1, 2016

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.26.0, this and 210 other issues have been
resolved.

Perl 5.26.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.26.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 30, 2017

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant