New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POSIX::mblen() broken on threaded perls with glibc #16887
Comments
From @ntyniThis is a bug report for perl from ntyni@debian.org, As reported in https://bugs.launchpad.net/bugs/1818953 POSIX::mblen() % perl -MPOSIX=mblen -e 'mblen("a", 1)' This broke in v5.27.8-134-g6c9ff7e96e which made the function The problem is initialization of the shift state with mbrlen(NULL, 0, &ps)); The glibc documentation for mbrlen(3) at does not mention initialization by passing in a null pointer for the If the next multibyte character corresponds to the NUL wide character, The attached proposed patch uses memset(3) instead for mbstate_t initialization, https://www.gnu.org/software/libc/manual/html_node/Keeping-the-state.html with the hope that this is more portable. The patch also adds a few basic test cases. These are in a new file because Flags: Site configuration information for perl 5.29.9: Configured by niko at Wed Mar 13 21:49:26 EET 2019. Summary of my perl5 (revision 5 version 29 subversion 9) configuration: Locally applied patches: @INC for perl 5.29.9: Environment for perl 5.29.9: |
From @ntyni0001-Fix-POSIX-mblen-mbstate_t-initialization-on-threaded.patchFrom aaf1159fe2b891f63f819b2d496b0f938456a36d Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Sun, 10 Mar 2019 19:40:42 +0200
Subject: [PATCH] Fix POSIX::mblen mbstate_t initialization on threaded perls
with glibc
As reported in https://bugs.launchpad.net/bugs/1818953 POSIX::mblen()
is broken on threaded perls with glibc.
% perl -MPOSIX=mblen -e 'mblen("a", 1)'
perl: mbrtowc.c:105: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed.
zsh: abort (core dumped) perl -MPOSIX=mblen -e 'mblen("a", 1)'
This broke in v5.27.8-134-g6c9ff7e96e which made the function
use mbrlen(3) under the hood on threaded perls.
The problem is initialization of the shift state with
mbrlen(NULL, 0, &ps));
The glibc documentation for mbrlen(3) at
https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html#Converting-a-Character
does not mention initialization by passing in a null pointer for the
string, only a pointer to a NUL wide character.
If the next multibyte character corresponds to the NUL wide character,
the return value is 0. If the next n bytes form a valid multibyte
character, the number of bytes belonging to this multibyte character
byte sequence is returned.
Use memset(3) instead for mbstate_t initialization, as suggested in
https://www.gnu.org/software/libc/manual/html_node/Keeping-the-state.html
with the hope that this is more portable.
While at it, add a few basic test cases. These are in a new file because
they need fresh_perl_is() from test.pl while the existing ones use
Test::More (and conversion of at least posix.t looks way too involved.)
Bug-Ubuntu: https://bugs.launchpad.net/bugs/1818953
---
MANIFEST | 1 +
ext/POSIX/POSIX.xs | 2 +-
ext/POSIX/lib/POSIX.pm | 2 +-
ext/POSIX/t/mb.t | 45 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 2 deletions(-)
create mode 100644 ext/POSIX/t/mb.t
diff --git a/MANIFEST b/MANIFEST
index 4cf40a8eec..57465859b9 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4182,6 +4182,7 @@ ext/POSIX/POSIX.xs POSIX extension external subroutines
ext/POSIX/t/export.t Test @EXPORT and @EXPORT_OK
ext/POSIX/t/iscrash See if POSIX isxxx() crashes with threads on Win32
ext/POSIX/t/math.t Basic math tests for POSIX
+ext/POSIX/t/mb.t Multibyte function tests for POSIX
ext/POSIX/t/posix.t See if POSIX works
ext/POSIX/t/sigaction.t See if POSIX::sigaction works
ext/POSIX/t/sigset.t See if POSIX::SigSet works
diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index 1ebc358af4..27051c1fdb 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -3329,7 +3329,7 @@ mblen(s, n)
#endif
CODE:
#if defined(USE_ITHREADS) && defined(HAS_MBRLEN)
- PERL_UNUSED_RESULT(mbrlen(NULL, 0, &ps)); /* Initialize state */
+ memset(&ps, 0, sizeof(ps)); /* Initialize state */
RETVAL = mbrlen(s, n, &ps); /* Prefer reentrant version */
#else
RETVAL = mblen(s, n);
diff --git a/ext/POSIX/lib/POSIX.pm b/ext/POSIX/lib/POSIX.pm
index 25392be4b1..4de039410f 100644
--- a/ext/POSIX/lib/POSIX.pm
+++ b/ext/POSIX/lib/POSIX.pm
@@ -4,7 +4,7 @@ use warnings;
our ($AUTOLOAD, %SIGRT);
-our $VERSION = '1.87';
+our $VERSION = '1.88';
require XSLoader;
diff --git a/ext/POSIX/t/mb.t b/ext/POSIX/t/mb.t
new file mode 100644
index 0000000000..4c60c22bae
--- /dev/null
+++ b/ext/POSIX/t/mb.t
@@ -0,0 +1,45 @@
+#!./perl
+
+# These tests are in a separate file, because they use fresh_perl_is()
+# from test.pl.
+
+# The mb* functions use the "underlying locale" that is not affected by
+# the Perl one. So we run the tests in a separate "fresh_perl" process
+# with the correct LC_CTYPE set in the environment.
+
+BEGIN {
+ require Config; import Config;
+ if ($^O ne 'VMS' and $Config{'extensions'} !~ /\bPOSIX\b/) {
+ print "1..0\n";
+ exit 0;
+ }
+ unshift @INC, "../../t";
+ require 'loc_tools.pl';
+ require 'test.pl';
+}
+
+plan tests => 3;
+
+use POSIX qw();
+
+SKIP: {
+ skip("mblen() not present", 3) unless $Config{d_mblen};
+
+ is(&POSIX::mblen("a", &POSIX::MB_CUR_MAX), 1, 'mblen() basically works');
+
+ skip("LC_CTYPE locale support not available", 2)
+ unless locales_enabled('LC_CTYPE');
+
+ my $utf8_locale = find_utf8_ctype_locale();
+ skip("no utf8 locale available", 2) unless $utf8_locale;
+
+ local $ENV{LC_CTYPE} = $utf8_locale;
+
+ fresh_perl_is(
+ 'use POSIX; print &POSIX::mblen("\x{c3}\x{28}", &POSIX::MB_CUR_MAX)',
+ -1, {}, 'mblen() recognizes invalid multibyte characters');
+
+ fresh_perl_is(
+ 'use POSIX; print &POSIX::mblen("\N{GREEK SMALL LETTER SIGMA}", &POSIX::MB_CUR_MAX)',
+ 2, {}, 'mblen() works on UTF-8 characters');
+}
--
2.20.1
|
From @jkeenanOn Wed, 13 Mar 2019 20:49:30 GMT, ntyni@debian.org wrote:
Made available for smoke-testing in this branch: smoke-me/jkeenan/ntyni/133928-posix Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @xenuOn Wed, 13 Mar 2019 13:49:31 -0700
BTW, it seems that we are *not* resetting the shift state when mblen() ISO-2022-* encodings have special escape sequences which change the The state changes caused by those escape sequences are global and It's mostly a theoretical problem and it's practically untestable, |
From @jkeenanOn Wed, 13 Mar 2019 21:27:49 GMT, jkeenan wrote:
[snip]
Niko, we're getting lots of smoke-test failures in this branch. See http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Fntyni%2F133928-posix Specifically, failing on Linux when using g++. Failing on NetBSD when using either gcc or g++. Can you investigate? Thank you very much. -- |
From @ntyniOn Thu, Mar 14, 2019 at 05:07:27AM -0700, James E Keenan via RT wrote:
Seems to be failing on pretty much all BSD variants. However, I just tried and it works fine for me on FreeBSD 12. And also The best I can think of is that all the failing ones have set LC_ALL=C Could we please see if this fares any better? Thanks, |
From @ntyni0001-Fix-POSIX-mblen-mbstate_t-initialization-on-threaded.patchFrom 0c09c63e43008a8ec56d8b9fc428d694fb1e2ac4 Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Sun, 10 Mar 2019 19:40:42 +0200
Subject: [PATCH] Fix POSIX::mblen mbstate_t initialization on threaded perls
with glibc
As reported in https://bugs.launchpad.net/bugs/1818953 POSIX::mblen()
is broken on threaded perls with glibc.
% perl -MPOSIX=mblen -e 'mblen("a", 1)'
perl: mbrtowc.c:105: __mbrtowc: Assertion `__mbsinit (data.__statep)' failed.
zsh: abort (core dumped) perl -MPOSIX=mblen -e 'mblen("a", 1)'
This broke in v5.27.8-134-g6c9ff7e96e which made the function
use mbrlen(3) under the hood on threaded perls.
The problem is initialization of the shift state with
mbrlen(NULL, 0, &ps));
The glibc documentation for mbrlen(3) at
https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html#Converting-a-Character
does not mention initialization by passing in a null pointer for the
string, only a pointer to a NUL wide character.
If the next multibyte character corresponds to the NUL wide character,
the return value is 0. If the next n bytes form a valid multibyte
character, the number of bytes belonging to this multibyte character
byte sequence is returned.
Use memset(3) instead for mbstate_t initialization, as suggested in
https://www.gnu.org/software/libc/manual/html_node/Keeping-the-state.html
with the hope that this is more portable.
While at it, add a few basic test cases. These are in a new file because
they need fresh_perl_is() from test.pl while the existing ones use
Test::More (and conversion of at least posix.t looks way too involved.)
Bug-Ubuntu: https://bugs.launchpad.net/bugs/1818953
---
MANIFEST | 1 +
ext/POSIX/POSIX.xs | 2 +-
ext/POSIX/lib/POSIX.pm | 2 +-
ext/POSIX/t/mb.t | 47 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 50 insertions(+), 2 deletions(-)
create mode 100644 ext/POSIX/t/mb.t
diff --git a/MANIFEST b/MANIFEST
index 4cf40a8eec..57465859b9 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -4182,6 +4182,7 @@ ext/POSIX/POSIX.xs POSIX extension external subroutines
ext/POSIX/t/export.t Test @EXPORT and @EXPORT_OK
ext/POSIX/t/iscrash See if POSIX isxxx() crashes with threads on Win32
ext/POSIX/t/math.t Basic math tests for POSIX
+ext/POSIX/t/mb.t Multibyte function tests for POSIX
ext/POSIX/t/posix.t See if POSIX works
ext/POSIX/t/sigaction.t See if POSIX::sigaction works
ext/POSIX/t/sigset.t See if POSIX::SigSet works
diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index 1ebc358af4..27051c1fdb 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -3329,7 +3329,7 @@ mblen(s, n)
#endif
CODE:
#if defined(USE_ITHREADS) && defined(HAS_MBRLEN)
- PERL_UNUSED_RESULT(mbrlen(NULL, 0, &ps)); /* Initialize state */
+ memset(&ps, 0, sizeof(ps)); /* Initialize state */
RETVAL = mbrlen(s, n, &ps); /* Prefer reentrant version */
#else
RETVAL = mblen(s, n);
diff --git a/ext/POSIX/lib/POSIX.pm b/ext/POSIX/lib/POSIX.pm
index 25392be4b1..4de039410f 100644
--- a/ext/POSIX/lib/POSIX.pm
+++ b/ext/POSIX/lib/POSIX.pm
@@ -4,7 +4,7 @@ use warnings;
our ($AUTOLOAD, %SIGRT);
-our $VERSION = '1.87';
+our $VERSION = '1.88';
require XSLoader;
diff --git a/ext/POSIX/t/mb.t b/ext/POSIX/t/mb.t
new file mode 100644
index 0000000000..961edf6cf2
--- /dev/null
+++ b/ext/POSIX/t/mb.t
@@ -0,0 +1,47 @@
+#!./perl
+
+# These tests are in a separate file, because they use fresh_perl_is()
+# from test.pl.
+
+# The mb* functions use the "underlying locale" that is not affected by
+# the Perl one. So we run the tests in a separate "fresh_perl" process
+# with the correct LC_CTYPE set in the environment.
+
+BEGIN {
+ require Config; import Config;
+ if ($^O ne 'VMS' and $Config{'extensions'} !~ /\bPOSIX\b/) {
+ print "1..0\n";
+ exit 0;
+ }
+ unshift @INC, "../../t";
+ require 'loc_tools.pl';
+ require 'test.pl';
+}
+
+plan tests => 3;
+
+use POSIX qw();
+
+SKIP: {
+ skip("mblen() not present", 3) unless $Config{d_mblen};
+
+ is(&POSIX::mblen("a", &POSIX::MB_CUR_MAX), 1, 'mblen() basically works');
+
+ skip("LC_CTYPE locale support not available", 2)
+ unless locales_enabled('LC_CTYPE');
+
+ my $utf8_locale = find_utf8_ctype_locale();
+ skip("no utf8 locale available", 2) unless $utf8_locale;
+
+ local $ENV{LC_CTYPE} = $utf8_locale;
+ local $ENV{LC_ALL};
+ delete $ENV{LC_ALL};
+
+ fresh_perl_is(
+ 'use POSIX; print &POSIX::mblen("\x{c3}\x{28}", &POSIX::MB_CUR_MAX)',
+ -1, {}, 'mblen() recognizes invalid multibyte characters');
+
+ fresh_perl_is(
+ 'use POSIX; print &POSIX::mblen("\N{GREEK SMALL LETTER SIGMA}", &POSIX::MB_CUR_MAX)',
+ 2, {}, 'mblen() works on UTF-8 characters');
+}
--
2.20.1
|
From @jkeenanOn Sat, 16 Mar 2019 10:19:33 GMT, ntyni@debian.org wrote:
New branch for smoke-testing: smoke-me/jkeenan/ntyni/133928-posix-2nd -- |
From @ntyniOn Sat, Mar 16, 2019 at 06:08:05AM -0700, James E Keenan via RT wrote:
Thanks. This looks much better. The only relevant failure left I looked a bit into this and I'm sort of baffled. I'd appreciate openbsd% locale Without -Dusethreads, where my patch doesn't make a difference, the % LC_CTYPE=fi_FI.UTF-8 ./perl -Ilib -MPOSIX=mblen -le 'print mblen("\303\244", 5)' However, with -Dusethreads, it seems that LC_CTYPE is not effective % LC_CTYPE=fi_FI.UTF-8 ./perl -Ilib -MPOSIX=mblen -le 'print mblen("\303\244", 5)' With -Dusethreads, with my patch: % LC_CTYPE=fi_FI.UTF-8 ./perl -Ilib -MPOSIX=mblen -le 'print mblen("\303\244", 5)' Why isn't LC_CTYPE working here? |
From @khwilliamsonOn 3/17/19 1:55 PM, Niko Tyni wrote:
Use the -DLv flag on a DEBUGGING build to get locale handling trace. |
From @ntyniOn Sun, Mar 17, 2019 at 08:09:55PM -0700, karl williamson via RT wrote:
Thanks. After glaring at this for a bit, I suspect the difference is From https://man.openbsd.org/uselocale.3 : On OpenBSD, mask is only meaningful if it includes LC_CTYPE_MASK, On OpenBSD, newlocale() ignores oldloc, and passing (locale_t)0 I'm attaching a log with PERL_DEBUG_LOCALE_INIT=1; unfortunately most The immediate reason for treating the locale as non-UTF8 is that #include <stdlib.h> int main(void) { Hope this makes sense. Any further insight would be welcome :) |
From @ntyniopenbsd% env -i LC_CTYPE=fi_FI.UTF-8 PERL_DEBUG_LOCALE_INIT=1 ./perl -DLv -Ilib -MPOSIX=mblen -le 'print mblen("\303\244", 5)' EXECUTING... 1 |
From @khwilliamsonOn 3/24/19 12:54 PM, Niko Tyni wrote:
I made the ones I thought important displayable at init time. I've wondered about that third argument being ignored on openbsd. My
But why is it 1? That sounds like a platform bug. But I think other cd t to see what is happening I think (but am not quite sure) that the following standalone
|
From @khwilliamsonNiko, I now have access to an openbsd box, so am working on this. I'll let On 3/24/19 3:58 PM, Karl Williamson wrote:
|
From @khwilliamsonOn 3/24/19 3:22 PM, Karl Williamson wrote:
Actually, upon further reading of the man pages there, I don't think
|
From @khwilliamsonOn 3/24/19 4:34 PM, Karl Williamson wrote:
I pushed your patch, as this looks like its a bug in the OS or perhaps
|
From @khwilliamsonOn 3/24/19 9:54 PM, Karl Williamson wrote:
The attached C program reproduces the problem, and I believe shows that
|
From @khwilliamson#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int
main (int argc, char ** argv)
{
/* Setting LC_CTYPE to C.UTF-8 properly makes a UTF-8 locale */
locale_t ctype = newlocale(LC_CTYPE_MASK, "C.UTF-8", (locale_t) 0);
uselocale(ctype);
fprintf(stderr, "MB_CUR_MAX after ctype=%zu\n", MB_CUR_MAX);
/* But then setting LC_NUMERIC destroys the LC_CTYPE setting. Setting a
* particular category should not affect the current settings of any other
* category. On other systems, the third parameter to newlocale() would
* have been 'ctype' to indicate that, but on openbsd that argument is
* ignored. This seems like a design flaw to me */
locale_t numeric = newlocale(LC_NUMERIC_MASK, "C.UTF-8", (locale_t) 0);
uselocale(numeric);
fprintf(stderr, "MB_CUR_MAX after numeric=%zu\n", MB_CUR_MAX);
} |
From @khwilliamsonOn 3/25/19 1:19 PM, Karl Williamson wrote:
I need to apologize to Niko. He had figured things out, and indeed his In thinking about it since, I'm unsure fhat openbsd needs to look at the
|
From @ntyniOn Mon, Mar 25, 2019 at 02:14:35PM -0700, karl williamson via RT wrote:
Oh, no need to apologize! I could certainly have explained things clearer, Too bad for the duplicate work, but I'm glad we arrived at the same
FWIW I also think this would work. I guess the mb.t tests added in this ticket could use a skip/todo for |
From @ntyni0001-Skip-most-ext-POSIX-t-mb.t-tests-on-OpenBSD-with-thr.patchFrom 70d50b021b6437654582fa2c77686ccd1426aa6f Mon Sep 17 00:00:00 2001
From: Niko Tyni <ntyni@debian.org>
Date: Tue, 26 Mar 2019 21:13:20 +0200
Subject: [PATCH] Skip most ext/POSIX/t/mb.t tests on OpenBSD with threads
As discussed in [perl #133928] the locale initialization
code does not currently work on OpenBSD. Skip the failing
tests for now.
---
ext/POSIX/t/mb.t | 3 +++
1 file changed, 3 insertions(+)
diff --git a/ext/POSIX/t/mb.t b/ext/POSIX/t/mb.t
index 961edf6cf2..321a6abbe2 100644
--- a/ext/POSIX/t/mb.t
+++ b/ext/POSIX/t/mb.t
@@ -30,6 +30,9 @@ SKIP: {
skip("LC_CTYPE locale support not available", 2)
unless locales_enabled('LC_CTYPE');
+ skip("locale initialization problems on OpenBSD / threaded [perl #133928]", 2)
+ if $^O eq 'openbsd' and $Config{useithreads};
+
my $utf8_locale = find_utf8_ctype_locale();
skip("no utf8 locale available", 2) unless $utf8_locale;
--
2.20.1
|
From @khwilliamsonOn 3/26/19 1:26 PM, Niko Tyni wrote:
I'm thinking that this bug in OpenBSD necessitates us not using the In either event, these tests will start working without having to change |
From @jkeenanOn Tue, 26 Mar 2019 19:56:17 GMT, public@khwilliamson.com wrote:
Among our smoke-testing rigs ext/POSIX/t/mb.t is still experiencing failures on 2 OSes. See: http://perl5.test-smoke.org/submatrix?test=../ext/POSIX/t/mb.t&pversion=5.29.10 Examples: OpenBSD-6.4: http://perl5.test-smoke.org/report/83977 |
From @khwilliamsonOn 4/2/19 3:38 PM, James E Keenan via RT wrote:
OpenBSD is changing their code because of this. I'm leaning to a hints I have access to a HP_UX 11.23, and its what I consider their bug. |
From @ntyniOn Tue, Apr 02, 2019 at 07:39:35PM -0700, karl williamson via RT wrote:
Hi, please note that C.UTF-8 is the only UTF-8 locale enabled by default |
From @khwilliamsonOn 4/3/19 12:35 PM, Niko Tyni wrote:
Of course |
From @khwilliamsonOn 4/2/19 8:39 PM, Karl Williamson wrote:
Now done. OpenBSD should now be passing.
I'm waiting for Tux to get 11.31 compiling again to see if HP eventually |
From @iabynOn Sat, Apr 06, 2019 at 11:59:59AM -0600, Karl Williamson wrote:
There have since been several commits to ext/POSIX/t/mb.t. Is there any -- |
I don't see anything left to do in this ticket. Is it closable? In the current branch smoke-me/khw-locale, I have added code to allow one to initialize the shift state for mblen by passing an undef as the first parameter. Please try it out. |
Yes, I think this is closable. Thanks. I tried smoke-me/khw-locale as far as running the test suite and checking that mblen() didn't start crashing again on threaded perl. All looks fine to me. I'm afraid I can't comment on the actual shift state initization part. |
As a final note, HP did fix this by 11.31 |
Migrated from rt.perl.org#133928 (status was 'open')
Searchable as RT133928$
The text was updated successfully, but these errors were encountered: