New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] remove regexec.c:non_utf8_target_but_utf8_required indirection var #16448
Comments
From @bulk88Created by @bulk88See attached patch. Perl Info
|
From @bulk880001-remove-regexec.c-non_utf8_target_but_utf8_required-i.patchFrom ad23122a7ca6fc961e0539715ace78fc75cc615a Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 1 Mar 2018 10:33:53 -0500
Subject: [PATCH] remove regexec.c:non_utf8_target_but_utf8_required
indirection var
commit 7e0d5ad7c9 "regexec.c: PATCH: [perl #114808]" made a global ptr
var that contains a ptr to a C string instead of writing the C string
literal multiple times and letting the linker dedup them. There is no
swaping the message/char * at runtime either because of consting. The var
became static in commit 991fc03aa2
"Make private variable static in regexec.c.". Since in re.pm's XS lib
var non_utf8_target_but_utf8_required exists but isnt referenced and isn't
optimized away due to a MSVC linker bug, just remove the intermediate var
and put the string litteral in the macro where it may or may not be removed
by the CPP. Remove the %s fmt string too since there can't be any token
injection into the string literal from afar (this was some CC unsafe fmt
string warning thing probably in commit 6b54ddc5f0 ).
VC 2003 32b CC re.dll
b4
.text section 0x2E9C0 bytes long
.rdata section 0x10661 bytes long
after
.text section 0x2E9B0 bytes long
.rdata section 0x10651 bytes long
the .text section decreased because "push offset "%s"" instruction was
removed from my_re_intuit_start and my_regexec. The MS CC did optimize
the Perl_re_printf call to do "push offset "Can't match..."" instead of
push [non_utf8_target_but_utf8_required] reading the char * from RO mem
instead of the char * being encoded in cpu instruction like a string
literal is, but didnt optimize away the non_utf8_target_but_utf8_required
static global var.
---
regexec.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/regexec.c b/regexec.c
index d9edb33..eb6ea59 100644
--- a/regexec.c
+++ b/regexec.c
@@ -91,21 +91,17 @@
static const char utf8_locale_required[] =
"Use of (?[ ]) for non-UTF-8 locale is wrong. Assuming a UTF-8 locale";
-#ifdef DEBUGGING
-/* At least one required character in the target string is expressible only in
- * UTF-8. */
-static const char* const non_utf8_target_but_utf8_required
- = "Can't match, because target string needs to be in UTF-8\n";
-#endif
-
/* Returns a boolean as to whether the input unsigned number is a power of 2
* (2**0, 2**1, etc). In other words if it has just a single bit set.
* If not, subtracting 1 would leave the uppermost bit set, so the & would
* yield non-zero */
#define isPOWER_OF_2(n) ((n & (n-1)) == 0)
+/* At least one required character in the target string is expressible only in
+ * UTF-8. */
#define NON_UTF8_TARGET_BUT_UTF8_REQUIRED(target) STMT_START { \
- DEBUG_EXECUTE_r(Perl_re_printf( aTHX_ "%s", non_utf8_target_but_utf8_required));\
+ DEBUG_EXECUTE_r(Perl_re_printf( aTHX_ \
+ "Can't match, because target string needs to be in UTF-8\n")); \
goto target; \
} STMT_END
--
1.7.9.msysgit.0
|
From @khwilliamsonOn 03/01/2018 08:39 AM, bulk88 (via RT) wrote:
I'm trying to understand the point of your patch. It looks like it
|
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88On Fri, 02 Mar 2018 06:08:06 -0800, public@khwilliamson.com wrote:
That static char * var with 1 reference wasn't optimized out by VC. I poked around a HPUX SL format perl binary a bit, IDK if it was compiled with acc or gcc, but it looked to me like identical string literals from ONE .o are de-duped but are not deduped across .o boundaries, and libperl is made of multiple .o'es. This patch doesn't solve deduping the string literals on CCs that dont want to, but it did removed the global pointer sized var. Maybe wrong linker/CC flags with perl on HPUX. I know originally per C specification all C double quote literals were stored in RW memory, but I think Perl forces it RO on any OS that has default RW string literals with Configure. -- |
From zefram@fysh.orgbulk88 via RT wrote:
That's too small a concern to be worth optimising for, especially for
Duplication of string literals involves quite a bit more binary size Probably a better approach would be to make the named object an array -zefram |
From @khwilliamsonI took Zefram's approach in e7a474c |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.30.0, this and 160 other issues have been Perl 5.30.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#132928 (status was 'resolved')
Searchable as RT132928$
The text was updated successfully, but these errors were encountered: