New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] 10s of KB to 100s of KB binary bloat with Visual C Perl related to Perl_SCX/charclass_invlists.h #16447
Comments
From @bulk88Created by @bulk88See attached patches. I noticed after compiling a new blead Perl with Visual C, then opening The first commit at fixing these RE related "const static" unreferenced I offlist communicated with KHW but he didn't have any immediate My opinion is this MSVC binary bloat problem is a blocker for 5.28 perl. ------------------------------------ $ ls -l **/*.dll | awk '{print $5 " " $9}' Perl Info
|
From @bulk880001-dont-define-and-declare-SCX_AUX_TABLES-in-every-perl.patchFrom 43de611d94094cd0cf2b8466e5b12f513f44c3c9 Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 1 Mar 2018 01:06:40 -0500
Subject: [PATCH 1/2] dont define and declare SCX_AUX_TABLES in every perl
binary
MSVC has a problem (or bug) removing unreferenced const static data from
binaries. Some more background is in commit f1bfbdda19
"remove unreferenced copies of char * swash_property_names[]" . Anyways
commit fea79782ed
"charclasslists.h: script enums visible to CORE,EXT" caused a multi 100KB
increase in the size of perl527.dll when compiled with MSVC. Using a
disassembler/symbol table tools shows 43 unique copies of the following
data vars inside perl527.dll
script_zeros
Perl_SCX_invmap
SCX_AUX_TABLE_1 to SCX_AUX_TABLE_45
SCX_AUX_TABLE_lengths
SCX_AUX_TABLE_ptrs
There is no disk size increase for Win32 perl compiled with GCC with
commit "charclasslists.h: script enums visible to CORE,EXT"
so the Win32 GCC linker correctly tossed the unreferenced static vars.
Before and after this commit, disk sizes of perl527.dll
VC 2008 32b
b4 perl527.dll 1605KB
b4 miniperl.exe 1376KB
after perl527.dll 1260KB
after miniperl.exe 1037KB
gcc version 4.8.3 32b
b4 perl527.dll 1949KB
b4 miniperl.exe 1763KB
after perl527.dll 1949KB
after miniperl.exe 1763KB
more file size diffs of XS DLLs before and after this commit are in
the ticket associated with this commit
Since I can't fix the problem with MS/MSVC-specific extensions, nor fix
already published MSVC versions from the past (file bug ticket for MSVC
with MS, wait till next release), this problem needs to be tackled from
P5P side. Also there might be other buggy non-GCC linkers that have
similar lack of tossing of unrefed statics. Also building perl is
theoretically faster since the CC doesn't need parse/assemble the static
SCX table dozens of times and place it in dozens of .o'es only for the
linker to toss 99% of the instances of SCX tables during the final link.
There are 2 main concepts how to fix it, either SCX inv table
becomes extern const global visible to all .o'es and initialized once,
or the limit the static's scope back to 1 or 2 .o'es. Limiting the statics
scope is a simpler fix.
The enum decl must be split from the tables because isSCRIPT_RUN has a
currently unused by any caller 4th argument "SCX_enum * ret_script".
Currently all callers pass NULL for that arg, so theoretically the argument
could be removed/reverted to the 5.26 API. But SCX_enum type must be
defined for all .o files (or the PERL_CORE/PERL_EXT subset) because of
proto.h declaring isSCRIPT_RUN. isSCRIPT_RUN has 3 callers. re.pm's
regexec.c, core's regexec.c and core's mg.c. SCX tables only need to be
visible to regexec.c, but the enum must be visible in 3 places because
of isSCRIPT_RUN's prototype. Spliting the enum from the data is the only
choice if arg 4 of isSCRIPT_RUN is to stay. Changing isSCRIPT_RUN's 4th arg
to a void * is too hackish. Predeclaring without defining enums isn't
portable. Playing with embed.fnc/proto.h to limit isSCRIPT_RUN's scope to
core (libperl) and re.so/re_exec.c seems too complicated. It still wouldn't
fix the problem that in libperl mg.o would have a 2nd unused copy of the
data tables (down from 43 copies) with MSVC. This patch doesn't fix that
a MSVC all static build would have 2 copies of the table, one of core's no
dbg regexec.c and re's dbg re_exec. MSVC re.dll with this patch has unused
SCX tables inside it. The next commit will remove the copy from re.dll.
mk_invlist.pl's %where_to_define_enums says "enums" but it actually applied
to enum and the table. So split it to really be only for enums.
---
charclass_invlists.h | 20 ++++++++++++++++----
regen/mk_invlists.pl | 2 ++
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/charclass_invlists.h b/charclass_invlists.h
index 60c64ef..7a37208 100644
--- a/charclass_invlists.h
+++ b/charclass_invlists.h
@@ -20593,6 +20593,10 @@ typedef enum {
SCX_use_AUX_TABLE_45 = -45
} SCX_enum;
+#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+
+#if defined(PERL_IN_REGEXEC_C)
+
#define HAS_SCX_AUX_TABLES
static const SCX_enum SCX_AUX_TABLE_1[] = {
@@ -22754,7 +22758,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for ASCII/Latin1 */
SCX_Unknown
};
-#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+#endif /* defined(PERL_IN_REGEXEC_C) */
#if defined(PERL_IN_PERL_C)
@@ -56984,6 +56988,10 @@ typedef enum {
SCX_use_AUX_TABLE_45 = -45
} SCX_enum;
+#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+
+#if defined(PERL_IN_REGEXEC_C)
+
#define HAS_SCX_AUX_TABLES
static const SCX_enum SCX_AUX_TABLE_1[] = {
@@ -59171,7 +59179,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for EBCDIC 1047 */
SCX_Unknown
};
-#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+#endif /* defined(PERL_IN_REGEXEC_C) */
#if defined(PERL_IN_PERL_C)
@@ -93585,6 +93593,10 @@ typedef enum {
SCX_use_AUX_TABLE_45 = -45
} SCX_enum;
+#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+
+#if defined(PERL_IN_REGEXEC_C)
+
#define HAS_SCX_AUX_TABLES
static const SCX_enum SCX_AUX_TABLE_1[] = {
@@ -95768,7 +95780,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for EBCDIC 037 */
SCX_Unknown
};
-#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
+#endif /* defined(PERL_IN_REGEXEC_C) */
#if defined(PERL_IN_PERL_C)
@@ -109537,5 +109549,5 @@ static const U8 WB_table[24][24] = {
* 5671c3de473b25e7ea47097e4906260624dfabe3e9b1739f490aecbc3d858459 lib/unicore/mktables
* 21653d2744fdd071f9ef138c805393901bb9547cf3e777ebf50215a191f986ea lib/unicore/version
* 913d2f93f3cb6cdf1664db888bf840bc4eb074eef824e082fceda24a9445e60c regen/charset_translations.pl
- * 4898ec84e2b81e8bf948dcdb1c015c14f258cc652337122719885a276ea46d7b regen/mk_invlists.pl
+ * 9cc3f0c19746b1ceaef1155d27d0ddc15e53b6ecd487acc9a609852b2f0e7575 regen/mk_invlists.pl
* ex: set ro: */
diff --git a/regen/mk_invlists.pl b/regen/mk_invlists.pl
index 79ceba9..947605c 100644
--- a/regen/mk_invlists.pl
+++ b/regen/mk_invlists.pl
@@ -429,6 +429,8 @@ sub output_invmap ($$$$$$$) {
print $out_fh "\n";
$declaration_type = "${name_prefix}enum";
print $out_fh "} $declaration_type;\n";
+ #scope tables definitions more narrowly than enums
+ switch_pound_if($name, 'PERL_IN_REGEXEC_C');
# Finished with the enum defintion.
$output_format = "${name_prefix}%s";
--
1.7.9.msysgit.0
|
From @bulk880002-remove-unused-SCX-inv-data-from-re-XS-library.patchFrom 7c86e2e6cac17e0811eb82bf76539a5b6506db57 Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 1 Mar 2018 01:14:53 -0500
Subject: [PATCH 2/2] remove unused SCX inv data from re XS library
See previous commit
"dont define and declare SCX_AUX_TABLES in every perl binary". Since
commit c9412ef201 "Hide Perl_isSCRIPT_RUN from the re extension."
isSCRIPT_RUN is not compiled inside re XS shared lib, instead the copy in
libperl is called from re.so/dll. Since the SCX const static tables arent
optimized away, and isSCRIPT_RUN is the only func that references them
remove them from being compiled at all inside re.pm.
b4 VC 2003 32 bit disk file size re.dll is 280KB
after this patch re.dll is 272KB
---
charclass_invlists.h | 14 +++++++-------
regen/mk_invlists.pl | 8 +++++++-
regexec.c | 2 ++
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/charclass_invlists.h b/charclass_invlists.h
index 7a37208..d4f7631 100644
--- a/charclass_invlists.h
+++ b/charclass_invlists.h
@@ -20595,7 +20595,7 @@ typedef enum {
#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
-#if defined(PERL_IN_REGEXEC_C)
+#if defined(PERL_IN_REGEXEC_C_NOT_RE)
#define HAS_SCX_AUX_TABLES
@@ -22758,7 +22758,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for ASCII/Latin1 */
SCX_Unknown
};
-#endif /* defined(PERL_IN_REGEXEC_C) */
+#endif /* defined(PERL_IN_REGEXEC_C_NOT_RE) */
#if defined(PERL_IN_PERL_C)
@@ -56990,7 +56990,7 @@ typedef enum {
#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
-#if defined(PERL_IN_REGEXEC_C)
+#if defined(PERL_IN_REGEXEC_C_NOT_RE)
#define HAS_SCX_AUX_TABLES
@@ -59179,7 +59179,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for EBCDIC 1047 */
SCX_Unknown
};
-#endif /* defined(PERL_IN_REGEXEC_C) */
+#endif /* defined(PERL_IN_REGEXEC_C_NOT_RE) */
#if defined(PERL_IN_PERL_C)
@@ -93595,7 +93595,7 @@ typedef enum {
#endif /* defined(PERL_CORE) || defined(PERL_EXT) */
-#if defined(PERL_IN_REGEXEC_C)
+#if defined(PERL_IN_REGEXEC_C_NOT_RE)
#define HAS_SCX_AUX_TABLES
@@ -95780,7 +95780,7 @@ static const SCX_enum _Perl_SCX_invmap[] = { /* for EBCDIC 037 */
SCX_Unknown
};
-#endif /* defined(PERL_IN_REGEXEC_C) */
+#endif /* defined(PERL_IN_REGEXEC_C_NOT_RE) */
#if defined(PERL_IN_PERL_C)
@@ -109549,5 +109549,5 @@ static const U8 WB_table[24][24] = {
* 5671c3de473b25e7ea47097e4906260624dfabe3e9b1739f490aecbc3d858459 lib/unicore/mktables
* 21653d2744fdd071f9ef138c805393901bb9547cf3e777ebf50215a191f986ea lib/unicore/version
* 913d2f93f3cb6cdf1664db888bf840bc4eb074eef824e082fceda24a9445e60c regen/charset_translations.pl
- * 9cc3f0c19746b1ceaef1155d27d0ddc15e53b6ecd487acc9a609852b2f0e7575 regen/mk_invlists.pl
+ * a049a81399b6a078a993baa2196a48620d90c537976b4c770c4dc56ccc6e6d23 regen/mk_invlists.pl
* ex: set ro: */
diff --git a/regen/mk_invlists.pl b/regen/mk_invlists.pl
index 947605c..a885568 100644
--- a/regen/mk_invlists.pl
+++ b/regen/mk_invlists.pl
@@ -61,6 +61,9 @@ my %exceptions_to_where_to_define =
);
my %where_to_define_enums = ( _Perl_SCX => [ qw(PERL_CORE PERL_EXT) ],
);
+my %where_to_define_maps = ( _Perl_SCX => 'PERL_IN_REGEXEC_C_NOT_RE',
+ );
+
my %gcb_enums;
my @gcb_short_enums;
@@ -430,7 +433,10 @@ sub output_invmap ($$$$$$$) {
$declaration_type = "${name_prefix}enum";
print $out_fh "} $declaration_type;\n";
#scope tables definitions more narrowly than enums
- switch_pound_if($name, 'PERL_IN_REGEXEC_C');
+ $where = (exists $where_to_define_maps{$name})
+ ? $where_to_define_maps{$name}
+ : 'PERL_IN_REGEXEC_C';
+ switch_pound_if($name, $where);
# Finished with the enum defintion.
$output_format = "${name_prefix}%s";
diff --git a/regexec.c b/regexec.c
index 750ddb5..d9edb33 100644
--- a/regexec.c
+++ b/regexec.c
@@ -35,6 +35,8 @@
#ifdef PERL_EXT_RE_BUILD
#include "re_top.h"
+#else
+# define PERL_IN_REGEXEC_C_NOT_RE
#endif
/*
--
1.7.9.msysgit.0
|
From @khwilliamsonDoes my patch 49cd072 which you suggested fix this for 5.28? (I haven't done any investigating) |
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88On Fri, 02 Mar 2018 12:05:52 -0800, khw wrote:
No. Perl_SCX is still defined in every .o file and there are still 43 copies of it inside perl527.dll. mk_invlists.pl/charclass_invlists.h need to be changed to limit Perl_SCX to regexec.c only the way it was before. The SCX_enum arg being part of extern export API because of isSCRIPT_RUN is why this patch split the #define for the enum from the map/list table but now that that arg is gone from extern API, mk_invlists.pl can be changed back to the old way it was. -- |
From @tonycozOn Fri, 02 Mar 2018 15:45:25 -0800, bulk88 wrote:
Was this fixed by c0221e1 ? Tony |
From @bulk88On Tue, 06 Mar 2018 19:30:53 -0800, tonyc wrote:
Yes. -- |
@iabyn - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#132926 (status was 'resolved')
Searchable as RT132926$
The text was updated successfully, but these errors were encountered: