Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] remove regexec.c:non_utf8_target_but_utf8_required indirection var #16448

Closed
p5pRT opened this issue Mar 1, 2018 · 10 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Mar 1, 2018

Migrated from rt.perl.org#132928 (status was 'resolved')

Searchable as RT132928$

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2018

From @bulk88

Created by @bulk88

See attached patch.

Perl Info

Flags:
       category=core
       severity=low

Site configuration information for perl 5.27.9:

Configured by Administrator at Tue Jan 30 20:34:30 2018.

Summary of my perl5 (revision 5 version 27 subversion 9) configuration:

     Platform:
       osname=MSWin32
       osvers=5.2.3790
       archname=MSWin32-x86-multi-thread
       uname=''
       config_args='undef'
       hint=recommended
       useposix=true
       d_sigaction=undef
       useithreads=define
       usemultiplicity=define
       use64bitint=undef
       use64bitall=undef
       uselongdouble=undef
       usemymalloc=n
       default_inc_excludes_dot=define
       bincompat5005=undef
     Compiler:
       cc='cl'
       ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GL -DWIN32
-D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE
-D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DWIN32_NO_REGISTRY'
       optimize='-O1 -MD -Zi -DNDEBUG -GL'
       cppflags='-DWIN32'
       ccversion='15.00.30729.01'
       gccversion=''
       gccosandvers=''
       intsize=4
       longsize=4
       ptrsize=4
       doublesize=8
       byteorder=1234
       doublekind=3
       d_longlong=undef
       longlongsize=8
       d_longdbl=define
       longdblsize=8
       longdblkind=0
       ivtype='long'
       ivsize=4
       nvtype='double'
       nvsize=8
       Off_t='__int64'
       lseeksize=8
       alignbytes=8
       prototype=define
     Linker and Libraries:
       ld='link'
       ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg
-libpath:"c:\perl\lib\CORE"        -machine:x86'
       libpth="C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\lib"
       libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
       perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
       libc=msvcrt.lib
       so=dll
       useshrplib=true
       libperl=perl527.lib
       gnulibc_version=''
     Dynamic Linking:
       dlsrc=dl_win32.xs
       dlext=dll
       d_dlsymun=undef
       ccdlflags=' '
       cccdlflags=' '
       lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg
       -libpath:"c:\perl\lib\CORE"        -machine:x86'



@INC for perl 5.27.9:
       lib
       C:/p527/srcnew/lib


Environment for perl 5.27.9:
       CYGWIN=tty
       HOME (unset)
       LANG (unset)
       LANGUAGE (unset)
       LD_LIBRARY_PATH=/usr/lib/x86:/usr/X11R6/lib
       LOGDIR (unset)
       PATH=C:\WINDOWS\system32;C:\Program Files (x86)\Microsoft Visual
Studio 9.0\VC\BIN;C:\Program Files\Microsoft
SDKs\Windows\v6.0A\bin;C:\Perl\bin;C:\WINDOWS;C:\Program Files
(x86)\Microsoft Visual Studio 9.0\Common7\IDE;C:\Program Files
(x86)\Git\bin;C:\sp3220\c\bin;
       PERL_BADLANG (unset)
       SHELL (unset)






@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2018

From @bulk88

0001-remove-regexec.c-non_utf8_target_but_utf8_required-i.patch
From ad23122a7ca6fc961e0539715ace78fc75cc615a Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 1 Mar 2018 10:33:53 -0500
Subject: [PATCH] remove regexec.c:non_utf8_target_but_utf8_required
 indirection var

commit 7e0d5ad7c9 "regexec.c: PATCH: [perl #114808]" made a global ptr
var that contains a ptr to a C string instead of writing the C string
literal multiple times and letting the linker dedup them. There is no
swaping the message/char * at runtime either because of consting. The var
became static in commit 991fc03aa2
"Make private variable static in regexec.c.". Since in re.pm's XS lib
var non_utf8_target_but_utf8_required exists but isnt referenced and isn't
optimized away due to a MSVC linker bug, just remove the intermediate var
and put the string litteral in the macro where it may or may not be removed
by the CPP. Remove the %s fmt string too since there can't be any token
injection into the string literal from afar (this was some CC unsafe fmt
string warning thing probably in commit 6b54ddc5f0 ).

VC 2003 32b CC re.dll
b4
.text section 0x2E9C0 bytes long
.rdata section 0x10661 bytes long
after
.text section 0x2E9B0 bytes long
.rdata section 0x10651 bytes long

the .text section decreased because "push offset "%s"" instruction was
removed from my_re_intuit_start and my_regexec. The MS CC did optimize
the Perl_re_printf call to do "push offset "Can't match..."" instead of
push [non_utf8_target_but_utf8_required] reading the char * from RO mem
instead of the char * being encoded in cpu instruction like a string
literal is, but didnt optimize away the non_utf8_target_but_utf8_required
static global var.
---
 regexec.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/regexec.c b/regexec.c
index d9edb33..eb6ea59 100644
--- a/regexec.c
+++ b/regexec.c
@@ -91,21 +91,17 @@
 static const char utf8_locale_required[] =
       "Use of (?[ ]) for non-UTF-8 locale is wrong.  Assuming a UTF-8 locale";
 
-#ifdef DEBUGGING
-/* At least one required character in the target string is expressible only in
- * UTF-8. */
-static const char* const non_utf8_target_but_utf8_required
-                = "Can't match, because target string needs to be in UTF-8\n";
-#endif
-
 /* Returns a boolean as to whether the input unsigned number is a power of 2
  * (2**0, 2**1, etc).  In other words if it has just a single bit set.
  * If not, subtracting 1 would leave the uppermost bit set, so the & would
  * yield non-zero */
 #define isPOWER_OF_2(n) ((n & (n-1)) == 0)
 
+/* At least one required character in the target string is expressible only in
+ * UTF-8. */
 #define NON_UTF8_TARGET_BUT_UTF8_REQUIRED(target) STMT_START {           \
-    DEBUG_EXECUTE_r(Perl_re_printf( aTHX_  "%s", non_utf8_target_but_utf8_required));\
+    DEBUG_EXECUTE_r(Perl_re_printf( aTHX_                                \
+        "Can't match, because target string needs to be in UTF-8\n"));   \
     goto target;                                                         \
 } STMT_END
 
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2018

From @khwilliamson

On 03/01/2018 08​:39 AM, bulk88 (via RT) wrote​:

# New Ticket Created by bulk88
# Please include the string​: [perl #132928]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=132928 >

This is a bug report for perl from bulk88@​hotmail.com,
generated with the help of perlbug 1.41 running under perl 5.27.9.

-----------------------------------------------------------------
[Please describe your issue here]

See attached patch.

I'm trying to understand the point of your patch. It looks like it
lowers slightly the size under Windows, but there are compilers, such
IIRC as HP's that will generate duplicate copies of this string, so
their size will increase, I believe.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=low
---
Site configuration information for perl 5.27.9​:

Configured by Administrator at Tue Jan 30 20​:34​:30 2018.

Summary of my perl5 (revision 5 version 27 subversion 9) configuration​:

  Platform&#8203;:
    osname=MSWin32
    osvers=5\.2\.3790
    archname=MSWin32\-x86\-multi\-thread
    uname=''
    config\_args='undef'
    hint=recommended
    useposix=true
    d\_sigaction=undef
    useithreads=define
    usemultiplicity=define
    use64bitint=undef
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default\_inc\_excludes\_dot=define
    bincompat5005=undef
  Compiler&#8203;:
    cc='cl'
    ccflags ='\-nologo \-GF \-W3 \-O1 \-MD \-Zi \-DNDEBUG \-GL \-DWIN32

-D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE
-D_CRT_NONSTDC_NO_DEPRECATE -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DWIN32_NO_REGISTRY'
optimize='-O1 -MD -Zi -DNDEBUG -GL'
cppflags='-DWIN32'
ccversion='15.00.30729.01'
gccversion=''
gccosandvers=''
intsize=4
longsize=4
ptrsize=4
doublesize=8
byteorder=1234
doublekind=3
d_longlong=undef
longlongsize=8
d_longdbl=define
longdblsize=8
longdblkind=0
ivtype='long'
ivsize=4
nvtype='double'
nvsize=8
Off_t='__int64'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='link'
ldflags ='-nologo -nodefaultlib -debug -opt​:ref,icf -ltcg
-libpath​:"c​:\perl\lib\CORE" -machine​:x86'
libpth="C​:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\lib"
libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
libc=msvcrt.lib
so=dll
useshrplib=true
libperl=perl527.lib
gnulibc_version=''
Dynamic Linking​:
dlsrc=dl_win32.xs
dlext=dll
d_dlsymun=undef
ccdlflags=' '
cccdlflags=' '
lddlflags='-dll -nologo -nodefaultlib -debug -opt​:ref,icf -ltcg
-libpath​:"c​:\perl\lib\CORE" -machine​:x86'

---
@​INC for perl 5.27.9​:
lib
C​:/p527/srcnew/lib

---
Environment for perl 5.27.9​:
CYGWIN=tty
HOME (unset)
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH=/usr/lib/x86​:/usr/X11R6/lib
LOGDIR (unset)
PATH=C​:\WINDOWS\system32;C​:\Program Files (x86)\Microsoft Visual
Studio 9.0\VC\BIN;C​:\Program Files\Microsoft
SDKs\Windows\v6.0A\bin;C​:\Perl\bin;C​:\WINDOWS;C​:\Program Files
(x86)\Microsoft Visual Studio 9.0\Common7\IDE;C​:\Program Files
(x86)\Git\bin;C​:\sp3220\c\bin;
PERL_BADLANG (unset)
SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 7, 2018

From @bulk88

On Fri, 02 Mar 2018 06​:08​:06 -0800, public@​khwilliamson.com wrote​:

On 03/01/2018 08​:39 AM, bulk88 (via RT) wrote​:

# New Ticket Created by bulk88
# Please include the string​: [perl #132928]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=132928 >

This is a bug report for perl from bulk88@​hotmail.com,
generated with the help of perlbug 1.41 running under perl 5.27.9.

-----------------------------------------------------------------
[Please describe your issue here]

See attached patch.

I'm trying to understand the point of your patch. It looks like it
lowers slightly the size under Windows, but there are compilers, such
IIRC as HP's that will generate duplicate copies of this string, so
their size will increase, I believe.

That static char * var with 1 reference wasn't optimized out by VC. I poked around a HPUX SL format perl binary a bit, IDK if it was compiled with acc or gcc, but it looked to me like identical string literals from ONE .o are de-duped but are not deduped across .o boundaries, and libperl is made of multiple .o'es. This patch doesn't solve deduping the string literals on CCs that dont want to, but it did removed the global pointer sized var. Maybe wrong linker/CC flags with perl on HPUX. I know originally per C specification all C double quote literals were stored in RW memory, but I think Perl forces it RO on any OS that has default RW string literals with Configure.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Mar 9, 2018

From zefram@fysh.org

bulk88 via RT wrote​:

That static char * var with 1 reference wasn't optimized out by VC.

That's too small a concern to be worth optimising for, especially for
something only present in debugging builds.

This patch doesn't solve deduping the string literals on CCs that dont
want to, but it did removed the global pointer sized var.

Duplication of string literals involves quite a bit more binary size
than the occasional pointer variable, and *is* something we optimise
for. Some compilers don't merge identical string literals at all;
some merge within a compilation unit but not between compilation units;
some may merge between compilation units as well. There is some value in
lifting this string literal out as a separate named object, even within
a single compilation unit, because it's referenced in multiple places.
Though for debugging-only code the value is limited.

Probably a better approach would be to make the named object an array
of char, instead of a pointer to char, as utf8_locale_required[] just
above it already is. This means that only a single instance of the
string would exist (as at present), but in addition a straightforward
compilation wouldn't allocate storage for a pointer to the string.
References to the array would compile to point directly to the string.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2019

From @khwilliamson

I took Zefram's approach in e7a474c
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2019

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant