-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Revert "utf8.h, utfebcdic.h: Add some assertions" #14903
Comments
From @bulk88Created by @bulk88See attached patch. before dmake: Error code 130, while making 'mini\regexec.obj' Perl Info
|
From @bulk880001-Revert-utf8.h-utfebcdic.h-Add-some-assertions.patchFrom af3851c8d46d25f5ab5ab6b9a595bdbdb890a92a Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Sat, 12 Sep 2015 20:15:23 -0400
Subject: [PATCH] Revert "utf8.h, utfebcdic.h: Add some assertions"
This reverts commit e9b19ab7df3480e8f710ca6faad519f6fccdb081.
This fixes a Win32 Visual C 2003 DEBUGGING build failure in compiling
regexec.obj.
cl -c -nologo -GF -W3 -I.\include -I. -I.. -DWIN32 -D_CONSOLE -DNO_STRICT -DPERL
DLL -DPERL_CORE -O1 -MD -Zi -DDEBUGGING -DPERL_EXTERNAL_GLOB -
DPERL_IS_MINIPERL -Fomini\regexec.obj -Fdmini\regexec.pdb ..\regexec.c
regexec.c
..\regexec.c(2006) : fatal error C1001: INTERNAL COMPILER ERROR
(compiler file 'msc1.cpp', line 2708)
Please choose the Technical Support command on the Visual C++
Help menu, or open the Technical Support help file for more information
No response from original author of the code at
http://www.nntp.perl.org/group/perl.perl5.porters/2015/09/msg230842.html
---
utf8.h | 8 ++++----
utfebcdic.h | 10 ++++------
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/utf8.h b/utf8.h
index 17f0e82..668626f 100644
--- a/utf8.h
+++ b/utf8.h
@@ -127,8 +127,8 @@ END_EXTERN_C
/* Native character to/from iso-8859-1. Are the identity functions on ASCII
* platforms */
-#define NATIVE_TO_LATIN1(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) (ch))
-#define LATIN1_TO_NATIVE(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) (ch))
+#define NATIVE_TO_LATIN1(ch) (ch)
+#define LATIN1_TO_NATIVE(ch) (ch)
/* I8 is an intermediate version of UTF-8 used only in UTF-EBCDIC. We thus
* consider it to be identical to UTF-8 on ASCII platforms. Strictly speaking
@@ -136,8 +136,8 @@ END_EXTERN_C
* because they are 8-bit encodings that serve the same purpose in Perl, and
* rarely do we need to distinguish them. The term "NATIVE_UTF8" applies to
* whichever one is applicable on the current platform */
-#define NATIVE_UTF8_TO_I8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) (ch))
-#define I8_TO_NATIVE_UTF8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) (ch))
+#define NATIVE_UTF8_TO_I8(ch) (ch)
+#define I8_TO_NATIVE_UTF8(ch) (ch)
/* Transforms in wide UV chars */
#define UNI_TO_NATIVE(ch) (ch)
diff --git a/utfebcdic.h b/utfebcdic.h
index c852946..003fb79 100644
--- a/utfebcdic.h
+++ b/utfebcdic.h
@@ -133,13 +133,11 @@ END_EXTERN_C
/* EBCDIC-happy ways of converting native code to UTF-8 */
-/* Use these when ch is known to be < 256 */
-#define NATIVE_TO_LATIN1(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) PL_e2a[(U8)(ch)])
-#define LATIN1_TO_NATIVE(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) PL_a2e[(U8)(ch)])
+#define NATIVE_TO_LATIN1(ch) PL_e2a[(U8)(ch)]
+#define LATIN1_TO_NATIVE(ch) PL_a2e[(U8)(ch)]
-/* Use these on bytes */
-#define NATIVE_UTF8_TO_I8(b) (__ASSERT_(FITS_IN_8_BITS(b)) PL_e2utf[(U8)(b)])
-#define I8_TO_NATIVE_UTF8(b) (__ASSERT_(FITS_IN_8_BITS(b)) PL_utf2e[(U8)(b)])
+#define NATIVE_UTF8_TO_I8(ch) PL_e2utf[(U8)(ch)]
+#define I8_TO_NATIVE_UTF8(ch) PL_utf2e[(U8)(ch)]
/* Transforms in wide UV chars */
#define NATIVE_TO_UNI(ch) (FITS_IN_8_BITS(ch) ? NATIVE_TO_LATIN1(ch) : (ch))
--
1.7.9.msysgit.0
|
From @khwilliamsonOn 09/12/2015 06:25 PM, bulk88 (via RT) wrote:
Of course, we will try to support your compiler, but support for current Your final statement quoted above seems to indicate that because I (the "I day or 2 ago on some a now deleted smoke-me branch of khws I got There is nothing actionable in that. smoke-me branches come and go. This If we found out that the commit was stressing a bunch of different The bottom line is to submit a patch that gets your compiler to working,
|
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88On Thu Sep 17 20:36:24 2015, public@khwilliamson.com wrote:
P5P supports VC 6 to 2013. Are you announcing that Perl 5.24 has discontinued supporting Visual C?
If you didn't change anything to address my complaint, and you pushed the code to blead, why would it suddenly stop failing? The only fix that was ever determined in that thread was to change the CC's buffer size for HP CC ( http://www.nntp.perl.org/group/perl.perl5.porters/2015/09/msg230941.html ) and nothing was committed to blead since Tux said he needs to test it further. You could have asked/posted "can you still reproduce this problem?" or "file a bug ticket so this isn't forgotten". Instead you never responded at all to me.
You broke 2 different CCs and didnt know about it until the smoke tests/user reports came in. Perhaps the design of the code/that patch should be rethought instead of figuring out how to fix breakage after the fact and assuming the patch is a golden cow. Using rare uncommon options to fix something says the code is difficult to maintain, and the code should change to meet the CC/interpreter/parser whatever and not "break" on every "upgrade". For that patch an "upgrade" is some new CC or port in the future.
Why dont you submit a patch that gets everyone elses compiler working? You are the author of the LOC and the author of the API (FBC_BOUND) in question. If you didn't have a commit bit, your patch would be reverted within a few days if you failed to respond/provide a fix, am I not right? I did research fixing it myself, but I stopped since you created an macro API (macro FBC_BOUND) that uses implicitly assumes there are certain C autos in scope, which is not allowed by perl's design style policy (there is grandfathered use AFAIK). At that point I stopped thinking of writing a patch since I cant change FBC_BOUND and started bisecting. FBC_BOUND creation commit Blead at http://perl5.git.perl.org/perl.git/b58e9a87f1dde6553a833962d54161262ba90be5 still has the build failure so it has not been fixed accidentally/unintentionally and this revert patch is still valid. C:\perl521\src\win32>dmake mini/regexec.obj CCTYPE=MSVC70 CFG=Debug dmake: Error code 130, while making 'mini\regexec.obj' C:\perl521\src\win32> The technical problem is "FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);" is expanding beyond a fixed length buffer in the VC/HP C CPP code. HP C has a buffer change cmd line option, VC does not. For VC the fixed length buffer is probably some power of 2 between 4KB and 64KB including those 2. As davem suggested in http://www.nntp.perl.org/group/perl.perl5.porters/2015/09/msg230968.html you need to turn stuff into functions. You are the creator of every token in the string "FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);" http://perl5.git.perl.org/perl.git/commitdiff/63ac0dadb1aafcf0c171d3c1422c1923b611b2fc Nobody else is going to, or will be competent at refactoring that API except you. Some people have said on P5P they use DEBUGGING perl for their business code. I dont care about optimizing DEBUGGING perl and never tried, but maybe the speed of DEBUGGING perl should be paid attention to since it affects 1. people paranoid enough to use OpenBSD and/or 100% asserts in all binaries on their system FBC_BOUND expands to pagefuls and pagefuls of code. A CC has to optimize all that code when creating regexec.o slowing down regexec.o compile time (its #3 in perl core on GCC compile time). Also there is a high risk that the same assert check runs multiple times after -O2 optimization due to macros in multieval macros being exponentially expanded. I suspect FBC_BOUND looks more like it should be a function call that calls func pointers from objects since it implements an algorithm that parts are plugged into. Useless details ahead about Visual C since it is closed source. A callstack of the assert in Visual C doesn't show enough to be very useful. c1.dll!Trap_main_compile() + 0x14e39 A SEGV exception in the CC. c1.dll!tl_getid() - 0xaac7 First-chance exception at 0x1061af14 (c1.dll) in cl.exe: 0xC0000005: Access violation writing location 0x008c4000. The first couple SEGVs, note all of them are at exactly a page boundary, those pages are allocated by a SEGV handler inside Visual C and execution resumes normally. 0x8c0000 and 0xdd0000 are 5124 KB blocks with only a couple 4KB chunks allocated at the beginning of the 5124KB reserved region/mem obj region. 0x8c0000 and 0xdd0000 memory regions are marked private/directly mmaped/directly VirtualAlloced by the OS, they are not part of malloc. The last SEGV is not at an page boundary and probably is causing the assert fail. Mem obj 0x8b0000 is a region from 0x8b0000 to 0x8c0000, 64KB big. 0x8b0000 to 0x8b5000 (20KB) is allocated and part of malloc. 0x8b5000 to 0x8c0000 is reserved for malloc to expand into, and it is owned by the malloc code. Faulting address pointer 0x008bb664 is a overflow by 0x008bb664 - 0x8b5000 = 0x6664 bytes or ~26 KB. The GCC expansion of "FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);" winds up being 49476 bytes long for me. The VC expansion will be different but IDK if larger or smaller. Telling malloc to dump itself. 0:001> !heap -h 0x8b0000 number in parens is malloc allocation size in hex. If I go plainly by size of the last entry, 0x800 or 2KB, that is the buffer size of the Visual C C preprocessor. The real answer might be more complicated with 0x800 being a truncated 0x10800 or 0x1800. Notice the 0x1800 allocation also. It was filled with alot 0x01, 0x04, and mostly 0x00 bytes, nothing else. The last in malloc heap big 0x800 allocation is a bunch of FILE * from libc. 0x77c5____ are data globals inside msvcrt.dll, including FILE * fd 0,1,2. They might be uninitialized memory from the past. 008b47c0 01 01 11 00 f0 01 08 00 80 fc c5 77 a0 fc c5 77 c0 fc ...........w...w.. -- |
From @tonycozOn Thu Sep 24 04:11:58 2015, bulk88 wrote:
...
I (and I expect khw) don't have the compiler to test with, and it's not being used for smoking. Only you can test such changes.
While I think it's bad in general, both the parser and the regexp engine make use of it, and for good reason I think. Could you please test the branch tonyc/vc2003-macro-issues to see if it crashes VC2003? Tony |
From @tonycoz0001-WIP-test-fix-for-126045-needs-work-needs-testing.patchFrom e708ca371e3916d4d43873b94590507b76536a3b Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 3 Nov 2015 15:27:31 +1100
Subject: WIP test fix for 126045, needs work, needs testing
---
regexec.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 43 insertions(+), 5 deletions(-)
diff --git a/regexec.c b/regexec.c
index 85c31a6..c76190f 100644
--- a/regexec.c
+++ b/regexec.c
@@ -263,6 +263,44 @@ static void S_setup_eval_state(pTHX_ regmatch_info *const reginfo);
static void S_cleanup_regmatch_info_aux(pTHX_ void *arg);
static regmatch_state * S_push_slab(pTHX);
+/* this needs to check the correct version number for MSVC 2003 */
+#ifdef _MSC_VER
+
+PERL_STATIC_INLINE bool
+S_isWORDCHAR_uni(pTHX_ UV c) {
+ /* macro already does cBOOL() */
+ return isWORDCHAR_uni(c);
+}
+
+PERL_STATIC_INLINE bool
+S_isWORDCHAR_LC_uvchr(pTHX_ UV c) {
+ return isWORDCHAR_LC_uvchr(c);
+}
+
+PERL_STATIC_INLINE bool
+S_isWORDCHAR_utf8(pTHX_ const U8 *p) {
+ return isWORDCHAR_utf8(p);
+}
+
+PERL_STATIC_INLINE bool
+S_isWORDCHAR_LC_utf8(pTHX_ const U8 *p) {
+ return isWORDCHAR_LC_utf8(p);
+}
+
+#define isWORDCHAR_uni_x(c) (S_isWORDCHAR_uni(aTHX_ (c)))
+#define isWORDCHAR_LC_uvchr_x(c) (S_isWORDCHAR_LC_uvchr(aTHX_ (c)))
+#define isWORDCHAR_utf8_x(p) (S_isWORDCHAR_utf8(aTHX_ (p)))
+#define isWORDCHAR_LC_utf8_x(p) (S_isWORDCHAR_LC_utf8(aTHX_ (p)))
+
+#else
+
+#define isWORDCHAR_uni_x isWORDCHAR_uni
+#define isWORDCHAR_LC_uvchr_x isWORDCHAR_LC_uvchr
+#define isWORDCHAR_utf8_x isWORDCHAR_utf8
+#define isWORDCHAR_LC_utf8_x isWORDCHAR_LC_utf8
+
+#endif
+
#define REGCP_PAREN_ELEMS 3
#define REGCP_OTHER_ELEMS 3
#define REGCP_FRAME_ELEMS 1
@@ -2003,7 +2041,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
goto do_boundu;
}
- FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);
+ FBC_BOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr_x, isWORDCHAR_LC_utf8_x);
break;
case NBOUNDL:
@@ -2016,14 +2054,14 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
goto do_nboundu;
}
- FBC_NBOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8);
+ FBC_NBOUND(isWORDCHAR_LC, isWORDCHAR_LC_uvchr_x, isWORDCHAR_LC_utf8_x);
break;
case BOUND: /* regcomp.c makes sure that this only has the traditional \b
meaning */
assert(FLAGS(c) == TRADITIONAL_BOUND);
- FBC_BOUND(isWORDCHAR, isWORDCHAR_uni, isWORDCHAR_utf8);
+ FBC_BOUND(isWORDCHAR, isWORDCHAR_uni_x, isWORDCHAR_utf8_x);
break;
case BOUNDA: /* regcomp.c makes sure that this only has the traditional \b
@@ -2037,7 +2075,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
meaning */
assert(FLAGS(c) == TRADITIONAL_BOUND);
- FBC_NBOUND(isWORDCHAR, isWORDCHAR_uni, isWORDCHAR_utf8);
+ FBC_NBOUND(isWORDCHAR, isWORDCHAR_uni_x, isWORDCHAR_utf8_x);
break;
case NBOUNDA: /* regcomp.c makes sure that this only has the traditional \b
@@ -2049,7 +2087,7 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
case NBOUNDU:
if ((bound_type) FLAGS(c) == TRADITIONAL_BOUND) {
- FBC_NBOUND(isWORDCHAR_L1, isWORDCHAR_uni, isWORDCHAR_utf8);
+ FBC_NBOUND(isWORDCHAR_L1, isWORDCHAR_uni_x, isWORDCHAR_utf8_x);
break;
}
--
2.1.4
|
From @bulk88On Mon Nov 02 21:02:44 2015, tonyc wrote:
Your branch changes the fatal error from VC 2003 but doesnt fix it. before C:\perl521\src\win32>dmake test dmake: Error code 130, while making 'mini\regexec.obj' C:\perl521\src\win32> after your patch "* WIP test fix for 126045, needs work, needs testing" C:\perl521\src\win32>dmake test C:\perl521\src\win32> The description for C2026 says the buffer is 2KB. https://msdn.microsoft.com/en-us/library/dddywwsc%28v=vs.71%29.aspx -- |
From @bulk88On Sat Nov 28 00:40:08 2015, bulk88 wrote:
Bump -- |
From @tonycozOn Sat Nov 28 00:40:08 2015, bulk88 wrote:
How about the attached? This follows the approach khw suggested in IRC, assuming I understand what he meant. The value I'm checking _MSC_VER against came from http://varlab.blogspot.com.au/2012/11/if-you-develop-applications-for-windows.html so it might need adjustment if that page is incorrect. Tony |
From @tonycoz0001-perl-126045-part-revert-e9b19ab7-for-vc2003-only.patchFrom 31c4a31a656328fc3509a6a2abfddf13b345e305 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 4 Jan 2016 15:50:59 +1100
Subject: [PATCH] [perl #126045] part revert e9b19ab7 for vc2003 only
This avoids an internal compiler error on VC 2003.
---
utf8.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/utf8.h b/utf8.h
index 1ed8fd8..5af7dc3 100644
--- a/utf8.h
+++ b/utf8.h
@@ -136,8 +136,13 @@ END_EXTERN_C
/* Native character to/from iso-8859-1. Are the identity functions on ASCII
* platforms */
+#if _MSC_VER == 1310
+#define NATIVE_TO_LATIN1(ch) (ch)
+#define LATIN1_TO_NATIVE(ch) (ch)
+#else
#define NATIVE_TO_LATIN1(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
#define LATIN1_TO_NATIVE(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
+#endif
/* I8 is an intermediate version of UTF-8 used only in UTF-EBCDIC. We thus
* consider it to be identical to UTF-8 on ASCII platforms. Strictly speaking
@@ -145,8 +150,13 @@ END_EXTERN_C
* because they are 8-bit encodings that serve the same purpose in Perl, and
* rarely do we need to distinguish them. The term "NATIVE_UTF8" applies to
* whichever one is applicable on the current platform */
+#if _MSC_VER == 1310
+#define NATIVE_UTF8_TO_I8(ch) (ch)
+#define I8_TO_NATIVE_UTF8(ch) (ch)
+#else
#define NATIVE_UTF8_TO_I8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
#define I8_TO_NATIVE_UTF8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
+#endif
/* Transforms in wide UV chars */
#define UNI_TO_NATIVE(ch) ((UV) (ch))
--
1.9.5.msysgit.0
|
From @bulk88On Sun Jan 03 20:54:25 2016, tonyc wrote:
the latest patch "part revert e9b19ab for vc2003 only" fixes the problem. -- |
From @bulk88On Fri Jan 15 09:28:58 2016, bulk88 wrote:
Egh, I tested VC6, the CC crashed like VC 2003 does, you need to add VC 6 and 2002 (dont have it) to the no assert logic . I tested VC 2005 and it built fine (no crash, I guess VC 2005 is the first VC to increase its preprocessor buffer size). -- |
From @tonycozOn Fri Jan 15 09:38:03 2016, bulk88 wrote:
Here's an updated patch. Tony |
From @tonycoz0001-perl-126045-part-revert-e9b19ab7-for-vc2003-and-earl.patchFrom 228f19ed730b470b3ca2afbbcbed00da8f646466 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 28 Jan 2016 15:08:01 +1100
Subject: [perl #126045] part revert e9b19ab7 for vc2003 and earlier
This avoids an internal compiler error on VC 2003 and earlier
---
utf8.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/utf8.h b/utf8.h
index 1ed8fd8..e523cc3 100644
--- a/utf8.h
+++ b/utf8.h
@@ -134,10 +134,20 @@ EXTCONST unsigned char PL_utf8skip[];
END_EXTERN_C
+#if _MSC_VER < 1400
+/* older MSVC versions have a smallish macro buffer */
+#define PERL_SMALL_MACRO_BUFFER
+#endif
+
/* Native character to/from iso-8859-1. Are the identity functions on ASCII
* platforms */
+#ifdef PERL_SMALL_MACRO_BUFFER
+#define NATIVE_TO_LATIN1(ch) (ch)
+#define LATIN1_TO_NATIVE(ch) (ch)
+#else
#define NATIVE_TO_LATIN1(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
#define LATIN1_TO_NATIVE(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
+#endif
/* I8 is an intermediate version of UTF-8 used only in UTF-EBCDIC. We thus
* consider it to be identical to UTF-8 on ASCII platforms. Strictly speaking
@@ -145,8 +155,13 @@ END_EXTERN_C
* because they are 8-bit encodings that serve the same purpose in Perl, and
* rarely do we need to distinguish them. The term "NATIVE_UTF8" applies to
* whichever one is applicable on the current platform */
+#ifdef PERL_SMALL_MACRO_BUFFER
+#define NATIVE_UTF8_TO_I8(ch) (ch)
+#define I8_TO_NATIVE_UTF8(ch) (ch)
+#else
#define NATIVE_UTF8_TO_I8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
#define I8_TO_NATIVE_UTF8(ch) (__ASSERT_(FITS_IN_8_BITS(ch)) ((U8) (ch)))
+#endif
/* Transforms in wide UV chars */
#define UNI_TO_NATIVE(ch) ((UV) (ch))
--
1.9.5.msysgit.0
|
From @bulk88On Wed Jan 27 20:09:11 2016, tonyc wrote:
new patch works on VC 6 and VC 2003 with DEBUGGING. no CC SEGV or errors. -- |
From @tonycozOn Thu Jan 28 03:01:28 2016, bulk88 wrote:
Thanks for testing, applied as 6f6d1ba. Tony |
@tonycoz - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for submitting this report. You have helped make Perl better. Perl 5.24.0 may be downloaded via https://metacpan.org/release/RJBS/perl-5.24.0 |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#126045 (status was 'resolved')
Searchable as RT126045$
The text was updated successfully, but these errors were encountered: