New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros #14125
Comments
From @bulk88Created by @bulk88See attached patch. Perl Info
|
From @bulk880001-fix-SV-body-docs-in-sv.c-PURIFY-cleanup-body-API-add.patchFrom 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Tue, 30 Sep 2014 18:30:26 -0400
Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
U32PBYTE macros
Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches
breakage from sv_flags changes, and bizzare platforms.
Docs are very out of date, bring them them into the present. Docs describe
APIs which are nearly or are gone from the core, so remove the references
to those APIs.
Change new_body_inline to be dbg stepable in DEBUGGUING builds (similar to
purpose of S_del_sv). Keep macro version for performance in
non-DEBUGGING. In sv_upgrade remove all uses of S_new_body so
sv_upgrade has less function calls, or is function call free in some
permutations. This required changing proto of macro new_XNV which is
not public API. Factor out the ghost fields subtraction into S_new_body
instead of subtraction happening in callers of S_new_body. S_new_body is
the "slow" version of new_body_inline.
Make -DPURIFY compile and work on Win32 (and presumably other platforms),
although I dont have access to PURIFY tools. Make new_body_inline token
undefined on PURIFY. Put asserts to make sure we never try to allocate an
arena body with PURIFY and therefore whether the body_details table is
correctly setup for PURIFY builds.
---
perl.h | 10 +++++
sv.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++-----------------
2 files changed, 105 insertions(+), 34 deletions(-)
diff --git a/perl.h b/perl.h
index 436c7d1..4c3b542 100644
--- a/perl.h
+++ b/perl.h
@@ -3806,6 +3806,12 @@ my_swap16(const U16 x) {
# define vtohs(x) ((x)&0xFFFF)
# define htovl(x) vtohl(x)
# define htovs(x) vtohs(x)
+/* U32PBYTE*(x) is a macro that splits a U32 * into mutable U8 *s which go from
+ LSB (byte 0) to MSB (byte 3), these macros account for platform endianess */
+# define U32PBYTE0(x) ((U8*)((Size_t)(x)+0))
+# define U32PBYTE1(x) ((U8*)((Size_t)(x)+1))
+# define U32PBYTE2(x) ((U8*)((Size_t)(x)+2))
+# define U32PBYTE3(x) ((U8*)((Size_t)(x)+3))
#elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321
# define vtohl(x) ((((x)&0xFF)<<24) \
+(((x)>>24)&0xFF) \
@@ -3814,6 +3820,10 @@ my_swap16(const U16 x) {
# define vtohs(x) ((((x)&0xFF)<<8) + (((x)>>8)&0xFF))
# define htovl(x) vtohl(x)
# define htovs(x) vtohs(x)
+# define U32PBYTE0(x) ((U8*)((Size_t)(x)+3))
+# define U32PBYTE1(x) ((U8*)((Size_t)(x)+2))
+# define U32PBYTE2(x) ((U8*)((Size_t)(x)+1))
+# define U32PBYTE3(x) ((U8*)((Size_t)(x)+0))
#else
# error "Unsupported byteorder"
/* If you have need for current perl on PDP-11 or similar, and can help test
diff --git a/sv.c b/sv.c
index 49bf8a0..f0e9817 100644
--- a/sv.c
+++ b/sv.c
@@ -800,24 +800,37 @@ Perl_sv_free_arenas(pTHX)
=cut
+WARNING: Every aspect of the internals of SV bodies has repeatedly changed
+over the course of Perl's history. This section describes the current
+implementation, not past implementations. Historically this section often
+became out of sync with the code that deals with SV bodys. The source code is
+authoritative, not this document.
+
Allocation of SV-bodies is similar to SV-heads, differing as follows;
the allocation mechanism is used for many body types, so is somewhat
more complicated, it uses arena-sets, and has no need for still-live
SV detection.
At the outermost level, (new|del)_X*V macros return bodies of the
-appropriate type. These macros call either (new|del)_body_type or
-(new|del)_body_allocated macro pairs, depending on specifics of the
-type. Most body types use the former pair, the latter pair is used to
-allocate body types with "ghost fields".
+appropriate type. The C<new_*> macros call either C<S_new_body> or use
+C<new_body_inline> directly based on performance considerations.
+In both cases, C<S_new_body> implictly and for C<new_body_inline> explictly
+the body pointer is backed up based on whether the type has "ghost fields"
+and size of all of them. To delete a body, if it is allocated in the first
+place, C<del_body> is called. PURIFY builds do something else that this
+paragraph didn't cover.
"ghost fields" are fields that are unused in certain types, and
consequently don't need to actually exist. They are declared because
-they're part of a "base type", which allows use of functions as
-methods. The simplest examples are AVs and HVs, 2 aggregate types
-which don't use the fields which support SCALAR semantics.
-
-For these types, the arenas are carved up into appropriately sized
+they're part of a "base type", which allows use the same macros, and therefore
+internally the same C/assembly code struct offset to be used to read the stuct
+member regardless of low or high (or simple or fancy) the particular body type
+is. For an example, lets look at IVs, RVs (a type of IV now) and PVs. These
+3 types aren't blessed, or magical, so they do not use, and therefore do not
+need the fields which support (and implement) blessed (an HV * that represent
+the class) and magic (the MG *) semantics.
+
+For types with ghost fields, the arenas are carved up into appropriately sized
chunks, we thus avoid wasted memory for those unaccessed members.
When bodies are allocated, we adjust the pointer back in memory by the
size of the part not allocated, so it's as if we allocated the full
@@ -825,6 +838,8 @@ structure. (But things will all go boom if you write to the part that
is "not there", because you'll be overwriting the last members of the
preceding structure in memory.)
+A long time ago in Perl's history, every body struct type had the NV as
+the first member, now the HV * that indicated a blessed object is first.
We calculate the correction using the STRUCT_OFFSET macro on the first
member present. If the allocated structure is smaller (no initial NV
actually allocated) then the net effect is to subtract the size of the NV
@@ -836,16 +851,17 @@ optimised accesses based on the alignment constraint of the actual pointer
to the full structure, for example, using a single 64 bit load instruction
because it "knows" that two adjacent 32 bit members will be 8-byte aligned.)
-This is the same trick as was used for NV and IV bodies. Ironically it
-doesn't need to be used for NV bodies any more, because NV is now at
-the start of the structure. IV bodies, and also in some builds NV bodies,
-don't need it either, because they are no longer allocated.
+This is the same trick as was used for NV and IV bodies. Over time, more types
+got ghost fields through subsequent refactorings. Also eventually, IV
+(and RVs which are implemented as IVs) lost their arena-ed and allocated bodies
+completly. A 4th member was added to the SV head. This 4th member has multiple
+uses now based on type, but for IVs, RVs, and also in some builds NV bodies,
+the body pointer is a corrected value to the 4th member of the head. No arena,
+no malloc.
-In turn, the new_body_* allocators call S_new_body(), which invokes
-new_body_inline macro, which takes a lock, and takes a body off the
-linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
-necessary to refresh an empty list. Then the lock is released, and
-the body is returned.
+S_new_body(), which invokes new_body_inline macro. The new_body_inline macro
+takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
+necessary to refresh an empty list, and then the body is returned.
Perl_more_bodies allocates a new arena, and carves it up into an array of N
bodies, which it strings into a linked list. It looks up arena-size
@@ -853,7 +869,8 @@ and body-size from the body_details table described below, thus
supporting the multiple body-types.
If PURIFY is defined, or PERL_ARENA_SIZE=0, arenas are not used, and
-the (new|del)_X*V macros are mapped directly to malloc/free.
+the (new|del)_X*V macros are mapped directly to malloc/free. Macro
+new_body_inline is never called on a PURIFY build.
For each sv-type, struct body_details bodies_by_type[] carries
parameters which control these aspects of SV handling:
@@ -1023,10 +1040,6 @@ static const struct body_details bodies_by_type[] = {
FIT_ARENA(24, sizeof(XPVIO)) },
};
-#define new_body_allocated(sv_type) \
- (void *)((char *)S_new_body(aTHX_ sv_type) \
- - bodies_by_type[sv_type].offset)
-
/* return a thing to the free list */
#define del_body(thing, root) \
@@ -1038,7 +1051,7 @@ static const struct body_details bodies_by_type[] = {
#ifdef PURIFY
#if !(NVSIZE <= IVSIZE)
-# define new_XNV() safemalloc(sizeof(XPVNV))
+# define new_XNV(x) ((x) = safemalloc(sizeof(XPVNV)))
#endif
#define new_XPVNV() safemalloc(sizeof(XPVNV))
#define new_XPVMG() safemalloc(sizeof(XPVMG))
@@ -1046,12 +1059,20 @@ static const struct body_details bodies_by_type[] = {
#define del_XPVGV(p) safefree(p)
#else /* !PURIFY */
-
+/* this is the only of the 3 new_* macros to be used in sv_upgrade which is hot
+ so use inline macro version */
#if !(NVSIZE <= IVSIZE)
-# define new_XNV() new_body_allocated(SVt_NV)
+# ifdef DEBUGGING
+# define new_XNV(x) x = S_new_body(aTHX_ SVt_NV)
+# else
+# define new_XNV(x) \
+ STMT_START { new_body_inline(x, SVt_NV); \
+ x = (void *)((char *)x - bodies_by_type[SVt_NV].offset); \
+ } STMT_END
+# endif
#endif
-#define new_XPVNV() new_body_allocated(SVt_PVNV)
-#define new_XPVMG() new_body_allocated(SVt_PVMG)
+#define new_XPVNV() S_new_body(aTHX_ SVt_PVNV)
+#define new_XPVMG() S_new_body(aTHX_ SVt_PVMG)
#define del_XPVGV(p) del_body(p + bodies_by_type[SVt_PVGV].offset, \
&PL_body_roots[SVt_PVGV])
@@ -1162,6 +1183,10 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
The inline version is used for speed in hot routines, and the
function using it serves the rest (unless PURIFY).
*/
+
+#ifndef PURIFY
+
+#ifndef DEBUGGING
#define new_body_inline(xpv, sv_type) \
STMT_START { \
void ** const r3wt = &PL_body_roots[sv_type]; \
@@ -1171,14 +1196,27 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
bodies_by_type[sv_type].arena_size)); \
*(r3wt) = *(void**)(xpv); \
} STMT_END
-
-#ifndef PURIFY
+#else
+#define new_body_inline(xpv, sv_type) S_new_body_inline(aTHX_ &(xpv), sv_type)
+void
+S_new_body_inline(pTHX_ void ** xpvp, const svtype sv_type) {
+ void * xpv;
+ void ** const r3wt = &PL_body_roots[sv_type];
+ xpv = (PTR_TBL_ENT_t*) (*((void **)(r3wt))
+ ? *((void **)(r3wt)) : Perl_more_bodies(aTHX_ sv_type,
+ bodies_by_type[sv_type].body_size,
+ bodies_by_type[sv_type].arena_size));
+ *(r3wt) = *(void**)(xpv);
+ *xpvp = xpv;
+ }
+#endif
STATIC void *
S_new_body(pTHX_ const svtype sv_type)
{
void *xpv;
new_body_inline(xpv, sv_type);
+ xpv = (void *)((char *)xpv - bodies_by_type[sv_type].offset);
return xpv;
}
@@ -1225,6 +1263,22 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
no longer need to unshare so as to free up the IVX slot for its proper
purpose. So it's safe to move the early return earlier. */
+#ifdef DEBUGGING /* sanity check for U32PBYTE0 below */
+{ U32 assert_low = SVTYPEMASK; U32 assert_hi = 0xFF000000;
+ PERL_UNUSED_ARG(assert_low); PERL_UNUSED_ARG(assert_hi);
+ assert(SVTYPEMASK == 0xff
+ && *U32PBYTE0(&assert_low) == SVTYPEMASK
+ && (*U32PBYTE0(&assert_low)) + (*U32PBYTE1(&assert_low) << 8)
+ + (*U32PBYTE2(&assert_low) << 16) + (*U32PBYTE3(&assert_low) << 24)
+ == SVTYPEMASK
+ && *U32PBYTE3(&assert_hi) == 0xFF
+ && (*U32PBYTE0(&assert_hi)) + (*U32PBYTE1(&assert_hi) << 8)
+ + (*U32PBYTE2(&assert_hi) << 16)+ (*U32PBYTE3(&assert_hi) << 24)
+ == 0xFF000000
+ );
+}
+#endif
+
if (new_type > SVt_PVMG && SvIsCOW(sv)) {
sv_force_normal_flags(sv, 0);
}
@@ -1321,8 +1375,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
new_type_details = bodies_by_type + new_type;
- SvFLAGS(sv) &= ~SVTYPEMASK;
- SvFLAGS(sv) |= new_type;
+ *U32PBYTE0(&SvFLAGS(sv)) = new_type;
/* This can't happen, as SVt_NULL is <= all values of new_type, so one of
the return statements above will have triggered. */
@@ -1338,7 +1391,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
#if NVSIZE <= IVSIZE
SvANY(sv) = (XPVNV*)((char*)&(sv->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
#else
- SvANY(sv) = new_XNV();
+ new_XNV(SvANY(sv));
#endif
SvNV_set(sv, 0);
return;
@@ -1418,10 +1471,14 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
/* We always allocated the full length item with PURIFY. To do this
we fake things so that arena is false for all 16 types.. */
if(new_type_details->arena) {
+#ifndef PURIFY
/* This points to the start of the allocated area. */
new_body_inline(new_body, new_type);
Zero(new_body, new_type_details->body_size, char);
new_body = ((char *)new_body) - new_type_details->offset;
+#else
+ assert(0);
+#endif
} else {
new_body = new_NOARENAZ(new_type_details);
}
@@ -13249,7 +13306,7 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
#if NVSIZE <= IVSIZE
SvANY(dstr) = (XPVNV*)((char*)&(dstr->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
#else
- SvANY(dstr) = new_XNV();
+ new_XNV(SvANY(dstr));
#endif
SvNV_set(dstr, SvNVX(sstr));
break;
@@ -13281,9 +13338,13 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
case SVt_PV:
assert(sv_type_details->body_size);
if (sv_type_details->arena) {
+#ifndef PURIFY
new_body_inline(new_body, sv_type);
new_body
= (void*)((char*)new_body - sv_type_details->offset);
+#else
+ assert(0);
+#endif
} else {
new_body = new_NOARENA(sv_type_details);
}
--
1.7.9.msysgit.0
|
From @doughera88On Tue, Sep 30, 2014 at 03:34:00PM -0700, bulk88 wrote:
I understand the idea, and I agree with the general idea of being - SvFLAGS(sv) &= ~SVTYPEMASK; I have to admit that I actually find the old way easier to read. Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC, -- |
The RT System itself - Status changed from 'new' to 'open' |
From @jhiI have to agree with Andy. The new code is harder to understand. And I think the new code goes exactly against the stated goal... "Let the compiler and CPU designers figure out how to read/write 1 byte in As far as I can tell the patch is microoptimizing for the speed of one On Thu, Oct 2, 2014 at 5:57 PM, Andy Dougherty <doughera@lafayette.edu>
-- |
From @bulk88On Thu Oct 02 08:58:11 2014, doughera wrote:
How many times a year does someone read it? I could add a comment.
Can you attach before and after .o'es or ELFs/exec binaries? -- |
From @bulk88On Thu Oct 02 15:28:09 2014, bulk88 wrote:
Forgot to mention, machine code size wasn't exactly the point of this commit, that is why I didn't collect it. I inlined something that was previously a function call on some platforms (where SVt_NV has an arena body). Making SV upgrading to SVt_NV not have additional function calls on the most common path of upgrading to SVt_NV was the main point of the commit. " *U32PBYTE0(&SvFLAGS(sv)) = new_type;" was atonement for the inlining, plus it was done before the inlining/API/DOCs cleaning up was finished. The U32PBYTE0 compiled to *U32PBYTE0(&SvFLAGS(sv)) = new_type; mov eax, [ebp+new_type]; means read C stack var new_type previously on VC 2003 the code was SvFLAGS(sv) &= ~SVTYPEMASK; mov eax, [ebp+new_type] Mingw GCC Strawberry 5.18 -O2, old way mov esi, [esp+7Ch+new_type] Should I produce ARM or PARISC binaries to prove my point? -- |
From @bulk88On Thu Oct 02 16:28:54 2014, bulk88 wrote:
PARISC old way ldw 8(%r4), %r31 ; read 32 bits at addr reg r4+8 into reg r32 -- |
From @doughera88On Thu, Oct 02, 2014 at 03:28:09PM -0700, bulk88 via RT wrote:
I would still prefer the older way. Among other things, the U32PBYTE0
I don't have them handy, and regenerating them would take a bit of effort - SvFLAGS(sv) &= ~SVTYPEMASK; SvFLAGS(sv) |= new_type; led to over 1300 non-cosmetic changed lines in the assembly output file. Absent any data indicating it's a meaningful improvement in size or -- |
From @khwilliamsonOn 10/03/2014 12:12 PM, Andy Dougherty wrote:
I agree with Andy here. As Jarkko said somewhat recently, older and wiser than before, he should |
From @bulk88On Fri Oct 03 11:13:05 2014, doughera wrote:
I will revise it so the internals of how to assign a type to an SV head is hidden and move the assert to somewhere else. -- |
From @bulk88On Sat Oct 04 00:58:36 2014, bulk88 wrote:
New patch attached. Details of setting the type of an SV were placed behind an api/macro. On ARM, setting SV type was 4 instructions previously, now it is 1 instruction after the patch. new_type_details = bodies_by_type + new_type; SvTYPE_set_mem(sv, new_type); /* This can't happen, as SVt_NULL is <= all values of new_type, so one of */star next to instruction address means sv type setting related instructions old 280CAFF0 MOV LR, #0 ; U32 null = 0x00000000; new ;start of branch In any case, how is STRB implemented on ARM? Obviously I dont have access to a commercial ARM core's source code, so a FOSS ARM CPU must do. assign store_op = mem_op && !instruction[20]; //bulk88 says this statment says, if its a byte op, take lowest byte, copy it to all 4 bytes of the word So "STRB" is implemented in hardware, not extra opcodes. -- |
From @bulk880001-fix-SV-body-docs-in-sv.c-PURIFY-cleanup-body-API-add.patchFrom c0f659f700ee7ebebc7fca6196cc09be44d8cab3 Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Fri, 10 Oct 2014 19:14:20 -0400
Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
U32PBYTE macros
Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes except
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert in
Perl_init_constants (to not slow down DEBUGGING builds too much) that
catches breakage from sv_flags changes, and bizzare platforms.
Docs are very out of date, bring them them into the present. Docs describe
APIs which are nearly or are gone from the core, so remove the references
to those APIs.
Change new_body_inline to be dbg stepable in DEBUGGUING builds (similar to
purpose of S_del_sv). Keep macro version for performance in
non-DEBUGGING. In sv_upgrade remove all uses of S_new_body so
sv_upgrade has less function calls, or is function call free in some
permutations. This required changing proto of macro new_XNV which is
not public API. Factor out the ghost fields subtraction into S_new_body
instead of subtraction happening in callers of S_new_body. S_new_body is
the "slow" version of new_body_inline.
Make -DPURIFY compile and work on Win32 (and presumably other platforms),
although I dont have access to PURIFY tools. Make new_body_inline token
undefined on PURIFY. Put asserts to make sure we never try to allocate an
arena body with PURIFY and therefore whether the body_details table is
correctly setup for PURIFY builds.
---
perl.h | 10 +++++
sv.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++----------------
sv.h | 8 ++++
3 files changed, 118 insertions(+), 34 deletions(-)
diff --git a/perl.h b/perl.h
index 0bd8a53..95498fc 100644
--- a/perl.h
+++ b/perl.h
@@ -3808,6 +3808,12 @@ my_swap16(const U16 x) {
# define vtohs(x) ((x)&0xFFFF)
# define htovl(x) vtohl(x)
# define htovs(x) vtohs(x)
+/* U32PBYTE*(x) is a macro that splits a U32 * into mutable U8 *s which go from
+ LSB (byte 0) to MSB (byte 3), these macros account for platform endianess */
+# define U32PBYTE0(x) ((U8*)((Size_t)(x)+0))
+# define U32PBYTE1(x) ((U8*)((Size_t)(x)+1))
+# define U32PBYTE2(x) ((U8*)((Size_t)(x)+2))
+# define U32PBYTE3(x) ((U8*)((Size_t)(x)+3))
#elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321
# define vtohl(x) ((((x)&0xFF)<<24) \
+(((x)>>24)&0xFF) \
@@ -3816,6 +3822,10 @@ my_swap16(const U16 x) {
# define vtohs(x) ((((x)&0xFF)<<8) + (((x)>>8)&0xFF))
# define htovl(x) vtohl(x)
# define htovs(x) vtohs(x)
+# define U32PBYTE0(x) ((U8*)((Size_t)(x)+3))
+# define U32PBYTE1(x) ((U8*)((Size_t)(x)+2))
+# define U32PBYTE2(x) ((U8*)((Size_t)(x)+1))
+# define U32PBYTE3(x) ((U8*)((Size_t)(x)+0))
#else
# error "Unsupported byteorder"
/* If you have need for current perl on PDP-11 or similar, and can help test
diff --git a/sv.c b/sv.c
index 665a0f6..b182c8b 100644
--- a/sv.c
+++ b/sv.c
@@ -800,24 +800,37 @@ Perl_sv_free_arenas(pTHX)
=cut
+WARNING: Every aspect of the internals of SV bodies has repeatedly changed
+over the course of Perl's history. This section describes the current
+implementation, not past implementations. Historically this section often
+became out of sync with the code that deals with SV bodys. The source code is
+authoritative, not this document.
+
Allocation of SV-bodies is similar to SV-heads, differing as follows;
the allocation mechanism is used for many body types, so is somewhat
more complicated, it uses arena-sets, and has no need for still-live
SV detection.
At the outermost level, (new|del)_X*V macros return bodies of the
-appropriate type. These macros call either (new|del)_body_type or
-(new|del)_body_allocated macro pairs, depending on specifics of the
-type. Most body types use the former pair, the latter pair is used to
-allocate body types with "ghost fields".
+appropriate type. The C<new_*> macros call either C<S_new_body> or use
+C<new_body_inline> directly based on performance considerations.
+In both cases, C<S_new_body> implictly and for C<new_body_inline> explictly
+the body pointer is backed up based on whether the type has "ghost fields"
+and size of all of them. To delete a body, if it is allocated in the first
+place, C<del_body> is called. PURIFY builds do something else that this
+paragraph didn't cover.
"ghost fields" are fields that are unused in certain types, and
consequently don't need to actually exist. They are declared because
-they're part of a "base type", which allows use of functions as
-methods. The simplest examples are AVs and HVs, 2 aggregate types
-which don't use the fields which support SCALAR semantics.
-
-For these types, the arenas are carved up into appropriately sized
+they're part of a "base type", which allows use the same macros, and therefore
+internally the same C/assembly code struct offset to be used to read the stuct
+member regardless of low or high (or simple or fancy) the particular body type
+is. For an example, lets look at IVs, RVs (a type of IV now) and PVs. These
+3 types aren't blessed, or magical, so they do not use, and therefore do not
+need the fields which support (and implement) blessed (an HV * that represent
+the class) and magic (the MG *) semantics.
+
+For types with ghost fields, the arenas are carved up into appropriately sized
chunks, we thus avoid wasted memory for those unaccessed members.
When bodies are allocated, we adjust the pointer back in memory by the
size of the part not allocated, so it's as if we allocated the full
@@ -825,6 +838,8 @@ structure. (But things will all go boom if you write to the part that
is "not there", because you'll be overwriting the last members of the
preceding structure in memory.)
+A long time ago in Perl's history, every body struct type had the NV as
+the first member, now the HV * that indicated a blessed object is first.
We calculate the correction using the STRUCT_OFFSET macro on the first
member present. If the allocated structure is smaller (no initial NV
actually allocated) then the net effect is to subtract the size of the NV
@@ -836,16 +851,17 @@ optimised accesses based on the alignment constraint of the actual pointer
to the full structure, for example, using a single 64 bit load instruction
because it "knows" that two adjacent 32 bit members will be 8-byte aligned.)
-This is the same trick as was used for NV and IV bodies. Ironically it
-doesn't need to be used for NV bodies any more, because NV is now at
-the start of the structure. IV bodies, and also in some builds NV bodies,
-don't need it either, because they are no longer allocated.
+This is the same trick as was used for NV and IV bodies. Over time, more types
+got ghost fields through subsequent refactorings. Also eventually, IV
+(and RVs which are implemented as IVs) lost their arena-ed and allocated bodies
+completly. A 4th member was added to the SV head. This 4th member has multiple
+uses now based on type, but for IVs, RVs, and also in some builds NV bodies,
+the body pointer is a corrected value to the 4th member of the head. No arena,
+no malloc.
-In turn, the new_body_* allocators call S_new_body(), which invokes
-new_body_inline macro, which takes a lock, and takes a body off the
-linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
-necessary to refresh an empty list. Then the lock is released, and
-the body is returned.
+S_new_body(), which invokes new_body_inline macro. The new_body_inline macro
+takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
+necessary to refresh an empty list, and then the body is returned.
Perl_more_bodies allocates a new arena, and carves it up into an array of N
bodies, which it strings into a linked list. It looks up arena-size
@@ -853,7 +869,8 @@ and body-size from the body_details table described below, thus
supporting the multiple body-types.
If PURIFY is defined, or PERL_ARENA_SIZE=0, arenas are not used, and
-the (new|del)_X*V macros are mapped directly to malloc/free.
+the (new|del)_X*V macros are mapped directly to malloc/free. Macro
+new_body_inline is never called on a PURIFY build.
For each sv-type, struct body_details bodies_by_type[] carries
parameters which control these aspects of SV handling:
@@ -1023,10 +1040,6 @@ static const struct body_details bodies_by_type[] = {
FIT_ARENA(24, sizeof(XPVIO)) },
};
-#define new_body_allocated(sv_type) \
- (void *)((char *)S_new_body(aTHX_ sv_type) \
- - bodies_by_type[sv_type].offset)
-
/* return a thing to the free list */
#define del_body(thing, root) \
@@ -1038,7 +1051,7 @@ static const struct body_details bodies_by_type[] = {
#ifdef PURIFY
#if !(NVSIZE <= IVSIZE)
-# define new_XNV() safemalloc(sizeof(XPVNV))
+# define new_XNV(x) ((x) = safemalloc(sizeof(XPVNV)))
#endif
#define new_XPVNV() safemalloc(sizeof(XPVNV))
#define new_XPVMG() safemalloc(sizeof(XPVMG))
@@ -1046,12 +1059,20 @@ static const struct body_details bodies_by_type[] = {
#define del_XPVGV(p) safefree(p)
#else /* !PURIFY */
-
+/* this is the only of the 3 new_* macros to be used in sv_upgrade which is hot
+ so use inline macro version */
#if !(NVSIZE <= IVSIZE)
-# define new_XNV() new_body_allocated(SVt_NV)
+# ifdef DEBUGGING
+# define new_XNV(x) x = S_new_body(aTHX_ SVt_NV)
+# else
+# define new_XNV(x) \
+ STMT_START { new_body_inline(x, SVt_NV); \
+ x = (void *)((char *)x - bodies_by_type[SVt_NV].offset); \
+ } STMT_END
+# endif
#endif
-#define new_XPVNV() new_body_allocated(SVt_PVNV)
-#define new_XPVMG() new_body_allocated(SVt_PVMG)
+#define new_XPVNV() S_new_body(aTHX_ SVt_PVNV)
+#define new_XPVMG() S_new_body(aTHX_ SVt_PVMG)
#define del_XPVGV(p) del_body(p + bodies_by_type[SVt_PVGV].offset, \
&PL_body_roots[SVt_PVGV])
@@ -1162,6 +1183,10 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
The inline version is used for speed in hot routines, and the
function using it serves the rest (unless PURIFY).
*/
+
+#ifndef PURIFY
+
+#ifndef DEBUGGING
#define new_body_inline(xpv, sv_type) \
STMT_START { \
void ** const r3wt = &PL_body_roots[sv_type]; \
@@ -1171,14 +1196,27 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
bodies_by_type[sv_type].arena_size)); \
*(r3wt) = *(void**)(xpv); \
} STMT_END
-
-#ifndef PURIFY
+#else
+#define new_body_inline(xpv, sv_type) S_new_body_inline(aTHX_ &(xpv), sv_type)
+void
+S_new_body_inline(pTHX_ void ** xpvp, const svtype sv_type) {
+ void * xpv;
+ void ** const r3wt = &PL_body_roots[sv_type];
+ xpv = (PTR_TBL_ENT_t*) (*((void **)(r3wt))
+ ? *((void **)(r3wt)) : Perl_more_bodies(aTHX_ sv_type,
+ bodies_by_type[sv_type].body_size,
+ bodies_by_type[sv_type].arena_size));
+ *(r3wt) = *(void**)(xpv);
+ *xpvp = xpv;
+ }
+#endif
STATIC void *
S_new_body(pTHX_ const svtype sv_type)
{
void *xpv;
new_body_inline(xpv, sv_type);
+ xpv = (void *)((char *)xpv - bodies_by_type[sv_type].offset);
return xpv;
}
@@ -1321,8 +1359,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
new_type_details = bodies_by_type + new_type;
- SvFLAGS(sv) &= ~SVTYPEMASK;
- SvFLAGS(sv) |= new_type;
+ SvTYPE_set_mem(sv, new_type);
/* This can't happen, as SVt_NULL is <= all values of new_type, so one of
the return statements above will have triggered. */
@@ -1338,7 +1375,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
#if NVSIZE <= IVSIZE
SvANY(sv) = (XPVNV*)((char*)&(sv->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
#else
- SvANY(sv) = new_XNV();
+ new_XNV(SvANY(sv));
#endif
SvNV_set(sv, 0);
return;
@@ -1418,10 +1455,14 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
/* We always allocated the full length item with PURIFY. To do this
we fake things so that arena is false for all 16 types.. */
if(new_type_details->arena) {
+#ifndef PURIFY
/* This points to the start of the allocated area. */
new_body_inline(new_body, new_type);
Zero(new_body, new_type_details->body_size, char);
new_body = ((char *)new_body) - new_type_details->offset;
+#else
+ assert(0);
+#endif
} else {
new_body = new_NOARENAZ(new_type_details);
}
@@ -13257,7 +13298,7 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
#if NVSIZE <= IVSIZE
SvANY(dstr) = (XPVNV*)((char*)&(dstr->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
#else
- SvANY(dstr) = new_XNV();
+ new_XNV(SvANY(dstr));
#endif
SvNV_set(dstr, SvNVX(sstr));
break;
@@ -13289,9 +13330,13 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
case SVt_PV:
assert(sv_type_details->body_size);
if (sv_type_details->arena) {
+#ifndef PURIFY
new_body_inline(new_body, sv_type);
new_body
= (void*)((char*)new_body - sv_type_details->offset);
+#else
+ assert(0);
+#endif
} else {
new_body = new_NOARENA(sv_type_details);
}
@@ -14964,6 +15009,27 @@ Perl_clone_params_new(PerlInterpreter *const from, PerlInterpreter *const to)
void
Perl_init_constants(pTHX)
{
+#ifdef DEBUGGING /* sanity check for U32PBYTE0 and SvTYPE_set_mem */
+{ U32 assert_low = SVTYPEMASK; U32 assert_hi = 0xFF000000;
+ struct STRUCT_SV sv_head;
+ SV * sv = &sv_head;
+ PERL_UNUSED_ARG(assert_low); PERL_UNUSED_ARG(assert_hi);
+ assert(SVTYPEMASK == 0xff
+ && *U32PBYTE0(&assert_low) == SVTYPEMASK
+ && (*U32PBYTE0(&assert_low)) + (*U32PBYTE1(&assert_low) << 8)
+ + (*U32PBYTE2(&assert_low) << 16) + (*U32PBYTE3(&assert_low) << 24)
+ == SVTYPEMASK
+ && *U32PBYTE3(&assert_hi) == 0xFF
+ && (*U32PBYTE0(&assert_hi)) + (*U32PBYTE1(&assert_hi) << 8)
+ + (*U32PBYTE2(&assert_hi) << 16)+ (*U32PBYTE3(&assert_hi) << 24)
+ == 0xFF000000
+ );
+ sv->sv_flags = SVt_IV;
+ SvTYPE_set_mem(sv, SVt_NV);
+ assert(SvTYPE(sv) == SVt_NV);
+}
+#endif
+
SvREFCNT(&PL_sv_undef) = SvREFCNT_IMMORTAL;
SvFLAGS(&PL_sv_undef) = SVf_READONLY|SVf_PROTECT|SVt_NULL;
SvANY(&PL_sv_undef) = NULL;
diff --git a/sv.h b/sv.h
index 1590c73..d46afd6 100644
--- a/sv.h
+++ b/sv.h
@@ -401,6 +401,14 @@ perform the upgrade if necessary. See C<svtype>.
#define SVTYPEMASK 0xff
#define SvTYPE(sv) ((svtype)((sv)->sv_flags & SVTYPEMASK))
+/* it is suggested for C compiler optimization reasons to only use this macro
+ if there are no other SV flags using/referencing statements are near
+ by in the code. This macro directly sets the memory in the SV head, and
+ avoids keeping the entire U32 sv flags in a temporary variable or register.
+ If you are modifying or testing the flags part of sv_flags member near by,
+ it is suggested to not use this macro and instead use traditional U32
+ arithmitic for optimization reasons */
+#define SvTYPE_set_mem(sv, type) (*U32PBYTE0(&SvFLAGS((sv))) = (type))
/* Sadly there are some parts of the core that have pointers to already-freed
SV heads, and rely on being able to tell that they are now free. So mark
--
1.7.9.msysgit.0
|
From @tonycozOn Fri Oct 10 19:37:11 2014, bulk88 wrote:
Whatever the macro is called, it's basically a hack to work around perl I'm still not fond of it. Looking at the documentation part of the patch, I see: +no malloc. -In turn, the new_body_* allocators call S_new_body(), which invokes which looks like something is missing from before S_new_body(). It looks like you've got at least three different changes in this patch: - the documentation update - byte access to flags - changes to the way the body is allocated. It would be preferable to see them as separate patches. Tony |
This patch seems to have been rejected. @bulk88 please submit a pull request if you would like further discussion. |
Migrated from rt.perl.org#122872 (status was 'open')
Searchable as RT122872$
The text was updated successfully, but these errors were encountered: