Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros #14125

Closed
p5pRT opened this issue Sep 30, 2014 · 15 comments
Closed

Comments

@p5pRT
Copy link

p5pRT commented Sep 30, 2014

Migrated from rt.perl.org#122872 (status was 'open')

Searchable as RT122872$

@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2014

From @bulk88

Created by @bulk88

See attached patch.

Perl Info

Flags:
           category=core
           severity=low

Site configuration information for perl 5.21.4:

Configured by Owner at Thu Sep 18 12:08:58 2014.

Summary of my perl5 (revision 5 version 21 subversion 4) configuration:
         Derived from: 7d2b2edb94ab56333b9049a3e26d15ea18445512
         Ancestor: 19be3be6968e2337bcdfe480693fff795ecd1304
         Platform:
           osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
           uname=''
           config_args='undef'
           hint=recommended, useposix=true, d_sigaction=undef
           useithreads=define, usemultiplicity=define
           use64bitint=undef, use64bitall=undef, uselongdouble=undef
           usemymalloc=n, bincompat5005=undef
         Compiler:
           cc='cl', ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -DWIN32
-D_CONSOLE -DNO_STRICT  -DPERL_TEXTMODE_SCRIPTS
-DPERL_HASH_FUNC_ONE_AT_A_TIME -DPERL_IMPLICIT_CONTEXT
-DPERL_IMPLICIT_SYS -DUSE_PERLIO -D_USE_32BIT_TIME_T',
           optimize='-O1 -MD -Zi -DNDEBUG',
           cppflags='-DWIN32'
           ccversion='12.00.8168', gccversion='', gccosandvers=''
           intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
           d_longlong=undef, longlongsize=8, d_longdbl=define, 
longdblsize=8,
longdblkind=0
           ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='__int64',
lseeksize=8
           alignbytes=8, prototype=define
         Linker and Libraries:
           ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf
-libpath:"c:\perl521\lib\CORE"  -machine:x86'
           libpth=C:\PROGRA~1\MIAF9D~1\VC98\lib
           libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
           perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib
version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
           libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl521.lib
           gnulibc_version=''
         Dynamic Linking:
           dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
           cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"c:\perl521\lib\CORE"  -machine:x86'

Locally applied patches:
           uncommitted-changes
           a0fe7a7e75de29e59f1da0d6822dc06e5be658fe
           a261faffee83d0145642ab5d1d046c9f813bc497
           6506ab86ad1602a9ca720fcd30446dce1461d23d
           7d2b2edb94ab56333b9049a3e26d15ea18445512


@INC for perl 5.21.4:
           lib
           C:/perl521/srcnew/lib
           .


Environment for perl 5.21.4:
           HOME (unset)
           LANG (unset)
           LANGUAGE (unset)
           LD_LIBRARY_PATH (unset)
           LOGDIR (unset)
           PATH=
           PERL_BADLANG (unset)
           PERL_JSON_BACKEND=Cpanel::JSON::XS
           PERL_YAML_BACKEND=YAML
           SHELL (unset)










@p5pRT
Copy link
Author

p5pRT commented Sep 30, 2014

From @bulk88

0001-fix-SV-body-docs-in-sv.c-PURIFY-cleanup-body-API-add.patch
From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Tue, 30 Sep 2014 18:30:26 -0400
Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
 U32PBYTE macros

Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches
breakage from sv_flags changes, and bizzare platforms.

Docs are very out of date, bring them them into the present. Docs describe
APIs which are nearly or are gone from the core, so remove the references
to those APIs.

Change new_body_inline to be dbg stepable in DEBUGGUING builds (similar to
purpose of S_del_sv). Keep macro version for performance in
non-DEBUGGING. In sv_upgrade remove all uses of S_new_body so
sv_upgrade has less function calls, or is function call free in some
permutations. This required changing proto of macro new_XNV which is
not public API. Factor out the ghost fields subtraction into S_new_body
instead of subtraction happening in callers of S_new_body. S_new_body is
the "slow" version of new_body_inline.

Make -DPURIFY compile and work on Win32 (and presumably other platforms),
although I dont have access to PURIFY tools. Make new_body_inline token
undefined on PURIFY. Put asserts to make sure we never try to allocate an
arena body with PURIFY and therefore whether the body_details table is
correctly setup for PURIFY builds.
---
 perl.h |   10 +++++
 sv.c   |  129 +++++++++++++++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 105 insertions(+), 34 deletions(-)

diff --git a/perl.h b/perl.h
index 436c7d1..4c3b542 100644
--- a/perl.h
+++ b/perl.h
@@ -3806,6 +3806,12 @@ my_swap16(const U16 x) {
 #  define vtohs(x)      ((x)&0xFFFF)
 #  define htovl(x)      vtohl(x)
 #  define htovs(x)      vtohs(x)
+/* U32PBYTE*(x) is a macro that splits a U32 * into mutable U8 *s which go from
+   LSB (byte 0) to MSB (byte 3), these macros account for platform endianess */
+#  define U32PBYTE0(x) ((U8*)((Size_t)(x)+0))
+#  define U32PBYTE1(x) ((U8*)((Size_t)(x)+1))
+#  define U32PBYTE2(x) ((U8*)((Size_t)(x)+2))
+#  define U32PBYTE3(x) ((U8*)((Size_t)(x)+3))
 #elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321
 #  define vtohl(x)	((((x)&0xFF)<<24)	\
 			+(((x)>>24)&0xFF)	\
@@ -3814,6 +3820,10 @@ my_swap16(const U16 x) {
 #  define vtohs(x)	((((x)&0xFF)<<8) + (((x)>>8)&0xFF))
 #  define htovl(x)	vtohl(x)
 #  define htovs(x)	vtohs(x)
+#  define U32PBYTE0(x) ((U8*)((Size_t)(x)+3))
+#  define U32PBYTE1(x) ((U8*)((Size_t)(x)+2))
+#  define U32PBYTE2(x) ((U8*)((Size_t)(x)+1))
+#  define U32PBYTE3(x) ((U8*)((Size_t)(x)+0))
 #else
 #  error "Unsupported byteorder"
 /* If you have need for current perl on PDP-11 or similar, and can help test
diff --git a/sv.c b/sv.c
index 49bf8a0..f0e9817 100644
--- a/sv.c
+++ b/sv.c
@@ -800,24 +800,37 @@ Perl_sv_free_arenas(pTHX)
 
 =cut
 
+WARNING: Every aspect of the internals of SV bodies has repeatedly changed
+over the course of Perl's history. This section describes the current
+implementation, not past implementations. Historically this section often
+became out of sync with the code that deals with SV bodys. The source code is
+authoritative, not this document.
+
 Allocation of SV-bodies is similar to SV-heads, differing as follows;
 the allocation mechanism is used for many body types, so is somewhat
 more complicated, it uses arena-sets, and has no need for still-live
 SV detection.
 
 At the outermost level, (new|del)_X*V macros return bodies of the
-appropriate type.  These macros call either (new|del)_body_type or
-(new|del)_body_allocated macro pairs, depending on specifics of the
-type.  Most body types use the former pair, the latter pair is used to
-allocate body types with "ghost fields".
+appropriate type.  The C<new_*> macros call either C<S_new_body> or use
+C<new_body_inline> directly based on performance considerations.
+In both cases, C<S_new_body> implictly and for C<new_body_inline> explictly
+the body pointer is backed up based on whether the type has "ghost fields"
+and size of all of them. To delete a body, if it is allocated in the first
+place, C<del_body> is called. PURIFY builds do something else that this
+paragraph didn't cover.
 
 "ghost fields" are fields that are unused in certain types, and
 consequently don't need to actually exist.  They are declared because
-they're part of a "base type", which allows use of functions as
-methods.  The simplest examples are AVs and HVs, 2 aggregate types
-which don't use the fields which support SCALAR semantics.
-
-For these types, the arenas are carved up into appropriately sized
+they're part of a "base type", which allows use the same macros, and therefore
+internally the same C/assembly code struct offset to be used to read the stuct
+member regardless of low or high (or simple or fancy) the particular body type
+is.  For an example, lets look at IVs, RVs (a type of IV now) and PVs. These
+3 types aren't blessed, or magical, so they do not use, and therefore do not
+need the fields which support (and implement) blessed (an HV * that represent
+the class) and magic (the MG *) semantics.
+
+For types with ghost fields, the arenas are carved up into appropriately sized
 chunks, we thus avoid wasted memory for those unaccessed members.
 When bodies are allocated, we adjust the pointer back in memory by the
 size of the part not allocated, so it's as if we allocated the full
@@ -825,6 +838,8 @@ structure.  (But things will all go boom if you write to the part that
 is "not there", because you'll be overwriting the last members of the
 preceding structure in memory.)
 
+A long time ago in Perl's history, every body struct type had the NV as
+the first member, now the HV * that indicated a blessed object is first.
 We calculate the correction using the STRUCT_OFFSET macro on the first
 member present.  If the allocated structure is smaller (no initial NV
 actually allocated) then the net effect is to subtract the size of the NV
@@ -836,16 +851,17 @@ optimised accesses based on the alignment constraint of the actual pointer
 to the full structure, for example, using a single 64 bit load instruction
 because it "knows" that two adjacent 32 bit members will be 8-byte aligned.)
 
-This is the same trick as was used for NV and IV bodies.  Ironically it
-doesn't need to be used for NV bodies any more, because NV is now at
-the start of the structure.  IV bodies, and also in some builds NV bodies,
-don't need it either, because they are no longer allocated.
+This is the same trick as was used for NV and IV bodies. Over time, more types
+got ghost fields through subsequent refactorings. Also eventually, IV
+(and RVs which are implemented as IVs) lost their arena-ed and allocated bodies
+completly. A 4th member was added to the SV head. This 4th member has multiple
+uses now based on type, but for IVs, RVs, and also in some builds NV bodies,
+the body pointer is a corrected value to the 4th member of the head. No arena,
+no malloc.
 
-In turn, the new_body_* allocators call S_new_body(), which invokes
-new_body_inline macro, which takes a lock, and takes a body off the
-linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
-necessary to refresh an empty list.  Then the lock is released, and
-the body is returned.
+S_new_body(), which invokes new_body_inline macro. The new_body_inline macro
+takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
+necessary to refresh an empty list, and then the body is returned.
 
 Perl_more_bodies allocates a new arena, and carves it up into an array of N
 bodies, which it strings into a linked list.  It looks up arena-size
@@ -853,7 +869,8 @@ and body-size from the body_details table described below, thus
 supporting the multiple body-types.
 
 If PURIFY is defined, or PERL_ARENA_SIZE=0, arenas are not used, and
-the (new|del)_X*V macros are mapped directly to malloc/free.
+the (new|del)_X*V macros are mapped directly to malloc/free.  Macro
+new_body_inline is never called on a PURIFY build.
 
 For each sv-type, struct body_details bodies_by_type[] carries
 parameters which control these aspects of SV handling:
@@ -1023,10 +1040,6 @@ static const struct body_details bodies_by_type[] = {
       FIT_ARENA(24, sizeof(XPVIO)) },
 };
 
-#define new_body_allocated(sv_type)		\
-    (void *)((char *)S_new_body(aTHX_ sv_type)	\
-	     - bodies_by_type[sv_type].offset)
-
 /* return a thing to the free list */
 
 #define del_body(thing, root)				\
@@ -1038,7 +1051,7 @@ static const struct body_details bodies_by_type[] = {
 
 #ifdef PURIFY
 #if !(NVSIZE <= IVSIZE)
-#  define new_XNV()	safemalloc(sizeof(XPVNV))
+#  define new_XNV(x)	((x) = safemalloc(sizeof(XPVNV)))
 #endif
 #define new_XPVNV()	safemalloc(sizeof(XPVNV))
 #define new_XPVMG()	safemalloc(sizeof(XPVMG))
@@ -1046,12 +1059,20 @@ static const struct body_details bodies_by_type[] = {
 #define del_XPVGV(p)	safefree(p)
 
 #else /* !PURIFY */
-
+/* this is the only of the 3 new_* macros to be used in sv_upgrade which is hot
+   so use inline macro version */
 #if !(NVSIZE <= IVSIZE)
-#  define new_XNV()	new_body_allocated(SVt_NV)
+#  ifdef DEBUGGING
+#    define new_XNV(x) x = 	S_new_body(aTHX_ SVt_NV)
+#  else
+#    define new_XNV(x)							\
+    STMT_START { new_body_inline(x, SVt_NV);				\
+	x = (void *)((char *)x - bodies_by_type[SVt_NV].offset);	\
+    } STMT_END
+#  endif
 #endif
-#define new_XPVNV()	new_body_allocated(SVt_PVNV)
-#define new_XPVMG()	new_body_allocated(SVt_PVMG)
+#define new_XPVNV()	S_new_body(aTHX_ SVt_PVNV)
+#define new_XPVMG()	S_new_body(aTHX_ SVt_PVMG)
 
 #define del_XPVGV(p)	del_body(p + bodies_by_type[SVt_PVGV].offset,	\
 				 &PL_body_roots[SVt_PVGV])
@@ -1162,6 +1183,10 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
    The inline version is used for speed in hot routines, and the
    function using it serves the rest (unless PURIFY).
 */
+
+#ifndef PURIFY
+
+#ifndef DEBUGGING
 #define new_body_inline(xpv, sv_type) \
     STMT_START { \
 	void ** const r3wt = &PL_body_roots[sv_type]; \
@@ -1171,14 +1196,27 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
 					     bodies_by_type[sv_type].arena_size)); \
 	*(r3wt) = *(void**)(xpv); \
     } STMT_END
-
-#ifndef PURIFY
+#else
+#define new_body_inline(xpv, sv_type) S_new_body_inline(aTHX_ &(xpv), sv_type)
+void
+S_new_body_inline(pTHX_ void ** xpvp, const svtype sv_type) {
+	void * xpv;
+	void ** const r3wt = &PL_body_roots[sv_type];
+	xpv = (PTR_TBL_ENT_t*) (*((void **)(r3wt))
+	  ? *((void **)(r3wt)) : Perl_more_bodies(aTHX_ sv_type,
+					     bodies_by_type[sv_type].body_size,
+					     bodies_by_type[sv_type].arena_size));
+	*(r3wt) = *(void**)(xpv);
+        *xpvp = xpv;
+    }
+#endif
 
 STATIC void *
 S_new_body(pTHX_ const svtype sv_type)
 {
     void *xpv;
     new_body_inline(xpv, sv_type);
+    xpv = (void *)((char *)xpv - bodies_by_type[sv_type].offset);
     return xpv;
 }
 
@@ -1225,6 +1263,22 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
        no longer need to unshare so as to free up the IVX slot for its proper
        purpose. So it's safe to move the early return earlier.  */
 
+#ifdef DEBUGGING /* sanity check for U32PBYTE0 below */
+{   U32 assert_low = SVTYPEMASK;	U32 assert_hi = 0xFF000000;
+    PERL_UNUSED_ARG(assert_low);	PERL_UNUSED_ARG(assert_hi);
+    assert(SVTYPEMASK == 0xff
+           && *U32PBYTE0(&assert_low) == SVTYPEMASK
+           && (*U32PBYTE0(&assert_low))         + (*U32PBYTE1(&assert_low) << 8)
+              + (*U32PBYTE2(&assert_low) << 16) + (*U32PBYTE3(&assert_low) << 24)
+                == SVTYPEMASK
+            && *U32PBYTE3(&assert_hi) == 0xFF
+            && (*U32PBYTE0(&assert_hi))         + (*U32PBYTE1(&assert_hi) << 8)
+                + (*U32PBYTE2(&assert_hi) << 16)+ (*U32PBYTE3(&assert_hi) << 24)
+                == 0xFF000000
+            );
+}
+#endif
+
     if (new_type > SVt_PVMG && SvIsCOW(sv)) {
 	sv_force_normal_flags(sv, 0);
     }
@@ -1321,8 +1375,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 
     new_type_details = bodies_by_type + new_type;
 
-    SvFLAGS(sv) &= ~SVTYPEMASK;
-    SvFLAGS(sv) |= new_type;
+    *U32PBYTE0(&SvFLAGS(sv)) = new_type;
 
     /* This can't happen, as SVt_NULL is <= all values of new_type, so one of
        the return statements above will have triggered.  */
@@ -1338,7 +1391,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 #if NVSIZE <= IVSIZE
 	SvANY(sv) = (XPVNV*)((char*)&(sv->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
 #else
-	SvANY(sv) = new_XNV();
+	new_XNV(SvANY(sv));
 #endif
 	SvNV_set(sv, 0);
 	return;
@@ -1418,10 +1471,14 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 	/* We always allocated the full length item with PURIFY. To do this
 	   we fake things so that arena is false for all 16 types..  */
 	if(new_type_details->arena) {
+#ifndef PURIFY
 	    /* This points to the start of the allocated area.  */
 	    new_body_inline(new_body, new_type);
 	    Zero(new_body, new_type_details->body_size, char);
 	    new_body = ((char *)new_body) - new_type_details->offset;
+#else
+	    assert(0);
+#endif
 	} else {
 	    new_body = new_NOARENAZ(new_type_details);
 	}
@@ -13249,7 +13306,7 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
 #if NVSIZE <= IVSIZE
 	SvANY(dstr) = (XPVNV*)((char*)&(dstr->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
 #else
-	SvANY(dstr)	= new_XNV();
+	new_XNV(SvANY(dstr));
 #endif
 	SvNV_set(dstr, SvNVX(sstr));
 	break;
@@ -13281,9 +13338,13 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
 	    case SVt_PV:
 		assert(sv_type_details->body_size);
 		if (sv_type_details->arena) {
+#ifndef PURIFY
 		    new_body_inline(new_body, sv_type);
 		    new_body
 			= (void*)((char*)new_body - sv_type_details->offset);
+#else
+		    assert(0);
+#endif
 		} else {
 		    new_body = new_NOARENA(sv_type_details);
 		}
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

From @doughera88

On Tue, Sep 30, 2014 at 03​:34​:00PM -0700, bulk88 wrote​:

# New Ticket Created by bulk88
# Please include the string​: [perl #122872]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122872 >

This is a bug report for perl from bulk88@​hotmail.com,
generated with the help of perlbug 1.40 running under perl 5.21.4.

From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00​:00​:00 2001
From​: Daniel Dragan <bulk88@​hotmail.com>
Date​: Tue, 30 Sep 2014 18​:30​:26 -0400
Subject​: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
U32PBYTE macros

Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches
breakage from sv_flags changes, and bizzare platforms.

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to hardware
instructions. However, this patch includes the following tweak to sv.c​:

  - SvFLAGS(sv) &= ~SVTYPEMASK;
  - SvFLAGS(sv) |= new_type;
  + *U32PBYTE0(&SvFLAGS(sv)) = new_type;

I have to admit that I actually find the old way easier to read.

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file. I would not
be at all surprised if the reverse were true on other systems. I did
not test speed at all.

--
  Andy Dougherty doughera@​lafayette.edu

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

From @jhi

I have to agree with Andy. The new code is harder to understand.

And I think the new code goes exactly against the stated goal...

"Let the compiler and CPU designers figure out how to read/write 1 byte in
a 32/64/128 bit cpu"

As far as I can tell the patch is microoptimizing for the speed of one
particular CPU (and one particular compiler). And while doing that,
pessimizing for readability, and all other CPU/compiler combos.

On Thu, Oct 2, 2014 at 5​:57 PM, Andy Dougherty <doughera@​lafayette.edu>
wrote​:

On Tue, Sep 30, 2014 at 03​:34​:00PM -0700, bulk88 wrote​:

# New Ticket Created by bulk88
# Please include the string​: [perl #122872]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122872 >

This is a bug report for perl from bulk88@​hotmail.com,
generated with the help of perlbug 1.40 running under perl 5.21.4.

From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00​:00​:00 2001
From​: Daniel Dragan <bulk88@​hotmail.com>
Date​: Tue, 30 Sep 2014 18​:30​:26 -0400
Subject​: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
U32PBYTE macros

Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software
level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches
breakage from sv_flags changes, and bizzare platforms.

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to hardware
instructions. However, this patch includes the following tweak to sv.c​:

\-    SvFLAGS\(sv\) &= ~SVTYPEMASK;
\-    SvFLAGS\(sv\) |= new\_type;
\+    \*U32PBYTE0\(&SvFLAGS\(sv\)\) = new\_type;

I have to admit that I actually find the old way easier to read.

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file. I would not
be at all surprised if the reverse were true on other systems. I did
not test speed at all.

--
Andy Dougherty doughera@​lafayette.edu

--
There is this special biologist word we use for 'stable'. It is 'dead'. --
Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

From @bulk88

On Thu Oct 02 08​:58​:11 2014, doughera wrote​:

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to
hardware
instructions. However, this patch includes the following tweak to
sv.c​:

- SvFLAGS(sv) &= ~SVTYPEMASK;
- SvFLAGS(sv) |= new_type;
+ *U32PBYTE0(&SvFLAGS(sv)) = new_type;

I have to admit that I actually find the old way easier to read.

How many times a year does someone read it? I could add a comment.

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file. I would not
be at all surprised if the reverse were true on other systems.

Can you attach before and after .o'es or ELFs/exec binaries?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

From @bulk88

On Thu Oct 02 15​:28​:09 2014, bulk88 wrote​:

On Thu Oct 02 08​:58​:11 2014, doughera wrote​:

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to
hardware
instructions. However, this patch includes the following tweak to
sv.c​:

- SvFLAGS(sv) &= ~SVTYPEMASK;
- SvFLAGS(sv) |= new_type;
+ *U32PBYTE0(&SvFLAGS(sv)) = new_type;

I have to admit that I actually find the old way easier to read.

How many times a year does someone read it? I could add a comment.

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file. I would not
be at all surprised if the reverse were true on other systems.

Can you attach before and after .o'es or ELFs/exec binaries?

Forgot to mention, machine code size wasn't exactly the point of this commit, that is why I didn't collect it. I inlined something that was previously a function call on some platforms (where SVt_NV has an arena body). Making SV upgrading to SVt_NV not have additional function calls on the most common path of upgrading to SVt_NV was the main point of the commit. " *U32PBYTE0(&SvFLAGS(sv)) = new_type;" was atonement for the inlining, plus it was done before the inlining/API/DOCs cleaning up was finished. The U32PBYTE0 compiled to


  *U32PBYTE0(&SvFLAGS(sv)) = new_type;


mov eax, [ebp+new_type]; means read C stack var new_type
;removed irrelevant instructions for the upcoming switch()
mov [esi+8], al ;register al is lowest byte of reg eax, esi is SV*


previously on VC 2003 the code was


  SvFLAGS(sv) &= ~SVTYPEMASK;
  SvFLAGS(sv) |= new_type;


mov eax, [ebp+new_type]
;removed
mov byte ptr [esi+8], 0 ;esi is SV*
;removed
or [esi+8], eax


Mingw GCC Strawberry 5.18 -O2, old way


mov esi, [esp+7Ch+new_type]
;removed, lots of branching in here
mov edx, [ebx+8]
;removed, lots of branching in here
;removed
mov eax, edx ;move flags to eax
xor al, al ;zero the register
or eax, esi ;note GCC is a RISC compiler at its heart, and wont use CISC style instructions on x86
;removed
mov [ebx+8], eax ;write to SV head


Should I produce ARM or PARISC binaries to prove my point?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2014

From @bulk88

On Thu Oct 02 16​:28​:54 2014, bulk88 wrote​:

Should I produce ARM or PARISC binaries to prove my point?

PARISC old way

ldw 8(%r4), %r31 ; read 32 bits at addr reg r4+8 into reg r32
depwi 0, 31, 8, %r31 ;"deposit word immediate", write 0 into bits 0 to 8 of reg r32?? confusing description in pdf
or %r6, %r31, %rp ;reg r6 has new_type in it, r31 is masked sv_flags, write result into reg rp
stw %rp, 8(%r4);write reg rp to sv head at addr reg r4+8

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Oct 3, 2014

From @doughera88

On Thu, Oct 02, 2014 at 03​:28​:09PM -0700, bulk88 via RT wrote​:

On Thu Oct 02 08​:58​:11 2014, doughera wrote​:

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to
hardware
instructions. However, this patch includes the following tweak to
sv.c​:

- SvFLAGS(sv) &= ~SVTYPEMASK;
- SvFLAGS(sv) |= new_type;
+ *U32PBYTE0(&SvFLAGS(sv)) = new_type;

I have to admit that I actually find the old way easier to read.

How many times a year does someone read it? I could add a comment.

I would still prefer the older way. Among other things, the U32PBYTE0
approach assumes that new_type will always be stored in byte 0
of a U32-sized structure, and that SVTYPEMASK will always be 0xff.
(Those are probably reasonable assumptions, but it did slow me down to
go and check them.)

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file. I would not
be at all surprised if the reverse were true on other systems.

Can you attach before and after .o'es or ELFs/exec binaries?

I don't have them handy, and regenerating them would take a bit of effort
(the system where I tested is currently offline). Here's the short
summary​: the simple one-line diff in sv.c​:

  - SvFLAGS(sv) &= ~SVTYPEMASK; SvFLAGS(sv) |= new_type;
  + *U32PBYTE0(&SvFLAGS(sv)) = new_type;

led to over 1300 non-cosmetic changed lines in the assembly output file.
(I removed trivial things like different label numbers.) Yes, it is
true that the SPARC assembler has an 'stb' directive to store a single
byte, but changing just that one line of C code caused the compiler to
change lots of other things as well, such as which registers were used in
what order and for what purpose. The net change in size was tiny, and,
as I observed, I wouldn't be at all surprised if the change was in the
opposite direction on other systems.

Absent any data indicating it's a meaningful improvement in size or
speed, I was merely registering my preference to keep it the way it was.

--
  Andy Dougherty doughera@​lafayette.edu

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2014

From @khwilliamson

On 10/03/2014 12​:12 PM, Andy Dougherty wrote​:

How many times a year does someone read it? I could add a comment.
I would still prefer the older way. Among other things, the U32PBYTE0
approach assumes that new_type will always be stored in byte 0
of a U32-sized structure, and that SVTYPEMASK will always be 0xff.
(Those are probably reasonable assumptions, but it did slow me down to
go and check them.)

I agree with Andy here.

As Jarkko said somewhat recently, older and wiser than before, he should
all along have been coding with maintainability as the first priority.

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2014

From @bulk88

On Fri Oct 03 11​:13​:05 2014, doughera wrote​:

I would still prefer the older way. Among other things, the U32PBYTE0
approach assumes that new_type will always be stored in byte 0
of a U32-sized structure, and that SVTYPEMASK will always be 0xff.
(Those are probably reasonable assumptions, but it did slow me down to
go and check them.)

I will revise it so the internals of how to assign a type to an SV head is hidden and move the assert to somewhere else.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2014

From @bulk88

On Sat Oct 04 00​:58​:36 2014, bulk88 wrote​:

I will revise it so the internals of how to assign a type to an SV
head is hidden and move the assert to somewhere else.

New patch attached. Details of setting the type of an SV were placed behind an api/macro.

On ARM, setting SV type was 4 instructions previously, now it is 1 instruction after the patch.
The ARM code from EVC 4 -01 this area of code, I am not sure if
"new_type_details = bodies_by_type + new_type;" is in the branch or optimizer moved it somewhere else


  new_type_details = bodies_by_type + new_type;

  SvTYPE_set_mem(sv, new_type);

  /* This can't happen, as SVt_NULL is <= all values of new_type, so one of
  the return statements above will have triggered. */
  assert (new_type != SVt_NULL);
  switch (new_type) {
  case SVt_IV​:


*/star next to instruction address means sv type setting related instructions

old


280CAFF0 MOV LR, #0 ; U32 null = 0x00000000;
.........
;start of branch
280CB0E4* STRB LR, [R11,#8] ; *(U8*)(R11+sv_flags) = (U8)null;
280CB0E8 MOV R3, R7,LSL#3
280CB0EC* LDR R0, [R11,#8] ; U32 old_sv_flags = *(U32*)(R11+sv_flags);
280CB0F0 SUB R2, R7, #1
280CB0F4 CMP R2, #0xE
280CB0F8* ORR R1, R0, R7 ; new_sv_flags = old_sv_flags | new_type;
280CB0FC ADD R6, R3, R8
280CB100* STR R1, [R11,#8] ; *(U32*)(R11+sv_flags) = new_sv_flags;
280CB104 BHI loc_280CB428 ; jump if higher than unsigned
280CB108 MOV R0, R2,LSL#1
280CB10C ADD R0, R0, PC
280CB110 LDRH R0, [R0,#4] ; R0 = *(U16*)(R0+4);
280CB114 ADD PC, PC, R0 ;instruction_pointer += R0; //jump table
280CB118 ; an inline in machine code U16 array lives here for the jump table
.............
280CB428 loc_280CB428​:
280CB428 LDR R0, ="panic​: sv_upgrade to unknown type %lu"
280CB42C MOV R1, R7 ; arg_2 = new_type;
280CB430 BL Perl_croak


new


;start of branch
280CB0EC MOV R2, R7,LSL#3
280CB0F0* STRB R7, [R11,#8] ; store byte in R7(new_type) to *(U8*)(R11+sv_flags)
280CB0F4 SUB R1, R7, #1
280CB0F8 AND R0, R7, #0xFF ; PURPOSE UNKNOWN, not in old asm, R0's val never used again compiler bug?
280CB0FC ADD R6, R2, R8
280CB100 CMP R1, #0xE
280CB104 BHI loc_280CB44C ;jump if higher than unsigned
280CB108 MOV R0, R1,LSL#1 ; R0 is wiped
280CB10C ADD R0, R0, PC
280CB110 LDRH R0, [R0,#4] ; R0 = *(U16*)(R0+4);
280CB114 ADD PC, PC, R0 ; instruction_pointer += R0; //jump table
280CB118 ; an inline in machine code U16 array lives here for the jump table
.............
280CB44C loc_280CB44C​:
280CB44C LDR R0, ="panic​: sv_upgrade to unknown type %lu" ; R0 is wiped
280CB450 MOV R1, R7 ; arg_2 = new_type;
280CB454 BL Perl_croak


In any case, how is STRB implemented on ARM? Obviously I dont have access to a commercial ARM core's source code, so a FOSS ARM CPU must do.


assign store_op = mem_op && !instruction[20];
................................................
  // Load & Store instructions
  if ( mem_op )
  begin
  saved_current_instruction_wen = 1'd1; // Save the memory access instruction to refer back to later
  pc_wen_nxt = 1'd0; // hold current PC value
  data_access_exec_nxt = 1'd1; // indicate that its a data read or write,
  // rather than an instruction fetch
  alu_out_sel_nxt = 4'd1; // Add
 
  if ( !instruction[23] ) // U​: Subtract offset
  begin
  alu_cin_sel_nxt = 2'd1; // cin = 1
  alu_not_sel_nxt = 1'd1; // invert B
  end
//bulk88 says bit 20 is difference between load and store, that was set earlier
  if ( store_op )
  begin
  write_data_wen_nxt = 1'd1;
//bulk88 says bit 22 in ARM op is byte or 32 bit flag, if 1, then this is byte ARM op
//bulk88 says TRANS the ARM instruction prefix that says this is a load/store class op
//bulk88 says STR, STRB, LDR, LDRB all have TRANS prefix
  if ( type == TRANS && instruction[22] )
  byte_enable_sel_nxt = 2'd1; // Save byte
  end
 
  // need to update the register holding the address ?
  // This is Rn bits [19​:16]
//bulk88 says pre-indexed/post-indexed is a feature where in a load/store addr reg operand is changed during the op
//bulk88 says this makes incrementing a array pointer during a loop be part of the array deref op
//bulk88 says can be "reg_dst = *(reg_src = reg_src+const_offset)"
//bulk88 says can be "reg_dst = *((reg_invisible = reg_src), (reg_src = reg_src+const_offset), (reg_invisible))"
//bulk88 says or normal idea of a move "reg_dst = *(reg_src+const_offset)"
  if ( mem_op_pre_indexed || mem_op_post_indexed )
  begin
  // Check is the load destination is the PC
  if ( o_rn_sel_nxt == 4'd15 )
  pc_sel_nxt = 2'd1;
  else
  reg_bank_wsel_nxt = o_rn_sel_nxt;
  end
 
  // if post-indexed, then use Rn rather than ALU output, as address
  if ( mem_op_post_indexed )
  address_sel_nxt = 4'd4; // Rn
  else
  address_sel_nxt = 4'd1; // alu out
 
  if ( instruction[25] && type == TRANS )
  barrel_shift_data_sel_nxt = 2'd2; // Shift value from Rm register
 
  if ( type == TRANS && instruction[25] && shift_imm != 5'd0 )
  begin
  barrel_shift_function_nxt = instruction[6​:5];
  barrel_shift_amount_sel_nxt = 2'd2; // imm_shift_amount
  end
  end
.....................................
wire [31​:0] write_data_word;
......................................
input [3​:0] i_byte_enable,
......................................
wire [CACHE_LINE_WIDTH-1​:0] write_hit_wdata;
......................................
wire [31​:0] rd;
......................................
// bulk88 says, this takes lowest 2 bits for U8s, or 2nd to lowest bit for U16s, of
// the output address, and picks a concat opcode to use later
// Note, this statement supports U16 internal to CPU opcode, but this CPU doesn't
// support LDRH or STRH machine code instructions since
// this is a ARM V2 CPU, U16 support was added in ARM V4. Orig comments ahead.
// ========================================================
// Byte Enable Select
// ========================================================
assign byte_enable_nxt = i_byte_enable_sel == 2'd0 ? 4'b1111 : // word write
  i_byte_enable_sel == 2'd2 ? // halfword write
  ( o_address_nxt[1] == 1'd0 ? 4'b0011 :
  4'b1100 ) :
 
  o_address_nxt[1​:0] == 2'd0 ? 4'b0001 : // byte write
  o_address_nxt[1​:0] == 2'd1 ? 4'b0010 :
  o_address_nxt[1​:0] == 2'd2 ? 4'b0100 :
  4'b1000 ;

//bulk88 says this statment says, if its a byte op, take lowest byte, copy it to all 4 bytes of the word
//bulk88 says this later allows the concating to work, in that no matter which byte is selected of the 4
//bulk88 says bytes to be spliced into the output U32 word, they will all be the same, note this statment
//bulk88 says is not short/U16 compatible
// ========================================================
// Write Data Select
// ========================================================
assign write_data_nxt = i_byte_enable_sel == 2'd0 ? rd :
  {4{rd[ 7​:0]}} ;
......................................
//bulk88 says {} is concat operator, this expression picks then concats/fuses
//the U8s, U16s or U32 that make up the output U32, using the opcode from above
//NOTICE it does not use "&" then "|" to get the job done the way Perl was doing it. Orig comments ahead.
// Use Byte Enables
assign write_data_word = i_byte_enable == 4'b0001 ? { o_read_data[31​: 8], i_write_data[ 7​: 0] } :
  i_byte_enable == 4'b0010 ? { o_read_data[31​:16], i_write_data[15​: 8], o_read_data[ 7​:0]} :
  i_byte_enable == 4'b0100 ? { o_read_data[31​:24], i_write_data[23​:16], o_read_data[15​:0]} :
  i_byte_enable == 4'b1000 ? { i_write_data[31​:24], o_read_data[23​:0]} :
  i_byte_enable == 4'b0011 ? { o_read_data[31​:16], i_write_data[15​: 0] } :
  i_byte_enable == 4'b1100 ? { i_write_data[31​:16], o_read_data[15​:0]} :
  i_write_data ;
.....................................
assign write_hit_wdata = i_address[3​:2] == 2'd0 ? {hit_rdata[127​:32], write_data_word } :
  i_address[3​:2] == 2'd1 ? {hit_rdata[127​:64], write_data_word, hit_rdata[31​:0] } :
  i_address[3​:2] == 2'd2 ? {hit_rdata[127​:96], write_data_word, hit_rdata[63​:0] } :
  { write_data_word, hit_rdata[95​:0] } ;
.....................................
  // Data comes in off the WB bus in wrap4 with the missed data word first
assign data_wdata = write_hit && c_state == CS_IDLE ? write_hit_wdata : read_miss_wdata;


So "STRB" is implemented in hardware, not extra opcodes.
algorithm is if byte write, take src register, take low byte, copy to bytes 1-3 of src register, take address,
use low 2 bits to figure out alignment, then splice as 2 32 bit values, then splice 32 bit value into 16 byte cache line

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2014

From @bulk88

0001-fix-SV-body-docs-in-sv.c-PURIFY-cleanup-body-API-add.patch
From c0f659f700ee7ebebc7fca6196cc09be44d8cab3 Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Fri, 10 Oct 2014 19:14:20 -0400
Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
 U32PBYTE macros

Change setting of SV's new type in sv_flags directly using 1 byte
assignment to avoid a data dependency on a load, atleast on software level.
Instead of load, AND, OR, store machine code ops. Just do store. All x86
and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes except
early Alphas CPUs. Let the compiler and CPU designers figure out how to
read/write 1 byte in a 32/64/128 bit cpu. Make an assert in
Perl_init_constants (to not slow down DEBUGGING builds too much) that
catches breakage from sv_flags changes, and bizzare platforms.

Docs are very out of date, bring them them into the present. Docs describe
APIs which are nearly or are gone from the core, so remove the references
to those APIs.

Change new_body_inline to be dbg stepable in DEBUGGUING builds (similar to
purpose of S_del_sv). Keep macro version for performance in
non-DEBUGGING. In sv_upgrade remove all uses of S_new_body so
sv_upgrade has less function calls, or is function call free in some
permutations. This required changing proto of macro new_XNV which is
not public API. Factor out the ghost fields subtraction into S_new_body
instead of subtraction happening in callers of S_new_body. S_new_body is
the "slow" version of new_body_inline.

Make -DPURIFY compile and work on Win32 (and presumably other platforms),
although I dont have access to PURIFY tools. Make new_body_inline token
undefined on PURIFY. Put asserts to make sure we never try to allocate an
arena body with PURIFY and therefore whether the body_details table is
correctly setup for PURIFY builds.
---
 perl.h |   10 +++++
 sv.c   |  134 +++++++++++++++++++++++++++++++++++++++++++++++----------------
 sv.h   |    8 ++++
 3 files changed, 118 insertions(+), 34 deletions(-)

diff --git a/perl.h b/perl.h
index 0bd8a53..95498fc 100644
--- a/perl.h
+++ b/perl.h
@@ -3808,6 +3808,12 @@ my_swap16(const U16 x) {
 #  define vtohs(x)      ((x)&0xFFFF)
 #  define htovl(x)      vtohl(x)
 #  define htovs(x)      vtohs(x)
+/* U32PBYTE*(x) is a macro that splits a U32 * into mutable U8 *s which go from
+   LSB (byte 0) to MSB (byte 3), these macros account for platform endianess */
+#  define U32PBYTE0(x) ((U8*)((Size_t)(x)+0))
+#  define U32PBYTE1(x) ((U8*)((Size_t)(x)+1))
+#  define U32PBYTE2(x) ((U8*)((Size_t)(x)+2))
+#  define U32PBYTE3(x) ((U8*)((Size_t)(x)+3))
 #elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321
 #  define vtohl(x)	((((x)&0xFF)<<24)	\
 			+(((x)>>24)&0xFF)	\
@@ -3816,6 +3822,10 @@ my_swap16(const U16 x) {
 #  define vtohs(x)	((((x)&0xFF)<<8) + (((x)>>8)&0xFF))
 #  define htovl(x)	vtohl(x)
 #  define htovs(x)	vtohs(x)
+#  define U32PBYTE0(x) ((U8*)((Size_t)(x)+3))
+#  define U32PBYTE1(x) ((U8*)((Size_t)(x)+2))
+#  define U32PBYTE2(x) ((U8*)((Size_t)(x)+1))
+#  define U32PBYTE3(x) ((U8*)((Size_t)(x)+0))
 #else
 #  error "Unsupported byteorder"
 /* If you have need for current perl on PDP-11 or similar, and can help test
diff --git a/sv.c b/sv.c
index 665a0f6..b182c8b 100644
--- a/sv.c
+++ b/sv.c
@@ -800,24 +800,37 @@ Perl_sv_free_arenas(pTHX)
 
 =cut
 
+WARNING: Every aspect of the internals of SV bodies has repeatedly changed
+over the course of Perl's history. This section describes the current
+implementation, not past implementations. Historically this section often
+became out of sync with the code that deals with SV bodys. The source code is
+authoritative, not this document.
+
 Allocation of SV-bodies is similar to SV-heads, differing as follows;
 the allocation mechanism is used for many body types, so is somewhat
 more complicated, it uses arena-sets, and has no need for still-live
 SV detection.
 
 At the outermost level, (new|del)_X*V macros return bodies of the
-appropriate type.  These macros call either (new|del)_body_type or
-(new|del)_body_allocated macro pairs, depending on specifics of the
-type.  Most body types use the former pair, the latter pair is used to
-allocate body types with "ghost fields".
+appropriate type.  The C<new_*> macros call either C<S_new_body> or use
+C<new_body_inline> directly based on performance considerations.
+In both cases, C<S_new_body> implictly and for C<new_body_inline> explictly
+the body pointer is backed up based on whether the type has "ghost fields"
+and size of all of them. To delete a body, if it is allocated in the first
+place, C<del_body> is called. PURIFY builds do something else that this
+paragraph didn't cover.
 
 "ghost fields" are fields that are unused in certain types, and
 consequently don't need to actually exist.  They are declared because
-they're part of a "base type", which allows use of functions as
-methods.  The simplest examples are AVs and HVs, 2 aggregate types
-which don't use the fields which support SCALAR semantics.
-
-For these types, the arenas are carved up into appropriately sized
+they're part of a "base type", which allows use the same macros, and therefore
+internally the same C/assembly code struct offset to be used to read the stuct
+member regardless of low or high (or simple or fancy) the particular body type
+is.  For an example, lets look at IVs, RVs (a type of IV now) and PVs. These
+3 types aren't blessed, or magical, so they do not use, and therefore do not
+need the fields which support (and implement) blessed (an HV * that represent
+the class) and magic (the MG *) semantics.
+
+For types with ghost fields, the arenas are carved up into appropriately sized
 chunks, we thus avoid wasted memory for those unaccessed members.
 When bodies are allocated, we adjust the pointer back in memory by the
 size of the part not allocated, so it's as if we allocated the full
@@ -825,6 +838,8 @@ structure.  (But things will all go boom if you write to the part that
 is "not there", because you'll be overwriting the last members of the
 preceding structure in memory.)
 
+A long time ago in Perl's history, every body struct type had the NV as
+the first member, now the HV * that indicated a blessed object is first.
 We calculate the correction using the STRUCT_OFFSET macro on the first
 member present.  If the allocated structure is smaller (no initial NV
 actually allocated) then the net effect is to subtract the size of the NV
@@ -836,16 +851,17 @@ optimised accesses based on the alignment constraint of the actual pointer
 to the full structure, for example, using a single 64 bit load instruction
 because it "knows" that two adjacent 32 bit members will be 8-byte aligned.)
 
-This is the same trick as was used for NV and IV bodies.  Ironically it
-doesn't need to be used for NV bodies any more, because NV is now at
-the start of the structure.  IV bodies, and also in some builds NV bodies,
-don't need it either, because they are no longer allocated.
+This is the same trick as was used for NV and IV bodies. Over time, more types
+got ghost fields through subsequent refactorings. Also eventually, IV
+(and RVs which are implemented as IVs) lost their arena-ed and allocated bodies
+completly. A 4th member was added to the SV head. This 4th member has multiple
+uses now based on type, but for IVs, RVs, and also in some builds NV bodies,
+the body pointer is a corrected value to the 4th member of the head. No arena,
+no malloc.
 
-In turn, the new_body_* allocators call S_new_body(), which invokes
-new_body_inline macro, which takes a lock, and takes a body off the
-linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
-necessary to refresh an empty list.  Then the lock is released, and
-the body is returned.
+S_new_body(), which invokes new_body_inline macro. The new_body_inline macro
+takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
+necessary to refresh an empty list, and then the body is returned.
 
 Perl_more_bodies allocates a new arena, and carves it up into an array of N
 bodies, which it strings into a linked list.  It looks up arena-size
@@ -853,7 +869,8 @@ and body-size from the body_details table described below, thus
 supporting the multiple body-types.
 
 If PURIFY is defined, or PERL_ARENA_SIZE=0, arenas are not used, and
-the (new|del)_X*V macros are mapped directly to malloc/free.
+the (new|del)_X*V macros are mapped directly to malloc/free.  Macro
+new_body_inline is never called on a PURIFY build.
 
 For each sv-type, struct body_details bodies_by_type[] carries
 parameters which control these aspects of SV handling:
@@ -1023,10 +1040,6 @@ static const struct body_details bodies_by_type[] = {
       FIT_ARENA(24, sizeof(XPVIO)) },
 };
 
-#define new_body_allocated(sv_type)		\
-    (void *)((char *)S_new_body(aTHX_ sv_type)	\
-	     - bodies_by_type[sv_type].offset)
-
 /* return a thing to the free list */
 
 #define del_body(thing, root)				\
@@ -1038,7 +1051,7 @@ static const struct body_details bodies_by_type[] = {
 
 #ifdef PURIFY
 #if !(NVSIZE <= IVSIZE)
-#  define new_XNV()	safemalloc(sizeof(XPVNV))
+#  define new_XNV(x)	((x) = safemalloc(sizeof(XPVNV)))
 #endif
 #define new_XPVNV()	safemalloc(sizeof(XPVNV))
 #define new_XPVMG()	safemalloc(sizeof(XPVMG))
@@ -1046,12 +1059,20 @@ static const struct body_details bodies_by_type[] = {
 #define del_XPVGV(p)	safefree(p)
 
 #else /* !PURIFY */
-
+/* this is the only of the 3 new_* macros to be used in sv_upgrade which is hot
+   so use inline macro version */
 #if !(NVSIZE <= IVSIZE)
-#  define new_XNV()	new_body_allocated(SVt_NV)
+#  ifdef DEBUGGING
+#    define new_XNV(x) x = 	S_new_body(aTHX_ SVt_NV)
+#  else
+#    define new_XNV(x)							\
+    STMT_START { new_body_inline(x, SVt_NV);				\
+	x = (void *)((char *)x - bodies_by_type[SVt_NV].offset);	\
+    } STMT_END
+#  endif
 #endif
-#define new_XPVNV()	new_body_allocated(SVt_PVNV)
-#define new_XPVMG()	new_body_allocated(SVt_PVMG)
+#define new_XPVNV()	S_new_body(aTHX_ SVt_PVNV)
+#define new_XPVMG()	S_new_body(aTHX_ SVt_PVMG)
 
 #define del_XPVGV(p)	del_body(p + bodies_by_type[SVt_PVGV].offset,	\
 				 &PL_body_roots[SVt_PVGV])
@@ -1162,6 +1183,10 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
    The inline version is used for speed in hot routines, and the
    function using it serves the rest (unless PURIFY).
 */
+
+#ifndef PURIFY
+
+#ifndef DEBUGGING
 #define new_body_inline(xpv, sv_type) \
     STMT_START { \
 	void ** const r3wt = &PL_body_roots[sv_type]; \
@@ -1171,14 +1196,27 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size,
 					     bodies_by_type[sv_type].arena_size)); \
 	*(r3wt) = *(void**)(xpv); \
     } STMT_END
-
-#ifndef PURIFY
+#else
+#define new_body_inline(xpv, sv_type) S_new_body_inline(aTHX_ &(xpv), sv_type)
+void
+S_new_body_inline(pTHX_ void ** xpvp, const svtype sv_type) {
+	void * xpv;
+	void ** const r3wt = &PL_body_roots[sv_type];
+	xpv = (PTR_TBL_ENT_t*) (*((void **)(r3wt))
+	  ? *((void **)(r3wt)) : Perl_more_bodies(aTHX_ sv_type,
+					     bodies_by_type[sv_type].body_size,
+					     bodies_by_type[sv_type].arena_size));
+	*(r3wt) = *(void**)(xpv);
+        *xpvp = xpv;
+    }
+#endif
 
 STATIC void *
 S_new_body(pTHX_ const svtype sv_type)
 {
     void *xpv;
     new_body_inline(xpv, sv_type);
+    xpv = (void *)((char *)xpv - bodies_by_type[sv_type].offset);
     return xpv;
 }
 
@@ -1321,8 +1359,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 
     new_type_details = bodies_by_type + new_type;
 
-    SvFLAGS(sv) &= ~SVTYPEMASK;
-    SvFLAGS(sv) |= new_type;
+    SvTYPE_set_mem(sv, new_type);
 
     /* This can't happen, as SVt_NULL is <= all values of new_type, so one of
        the return statements above will have triggered.  */
@@ -1338,7 +1375,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 #if NVSIZE <= IVSIZE
 	SvANY(sv) = (XPVNV*)((char*)&(sv->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
 #else
-	SvANY(sv) = new_XNV();
+	new_XNV(SvANY(sv));
 #endif
 	SvNV_set(sv, 0);
 	return;
@@ -1418,10 +1455,14 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type)
 	/* We always allocated the full length item with PURIFY. To do this
 	   we fake things so that arena is false for all 16 types..  */
 	if(new_type_details->arena) {
+#ifndef PURIFY
 	    /* This points to the start of the allocated area.  */
 	    new_body_inline(new_body, new_type);
 	    Zero(new_body, new_type_details->body_size, char);
 	    new_body = ((char *)new_body) - new_type_details->offset;
+#else
+	    assert(0);
+#endif
 	} else {
 	    new_body = new_NOARENAZ(new_type_details);
 	}
@@ -13257,7 +13298,7 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
 #if NVSIZE <= IVSIZE
 	SvANY(dstr) = (XPVNV*)((char*)&(dstr->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv));
 #else
-	SvANY(dstr)	= new_XNV();
+	new_XNV(SvANY(dstr));
 #endif
 	SvNV_set(dstr, SvNVX(sstr));
 	break;
@@ -13289,9 +13330,13 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param)
 	    case SVt_PV:
 		assert(sv_type_details->body_size);
 		if (sv_type_details->arena) {
+#ifndef PURIFY
 		    new_body_inline(new_body, sv_type);
 		    new_body
 			= (void*)((char*)new_body - sv_type_details->offset);
+#else
+		    assert(0);
+#endif
 		} else {
 		    new_body = new_NOARENA(sv_type_details);
 		}
@@ -14964,6 +15009,27 @@ Perl_clone_params_new(PerlInterpreter *const from, PerlInterpreter *const to)
 void
 Perl_init_constants(pTHX)
 {
+#ifdef DEBUGGING /* sanity check for U32PBYTE0 and SvTYPE_set_mem */
+{   U32 assert_low = SVTYPEMASK;	U32 assert_hi = 0xFF000000;
+    struct STRUCT_SV sv_head;
+    SV * sv = &sv_head;
+    PERL_UNUSED_ARG(assert_low);	PERL_UNUSED_ARG(assert_hi);
+    assert(SVTYPEMASK == 0xff
+           && *U32PBYTE0(&assert_low) == SVTYPEMASK
+           && (*U32PBYTE0(&assert_low))         + (*U32PBYTE1(&assert_low) << 8)
+              + (*U32PBYTE2(&assert_low) << 16) + (*U32PBYTE3(&assert_low) << 24)
+                == SVTYPEMASK
+            && *U32PBYTE3(&assert_hi) == 0xFF
+            && (*U32PBYTE0(&assert_hi))         + (*U32PBYTE1(&assert_hi) << 8)
+                + (*U32PBYTE2(&assert_hi) << 16)+ (*U32PBYTE3(&assert_hi) << 24)
+                == 0xFF000000
+            );
+    sv->sv_flags = SVt_IV;
+    SvTYPE_set_mem(sv, SVt_NV);
+    assert(SvTYPE(sv) ==  SVt_NV);
+}
+#endif
+
     SvREFCNT(&PL_sv_undef)	= SvREFCNT_IMMORTAL;
     SvFLAGS(&PL_sv_undef)	= SVf_READONLY|SVf_PROTECT|SVt_NULL;
     SvANY(&PL_sv_undef)		= NULL;
diff --git a/sv.h b/sv.h
index 1590c73..d46afd6 100644
--- a/sv.h
+++ b/sv.h
@@ -401,6 +401,14 @@ perform the upgrade if necessary.  See C<svtype>.
 
 #define SVTYPEMASK	0xff
 #define SvTYPE(sv)	((svtype)((sv)->sv_flags & SVTYPEMASK))
+/* it is suggested for C compiler optimization reasons to only use this macro
+   if there are no other SV flags using/referencing statements are near
+   by in the code. This macro directly sets the memory in the SV head, and
+   avoids keeping the entire U32 sv flags in a temporary variable or register.
+   If you are modifying or testing the flags part of sv_flags member near by,
+   it is suggested to not use this macro and instead use traditional U32
+   arithmitic for optimization reasons */
+#define SvTYPE_set_mem(sv, type) (*U32PBYTE0(&SvFLAGS((sv))) = (type))
 
 /* Sadly there are some parts of the core that have pointers to already-freed
    SV heads, and rely on being able to tell that they are now free. So mark
-- 
1.7.9.msysgit.0

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2015

From @tonycoz

On Fri Oct 10 19​:37​:11 2014, bulk88 wrote​:

On Sat Oct 04 00​:58​:36 2014, bulk88 wrote​:

I will revise it so the internals of how to assign a type to an SV
head is hidden and move the assert to somewhere else.

New patch attached. Details of setting the type of an SV were placed
behind an api/macro.

Whatever the macro is called, it's basically a hack to work around perl
historically using a U32 with masks instead of a struct with bit fields.

I'm still not fond of it.

Looking at the documentation part of the patch, I see​:

+no malloc.

-In turn, the new_body_* allocators call S_new_body(), which invokes
-new_body_inline macro, which takes a lock, and takes a body off the
-linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
-necessary to refresh an empty list. Then the lock is released, and
-the body is returned.
+S_new_body(), which invokes new_body_inline macro. The new_body_inline macro
+takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if
+necessary to refresh an empty list, and then the body is returned.

which looks like something is missing from before S_new_body().

It looks like you've got at least three different changes in this patch​:

- the documentation update

- byte access to flags

- changes to the way the body is allocated.

It would be preferable to see them as separate patches.

Tony

@toddr
Copy link
Member

toddr commented Feb 13, 2020

This patch seems to have been rejected. @bulk88 please submit a pull request if you would like further discussion.

@toddr toddr closed this as completed Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants