Skip Menu |
Report information
Id: 122872
Status: open
Priority: 0/
Queue: perl5

Owner: tonyc <tony [at] develop-help.com>
Requestors: bulk88 <bulk88 [at] hotmail.com>
Cc:
AdminCc:

Operating System: mswin32
PatchStatus: (no value)
Severity: low
Type:
Perl Version: 5.21.4
Fixed In: (no value)



To: perlbug [...] perl.org
Date: Tue, 30 Sep 2014 18:32:28 -0400
From: bulk88 <bulk88 [...] hotmail.com>
Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros
Download (untitled) / with headers
text/plain 3.4k
This is a bug report for perl from bulk88@hotmail.com, generated with the help of perlbug 1.40 running under perl 5.21.4. ----------------------------------------------------------------- [Please describe your issue here] See attached patch. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.21.4: Configured by Owner at Thu Sep 18 12:08:58 2014. Summary of my perl5 (revision 5 version 21 subversion 4) configuration: Derived from: 7d2b2edb94ab56333b9049a3e26d15ea18445512 Ancestor: 19be3be6968e2337bcdfe480693fff795ecd1304 Platform: osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT -DPERL_TEXTMODE_SCRIPTS -DPERL_HASH_FUNC_ONE_AT_A_TIME -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -D_USE_32BIT_TIME_T', optimize='-O1 -MD -Zi -DNDEBUG', cppflags='-DWIN32' ccversion='12.00.8168', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8, longdblkind=0 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"c:\perl521\lib\CORE" -machine:x86' libpth=C:\PROGRA~1\MIAF9D~1\VC98\lib libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl521.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -libpath:"c:\perl521\lib\CORE" -machine:x86' Locally applied patches: uncommitted-changes a0fe7a7e75de29e59f1da0d6822dc06e5be658fe a261faffee83d0145642ab5d1d046c9f813bc497 6506ab86ad1602a9ca720fcd30446dce1461d23d 7d2b2edb94ab56333b9049a3e26d15ea18445512 --- @INC for perl 5.21.4: lib C:/perl521/srcnew/lib . --- Environment for perl 5.21.4: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH= PERL_BADLANG (unset) PERL_JSON_BACKEND=Cpanel::JSON::XS PERL_YAML_BACKEND=YAML SHELL (unset)
From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00:00:00 2001 From: Daniel Dragan <bulk88@hotmail.com> Date: Tue, 30 Sep 2014 18:30:26 -0400 Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros Change setting of SV's new type in sv_flags directly using 1 byte assignment to avoid a data dependency on a load, atleast on software level. Instead of load, AND, OR, store machine code ops. Just do store. All x86 and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept early Alphas CPUs. Let the compiler and CPU designers figure out how to read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches breakage from sv_flags changes, and bizzare platforms. Docs are very out of date, bring them them into the present. Docs describe APIs which are nearly or are gone from the core, so remove the references to those APIs. Change new_body_inline to be dbg stepable in DEBUGGUING builds (similar to purpose of S_del_sv). Keep macro version for performance in non-DEBUGGING. In sv_upgrade remove all uses of S_new_body so sv_upgrade has less function calls, or is function call free in some permutations. This required changing proto of macro new_XNV which is not public API. Factor out the ghost fields subtraction into S_new_body instead of subtraction happening in callers of S_new_body. S_new_body is the "slow" version of new_body_inline. Make -DPURIFY compile and work on Win32 (and presumably other platforms), although I dont have access to PURIFY tools. Make new_body_inline token undefined on PURIFY. Put asserts to make sure we never try to allocate an arena body with PURIFY and therefore whether the body_details table is correctly setup for PURIFY builds. --- perl.h | 10 +++++ sv.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 105 insertions(+), 34 deletions(-) diff --git a/perl.h b/perl.h index 436c7d1..4c3b542 100644 --- a/perl.h +++ b/perl.h @@ -3806,6 +3806,12 @@ my_swap16(const U16 x) { # define vtohs(x) ((x)&0xFFFF) # define htovl(x) vtohl(x) # define htovs(x) vtohs(x) +/* U32PBYTE*(x) is a macro that splits a U32 * into mutable U8 *s which go from + LSB (byte 0) to MSB (byte 3), these macros account for platform endianess */ +# define U32PBYTE0(x) ((U8*)((Size_t)(x)+0)) +# define U32PBYTE1(x) ((U8*)((Size_t)(x)+1)) +# define U32PBYTE2(x) ((U8*)((Size_t)(x)+2)) +# define U32PBYTE3(x) ((U8*)((Size_t)(x)+3)) #elif BYTEORDER == 0x4321 || BYTEORDER == 0x87654321 # define vtohl(x) ((((x)&0xFF)<<24) \ +(((x)>>24)&0xFF) \ @@ -3814,6 +3820,10 @@ my_swap16(const U16 x) { # define vtohs(x) ((((x)&0xFF)<<8) + (((x)>>8)&0xFF)) # define htovl(x) vtohl(x) # define htovs(x) vtohs(x) +# define U32PBYTE0(x) ((U8*)((Size_t)(x)+3)) +# define U32PBYTE1(x) ((U8*)((Size_t)(x)+2)) +# define U32PBYTE2(x) ((U8*)((Size_t)(x)+1)) +# define U32PBYTE3(x) ((U8*)((Size_t)(x)+0)) #else # error "Unsupported byteorder" /* If you have need for current perl on PDP-11 or similar, and can help test diff --git a/sv.c b/sv.c index 49bf8a0..f0e9817 100644 --- a/sv.c +++ b/sv.c @@ -800,24 +800,37 @@ Perl_sv_free_arenas(pTHX) =cut +WARNING: Every aspect of the internals of SV bodies has repeatedly changed +over the course of Perl's history. This section describes the current +implementation, not past implementations. Historically this section often +became out of sync with the code that deals with SV bodys. The source code is +authoritative, not this document. + Allocation of SV-bodies is similar to SV-heads, differing as follows; the allocation mechanism is used for many body types, so is somewhat more complicated, it uses arena-sets, and has no need for still-live SV detection. At the outermost level, (new|del)_X*V macros return bodies of the -appropriate type. These macros call either (new|del)_body_type or -(new|del)_body_allocated macro pairs, depending on specifics of the -type. Most body types use the former pair, the latter pair is used to -allocate body types with "ghost fields". +appropriate type. The C<new_*> macros call either C<S_new_body> or use +C<new_body_inline> directly based on performance considerations. +In both cases, C<S_new_body> implictly and for C<new_body_inline> explictly +the body pointer is backed up based on whether the type has "ghost fields" +and size of all of them. To delete a body, if it is allocated in the first +place, C<del_body> is called. PURIFY builds do something else that this +paragraph didn't cover. "ghost fields" are fields that are unused in certain types, and consequently don't need to actually exist. They are declared because -they're part of a "base type", which allows use of functions as -methods. The simplest examples are AVs and HVs, 2 aggregate types -which don't use the fields which support SCALAR semantics. - -For these types, the arenas are carved up into appropriately sized +they're part of a "base type", which allows use the same macros, and therefore +internally the same C/assembly code struct offset to be used to read the stuct +member regardless of low or high (or simple or fancy) the particular body type +is. For an example, lets look at IVs, RVs (a type of IV now) and PVs. These +3 types aren't blessed, or magical, so they do not use, and therefore do not +need the fields which support (and implement) blessed (an HV * that represent +the class) and magic (the MG *) semantics. + +For types with ghost fields, the arenas are carved up into appropriately sized chunks, we thus avoid wasted memory for those unaccessed members. When bodies are allocated, we adjust the pointer back in memory by the size of the part not allocated, so it's as if we allocated the full @@ -825,6 +838,8 @@ structure. (But things will all go boom if you write to the part that is "not there", because you'll be overwriting the last members of the preceding structure in memory.) +A long time ago in Perl's history, every body struct type had the NV as +the first member, now the HV * that indicated a blessed object is first. We calculate the correction using the STRUCT_OFFSET macro on the first member present. If the allocated structure is smaller (no initial NV actually allocated) then the net effect is to subtract the size of the NV @@ -836,16 +851,17 @@ optimised accesses based on the alignment constraint of the actual pointer to the full structure, for example, using a single 64 bit load instruction because it "knows" that two adjacent 32 bit members will be 8-byte aligned.) -This is the same trick as was used for NV and IV bodies. Ironically it -doesn't need to be used for NV bodies any more, because NV is now at -the start of the structure. IV bodies, and also in some builds NV bodies, -don't need it either, because they are no longer allocated. +This is the same trick as was used for NV and IV bodies. Over time, more types +got ghost fields through subsequent refactorings. Also eventually, IV +(and RVs which are implemented as IVs) lost their arena-ed and allocated bodies +completly. A 4th member was added to the SV head. This 4th member has multiple +uses now based on type, but for IVs, RVs, and also in some builds NV bodies, +the body pointer is a corrected value to the 4th member of the head. No arena, +no malloc. -In turn, the new_body_* allocators call S_new_body(), which invokes -new_body_inline macro, which takes a lock, and takes a body off the -linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if -necessary to refresh an empty list. Then the lock is released, and -the body is returned. +S_new_body(), which invokes new_body_inline macro. The new_body_inline macro +takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if +necessary to refresh an empty list, and then the body is returned. Perl_more_bodies allocates a new arena, and carves it up into an array of N bodies, which it strings into a linked list. It looks up arena-size @@ -853,7 +869,8 @@ and body-size from the body_details table described below, thus supporting the multiple body-types. If PURIFY is defined, or PERL_ARENA_SIZE=0, arenas are not used, and -the (new|del)_X*V macros are mapped directly to malloc/free. +the (new|del)_X*V macros are mapped directly to malloc/free. Macro +new_body_inline is never called on a PURIFY build. For each sv-type, struct body_details bodies_by_type[] carries parameters which control these aspects of SV handling: @@ -1023,10 +1040,6 @@ static const struct body_details bodies_by_type[] = { FIT_ARENA(24, sizeof(XPVIO)) }, }; -#define new_body_allocated(sv_type) \ - (void *)((char *)S_new_body(aTHX_ sv_type) \ - - bodies_by_type[sv_type].offset) - /* return a thing to the free list */ #define del_body(thing, root) \ @@ -1038,7 +1051,7 @@ static const struct body_details bodies_by_type[] = { #ifdef PURIFY #if !(NVSIZE <= IVSIZE) -# define new_XNV() safemalloc(sizeof(XPVNV)) +# define new_XNV(x) ((x) = safemalloc(sizeof(XPVNV))) #endif #define new_XPVNV() safemalloc(sizeof(XPVNV)) #define new_XPVMG() safemalloc(sizeof(XPVMG)) @@ -1046,12 +1059,20 @@ static const struct body_details bodies_by_type[] = { #define del_XPVGV(p) safefree(p) #else /* !PURIFY */ - +/* this is the only of the 3 new_* macros to be used in sv_upgrade which is hot + so use inline macro version */ #if !(NVSIZE <= IVSIZE) -# define new_XNV() new_body_allocated(SVt_NV) +# ifdef DEBUGGING +# define new_XNV(x) x = S_new_body(aTHX_ SVt_NV) +# else +# define new_XNV(x) \ + STMT_START { new_body_inline(x, SVt_NV); \ + x = (void *)((char *)x - bodies_by_type[SVt_NV].offset); \ + } STMT_END +# endif #endif -#define new_XPVNV() new_body_allocated(SVt_PVNV) -#define new_XPVMG() new_body_allocated(SVt_PVMG) +#define new_XPVNV() S_new_body(aTHX_ SVt_PVNV) +#define new_XPVMG() S_new_body(aTHX_ SVt_PVMG) #define del_XPVGV(p) del_body(p + bodies_by_type[SVt_PVGV].offset, \ &PL_body_roots[SVt_PVGV]) @@ -1162,6 +1183,10 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size, The inline version is used for speed in hot routines, and the function using it serves the rest (unless PURIFY). */ + +#ifndef PURIFY + +#ifndef DEBUGGING #define new_body_inline(xpv, sv_type) \ STMT_START { \ void ** const r3wt = &PL_body_roots[sv_type]; \ @@ -1171,14 +1196,27 @@ Perl_more_bodies (pTHX_ const svtype sv_type, const size_t body_size, bodies_by_type[sv_type].arena_size)); \ *(r3wt) = *(void**)(xpv); \ } STMT_END - -#ifndef PURIFY +#else +#define new_body_inline(xpv, sv_type) S_new_body_inline(aTHX_ &(xpv), sv_type) +void +S_new_body_inline(pTHX_ void ** xpvp, const svtype sv_type) { + void * xpv; + void ** const r3wt = &PL_body_roots[sv_type]; + xpv = (PTR_TBL_ENT_t*) (*((void **)(r3wt)) + ? *((void **)(r3wt)) : Perl_more_bodies(aTHX_ sv_type, + bodies_by_type[sv_type].body_size, + bodies_by_type[sv_type].arena_size)); + *(r3wt) = *(void**)(xpv); + *xpvp = xpv; + } +#endif STATIC void * S_new_body(pTHX_ const svtype sv_type) { void *xpv; new_body_inline(xpv, sv_type); + xpv = (void *)((char *)xpv - bodies_by_type[sv_type].offset); return xpv; } @@ -1225,6 +1263,22 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type) no longer need to unshare so as to free up the IVX slot for its proper purpose. So it's safe to move the early return earlier. */ +#ifdef DEBUGGING /* sanity check for U32PBYTE0 below */ +{ U32 assert_low = SVTYPEMASK; U32 assert_hi = 0xFF000000; + PERL_UNUSED_ARG(assert_low); PERL_UNUSED_ARG(assert_hi); + assert(SVTYPEMASK == 0xff + && *U32PBYTE0(&assert_low) == SVTYPEMASK + && (*U32PBYTE0(&assert_low)) + (*U32PBYTE1(&assert_low) << 8) + + (*U32PBYTE2(&assert_low) << 16) + (*U32PBYTE3(&assert_low) << 24) + == SVTYPEMASK + && *U32PBYTE3(&assert_hi) == 0xFF + && (*U32PBYTE0(&assert_hi)) + (*U32PBYTE1(&assert_hi) << 8) + + (*U32PBYTE2(&assert_hi) << 16)+ (*U32PBYTE3(&assert_hi) << 24) + == 0xFF000000 + ); +} +#endif + if (new_type > SVt_PVMG && SvIsCOW(sv)) { sv_force_normal_flags(sv, 0); } @@ -1321,8 +1375,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type) new_type_details = bodies_by_type + new_type; - SvFLAGS(sv) &= ~SVTYPEMASK; - SvFLAGS(sv) |= new_type; + *U32PBYTE0(&SvFLAGS(sv)) = new_type; /* This can't happen, as SVt_NULL is <= all values of new_type, so one of the return statements above will have triggered. */ @@ -1338,7 +1391,7 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type) #if NVSIZE <= IVSIZE SvANY(sv) = (XPVNV*)((char*)&(sv->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv)); #else - SvANY(sv) = new_XNV(); + new_XNV(SvANY(sv)); #endif SvNV_set(sv, 0); return; @@ -1418,10 +1471,14 @@ Perl_sv_upgrade(pTHX_ SV *const sv, svtype new_type) /* We always allocated the full length item with PURIFY. To do this we fake things so that arena is false for all 16 types.. */ if(new_type_details->arena) { +#ifndef PURIFY /* This points to the start of the allocated area. */ new_body_inline(new_body, new_type); Zero(new_body, new_type_details->body_size, char); new_body = ((char *)new_body) - new_type_details->offset; +#else + assert(0); +#endif } else { new_body = new_NOARENAZ(new_type_details); } @@ -13249,7 +13306,7 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param) #if NVSIZE <= IVSIZE SvANY(dstr) = (XPVNV*)((char*)&(dstr->sv_u.svu_nv) - STRUCT_OFFSET(XPVNV, xnv_u.xnv_nv)); #else - SvANY(dstr) = new_XNV(); + new_XNV(SvANY(dstr)); #endif SvNV_set(dstr, SvNVX(sstr)); break; @@ -13281,9 +13338,13 @@ S_sv_dup_common(pTHX_ const SV *const sstr, CLONE_PARAMS *const param) case SVt_PV: assert(sv_type_details->body_size); if (sv_type_details->arena) { +#ifndef PURIFY new_body_inline(new_body, sv_type); new_body = (void*)((char*)new_body - sv_type_details->offset); +#else + assert(0); +#endif } else { new_body = new_NOARENA(sv_type_details); } -- 1.7.9.msysgit.0
Date: Thu, 2 Oct 2014 11:57:35 -0400
From: Andy Dougherty <doughera [...] lafayette.edu>
To: perl5-porters [...] perl.org
Subject: Re: [perl #122872] [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros
Download (untitled) / with headers
text/plain 1.7k
On Tue, Sep 30, 2014 at 03:34:00PM -0700, bulk88 wrote: Show quoted text
> # New Ticket Created by bulk88 > # Please include the string: [perl #122872] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=122872 > > > > This is a bug report for perl from bulk88@hotmail.com, > generated with the help of perlbug 1.40 running under perl 5.21.4.
Show quoted text
> >From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00:00:00 2001
> From: Daniel Dragan <bulk88@hotmail.com> > Date: Tue, 30 Sep 2014 18:30:26 -0400 > Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add > U32PBYTE macros > > Change setting of SV's new type in sv_flags directly using 1 byte > assignment to avoid a data dependency on a load, atleast on software level. > Instead of load, AND, OR, store machine code ops. Just do store. All x86 > and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept > early Alphas CPUs. Let the compiler and CPU designers figure out how to > read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches > breakage from sv_flags changes, and bizzare platforms.
I understand the idea, and I agree with the general idea of being explicit in intent and relying on the compiler to translate it to hardware instructions. However, this patch includes the following tweak to sv.c: - SvFLAGS(sv) &= ~SVTYPEMASK; - SvFLAGS(sv) |= new_type; + *U32PBYTE0(&SvFLAGS(sv)) = new_type; I have to admit that I actually find the old way easier to read. Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC, the old way actually gave a slightly smaller object file. I would not be at all surprised if the reverse were true on other systems. I did not test speed at all. -- Andy Dougherty doughera@lafayette.edu
Date: Thu, 2 Oct 2014 18:28:57 +0200
From: Jarkko Hietaniemi <jhi [...] iki.fi>
Subject: Re: [perl #122872] [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros
CC: Perl5 Porters <perl5-porters [...] perl.org>
To: Andy Dougherty <doughera [...] lafayette.edu>
Download (untitled) / with headers
text/plain 2.4k
I have to agree with Andy.  The new code is harder to understand.

And I think the new code goes exactly against the stated goal...

"Let the compiler and CPU designers figure out how to read/write 1 byte in a 32/64/128 bit cpu"

As far as I can tell the patch is microoptimizing for the speed of one particular CPU (and one particular compiler).  And while doing that, pessimizing for readability, and all other CPU/compiler combos.


On Thu, Oct 2, 2014 at 5:57 PM, Andy Dougherty <doughera@lafayette.edu> wrote:
Show quoted text
On Tue, Sep 30, 2014 at 03:34:00PM -0700, bulk88 wrote:
> # New Ticket Created by  bulk88
> # Please include the string:  [perl #122872]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org/Ticket/Display.html?id=122872 >
>
>
> This is a bug report for perl from bulk88@hotmail.com,
> generated with the help of perlbug 1.40 running under perl 5.21.4.

> >From 7e4334762f61048c329fe2d50bf0c62d85648e8e Mon Sep 17 00:00:00 2001
> From: Daniel Dragan <bulk88@hotmail.com>
> Date: Tue, 30 Sep 2014 18:30:26 -0400
> Subject: [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add
>  U32PBYTE macros
>
> Change setting of SV's new type in sv_flags directly using 1 byte
> assignment to avoid a data dependency on a load, atleast on software level.
> Instead of load, AND, OR, store machine code ops. Just do store. All x86
> and ARM CPUs, and other Perl CPUs have 1 byte memory reads/writes execept
> early Alphas CPUs. Let the compiler and CPU designers figure out how to
> read/write 1 byte in a 32/64/128 bit cpu. Make an assert that catches
> breakage from sv_flags changes, and bizzare platforms.

I understand the idea, and I agree with the general idea of being
explicit in intent and relying on the compiler to translate it to hardware
instructions.  However, this patch includes the following tweak to sv.c:

    -    SvFLAGS(sv) &= ~SVTYPEMASK;
    -    SvFLAGS(sv) |= new_type;
    +    *U32PBYTE0(&SvFLAGS(sv)) = new_type;

I have to admit that I actually find the old way easier to read.

Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC,
the old way actually gave a slightly smaller object file.  I would not
be at all surprised if the reverse were true on other systems.  I did
not test speed at all.

--
    Andy Dougherty              doughera@lafayette.edu



--
There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 830b
On Thu Oct 02 08:58:11 2014, doughera wrote: Show quoted text
> I understand the idea, and I agree with the general idea of being > explicit in intent and relying on the compiler to translate it to > hardware > instructions. However, this patch includes the following tweak to > sv.c: > > - SvFLAGS(sv) &= ~SVTYPEMASK; > - SvFLAGS(sv) |= new_type; > + *U32PBYTE0(&SvFLAGS(sv)) = new_type; > > I have to admit that I actually find the old way easier to read.
How many times a year does someone read it? I could add a comment. Show quoted text
> > Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC, > the old way actually gave a slightly smaller object file. I would not > be at all surprised if the reverse were true on other systems.
Can you attach before and after .o'es or ELFs/exec binaries? -- bulk88 ~ bulk88 at hotmail.com
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.6k
On Thu Oct 02 15:28:09 2014, bulk88 wrote: Show quoted text
> On Thu Oct 02 08:58:11 2014, doughera wrote:
> > I understand the idea, and I agree with the general idea of being > > explicit in intent and relying on the compiler to translate it to > > hardware > > instructions. However, this patch includes the following tweak to > > sv.c: > > > > - SvFLAGS(sv) &= ~SVTYPEMASK; > > - SvFLAGS(sv) |= new_type; > > + *U32PBYTE0(&SvFLAGS(sv)) = new_type; > > > > I have to admit that I actually find the old way easier to read.
> > How many times a year does someone read it? I could add a comment. >
> > > > Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC, > > the old way actually gave a slightly smaller object file. I would not > > be at all surprised if the reverse were true on other systems.
> > Can you attach before and after .o'es or ELFs/exec binaries?
Forgot to mention, machine code size wasn't exactly the point of this commit, that is why I didn't collect it. I inlined something that was previously a function call on some platforms (where SVt_NV has an arena body). Making SV upgrading to SVt_NV not have additional function calls on the most common path of upgrading to SVt_NV was the main point of the commit. " *U32PBYTE0(&SvFLAGS(sv)) = new_type;" was atonement for the inlining, plus it was done before the inlining/API/DOCs cleaning up was finished. The U32PBYTE0 compiled to -------------------------------------------- *U32PBYTE0(&SvFLAGS(sv)) = new_type; -------------------------------------------- mov eax, [ebp+new_type]; means read C stack var new_type ;removed irrelevant instructions for the upcoming switch() mov [esi+8], al ;register al is lowest byte of reg eax, esi is SV* -------------------------------------------- previously on VC 2003 the code was -------------------------------------------- SvFLAGS(sv) &= ~SVTYPEMASK; SvFLAGS(sv) |= new_type; -------------------------------------------- mov eax, [ebp+new_type] ;removed mov byte ptr [esi+8], 0 ;esi is SV* ;removed or [esi+8], eax -------------------------------------------- Mingw GCC Strawberry 5.18 -O2, old way ------------------------------------------- mov esi, [esp+7Ch+new_type] ;removed, lots of branching in here mov edx, [ebx+8] ;removed, lots of branching in here ;removed mov eax, edx ;move flags to eax xor al, al ;zero the register or eax, esi ;note GCC is a RISC compiler at its heart, and wont use CISC style instructions on x86 ;removed mov [ebx+8], eax ;write to SV head --------------------------------------------- Should I produce ARM or PARISC binaries to prove my point? -- bulk88 ~ bulk88 at hotmail.com
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 532b
On Thu Oct 02 16:28:54 2014, bulk88 wrote: Show quoted text
> Should I produce ARM or PARISC binaries to prove my point?
PARISC old way ldw 8(%r4), %r31 ; read 32 bits at addr reg r4+8 into reg r32 depwi 0, 31, 8, %r31 ;"deposit word immediate", write 0 into bits 0 to 8 of reg r32?? confusing description in pdf or %r6, %r31, %rp ;reg r6 has new_type in it, r31 is masked sv_flags, write result into reg rp stw %rp, 8(%r4);write reg rp to sv head at addr reg r4+8 -- bulk88 ~ bulk88 at hotmail.com
Subject: Re: [perl #122872] [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros
Date: Fri, 3 Oct 2014 14:12:27 -0400
From: Andy Dougherty <doughera [...] lafayette.edu>
CC: perl5-porters [...] perl.org
To: bulk88 via RT <perlbug-followup [...] perl.org>
Download (untitled) / with headers
text/plain 2.1k
On Thu, Oct 02, 2014 at 03:28:09PM -0700, bulk88 via RT wrote: Show quoted text
> On Thu Oct 02 08:58:11 2014, doughera wrote:
> > I understand the idea, and I agree with the general idea of being > > explicit in intent and relying on the compiler to translate it to > > hardware > > instructions. However, this patch includes the following tweak to > > sv.c: > > > > - SvFLAGS(sv) &= ~SVTYPEMASK; > > - SvFLAGS(sv) |= new_type; > > + *U32PBYTE0(&SvFLAGS(sv)) = new_type; > > > > I have to admit that I actually find the old way easier to read.
> > How many times a year does someone read it? I could add a comment.
I would still prefer the older way. Among other things, the U32PBYTE0 approach assumes that new_type will always be stored in byte 0 of a U32-sized structure, and that SVTYPEMASK will always be 0xff. (Those are probably reasonable assumptions, but it did slow me down to go and check them.) Show quoted text
> > Further, both on gcc-4.7.2 on Linux/amd64, and Solaris CC on SPARC, > > the old way actually gave a slightly smaller object file. I would not > > be at all surprised if the reverse were true on other systems.
> > Can you attach before and after .o'es or ELFs/exec binaries?
I don't have them handy, and regenerating them would take a bit of effort (the system where I tested is currently offline). Here's the short summary: the simple one-line diff in sv.c: - SvFLAGS(sv) &= ~SVTYPEMASK; SvFLAGS(sv) |= new_type; + *U32PBYTE0(&SvFLAGS(sv)) = new_type; led to over 1300 non-cosmetic changed lines in the assembly output file. (I removed trivial things like different label numbers.) Yes, it is true that the SPARC assembler has an 'stb' directive to store a single byte, but changing just that one line of C code caused the compiler to change lots of other things as well, such as which registers were used in what order and for what purpose. The net change in size was tiny, and, as I observed, I wouldn't be at all surprised if the change was in the opposite direction on other systems. Absent any data indicating it's a meaningful improvement in size or speed, I was merely registering my preference to keep it the way it was. -- Andy Dougherty doughera@lafayette.edu
To: Andy Dougherty <doughera [...] lafayette.edu>, bulk88 via RT <perlbug-followup [...] perl.org>
Date: Fri, 03 Oct 2014 22:27:34 -0600
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #122872] [PATCH] fix SV body docs in sv.c & PURIFY, cleanup body API, add U32PBYTE macros
CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 594b
On 10/03/2014 12:12 PM, Andy Dougherty wrote: Show quoted text
>> How many times a year does someone read it? I could add a comment.
> I would still prefer the older way. Among other things, the U32PBYTE0 > approach assumes that new_type will always be stored in byte 0 > of a U32-sized structure, and that SVTYPEMASK will always be 0xff. > (Those are probably reasonable assumptions, but it did slow me down to > go and check them.) > >
I agree with Andy here. As Jarkko said somewhat recently, older and wiser than before, he should all along have been coding with maintainability as the first priority.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 507b
On Fri Oct 03 11:13:05 2014, doughera wrote: Show quoted text
> I would still prefer the older way. Among other things, the U32PBYTE0 > approach assumes that new_type will always be stored in byte 0 > of a U32-sized structure, and that SVTYPEMASK will always be 0xff. > (Those are probably reasonable assumptions, but it did slow me down to > go and check them.) >
I will revise it so the internals of how to assign a type to an SV head is hidden and move the assert to somewhere else. -- bulk88 ~ bulk88 at hotmail.com
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 10.8k
On Sat Oct 04 00:58:36 2014, bulk88 wrote: Show quoted text
> I will revise it so the internals of how to assign a type to an SV > head is hidden and move the assert to somewhere else.
New patch attached. Details of setting the type of an SV were placed behind an api/macro. On ARM, setting SV type was 4 instructions previously, now it is 1 instruction after the patch. The ARM code from EVC 4 -01 this area of code, I am not sure if "new_type_details = bodies_by_type + new_type;" is in the branch or optimizer moved it somewhere else ------------------------------------------------- new_type_details = bodies_by_type + new_type; SvTYPE_set_mem(sv, new_type); /* This can't happen, as SVt_NULL is <= all values of new_type, so one of the return statements above will have triggered. */ assert (new_type != SVt_NULL); switch (new_type) { case SVt_IV: ------------------------------------------------- */star next to instruction address means sv type setting related instructions old ------------------------------------------------- 280CAFF0 MOV LR, #0 ; U32 null = 0x00000000; ......... ;start of branch 280CB0E4* STRB LR, [R11,#8] ; *(U8*)(R11+sv_flags) = (U8)null; 280CB0E8 MOV R3, R7,LSL#3 280CB0EC* LDR R0, [R11,#8] ; U32 old_sv_flags = *(U32*)(R11+sv_flags); 280CB0F0 SUB R2, R7, #1 280CB0F4 CMP R2, #0xE 280CB0F8* ORR R1, R0, R7 ; new_sv_flags = old_sv_flags | new_type; 280CB0FC ADD R6, R3, R8 280CB100* STR R1, [R11,#8] ; *(U32*)(R11+sv_flags) = new_sv_flags; 280CB104 BHI loc_280CB428 ; jump if higher than unsigned 280CB108 MOV R0, R2,LSL#1 280CB10C ADD R0, R0, PC 280CB110 LDRH R0, [R0,#4] ; R0 = *(U16*)(R0+4); 280CB114 ADD PC, PC, R0 ;instruction_pointer += R0; //jump table 280CB118 ; an inline in machine code U16 array lives here for the jump table ............. 280CB428 loc_280CB428: 280CB428 LDR R0, ="panic: sv_upgrade to unknown type %lu" 280CB42C MOV R1, R7 ; arg_2 = new_type; 280CB430 BL Perl_croak ------------------------------------------------- new ------------------------------------------------- ;start of branch 280CB0EC MOV R2, R7,LSL#3 280CB0F0* STRB R7, [R11,#8] ; store byte in R7(new_type) to *(U8*)(R11+sv_flags) 280CB0F4 SUB R1, R7, #1 280CB0F8 AND R0, R7, #0xFF ; PURPOSE UNKNOWN, not in old asm, R0's val never used again compiler bug? 280CB0FC ADD R6, R2, R8 280CB100 CMP R1, #0xE 280CB104 BHI loc_280CB44C ;jump if higher than unsigned 280CB108 MOV R0, R1,LSL#1 ; R0 is wiped 280CB10C ADD R0, R0, PC 280CB110 LDRH R0, [R0,#4] ; R0 = *(U16*)(R0+4); 280CB114 ADD PC, PC, R0 ; instruction_pointer += R0; //jump table 280CB118 ; an inline in machine code U16 array lives here for the jump table ............. 280CB44C loc_280CB44C: 280CB44C LDR R0, ="panic: sv_upgrade to unknown type %lu" ; R0 is wiped 280CB450 MOV R1, R7 ; arg_2 = new_type; 280CB454 BL Perl_croak ------------------------------------------------- In any case, how is STRB implemented on ARM? Obviously I dont have access to a commercial ARM core's source code, so a FOSS ARM CPU must do. ------------------------------------------------ assign store_op = mem_op && !instruction[20]; ................................................ // Load & Store instructions if ( mem_op ) begin saved_current_instruction_wen = 1'd1; // Save the memory access instruction to refer back to later pc_wen_nxt = 1'd0; // hold current PC value data_access_exec_nxt = 1'd1; // indicate that its a data read or write, // rather than an instruction fetch alu_out_sel_nxt = 4'd1; // Add if ( !instruction[23] ) // U: Subtract offset begin alu_cin_sel_nxt = 2'd1; // cin = 1 alu_not_sel_nxt = 1'd1; // invert B end //bulk88 says bit 20 is difference between load and store, that was set earlier if ( store_op ) begin write_data_wen_nxt = 1'd1; //bulk88 says bit 22 in ARM op is byte or 32 bit flag, if 1, then this is byte ARM op //bulk88 says TRANS the ARM instruction prefix that says this is a load/store class op //bulk88 says STR, STRB, LDR, LDRB all have TRANS prefix if ( type == TRANS && instruction[22] ) byte_enable_sel_nxt = 2'd1; // Save byte end // need to update the register holding the address ? // This is Rn bits [19:16] //bulk88 says pre-indexed/post-indexed is a feature where in a load/store addr reg operand is changed during the op //bulk88 says this makes incrementing a array pointer during a loop be part of the array deref op //bulk88 says can be "reg_dst = *(reg_src = reg_src+const_offset)" //bulk88 says can be "reg_dst = *((reg_invisible = reg_src), (reg_src = reg_src+const_offset), (reg_invisible))" //bulk88 says or normal idea of a move "reg_dst = *(reg_src+const_offset)" if ( mem_op_pre_indexed || mem_op_post_indexed ) begin // Check is the load destination is the PC if ( o_rn_sel_nxt == 4'd15 ) pc_sel_nxt = 2'd1; else reg_bank_wsel_nxt = o_rn_sel_nxt; end // if post-indexed, then use Rn rather than ALU output, as address if ( mem_op_post_indexed ) address_sel_nxt = 4'd4; // Rn else address_sel_nxt = 4'd1; // alu out if ( instruction[25] && type == TRANS ) barrel_shift_data_sel_nxt = 2'd2; // Shift value from Rm register if ( type == TRANS && instruction[25] && shift_imm != 5'd0 ) begin barrel_shift_function_nxt = instruction[6:5]; barrel_shift_amount_sel_nxt = 2'd2; // imm_shift_amount end end ..................................... wire [31:0] write_data_word; ...................................... input [3:0] i_byte_enable, ...................................... wire [CACHE_LINE_WIDTH-1:0] write_hit_wdata; ...................................... wire [31:0] rd; ...................................... // bulk88 says, this takes lowest 2 bits for U8s, or 2nd to lowest bit for U16s, of // the output address, and picks a concat opcode to use later // Note, this statement supports U16 internal to CPU opcode, but this CPU doesn't // support LDRH or STRH machine code instructions since // this is a ARM V2 CPU, U16 support was added in ARM V4. Orig comments ahead. // ======================================================== // Byte Enable Select // ======================================================== assign byte_enable_nxt = i_byte_enable_sel == 2'd0 ? 4'b1111 : // word write i_byte_enable_sel == 2'd2 ? // halfword write ( o_address_nxt[1] == 1'd0 ? 4'b0011 : 4'b1100 ) : o_address_nxt[1:0] == 2'd0 ? 4'b0001 : // byte write o_address_nxt[1:0] == 2'd1 ? 4'b0010 : o_address_nxt[1:0] == 2'd2 ? 4'b0100 : 4'b1000 ; //bulk88 says this statment says, if its a byte op, take lowest byte, copy it to all 4 bytes of the word //bulk88 says this later allows the concating to work, in that no matter which byte is selected of the 4 //bulk88 says bytes to be spliced into the output U32 word, they will all be the same, note this statment //bulk88 says is not short/U16 compatible // ======================================================== // Write Data Select // ======================================================== assign write_data_nxt = i_byte_enable_sel == 2'd0 ? rd : {4{rd[ 7:0]}} ; ...................................... //bulk88 says {} is concat operator, this expression picks then concats/fuses //the U8s, U16s or U32 that make up the output U32, using the opcode from above //NOTICE it does not use "&" then "|" to get the job done the way Perl was doing it. Orig comments ahead. // Use Byte Enables assign write_data_word = i_byte_enable == 4'b0001 ? { o_read_data[31: 8], i_write_data[ 7: 0] } : i_byte_enable == 4'b0010 ? { o_read_data[31:16], i_write_data[15: 8], o_read_data[ 7:0]} : i_byte_enable == 4'b0100 ? { o_read_data[31:24], i_write_data[23:16], o_read_data[15:0]} : i_byte_enable == 4'b1000 ? { i_write_data[31:24], o_read_data[23:0]} : i_byte_enable == 4'b0011 ? { o_read_data[31:16], i_write_data[15: 0] } : i_byte_enable == 4'b1100 ? { i_write_data[31:16], o_read_data[15:0]} : i_write_data ; ..................................... assign write_hit_wdata = i_address[3:2] == 2'd0 ? {hit_rdata[127:32], write_data_word } : i_address[3:2] == 2'd1 ? {hit_rdata[127:64], write_data_word, hit_rdata[31:0] } : i_address[3:2] == 2'd2 ? {hit_rdata[127:96], write_data_word, hit_rdata[63:0] } : { write_data_word, hit_rdata[95:0] } ; ..................................... // Data comes in off the WB bus in wrap4 with the missed data word first assign data_wdata = write_hit && c_state == CS_IDLE ? write_hit_wdata : read_miss_wdata; ------------------------------------------------ So "STRB" is implemented in hardware, not extra opcodes. algorithm is if byte write, take src register, take low byte, copy to bytes 1-3 of src register, take address, use low 2 bits to figure out alignment, then splice as 2 32 bit values, then splice 32 bit value into 16 byte cache line -- bulk88 ~ bulk88 at hotmail.com
Subject: 0001-fix-SV-body-docs-in-sv.c-PURIFY-cleanup-body-API-add.patch

Message body is not shown because it is too large.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.3k
On Fri Oct 10 19:37:11 2014, bulk88 wrote: Show quoted text
> On Sat Oct 04 00:58:36 2014, bulk88 wrote:
> > I will revise it so the internals of how to assign a type to an SV > > head is hidden and move the assert to somewhere else.
> > New patch attached. Details of setting the type of an SV were placed > behind an api/macro.
Whatever the macro is called, it's basically a hack to work around perl historically using a U32 with masks instead of a struct with bit fields. I'm still not fond of it. Looking at the documentation part of the patch, I see: +no malloc. -In turn, the new_body_* allocators call S_new_body(), which invokes -new_body_inline macro, which takes a lock, and takes a body off the -linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if -necessary to refresh an empty list. Then the lock is released, and -the body is returned. +S_new_body(), which invokes new_body_inline macro. The new_body_inline macro +takes a body off the linked list at PL_body_roots[sv_type], calling Perl_more_bodies() if +necessary to refresh an empty list, and then the body is returned. which looks like something is missing from before S_new_body(). It looks like you've got at least three different changes in this patch: - the documentation update - byte access to flags - changes to the way the body is allocated. It would be preferable to see them as separate patches. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org