Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bleadperl v5.17.3-255-g6502e08 breaks SARTAK/Path-Dispatcher-1.04.tar.gz #12395

Closed
p5pRT opened this issue Sep 10, 2012 · 66 comments
Closed

Bleadperl v5.17.3-255-g6502e08 breaks SARTAK/Path-Dispatcher-1.04.tar.gz #12395

p5pRT opened this issue Sep 10, 2012 · 66 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 10, 2012

Migrated from rt.perl.org#114820 (status was 'resolved')

Searchable as RT114820$

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @andk

git bisect


commit 6502e08
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

  Don't copy all of the match string buffer

diagnostics


Note​: Any​::Moose was picking Mouse 1.02 here because current Moose
2.0603 is broken with current bleadperl (see #114628)

[...]
Can't call method "parent" on an undefined value at t/031-structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run
[...]
t/031-structured-match.t (Wstat​: 65280 Tests​: 0 Failed​: 0)
  Non-zero exit status​: 255
  Parse errors​: No plan found in TAP output
[...]

perl -V


--
andreas

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @iabyn

On Sun, Sep 09, 2012 at 07​:31​:26PM -0700, Andreas J. Koenig via RT wrote​:

----------
commit 6502e08
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

  $extra{leftover} = eval q{$'};

which expects to be able to execute a regex where $' hasn't been seen yet,
then afterwards to be able to retrieve its value. This used to happen to
work, but can't be relied upon (and no longer works).

--
Art is anything that has a label (especially if the label is "untitled 1")

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @cpansprout

On Mon Sep 10 08​:29​:17 2012, davem wrote​:

On Sun, Sep 09, 2012 at 07​:31​:26PM -0700, Andreas J. Koenig via RT
wrote​:

----------
commit 6502e08
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-
structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

$extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen
yet,
then afterwards to be able to retrieve its value. This used to happen
to
work, but can't be relied upon (and no longer works).

I have long considered it a bug that eval '$&' doesn’t work if the three
forbidden match vars have not been seen.

That you have extended the bug further worries me.

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

An idea I had, which I started to explore a year ago, but never quite
finished, was to force a COW ‘copy’ and, following the example of
PERL_OLD_COPY_ON_WRITE, store an SV in the regexp struct. (I never got
it to work completely, and, code freeze fast approaching, I decided to
shelve it for a short while, which turned into a long while.)

That in itself might speed up the cases you sped up differently, if we
assume that strings are rarely modified right after a match.

See <https://rt-archive.perl.org/perl5/Ticket/Display.html?id=37038#txn-1060310>.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From p5p@sartak.org

Dave Mitchell wrote​:

On Sun, Sep 09, 2012 at 07​:31​:26PM -0700, Andreas J. Koenig via RT wrote​:

----------
commit 6502e08
Author​: David Mitchell<davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

 Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

 $extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen yet,
then afterwards to be able to retrieve its value. This used to happen to
work, but can't be relied upon (and no longer works).

Hi,

Author of the module in question here.

This line of code is, of course, simply an optimization. I have no
qualms switching that to this less sneaky implementation​:

  $extra{leftover} = $';

Unfortunately grep.cpan.me seems to be down, so I can't investigate how
common a pattern eval q{$'} etc are.

Shawn

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @iabyn

On Mon, Sep 10, 2012 at 02​:06​:53PM -0400, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is, of course, simply an optimization. I have no
qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $', wouldn't it be better
to just modify specific regexps to have '(.*)$' at the end?

Unfortunately grep.cpan.me seems to be down, so I can't investigate
how common a pattern eval q{$'} etc are.

Hopefully not too many :-(

--
I've often wanted to drown my troubles, but I can't get my wife to go
swimming.

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @doy

On Mon, Sep 10, 2012 at 08​:52​:14PM +0100, Dave Mitchell wrote​:

On Mon, Sep 10, 2012 at 02​:06​:53PM -0400, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is, of course, simply an optimization. I have no
qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $', wouldn't it be better
to just modify specific regexps to have '(.*)$' at the end?

It's matching against user-specified regexes, so no. On the other hand,
using $' directly for 5.8 and switching to use /p and ${^POSTMATCH} for
5.10+ would be doable I think?

-doy

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @iabyn

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

An idea I had, which I started to explore a year ago, but never quite
finished, was to force a COW ‘copy’ and, following the example of
PERL_OLD_COPY_ON_WRITE, store an SV in the regexp struct. (I never got
it to work completely, and, code freeze fast approaching, I decided to
shelve it for a short while, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

--
You're only as old as you look.

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From dennis.kaarsemaker@booking.com

On ma, 2012-09-10 at 14​:06 -0400, Shawn M Moore wrote​:

Unfortunately grep.cpan.me seems to be down, so I can't investigate
how common a pattern eval q{$'} etc are.

Unique, it seems​:

[dkaarsemaker@​llama ~]$ csearch "q{\\s*\\\$'\\s*}" 2>/dev/null
/scratch/cpan/Benchmark-Perl-Formance-Cargo/share/PerlCritic/Critic/Policy/Variables/ProhibitPunctuationVars.pm​: ( q{$'}, q{$$}, q{$#}, q{$​:}, ); ## no critic ( RequireInterpolationOfMetachars, ProhibitQuotedWordLists )
/scratch/cpan/Path-Dispatcher/lib/Path/Dispatcher/Rule/Regex.pm​: $extra{leftover} = eval q{$'};
/scratch/cpan/Perl-Critic/lib/Perl/Critic/Policy/Variables/ProhibitPunctuationVars.pm​: ( q{$'}, q{$$}, q{$#}, q{$​:}, ); ## no critic ( RequireInterpolationOfMetachars, ProhibitQuotedWordLists )

--
Dennis Kaarsemaker, Systems Architect
Booking.com
Herengracht 597, 1017 CE Amsterdam
Tel external +31 (0) 20 715 3409
Tel internal (7207) 3409

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @iabyn

On Mon, Sep 10, 2012 at 02​:56​:44PM -0500, Jesse Luehrs wrote​:

On Mon, Sep 10, 2012 at 08​:52​:14PM +0100, Dave Mitchell wrote​:

On Mon, Sep 10, 2012 at 02​:06​:53PM -0400, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is, of course, simply an optimization. I have no
qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $', wouldn't it be better
to just modify specific regexps to have '(.*)$' at the end?

It's matching against user-specified regexes, so no.

Why not $user_regex = qr/$user_regex(.*)$/;

On the other hand,
using $' directly for 5.8 and switching to use /p and ${^POSTMATCH} for
5.10+ would be doable I think?

That would be better than nothing.

--
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
  -- Life of Brian

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @cpansprout

On Mon Sep 10 13​:02​:21 2012, davem wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT
wrote​:

An idea I had, which I started to explore a year ago, but never quite
finished, was to force a COW ‘copy’ and, following the example of
PERL_OLD_COPY_ON_WRITE, store an SV in the regexp struct. (I never got
it to work completely, and, code freeze fast approaching, I decided to
shelve it for a short while, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

I think it is perfectly viable. If we revive the PERL_OLD_COPY_ON_WRITE
scheme, we can​:

• Upgrade any scalar to PVIV, or even discard a cached integer, if the
string is long enough that benchmarks show the copying to be slower than
the bookkeeping overhead (or use SvLEN instead for storing the pointer);
• Prefer swipe over cow whenever possible, since we know it’s faster.

On Aug 3, 2012, at 8​:14 AM, Nicholas Clark wrote​:

So I gave up on the whole thing, because even though it seemed about
the most
lightweight approach possible, I still couldn't get it to the point
where it
was measurably faster than not doing it

I think the biggest gain would be with large strings. If we do simple
benchmarks to find the threshold where the copy outweighs the
bookkeeping overhead, we can make this work well.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 10, 2012

From @cpansprout

On Mon Sep 10 15​:52​:14 2012, sprout wrote​:

On Aug 3, 2012, at 8​:14 AM, Nicholas Clark wrote​:

So I gave up on the whole thing, because even though it seemed about
the most
lightweight approach possible, I still couldn't get it to the point
where it
was measurably faster than not doing it

I think the biggest gain would be with large strings. If we do simple
benchmarks to find the threshold where the copy outweighs the
bookkeeping overhead, we can make this work well.

And I was going to say​: This may not speed up most code in general, but
where it really matters (large strings) it will.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 11, 2012

From @iabyn

On Mon, Sep 10, 2012 at 03​:52​:15PM -0700, Father Chrysostomos via RT wrote​:

On Mon Sep 10 13​:02​:21 2012, davem wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT
wrote​:

An idea I had, which I started to explore a year ago, but never quite
finished, was to force a COW ‘copy’ and, following the example of
PERL_OLD_COPY_ON_WRITE, store an SV in the regexp struct. (I never got
it to work completely, and, code freeze fast approaching, I decided to
shelve it for a short while, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

From Nicholas's recent description of his COW work​:

Message-ID​: <20120803151426.GL9583@​plum.flirble.org>
Plus also one test still failed in Compress​::Zlib, which I couldn't work out
why, and so I feared that there was some subtle assumption that I'd missed
that might cause other CPAN code to break. (Possibly at runtime, possibly in
interestingly subtle ways). At some point Artur Bergman made the point that
CPAN code would reasonably expect that if it copied something read only, the
copy would not be. But (obviously) a COW copy has to act read only. Which
troubled me. (The idea only came later of using #ifdef PERL_CORE in the
headers to avoid doing COW copies in XS code, after I'd concluded that I
couldn't get (general) COW to be fast enough.)

So I gave up on the whole thing, because even though it seemed about the most
lightweight approach possible, I still couldn't get it to the point where it
was measurably faster than not doing it, and it added complexity to the
(runtime) code, and it looked like it had a risk of subtly breaking CPAN code.
This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
  -- Things That Never Happen in "Star Trek" #7

@p5pRT
Copy link
Author

p5pRT commented Sep 11, 2012

From @cpansprout

On Tue Sep 11 06​:04​:53 2012, davem wrote​:

On Mon, Sep 10, 2012 at 03​:52​:15PM -0700, Father Chrysostomos via RT
wrote​:

On Mon Sep 10 13​:02​:21 2012, davem wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via
RT
wrote​:

An idea I had, which I started to explore a year ago, but never
quite
finished, was to force a COW ‘copy’ and, following the example
of
PERL_OLD_COPY_ON_WRITE, store an SV in the regexp struct. (I
never got
it to work completely, and, code freeze fast approaching, I
decided to
shelve it for a short while, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

From Nicholas's recent description of his COW work​:

Message-ID​: <20120803151426.GL9583@​plum.flirble.org>
Plus also one test still failed in Compress​::Zlib, which I couldn't
work out
why, and so I feared that there was some subtle assumption that I'd
missed
that might cause other CPAN code to break. (Possibly at runtime,
possibly in
interestingly subtle ways). At some point Artur Bergman made the
point that
CPAN code would reasonably expect that if it copied something read
only, the
copy would not be. But (obviously) a COW copy has to act read only.
Which
troubled me. (The idea only came later of using #ifdef PERL_CORE in
the
headers to avoid doing COW copies in XS code, after I'd concluded
that I
couldn't get (general) COW to be fast enough.)

So I gave up on the whole thing, because even though it seemed about
the most
lightweight approach possible, I still couldn't get it to the point
where it
was measurably faster than not doing it, and it added complexity to
the
(runtime) code, and it looked like it had a risk of subtly breaking
CPAN code.
This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the
headers to avoid doing COW copies in XS code, after I'd concluded that I
couldn't get (general) COW to be fast enough.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 17, 2012

From @nwc10

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

On Mon Sep 10 08​:29​:17 2012, davem wrote​:

On Sun, Sep 09, 2012 at 07​:31​:26PM -0700, Andreas J. Koenig via RT
wrote​:

----------
commit 6502e08
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-
structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

$extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen
yet,
then afterwards to be able to retrieve its value. This used to happen
to
work, but can't be relied upon (and no longer works).

I have long considered it a bug that eval '$&' doesn't work if the three
forbidden match vars have not been seen.

Me too.

That you have extended the bug further worries me.

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

But I wasn't confident of this part.

After all, I fixed C<study> so that it could work on more than one string.
Only for C<study> to be disabled, and then removed.
And I didn't feel particularly bad about it.

On Tue, Sep 11, 2012 at 12​:41​:25PM -0700, Father Chrysostomos via RT wrote​:

On Tue Sep 11 06​:04​:53 2012, davem wrote​:

On Mon, Sep 10, 2012 at 03​:52​:15PM -0700, Father Chrysostomos via RT
wrote​:

So I gave up on the whole thing, because even though it seemed about
the most
lightweight approach possible, I still couldn't get it to the point
where it
was measurably faster than not doing it, and it added complexity to
the
(runtime) code, and it looked like it had a risk of subtly breaking
CPAN code.
This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the
headers to avoid doing COW copies in XS code, after I'd concluded that I
couldn't get (general) COW to be fast enough.)

It still doesn't solve the test failures in one of the Compress​::Zlib modules.
(Didn't check thoroughly as to why, but one still failed when I got
PERL_OLD_COPY_ON_WRITE compiling again a couple of days ago)

In particular, I remember the frustration that led to this comment in
Perl_sv_gets​:

  /* XXX. If you make this PVIV, then copy on write can copy scalars read
  from <>.
  However, perlbench says it's slower, because the existing swipe code
  is faster than copy on write.
  Swings and roundabouts. */
  SvUPGRADE(sv, SVt_PV);

The swipe code seems to be very effective. However, I don't totally like it
because it relies on SvTEMP() meaning "I can steal the buffer". Which isn't
always true.

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

To avoid moving memory around when truncating strings this permits SvPVX()
to point within the malloc()ed buffer, storing an offset to the true buffer
start. Historically, this was done with SvIVX(). It occurred to me that
using extra memory was wasteful, as we had free memory in the buffer itself.
So currently the scheme is

  SvPVX()
  v
  ...*Your string is here
  ^
  a flag byte

If the flag byte is non-zero, then it's the offset in bytes back to the
buffer start. If it's zero, then the sizeof(STRLEN) bytes before give that
offset. Thus it can store an offset from 1 to STRLEN.

I wondered if that scheme can change. It stays as​:

  SvPVX()
  v
  ...*Your string is here
  ^
  a flag byte

but the flag byte is split in two in some fashion. Possibly, low 7 bits
are a reference count for the allocated buffer. Top bit is 0 if there's
no further offset to the start of the allocated buffer. ie

  SvPVX()
  v
  *Your string is here
  ^
  reference count
  ^ start of allocated buffer

and is 1 if there's the rest of the OOK structure earlier​:

  SvPVX()
  v
  ...*Your string is here
  ^
  reference count
  ^ OOK 1 byte offset, and maybe a STRLEN before it.
  ^ start of allocated buffer

I think only a small amount of core code would need to change to accommodate
this. (SvOOK_offset() and Perl_sv_backoff())
But it would permit a very cheap way to allocate PVs such that they could
be "copied" - the overhead would be just 1 byte.
And it feels much cheaper to undo this COW than the old scheme's pointer
loop chasing.

It might be cheap enough to use everywhere when creating strings, and thus
actually get rid of the swipe code.

I'm also wondering if it's complementary with the other COW approach.
That one doesn't need to move the contents of an existing buffer (so better
for long strings), but does need to upgrade the SV body to PVIV, and does
need to chase the pointer loop on release.

However, I'm still wary of the Compress​::Zlib test failure, as an indicator
that I didn't manage to anticipate every problem.

Also, I'm thinking that we ought to get rid of the (ab)use of SvREADONLY()
to mean two things - "This is a readonly value" and "you can't write to
the buffer that SvPVX() points to". It would be better to have two flags.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @cpansprout

On Mon Sep 17 14​:06​:19 2012, nicholas wrote​:

On Tue, Sep 11, 2012 at 12​:41​:25PM -0700, Father Chrysostomos via RT
wrote​:

On Tue Sep 11 06​:04​:53 2012, davem wrote​:

On Mon, Sep 10, 2012 at 03​:52​:15PM -0700, Father Chrysostomos via
RT
wrote​:

So I gave up on the whole thing, because even though it seemed
about
the most
lightweight approach possible, I still couldn't get it to the
point
where it
was measurably faster than not doing it, and it added complexity
to
the
(runtime) code, and it looked like it had a risk of subtly
breaking
CPAN code.
This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the
headers to avoid doing COW copies in XS code, after I'd concluded
that I
couldn't get (general) COW to be fast enough.)

It still doesn't solve the test failures in one of the Compress​::Zlib
modules.
(Didn't check thoroughly as to why, but one still failed when I got
PERL_OLD_COPY_ON_WRITE compiling again a couple of days ago)

In particular, I remember the frustration that led to this comment in
Perl_sv_gets​:

/\* XXX\. If you make this PVIV\, then copy on write can copy scalars

read
from <>.
However, perlbench says it's slower, because the existing swipe
code
is faster than copy on write.
Swings and roundabouts. */
SvUPGRADE(sv, SVt_PV);

The swipe code seems to be very effective. However, I don't totally
like it
because it relies on SvTEMP() meaning "I can steal the buffer". Which
isn't
always true.

Are you referring to the use of _nosteal here and there, or are you
unaware that the swipe code checks for a refcount of 1?

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

...snip details...

That is just brilliant!

But it would permit a very cheap way to allocate PVs such that they
could
be "copied" - the overhead would be just 1 byte.

That’s a lot smaller than allocating a shared_he (20 bytes on a 32-bit
machine), something I tried to do.

And it feels much cheaper to undo this COW than the old scheme's
pointer
loop chasing.

It might be cheap enough to use everywhere when creating strings, and
thus
actually get rid of the swipe code.

I'm also wondering if it's complementary with the other COW approach.
That one doesn't need to move the contents of an existing buffer

I may have misunderstood, but it doesn’t sound as though your 1-byte
flag method needs that either, if that extra byte is allocated at the
outset.

(so
better
for long strings), but does need to upgrade the SV body to PVIV, and
does
need to chase the pointer loop on release.

However, I'm still wary of the Compress​::Zlib test failure, as an
indicator
that I didn't manage to anticipate every problem.

It is probably doing something naughty. :-)

Also, I'm thinking that we ought to get rid of the (ab)use of
SvREADONLY()
to mean two things - "This is a readonly value" and "you can't write
to
the buffer that SvPVX() points to". It would be better to have two
flags.

What about existing XS code that checks SvREADONLY before modifying
SvPVX? (Is there any? Most XS code I’ve seen does a terrible job of
handling perl’s types.)

The flag space is rather crowded. Maybe it’s about time we made pad
names their own type. Their being SVs makes SVs themselves more
complicated. A dedicated pad name type would also allow the name to be
allocated as part of the same memory block, much like a hek.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @demerphq

On 17 September 2012 23​:05, Nicholas Clark <nick@​ccl4.org> wrote​:

To avoid moving memory around when truncating strings this permits SvPVX()
to point within the malloc()ed buffer, storing an offset to the true buffer
start. Historically, this was done with SvIVX(). It occurred to me that
using extra memory was wasteful, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not
using SvIVX() saves memory. Am I correct in understanding this means
we can do the OOK hack without upgrading to a SvPVIV? We can leave the
item as a SvPV?

Also a note for readers new to this subject​: "truncate" means "remove
the start of the string" in this context.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @nwc10

On Tue, Sep 18, 2012 at 11​:00​:08AM +0200, demerphq wrote​:

On 17 September 2012 23​:05, Nicholas Clark <nick@​ccl4.org> wrote​:

To avoid moving memory around when truncating strings this permits SvPVX()
to point within the malloc()ed buffer, storing an offset to the true buffer
start. Historically, this was done with SvIVX(). It occurred to me that
using extra memory was wasteful, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not
using SvIVX() saves memory. Am I correct in understanding this means
we can do the OOK hack without upgrading to a SvPVIV? We can leave the
item as a SvPV?

We do do the OOK hack without upgrading, as of (it seems) v5.12.0

$ perl5.10.0 -MDevel​::Peek -le '$_ = "half pint"; s/half //; Dump $_; print $]'
SV = PVIV(0x100803c10) at 0x100820f08
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  IV = 5 (OFFSET)
  PV = 0x1002023d5 ( "half " . ) "pint"\0
  CUR = 4
  LEN = 11
5.010000
$ perl -MDevel​::Peek -le '$_ = "half pint"; s/half //; Dump $_; print $]'
SV = PV(0x100801c78) at 0x10081ece8
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  OFFSET = 5
  PV = 0x100201e45 ( "half\5" . ) "pint"\0
  CUR = 4
  LEN = 11
5.012004

Contrast the IV = 5 (OFFSET) with the use of "\5" to store the offset.

Here's the change of storage format​:

$ perl -MDevel​::Peek -le '$_ = "N" x 256; s/.{255}//; Dump $_; print $]'
SV = PV(0x100801d78) at 0x10081ece8
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  OFFSET = 255
  PV = 0x100213cbf ( "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\377" . ) "N"\0
  CUR = 1
  LEN = 17
5.012004
$ perl -MDevel​::Peek -le '$_ = "N" x 257; s/.{256}//; Dump $_; print $]'
SV = PV(0x100801d78) at 0x10081ece8
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  OFFSET = 256
  PV = 0x100213cc0 ( "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\0\1\0\0\0\0\0\0\0" . ) "N"\0
  CUR = 1
  LEN = 16
5.012004

And this is the -DDEBUGGING version being paranoid that someone might
overwrite the unused buffer​:

$ ./perl -Ilib -MDevel​::Peek -le '$_ = "N" x 257; s/.{256}//; Dump $_; print $]'
SV = PV(0x100802e70) at 0x100825fc0
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  OFFSET = 256
  PV = 0x100608be0 ( "\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./0123456789​:;<=>?@​ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\0\1\0\0\0\0\0\0\0" . ) "N"\0
  CUR = 1
  LEN = 16
5.017004

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @iabyn

On Mon, Sep 17, 2012 at 10​:05​:34PM +0100, Nicholas Clark wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

But I wasn't confident of this part.

After all, I fixed C<study> so that it could work on more than one string.
Only for C<study> to be disabled, and then removed.
And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being
ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

So this would involve any string about to be COWed to need to be Moved
up 1 byte and possibly realloced?

One thing I don't yet understand about COW (either scheme) is how
you detect all possible writes. What's to stop some XS code code doing
something like​:

  if (SvPOK(sv) && SvCUR(sv))
  (SvPVX(sv))[0] = 'x';

--
Britain, Britain, Britain! Discovered by Sir Henry Britain in
sixteen-oh-ten. Sold to Germany a year later for a pfennig and the promise
of a kiss. Destroyed in eighteen thirty-forty two, and rebuilt a week
later by a man. This we know. Hello. But what of the people of Britain?
Who they? What do? And why? -- "Little Britain"

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @nwc10

On Tue, Sep 18, 2012 at 10​:38​:01AM +0100, Dave Mitchell wrote​:

On Mon, Sep 17, 2012 at 10​:05​:34PM +0100, Nicholas Clark wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

But I wasn't confident of this part.

After all, I fixed C<study> so that it could work on more than one string.
Only for C<study> to be disabled, and then removed.
And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being
ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

So this would involve any string about to be COWed to need to be Moved
up 1 byte and possibly realloced?

Yes. Which was why I suggested that

a) one runs with both available
b) new strings are allocatd 1 byte off
  [forgot to say - on how many platforms does this make a performance hit?
  On ARM, at least, there are hardcore optimised word-at-a-time routines
  for word aligned strings, and byte-by-byte for everything else]
c) existing long strings are COW-ed by the "upgrade to PVIV" approach
d) potentially things in the middle just get copied

One thing I don't yet understand about COW (either scheme) is how
you detect all possible writes. What's to stop some XS code code doing
something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
\(SvPVX\(sv\)\)\[0\] = 'x';

You can't, if they don't give an $expletive about the READONLY flag.
That is their bug. But BBC does go here, as does perlbug@​perl.org
The subtle problem that I missed but Artur spotted was ensuring that API
calls that "obviously" return a fresh clean new scalar (and hence writable)
actually don't change to returning a COW copy of something.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @demerphq

On 18 September 2012 12​:00, Nicholas Clark <nick@​ccl4.org> wrote​:

On Tue, Sep 18, 2012 at 10​:38​:01AM +0100, Dave Mitchell wrote​:

On Mon, Sep 17, 2012 at 10​:05​:34PM +0100, Nicholas Clark wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

But I wasn't confident of this part.

After all, I fixed C<study> so that it could work on more than one string.
Only for C<study> to be disabled, and then removed.
And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being
ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

So this would involve any string about to be COWed to need to be Moved
up 1 byte and possibly realloced?

Yes. Which was why I suggested that

a) one runs with both available
b) new strings are allocatd 1 byte off
[forgot to say - on how many platforms does this make a performance hit?
On ARM, at least, there are hardcore optimised word-at-a-time routines
for word aligned strings, and byte-by-byte for everything else]
c) existing long strings are COW-ed by the "upgrade to PVIV" approach
d) potentially things in the middle just get copied

One thing I don't yet understand about COW (either scheme) is how
you detect all possible writes. What's to stop some XS code code doing
something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
  \(SvPVX\(sv\)\)\[0\] = 'x';

You can't, if they don't give an $expletive about the READONLY flag.
That is their bug. But BBC does go here, as does perlbug@​perl.org
The subtle problem that I missed but Artur spotted was ensuring that API
calls that "obviously" return a fresh clean new scalar (and hence writable)
actually don't change to returning a COW copy of something.

Maybe a SvPVXW macro, which would check this and throw an error would
be a useful addition to future proof code to this problem?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @demerphq

On 18 September 2012 11​:11, Nicholas Clark <nick@​ccl4.org> wrote​:

On Tue, Sep 18, 2012 at 11​:00​:08AM +0200, demerphq wrote​:

On 17 September 2012 23​:05, Nicholas Clark <nick@​ccl4.org> wrote​:

To avoid moving memory around when truncating strings this permits SvPVX()
to point within the malloc()ed buffer, storing an offset to the true buffer
start. Historically, this was done with SvIVX(). It occurred to me that
using extra memory was wasteful, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not
using SvIVX() saves memory. Am I correct in understanding this means
we can do the OOK hack without upgrading to a SvPVIV? We can leave the
item as a SvPV?

We do do the OOK hack without upgrading, as of (it seems) v5.12.0

That is very cool. Thanks a lot for explaining, and I guess doing it.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @salva

On 09/18/2012 11​:38 AM, Dave Mitchell wrote​:

On Mon, Sep 17, 2012 at 10​:05​:34PM +0100, Nicholas Clark wrote​:

On Mon, Sep 10, 2012 at 09​:15​:06AM -0700, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this, but it sort of dashes any hopes
of getting that fixed. :-)

But I wasn't confident of this part.

After all, I fixed C<study> so that it could work on more than one string.
Only for C<study> to be disabled, and then removed.
And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being
ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

So this would involve any string about to be COWed to need to be Moved
up 1 byte and possibly realloced?

There are XS modules out there that place C structures on the PV slot
and rely on it being properly aligned.

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @nwc10

On Tue, Sep 18, 2012 at 12​:14​:45PM +0200, Salvador Fandino wrote​:

On 09/18/2012 11​:38 AM, Dave Mitchell wrote​:

On Mon, Sep 17, 2012 at 10​:05​:34PM +0100, Nicholas Clark wrote​:

But I had a thought about an alternative COW mechanism, exploiting the
OOK mechanism.

So this would involve any string about to be COWed to need to be Moved
up 1 byte and possibly realloced?

There are XS modules out there that place C structures on the PV slot
and rely on it being properly aligned.

Good point. I think that there are XS modules in ext/ which rely on that :-)

So it would only be sane to do this for new PVs that are explicitly known
to be strings.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @ap

* Nicholas Clark <nick@​ccl4.org> [2012-09-18 12​:05]​:

b) new strings are allocatd 1 byte off
[forgot to say - on how many platforms does this make a performance
hit? On ARM, at least, there are hardcore optimised word-at-a-time
routines for word aligned strings, and byte-by-byte for everything
else]

Just allocate it another 1 (or even 3) bytes off on platforms where that
is a concern?

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2012

From @Leont

On Tue, Sep 18, 2012 at 11​:38 AM, Dave Mitchell <davem@​iabyn.com> wrote​:

One thing I don't yet understand about COW (either scheme) is how
you detect all possible writes. What's to stop some XS code code doing
something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
    \(SvPVX\(sv\)\)\[0\] = 'x';

There is already SV_CHECK_THINKFIRST (and friends) to fix all such
issues. That code is already wrong today, but there's also already a
solution.

Leon

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2012

From @nwc10

On Tue, Sep 18, 2012 at 01​:00​:58PM +0200, Aristotle Pagaltzis wrote​:

* Nicholas Clark <nick@​ccl4.org> [2012-09-18 12​:05]​:

b) new strings are allocatd 1 byte off
[forgot to say - on how many platforms does this make a performance
hit? On ARM, at least, there are hardcore optimised word-at-a-time
routines for word aligned strings, and byte-by-byte for everything
else]

Just allocate it another 1 (or even 3) bytes off on platforms where that
is a concern?

I think that it would be 3 or 7. ["real" platforms have words that are
4 or 8 bytes. At least, in my book, they do :-)]

I'm not sure that that is as worth it, given that

a) the seduction of the use of 1 byte is that it's a really low memory cost,
  which at least 25% (or 12.5%) of the time isn't going to result in the
  use of a different malloc() buckets
b) not sure how often perl really ends up doing full-on comparisons, as
  storing the length means that unlike C, equal and not equal don't always
  need to walk the full string

Would be the right sort of thing to benchmark to find out the costs/benefits,
given "typical program" benchmarks.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2012

From @cpansprout

On Mon Sep 17 21​:44​:53 2012, sprout wrote​:

On Mon Sep 17 14​:06​:19 2012, nicholas wrote​:

However, I'm still wary of the Compress​::Zlib test failure, as an
indicator
that I didn't manage to anticipate every problem.

It is probably doing something naughty. :-)

This is enough to get it working​:

Inline Patch
diff --git a/cpan/Compress-Raw-Zlib/Zlib.xs b/cpan/Compress-Raw-Zlib/Zlib.xs
index 9f1d7a1..acaff84 100644
--- a/cpan/Compress-Raw-Zlib/Zlib.xs
+++ b/cpan/Compress-Raw-Zlib/Zlib.xs
@@ -614,7 +614,7 @@ char * string ;
             croak("%s: buffer parameter is a reference to a reference",
string) ;   \}

- if (SvREADONLY(sv) && PL_curcop != &PL_compiling)
+ if (SvREADONLY(sv) && !SvIsCOW(sv) && PL_curcop != &PL_compiling)
  croak("%s​: buffer parameter is read-only", string);

  SvUPGRADE(sv, SVt_PV);
@​@​ -1334,7 +1334,7 @​@​ inflate (s, buf, output, eof=FALSE)
  /* If the buffer is a reference, dereference it */
  buf = deRef(buf, "inflate") ;

- if (s->flags & FLAG_CONSUME_INPUT && SvREADONLY(buf))
+ if (s->flags & FLAG_CONSUME_INPUT && SvREADONLY(buf) && !SvIsCOW(buf))
  croak("Compress​::Raw​::Zlib​::Inflate​::inflate input parameter
cannot be read-only when ConsumeInput is specified");
#ifdef UTF8_AVAILABLE
  if (DO_UTF8(buf) && !sv_utf8_downgrade(buf, 1))

Also, I'm thinking that we ought to get rid of the (ab)use of
SvREADONLY()
to mean two things - "This is a readonly value" and "you can't write
to
the buffer that SvPVX() points to". It would be better to have two
flags.

What about existing XS code that checks SvREADONLY before modifying
SvPVX? (Is there any? Most XS code I’ve seen does a terrible job of
handling perl’s types.)

I did a quick audit and found numerous modules that use SvREADONLY.
Only three of them use it correctly​:

File​::Map
optimizer
POSIX​::pselect

And it looks as though only POSIX​::pselect will break if we make SvIsCOW
its own flag.

I suggest we go ahead and do it. Some modules mishandling SvREADONLY
might start to modify COW scalars in place. But I think it really is
worth it.

Any code that wants to modify strings in place should be using
SvPV_force first anyway.

The flag space is rather crowded. Maybe it’s about time we made pad
names their own type. Their being SVs makes SVs themselves more
complicated. A dedicated pad name type would also allow the name to be
allocated as part of the same memory block, much like a hek.

We could use Dave Mitchell’s free bit for now, which in
<20111007191900.GF3046@​iabyn.com> he asked us not to squander.

Presumably SvREADONLY should also set the flag. Presumably we would
call it something like SvREADONLYPV or SvPVisREADONLY.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2012

From @bulk88

On Mon Sep 17 21​:44​:53 2012, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad
names their own type. Their being SVs makes SVs themselves more
complicated. A dedicated pad name type would also allow the name to be
allocated as part of the same memory block, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't
see it in this thread, but what is wrong with using shared HEKs for COW?

Since the proposal as I read is it to introduce 2 different "start of
buffer" formats for the offset system, is a flag needed? Why not just
use SVf_OOK and let the macros and functions figure out if it is a COW
or offset in the start of string data?

In XS code I've used SVf_READONLY to protect PV buffers given to a OS
thread pooled C library for asynchronous reading or writing during an
asynchronous transaction. Specifically the PV must not be realloced
(SVf_READONLY) or freeded (SV refcount) during the transaction otherwise
a thread in the pool will SEGV. Double copying to private buffers caused
performance problems since the result of each transaction were
20KB-500KB binary strings. After the transaction is completed,
SVf_READONLY is turned off, sv_chop is used to cut off the C library's
opaque transaction header struct from the PV since it is binary now
expired garbage to the caller, and then the refcount is decreased on the
SVPV. The caller is supposed to keep a reference to the SVPV somewhere
in his Perl code.

Another thing, I've heard Perl allows non-Perl allocated PVs, see
http​://www.nntp.perl.org/group/perl.perl5.porters/2011/12/msg180269.html
. This shouldn't be broken by a future COW system.

@p5pRT
Copy link
Author

p5pRT commented Oct 2, 2012

From @cpansprout

On Tue Oct 02 12​:06​:42 2012, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad
names their own type. Their being SVs makes SVs themselves more
complicated. A dedicated pad name type would also allow the name to be
allocated as part of the same memory block, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't
see it in this thread, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string
and creating a HEK out of it requires copying it. I tried an experiment
to allocate all strings as HEKs initially, and then upgrade them to
shared HEKs by adding them to the string table when they would have been
copied, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of
buffer" formats for the offset system, is a flag needed? Why not just
use SVf_OOK and let the macros and functions figure out if it is a COW
or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY, we can make the dozen or so
modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working
correctly.

But no, the new flag would not be strictly necessary; just convenient.

In XS code I've used SVf_READONLY to protect PV buffers given to a OS
thread pooled C library for asynchronous reading or writing during an
asynchronous transaction. Specifically the PV must not be realloced
(SVf_READONLY) or freeded (SV refcount) during the transaction otherwise
a thread in the pool will SEGV. Double copying to private buffers caused
performance problems since the result of each transaction were
20KB-500KB binary strings. After the transaction is completed,
SVf_READONLY is turned off, sv_chop is used to cut off the C library's
opaque transaction header struct from the PV since it is binary now
expired garbage to the caller, and then the refcount is decreased on the
SVPV. The caller is supposed to keep a reference to the SVPV somewhere
in his Perl code.

Another thing, I've heard Perl allows non-Perl allocated PVs, see
http​://www.nntp.perl.org/group/perl.perl5.porters/2011/12/msg180269.html
. This shouldn't be broken by a future COW system.

I don’t think any of this would be broken.

Basically, as long as any XS code that wants to modify a string does
SvPV_force(sv) first (as it should be doing already), everything should
work.

I can image that newSVpvn might need to behave differently depending on
whether it is called by core code.

When XS modules use the PV pointer to store pointer-aligned binary data,
do they get perl to allocate the PV to begin with? Or do they allocate
it themselves? Does anyone know which modules these are? I cannot
think of a way to search for them.

I have just had another idea, a variation of the SvOOK hack, which may
render those questions moot. We could put the reference count on the
end of the string, after the null, rather than at the beginning. If
there is no room there when we are trying to fake a copy, then we copy
the string for real, making the new buffer one byte bigger.

The chances of missing the opportunity for COW are 1 in 16, on 32-bit
darwin​:

#include <stdlib.h>
#include <stdio.h>
main() {
int i;
size_t prev = 0;
for(i = 1; i<=256; i++) {
  void *ptr = malloc(i);
  size_t size = malloc_size(ptr);
  if (size != prev) printf ("Size for %d is %d\n", i, (int)size);
  prev = size;
  free(ptr);
}
}

Output​:

Size for 1 is 16
Size for 17 is 32
Size for 33 is 48
Size for 49 is 64
Size for 65 is 80
Size for 81 is 96
Size for 97 is 112
Size for 113 is 128
Size for 129 is 144
Size for 145 is 160
Size for 161 is 176
Size for 177 is 192
Size for 193 is 208
Size for 209 is 224
Size for 225 is 240
Size for 241 is 256

It is probably worthwhile to go ahead and allocate the extra byte anyway
at the outset (only 1/16th of the time will we use more memory than
needed, but we will be trading memory for speed), but we will still have
to check for it and fall back to copying, because XS modules may create
SVs without the extra byte.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2012

From @bulk88

#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>
#include <windows.h>

 
#ifdef _WIN64
#define HEAP_ENTRY_SHIFT 4
#else
#define HEAP_ENTRY_SHIFT 3
#endif


//32bit WinXP SP3 ONLY, MSVCR71.dll ONLY, not any other SP or OS or any other CRT

//if CRTs change, LFH might be turned on, this code does NOT support LFH heaps

//derived from REing ntdll and a app called Volatility
typedef struct _HEAP_ENTRY
{
	USHORT Size;
	USHORT PreviousSize;
	UCHAR SmallTagIndex;
	UCHAR Flags;
	UCHAR UnusedBytes;
	UCHAR SegmentIndex;
}  HEAP_ENTRY, *PHEAP_ENTRY;

int main() {
	int i;
	for(i = 1; i<=256; i++) {
		PHEAP_ENTRY HeapEntry;
		void * ptr = malloc(i);
		size_t RequestedEntrySize;
		size_t ActualSize;
		size_t size;
		HeapEntry = (char*)ptr-(char*)(sizeof(*HeapEntry));
		memset(ptr, 0xAF, i);
		size = _msize(ptr);
		//prove the heap header is parsed correctly, RequestedEntrySize must be equal to _msize's return
		RequestedEntrySize = (HeapEntry->Size << HEAP_ENTRY_SHIFT) - HeapEntry->UnusedBytes;
		ActualSize = (HeapEntry->Size << HEAP_ENTRY_SHIFT);
		printf ("ptr=%X i=%X _msize=%X RevEngd req sz=%X Actual=%X\n",
			ptr, i, size, RequestedEntrySize, ActualSize );
		free(ptr);
	}
	return 0;
}

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2012

From @bulk88

ptr=333DA0 i=1 _msize=1 RevEngd req sz=1 Actual=10
ptr=333DA0 i=2 _msize=2 RevEngd req sz=2 Actual=10
ptr=333DA0 i=3 _msize=3 RevEngd req sz=3 Actual=10
ptr=333DA0 i=4 _msize=4 RevEngd req sz=4 Actual=10
ptr=333DA0 i=5 _msize=5 RevEngd req sz=5 Actual=10
ptr=333DA0 i=6 _msize=6 RevEngd req sz=6 Actual=10
ptr=333DA0 i=7 _msize=7 RevEngd req sz=7 Actual=10
ptr=333DA0 i=8 _msize=8 RevEngd req sz=8 Actual=10
ptr=332438 i=9 _msize=9 RevEngd req sz=9 Actual=18
ptr=332438 i=A _msize=A RevEngd req sz=A Actual=18
ptr=332438 i=B _msize=B RevEngd req sz=B Actual=18
ptr=332438 i=C _msize=C RevEngd req sz=C Actual=18
ptr=332438 i=D _msize=D RevEngd req sz=D Actual=18
ptr=332438 i=E _msize=E RevEngd req sz=E Actual=18
ptr=332438 i=F _msize=F RevEngd req sz=F Actual=18
ptr=332438 i=10 _msize=10 RevEngd req sz=10 Actual=18
ptr=332450 i=11 _msize=11 RevEngd req sz=11 Actual=20
ptr=332450 i=12 _msize=12 RevEngd req sz=12 Actual=20
ptr=332450 i=13 _msize=13 RevEngd req sz=13 Actual=20
ptr=332450 i=14 _msize=14 RevEngd req sz=14 Actual=20
ptr=332450 i=15 _msize=15 RevEngd req sz=15 Actual=20
ptr=332450 i=16 _msize=16 RevEngd req sz=16 Actual=20
ptr=332450 i=17 _msize=17 RevEngd req sz=17 Actual=20
ptr=332450 i=18 _msize=18 RevEngd req sz=18 Actual=20
ptr=332470 i=19 _msize=19 RevEngd req sz=19 Actual=28
ptr=332470 i=1A _msize=1A RevEngd req sz=1A Actual=28
ptr=332470 i=1B _msize=1B RevEngd req sz=1B Actual=28
ptr=332470 i=1C _msize=1C RevEngd req sz=1C Actual=28
ptr=332470 i=1D _msize=1D RevEngd req sz=1D Actual=28
ptr=332470 i=1E _msize=1E RevEngd req sz=1E Actual=28
ptr=332470 i=1F _msize=1F RevEngd req sz=1F Actual=28
ptr=332470 i=20 _msize=20 RevEngd req sz=20 Actual=28
ptr=332498 i=21 _msize=21 RevEngd req sz=21 Actual=30
ptr=332498 i=22 _msize=22 RevEngd req sz=22 Actual=30
ptr=332498 i=23 _msize=23 RevEngd req sz=23 Actual=30
ptr=332498 i=24 _msize=24 RevEngd req sz=24 Actual=30
ptr=332498 i=25 _msize=25 RevEngd req sz=25 Actual=30
ptr=332498 i=26 _msize=26 RevEngd req sz=26 Actual=30
ptr=332498 i=27 _msize=27 RevEngd req sz=27 Actual=30
ptr=332498 i=28 _msize=28 RevEngd req sz=28 Actual=30
ptr=3324C8 i=29 _msize=29 RevEngd req sz=29 Actual=38
ptr=3324C8 i=2A _msize=2A RevEngd req sz=2A Actual=38
ptr=3324C8 i=2B _msize=2B RevEngd req sz=2B Actual=38
ptr=3324C8 i=2C _msize=2C RevEngd req sz=2C Actual=38
ptr=3324C8 i=2D _msize=2D RevEngd req sz=2D Actual=38
ptr=3324C8 i=2E _msize=2E RevEngd req sz=2E Actual=38
ptr=3324C8 i=2F _msize=2F RevEngd req sz=2F Actual=38
ptr=3324C8 i=30 _msize=30 RevEngd req sz=30 Actual=38
ptr=332500 i=31 _msize=31 RevEngd req sz=31 Actual=40
ptr=332500 i=32 _msize=32 RevEngd req sz=32 Actual=40
ptr=332500 i=33 _msize=33 RevEngd req sz=33 Actual=40
ptr=332500 i=34 _msize=34 RevEngd req sz=34 Actual=40
ptr=332500 i=35 _msize=35 RevEngd req sz=35 Actual=40
ptr=332500 i=36 _msize=36 RevEngd req sz=36 Actual=40
ptr=332500 i=37 _msize=37 RevEngd req sz=37 Actual=40
ptr=332500 i=38 _msize=38 RevEngd req sz=38 Actual=40
ptr=332540 i=39 _msize=39 RevEngd req sz=39 Actual=48
ptr=332540 i=3A _msize=3A RevEngd req sz=3A Actual=48
ptr=332540 i=3B _msize=3B RevEngd req sz=3B Actual=48
ptr=332540 i=3C _msize=3C RevEngd req sz=3C Actual=48
ptr=332540 i=3D _msize=3D RevEngd req sz=3D Actual=48
ptr=332540 i=3E _msize=3E RevEngd req sz=3E Actual=48
ptr=332540 i=3F _msize=3F RevEngd req sz=3F Actual=48
ptr=332540 i=40 _msize=40 RevEngd req sz=40 Actual=48
ptr=332588 i=41 _msize=41 RevEngd req sz=41 Actual=50
ptr=332588 i=42 _msize=42 RevEngd req sz=42 Actual=50
ptr=332588 i=43 _msize=43 RevEngd req sz=43 Actual=50
ptr=332588 i=44 _msize=44 RevEngd req sz=44 Actual=50
ptr=332588 i=45 _msize=45 RevEngd req sz=45 Actual=50
ptr=332588 i=46 _msize=46 RevEngd req sz=46 Actual=50
ptr=332588 i=47 _msize=47 RevEngd req sz=47 Actual=50
ptr=332588 i=48 _msize=48 RevEngd req sz=48 Actual=50
ptr=3325D8 i=49 _msize=49 RevEngd req sz=49 Actual=58
ptr=3325D8 i=4A _msize=4A RevEngd req sz=4A Actual=58
ptr=3325D8 i=4B _msize=4B RevEngd req sz=4B Actual=58
ptr=3325D8 i=4C _msize=4C RevEngd req sz=4C Actual=58
ptr=3325D8 i=4D _msize=4D RevEngd req sz=4D Actual=58
ptr=3325D8 i=4E _msize=4E RevEngd req sz=4E Actual=58
ptr=3325D8 i=4F _msize=4F RevEngd req sz=4F Actual=58
ptr=3325D8 i=50 _msize=50 RevEngd req sz=50 Actual=58
ptr=332630 i=51 _msize=51 RevEngd req sz=51 Actual=60
ptr=332630 i=52 _msize=52 RevEngd req sz=52 Actual=60
ptr=332630 i=53 _msize=53 RevEngd req sz=53 Actual=60
ptr=332630 i=54 _msize=54 RevEngd req sz=54 Actual=60
ptr=332630 i=55 _msize=55 RevEngd req sz=55 Actual=60
ptr=332630 i=56 _msize=56 RevEngd req sz=56 Actual=60
ptr=332630 i=57 _msize=57 RevEngd req sz=57 Actual=60
ptr=332630 i=58 _msize=58 RevEngd req sz=58 Actual=60
ptr=332690 i=59 _msize=59 RevEngd req sz=59 Actual=68
ptr=332690 i=5A _msize=5A RevEngd req sz=5A Actual=68
ptr=332690 i=5B _msize=5B RevEngd req sz=5B Actual=68
ptr=332690 i=5C _msize=5C RevEngd req sz=5C Actual=68
ptr=332690 i=5D _msize=5D RevEngd req sz=5D Actual=68
ptr=332690 i=5E _msize=5E RevEngd req sz=5E Actual=68
ptr=332690 i=5F _msize=5F RevEngd req sz=5F Actual=68
ptr=332690 i=60 _msize=60 RevEngd req sz=60 Actual=68
ptr=3326F8 i=61 _msize=61 RevEngd req sz=61 Actual=70
ptr=3326F8 i=62 _msize=62 RevEngd req sz=62 Actual=70
ptr=3326F8 i=63 _msize=63 RevEngd req sz=63 Actual=70
ptr=3326F8 i=64 _msize=64 RevEngd req sz=64 Actual=70
ptr=3326F8 i=65 _msize=65 RevEngd req sz=65 Actual=70
ptr=3326F8 i=66 _msize=66 RevEngd req sz=66 Actual=70
ptr=3326F8 i=67 _msize=67 RevEngd req sz=67 Actual=70
ptr=3326F8 i=68 _msize=68 RevEngd req sz=68 Actual=70
ptr=332768 i=69 _msize=69 RevEngd req sz=69 Actual=78
ptr=332768 i=6A _msize=6A RevEngd req sz=6A Actual=78
ptr=332768 i=6B _msize=6B RevEngd req sz=6B Actual=78
ptr=332768 i=6C _msize=6C RevEngd req sz=6C Actual=78
ptr=332768 i=6D _msize=6D RevEngd req sz=6D Actual=78
ptr=332768 i=6E _msize=6E RevEngd req sz=6E Actual=78
ptr=332768 i=6F _msize=6F RevEngd req sz=6F Actual=78
ptr=332768 i=70 _msize=70 RevEngd req sz=70 Actual=78
ptr=3327E0 i=71 _msize=71 RevEngd req sz=71 Actual=80
ptr=3327E0 i=72 _msize=72 RevEngd req sz=72 Actual=80
ptr=3327E0 i=73 _msize=73 RevEngd req sz=73 Actual=80
ptr=3327E0 i=74 _msize=74 RevEngd req sz=74 Actual=80
ptr=3327E0 i=75 _msize=75 RevEngd req sz=75 Actual=80
ptr=3327E0 i=76 _msize=76 RevEngd req sz=76 Actual=80
ptr=3327E0 i=77 _msize=77 RevEngd req sz=77 Actual=80
ptr=3327E0 i=78 _msize=78 RevEngd req sz=78 Actual=80
ptr=332860 i=79 _msize=79 RevEngd req sz=79 Actual=88
ptr=332860 i=7A _msize=7A RevEngd req sz=7A Actual=88
ptr=332860 i=7B _msize=7B RevEngd req sz=7B Actual=88
ptr=332860 i=7C _msize=7C RevEngd req sz=7C Actual=88
ptr=332860 i=7D _msize=7D RevEngd req sz=7D Actual=88
ptr=332860 i=7E _msize=7E RevEngd req sz=7E Actual=88
ptr=332860 i=7F _msize=7F RevEngd req sz=7F Actual=88
ptr=332860 i=80 _msize=80 RevEngd req sz=80 Actual=88
ptr=3328E8 i=81 _msize=81 RevEngd req sz=81 Actual=90
ptr=3328E8 i=82 _msize=82 RevEngd req sz=82 Actual=90
ptr=3328E8 i=83 _msize=83 RevEngd req sz=83 Actual=90
ptr=3328E8 i=84 _msize=84 RevEngd req sz=84 Actual=90
ptr=3328E8 i=85 _msize=85 RevEngd req sz=85 Actual=90
ptr=3328E8 i=86 _msize=86 RevEngd req sz=86 Actual=90
ptr=3328E8 i=87 _msize=87 RevEngd req sz=87 Actual=90
ptr=3328E8 i=88 _msize=88 RevEngd req sz=88 Actual=90
ptr=332978 i=89 _msize=89 RevEngd req sz=89 Actual=98
ptr=332978 i=8A _msize=8A RevEngd req sz=8A Actual=98
ptr=332978 i=8B _msize=8B RevEngd req sz=8B Actual=98
ptr=332978 i=8C _msize=8C RevEngd req sz=8C Actual=98
ptr=332978 i=8D _msize=8D RevEngd req sz=8D Actual=98
ptr=332978 i=8E _msize=8E RevEngd req sz=8E Actual=98
ptr=332978 i=8F _msize=8F RevEngd req sz=8F Actual=98
ptr=332978 i=90 _msize=90 RevEngd req sz=90 Actual=98
ptr=334DB8 i=91 _msize=91 RevEngd req sz=91 Actual=A0
ptr=334DB8 i=92 _msize=92 RevEngd req sz=92 Actual=A0
ptr=334DB8 i=93 _msize=93 RevEngd req sz=93 Actual=A0
ptr=334DB8 i=94 _msize=94 RevEngd req sz=94 Actual=A0
ptr=334DB8 i=95 _msize=95 RevEngd req sz=95 Actual=A0
ptr=334DB8 i=96 _msize=96 RevEngd req sz=96 Actual=A0
ptr=334DB8 i=97 _msize=97 RevEngd req sz=97 Actual=A0
ptr=334DB8 i=98 _msize=98 RevEngd req sz=98 Actual=A0
ptr=334E58 i=99 _msize=99 RevEngd req sz=99 Actual=A8
ptr=334E58 i=9A _msize=9A RevEngd req sz=9A Actual=A8
ptr=334E58 i=9B _msize=9B RevEngd req sz=9B Actual=A8
ptr=334E58 i=9C _msize=9C RevEngd req sz=9C Actual=A8
ptr=334E58 i=9D _msize=9D RevEngd req sz=9D Actual=A8
ptr=334E58 i=9E _msize=9E RevEngd req sz=9E Actual=A8
ptr=334E58 i=9F _msize=9F RevEngd req sz=9F Actual=A8
ptr=334E58 i=A0 _msize=A0 RevEngd req sz=A0 Actual=A8
ptr=334F00 i=A1 _msize=A1 RevEngd req sz=A1 Actual=B0
ptr=334F00 i=A2 _msize=A2 RevEngd req sz=A2 Actual=B0
ptr=334F00 i=A3 _msize=A3 RevEngd req sz=A3 Actual=B0
ptr=334F00 i=A4 _msize=A4 RevEngd req sz=A4 Actual=B0
ptr=334F00 i=A5 _msize=A5 RevEngd req sz=A5 Actual=B0
ptr=334F00 i=A6 _msize=A6 RevEngd req sz=A6 Actual=B0
ptr=334F00 i=A7 _msize=A7 RevEngd req sz=A7 Actual=B0
ptr=334F00 i=A8 _msize=A8 RevEngd req sz=A8 Actual=B0
ptr=334FB0 i=A9 _msize=A9 RevEngd req sz=A9 Actual=B8
ptr=334FB0 i=AA _msize=AA RevEngd req sz=AA Actual=B8
ptr=334FB0 i=AB _msize=AB RevEngd req sz=AB Actual=B8
ptr=334FB0 i=AC _msize=AC RevEngd req sz=AC Actual=B8
ptr=334FB0 i=AD _msize=AD RevEngd req sz=AD Actual=B8
ptr=334FB0 i=AE _msize=AE RevEngd req sz=AE Actual=B8
ptr=334FB0 i=AF _msize=AF RevEngd req sz=AF Actual=B8
ptr=334FB0 i=B0 _msize=B0 RevEngd req sz=B0 Actual=B8
ptr=335068 i=B1 _msize=B1 RevEngd req sz=B1 Actual=C0
ptr=335068 i=B2 _msize=B2 RevEngd req sz=B2 Actual=C0
ptr=335068 i=B3 _msize=B3 RevEngd req sz=B3 Actual=C0
ptr=335068 i=B4 _msize=B4 RevEngd req sz=B4 Actual=C0
ptr=335068 i=B5 _msize=B5 RevEngd req sz=B5 Actual=C0
ptr=335068 i=B6 _msize=B6 RevEngd req sz=B6 Actual=C0
ptr=335068 i=B7 _msize=B7 RevEngd req sz=B7 Actual=C0
ptr=335068 i=B8 _msize=B8 RevEngd req sz=B8 Actual=C0
ptr=335128 i=B9 _msize=B9 RevEngd req sz=B9 Actual=C8
ptr=335128 i=BA _msize=BA RevEngd req sz=BA Actual=C8
ptr=335128 i=BB _msize=BB RevEngd req sz=BB Actual=C8
ptr=335128 i=BC _msize=BC RevEngd req sz=BC Actual=C8
ptr=335128 i=BD _msize=BD RevEngd req sz=BD Actual=C8
ptr=335128 i=BE _msize=BE RevEngd req sz=BE Actual=C8
ptr=335128 i=BF _msize=BF RevEngd req sz=BF Actual=C8
ptr=335128 i=C0 _msize=C0 RevEngd req sz=C0 Actual=C8
ptr=3351F0 i=C1 _msize=C1 RevEngd req sz=C1 Actual=D0
ptr=3351F0 i=C2 _msize=C2 RevEngd req sz=C2 Actual=D0
ptr=3351F0 i=C3 _msize=C3 RevEngd req sz=C3 Actual=D0
ptr=3351F0 i=C4 _msize=C4 RevEngd req sz=C4 Actual=D0
ptr=3351F0 i=C5 _msize=C5 RevEngd req sz=C5 Actual=D0
ptr=3351F0 i=C6 _msize=C6 RevEngd req sz=C6 Actual=D0
ptr=3351F0 i=C7 _msize=C7 RevEngd req sz=C7 Actual=D0
ptr=3351F0 i=C8 _msize=C8 RevEngd req sz=C8 Actual=D0
ptr=3352C0 i=C9 _msize=C9 RevEngd req sz=C9 Actual=D8
ptr=3352C0 i=CA _msize=CA RevEngd req sz=CA Actual=D8
ptr=3352C0 i=CB _msize=CB RevEngd req sz=CB Actual=D8
ptr=3352C0 i=CC _msize=CC RevEngd req sz=CC Actual=D8
ptr=3352C0 i=CD _msize=CD RevEngd req sz=CD Actual=D8
ptr=3352C0 i=CE _msize=CE RevEngd req sz=CE Actual=D8
ptr=3352C0 i=CF _msize=CF RevEngd req sz=CF Actual=D8
ptr=3352C0 i=D0 _msize=D0 RevEngd req sz=D0 Actual=D8
ptr=335398 i=D1 _msize=D1 RevEngd req sz=D1 Actual=E0
ptr=335398 i=D2 _msize=D2 RevEngd req sz=D2 Actual=E0
ptr=335398 i=D3 _msize=D3 RevEngd req sz=D3 Actual=E0
ptr=335398 i=D4 _msize=D4 RevEngd req sz=D4 Actual=E0
ptr=335398 i=D5 _msize=D5 RevEngd req sz=D5 Actual=E0
ptr=335398 i=D6 _msize=D6 RevEngd req sz=D6 Actual=E0
ptr=335398 i=D7 _msize=D7 RevEngd req sz=D7 Actual=E0
ptr=335398 i=D8 _msize=D8 RevEngd req sz=D8 Actual=E0
ptr=335478 i=D9 _msize=D9 RevEngd req sz=D9 Actual=E8
ptr=335478 i=DA _msize=DA RevEngd req sz=DA Actual=E8
ptr=335478 i=DB _msize=DB RevEngd req sz=DB Actual=E8
ptr=335478 i=DC _msize=DC RevEngd req sz=DC Actual=E8
ptr=335478 i=DD _msize=DD RevEngd req sz=DD Actual=E8
ptr=335478 i=DE _msize=DE RevEngd req sz=DE Actual=E8
ptr=335478 i=DF _msize=DF RevEngd req sz=DF Actual=E8
ptr=335478 i=E0 _msize=E0 RevEngd req sz=E0 Actual=E8
ptr=335560 i=E1 _msize=E1 RevEngd req sz=E1 Actual=F0
ptr=335560 i=E2 _msize=E2 RevEngd req sz=E2 Actual=F0
ptr=335560 i=E3 _msize=E3 RevEngd req sz=E3 Actual=F0
ptr=335560 i=E4 _msize=E4 RevEngd req sz=E4 Actual=F0
ptr=335560 i=E5 _msize=E5 RevEngd req sz=E5 Actual=F0
ptr=335560 i=E6 _msize=E6 RevEngd req sz=E6 Actual=F0
ptr=335560 i=E7 _msize=E7 RevEngd req sz=E7 Actual=F0
ptr=335560 i=E8 _msize=E8 RevEngd req sz=E8 Actual=F0
ptr=335650 i=E9 _msize=E9 RevEngd req sz=E9 Actual=F8
ptr=335650 i=EA _msize=EA RevEngd req sz=EA Actual=F8
ptr=335650 i=EB _msize=EB RevEngd req sz=EB Actual=F8
ptr=335650 i=EC _msize=EC RevEngd req sz=EC Actual=F8
ptr=335650 i=ED _msize=ED RevEngd req sz=ED Actual=F8
ptr=335650 i=EE _msize=EE RevEngd req sz=EE Actual=F8
ptr=335650 i=EF _msize=EF RevEngd req sz=EF Actual=F8
ptr=335650 i=F0 _msize=F0 RevEngd req sz=F0 Actual=F8
ptr=335748 i=F1 _msize=F1 RevEngd req sz=F1 Actual=100
ptr=335748 i=F2 _msize=F2 RevEngd req sz=F2 Actual=100
ptr=335748 i=F3 _msize=F3 RevEngd req sz=F3 Actual=100
ptr=335748 i=F4 _msize=F4 RevEngd req sz=F4 Actual=100
ptr=335748 i=F5 _msize=F5 RevEngd req sz=F5 Actual=100
ptr=335748 i=F6 _msize=F6 RevEngd req sz=F6 Actual=100
ptr=335748 i=F7 _msize=F7 RevEngd req sz=F7 Actual=100
ptr=335748 i=F8 _msize=F8 RevEngd req sz=F8 Actual=100
ptr=335848 i=F9 _msize=F9 RevEngd req sz=F9 Actual=108
ptr=335848 i=FA _msize=FA RevEngd req sz=FA Actual=108
ptr=335848 i=FB _msize=FB RevEngd req sz=FB Actual=108
ptr=335848 i=FC _msize=FC RevEngd req sz=FC Actual=108
ptr=335848 i=FD _msize=FD RevEngd req sz=FD Actual=108
ptr=335848 i=FE _msize=FE RevEngd req sz=FE Actual=108
ptr=335848 i=FF _msize=FF RevEngd req sz=FF Actual=108
ptr=335848 i=100 _msize=100 RevEngd req sz=100 Actual=108

@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2012

From @cpansprout

On Wed Oct 03 12​:57​:42 2012, sprout wrote​:

I think I have abandoned that idea. Putting the refcount after the
trailing null sidesteps the whole issue.

But it causes problems in the regexp engine, which sets SvCUR to
0--something that has always been safe on COWs.

So instead I’m putting the refcount at the very end of the buffer.

So we have about three possible approaches​:

• Allocate an extra byte in sv_grow, to allow all strings to be
potential cows (calves :-). In rare circumstances this could increase
memory usage significantly (if there are many Math-Int128 objects).
• Only do COW when there is an extra byte available. This would be
unfortunate if a string is exactly 2GB minus 1 byte long.
• Upgrade to PVIV and fall back to the other COW scheme (storing a
pointer in the IV slot) if the string is sufficiently long that the
savings outweigh the bookkeeping costs. Some simple benchmarks would
allow us to determine the threshold. We might want different thresholds
depending on whether an upgrade is needed. Or we could store the poiner
in SvLEN.

The first is by far the simplest. The second would definitely be an
improvement over the current state of things. The third would be better
than the second but take longer to implement.

My order of preference is 1, 3, 2, but bearing in mind that implementing
3 means implementing 2 first and then building on top of that.

Actually, what I am going to do is modify calls to sv_grow throughout
the perl source code to account for the extra byte. That way we can
avoid it in those cases where we know we will not use it.

So I think that answers my question about what read-only
means. (We can’t COW with existing read-only variables that are not
already cows.)

We cannot COW with any read-only scalars at all. Existing well-behaved
code does SvREADONLY && !SvIsCOW, which would break if we had read-only
COWs.

(Unless we define SvIsCOW differently for XS.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2012

From @cpansprout

I still have a few problems to iron out, but here come the stupid
benchmarks​:

To do COW with a refcount stored in the string buffer, we have to check
that SvLEN is at least two bytes greater than SvCUR and than the
reference count has not reached 255. That overhead slows down COW.

perl.git is the COW version.
perl.git-copy is the non-COW version.

Short string​:

Pint​:perl.git sprout$ time ./miniperl -e '$x = "hello"; $y = $x for
1..10000000'

real 0m3.137s
user 0m3.128s
sys 0m0.006s
Pint​:perl.git-copy sprout$ time ./miniperl -e '$x = "hello"; $y = $x
for 1..10000000'

real 0m2.597s
user 0m2.587s
sys 0m0.006s

No cow is clearly the winner.

Long string​:

Pint​:perl.git sprout$ time ./miniperl -e '$x = "hello"x1000; $y = $x
for 1..10000000'

real 0m3.139s
user 0m3.128s
sys 0m0.008s
Pint​:perl.git-copy sprout$ time ./miniperl -e '$x = "hello"x1000; $y =
$x for 1..10000000'

real 0m7.533s
user 0m7.520s
sys 0m0.009s

For cow, the speed is the same. For no cow, things are clearly slower.

Harness is hanging for some reason, but minitest works so far. ‘make
test’ has a few failures.

You can play with it if you want. It’s on the sprout/ookow branch
(SvOOK-style COW [except it does not involve SvOOK any more]).

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 8, 2012

From @cpansprout

On Mon Oct 08 00​:37​:16 2012, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch
(SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 16, 2012

From @cpansprout

On Mon Oct 08 08​:40​:33 2012, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch
(SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing, except Peek.t, which will need
tweaking when I have finished.

I still need to do some benchmarks to find a threshold where
copy-on-write speeds things up (and then only do COW above that
threshold). We may need two thresholds depending on whether the
destination string has a buffer already, and whether that buffer is big
enough. (Copying may be faster than freeing a buffer.)

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

And then I need to see whether eliminating swipe makes any speed difference.

And then I need to benchmark it against some real code. (Test.Simple’s
test suite run with WWW​::Scripter might work well. [See
<https://rt.cpan.org/Ticket/Display.html?id=50462#txn-681840>.
Downloading it and using file​:/// URLs will eliminate any network
latency.] Although a test suite is not typical code, as it tries to
exercise every code path, Test.Simple’s tests are testing all of
Test.Simple’s code paths, not all of JE’s. So JE may be a good buffer
that translates atypical JS code into typical Perl code.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 16, 2012

From [Unknown Contact. See original ticket]

On Mon Oct 08 08​:40​:33 2012, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch
(SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing, except Peek.t, which will need
tweaking when I have finished.

I still need to do some benchmarks to find a threshold where
copy-on-write speeds things up (and then only do COW above that
threshold). We may need two thresholds depending on whether the
destination string has a buffer already, and whether that buffer is big
enough. (Copying may be faster than freeing a buffer.)

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

And then I need to see whether eliminating swipe makes any speed difference.

And then I need to benchmark it against some real code. (Test.Simple’s
test suite run with WWW​::Scripter might work well. [See
<https://rt.cpan.org/Ticket/Display.html?id=50462#txn-681840>.
Downloading it and using file​:/// URLs will eliminate any network
latency.] Although a test suite is not typical code, as it tries to
exercise every code path, Test.Simple’s tests are testing all of
Test.Simple’s code paths, not all of JE’s. So JE may be a good buffer
that translates atypical JS code into typical Perl code.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 17, 2012

From @rurban

On Tue, Oct 2, 2012 at 11​:44 PM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Tue Oct 02 13​:02​:26 2012, sprout wrote​:

On Tue Oct 02 12​:06​:42 2012, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad
names their own type. Their being SVs makes SVs themselves more
complicated. A dedicated pad name type would also allow the name
to be
allocated as part of the same memory block, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't
see it in this thread, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string
and creating a HEK out of it requires copying it. I tried an experiment
to allocate all strings as HEKs initially, and then upgrade them to
shared HEKs by adding them to the string table when they would have been
copied, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of
buffer" formats for the offset system, is a flag needed? Why not just
use SVf_OOK and let the macros and functions figure out if it is a COW
or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY, we can make the dozen or so
modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working
correctly.

But no, the new flag would not be strictly necessary; just convenient.

Here is another reason​:

COW currently does not work on read-only scalars. If we add a COW flag,
then copying a read-only string will entail turning on the COW flag and
increasing the reference count at SvEND+1.

But is it OK to modify the string buffer of a read-only scalar? What
does read-only mean? That the contents of the scalar must not be
modified, or that modifications should have no observable effects at the
Perl language level?

Both, though only the first property is yet used.

SvREADONLY data is currently only used for run-time checks,
but will be used in the future for compile-time checks and optimizations.

BTW​: Optimizing on data properties is not a good idea. We generally optimize
on OP properties only. So it's a mess. But only really needed for
special objects
(classes and ISA).

But is it OK to modify the string buffer of a read-only scalar.

No.

What does read-only mean?

It is a reserved data flag to be used for compile-time and run-time checks,
and for future compile-time optimizations, when we will have read-only
lexicals, classes (immutable) and @​ISA (final).

At those times we only implemented the simpliest readonly-ness in PHP fashion
by using global functions, without sigil (CONSTSUB), but did not come
to implement
the rest.
And it was really not needed with our simple OO system those days, which need
not to be optimized. Now with Moose, Catalyst and Dist​::Zilla and those beasts
read-only optimizations will bear fruit - in the > 200-2000% range, not 10%.

Is it OK to modify the hash key a read-only HV.

No

Is it OK to modify the hash value a read-only HV.

Yes. readonly-less is only shallow. A HV key points to a new SV,
which can/needs to be protected seperately.
Same for AV.

Is it OK to add or remove hash keys from a read-only HV or AV.

No

No answers yet to readonly function arguments and return declarations
as it is too early for you yet. I spec'd it in and gave more answers in
my typed/ro branch and pdd.
--
Reini Urban
http​://cpanel.net/ http​://www.perl-compiler.org/

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @cpansprout

On Tue Oct 16 10​:14​:42 2012, sprout wrote​:

On Mon Oct 08 08​:40​:33 2012, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch
(SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing, except Peek.t, which will need
tweaking when I have finished.

I still need to do some benchmarks to find a threshold where
copy-on-write speeds things up (and then only do COW above that
threshold). We may need two thresholds depending on whether the
destination string has a buffer already, and whether that buffer is big
enough. (Copying may be faster than freeing a buffer.)

You can now see this on the sprout/ookow2 branch. It still requires
-Accflags=-DPERL_NEW_COPY_ON_WRITE.

There are two #defines, SV_COW_THRESHOLD and SV_COWBUF_THRESHOLD. These
are minimum string lengths where COW kicks in. The latter is used if
the destination string already has a buffer large enough to hold the string.

For benchmarking I compiled without -DDEBUGGING. Earlier I had used it,
but it skewed the results because of extra PL_debug & DEBUG_C_FLAG
checks in the COW-handling code.

I tried those silly benchmarks with repeated variable assignment in a
loop, and found that sv_force_normal_flags is the real bottleneck when
it comes to copy-on-write. For these simple loops, the COW overhead
exceeds the cost of copying up to 209 bytes. Above that, COW is faster.
If the destination string is already large enough, then copying is so
fast that it beats the pants of COW up to 2131 bytes.

When I tried those numbers (209/2131) with mktables, it made absolutely
no difference to the speed, but 0/0 (always COW) showed a slight
speed-up. So I did repeated tests with different numbers and found
0/500 to be ideal.

I then tried David Wheeler’s Test.Simple JavaScript library with
WWW​::Scripter. Its test suite is written in JavaScript and runs in a
browser (like WWW​::Scripter :-). The optimum setting seemed to be
0/1250, which is just as fast as 0/500 for mktables. So that’s the
setting I currently have on my branch.

I set up those #defines with #ifndef conditions so that
platform-specific hints can override them. I’m sure the ideal settings
will be different for Windows, for instance.

For anyone who wants to try Test.Simple with blead (I think it makes a
good benchmark)​:

Get JS​::Test​::Simple off CPAN and then move the tests/ folder somewhere
convenient (I used /ts/).

Install Lexical​::Var into your blead installation with the patch from
<https://rt.cpan.org/Ticket/Display.html?id=80309#txn-1131526>.

Install WWW​::Scripter​::Plugin​::Ajax.

Then run this ‘one’-liner, with the right file​: URL​:

bleadperl -MTime​::HiRes=sleep -e '++$INC{"JE/Destroyer.pm"};use
WWW​::Scripter; agent_alias{$w=new WWW​::Scripter}"Mac Safari";
$w->use_plugin(Ajax); $w->get("file​:///ts/index.html ");
$w->wait_for_timers; print $w->content(format=>text)'

The %INC fiddling is a necessary hack. It basically has to do with
WWW​::Scripter​::Plugin​::JavaScript’s attempts to handle circular
references conflicting with Test.Simple’s expectations. Pretending to
load JE​::Destroyer disables that circularity handling.

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

Not done yet, but that is next on my list.

And then I need to see whether eliminating swipe makes any speed
difference.

Yes. It makes things slower.

(And, smueller, did you see my message about smoking sprout/ookow? If
you have not started, could you use sprout/ookow2 instead? Thank you.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From chromatic@wgz.org

On Tuesday, October 23, 2012 01​:01​:56 PM Father Chrysostomos via RT wrote​:

I tried those silly benchmarks with repeated variable assignment in a
loop, and found that sv_force_normal_flags is the real bottleneck when
it comes to copy-on-write.

FWIW, that's been the bottleneck in most of the microbenchmarks I've profiled
too. No obvious local improvements have ever jumped out at me.

-- c

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @cpansprout

On Tue Oct 23 13​:01​:56 2012, sprout wrote​:

There are two

*new*

#defines, SV_COW_THRESHOLD and SV_COWBUF_THRESHOLD. These
are minimum string lengths where COW kicks in. The latter is used if
the destination string already has a buffer large enough to hold the
string.

For benchmarking I compiled without -DDEBUGGING. Earlier I had used it,
but it skewed the results because of extra PL_debug & DEBUG_C_FLAG
checks in the COW-handling code.

I tried those silly benchmarks with repeated variable assignment in a
loop, and found that sv_force_normal_flags is the real bottleneck when
it comes to copy-on-write. For these simple loops, the COW overhead
exceeds the cost of copying up to 209 bytes. Above that, COW is faster.
If the destination string is already large enough, then copying is so
fast that it beats the pants of COW up to 2131 bytes.

s/of\K/f/

When I tried those numbers (209/2131) with mktables, it made absolutely
no difference to the speed, but 0/0 (always COW) showed a slight
speed-up. So I did repeated tests with different numbers and found
0/500 to be ideal.

I then tried David Wheeler’s Test.Simple JavaScript library with
WWW​::Scripter. Its test suite is written in JavaScript and runs in a
browser (like WWW​::Scripter :-). The optimum setting seemed to be
0/1250, which is just as fast as 0/500 for mktables. So that’s the
setting I currently have on my branch.

I forgot to mention that the speed increase I saw with mktables was
1.8%, and with Test.Simple 3%.

Those might not be significant, but programs that deal with large
strings should see a huge speed-up.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @tsee

On 10/23/2012 10​:01 PM, Father Chrysostomos via RT wrote​:

(And, smueller, did you see my message about smoking sprout/ookow? If
you have not started, could you use sprout/ookow2 instead? Thank you.)

Umm, too late. I think you missed my mail about starting the smoke. It
actually appears to be done​:

http​://users.perl5.git.perl.org/~tsee/sprout-ookow/

(Progress​: http​://users.perl5.git.perl.org/~tsee/progress.html)

Should I kick off sprout/ookow2 nonetheless?

--Steffen

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @cpansprout

On Tue Oct 23 13​:08​:11 2012, smueller@​cpan.org wrote​:

On 10/23/2012 10​:01 PM, Father Chrysostomos via RT wrote​:

(And, smueller, did you see my message about smoking sprout/ookow? If
you have not started, could you use sprout/ookow2 instead? Thank you.)

Umm, too late. I think you missed my mail about starting the smoke. It
actually appears to be done​:

http​://users.perl5.git.perl.org/~tsee/sprout-ookow/

(Progress​: http​://users.perl5.git.perl.org/~tsee/progress.html)

Thank you.

But, I think you missed my follow-up message. In which case it is my
fault for not stating up front that -Accflags=-DPERL_NEW_COPY_ON_WRITE
is needed. So I apologise for making you do useless work.

(Or am I misreading the reports?)

Should I kick off sprout/ookow2 nonetheless?

On second thought, sprout/ookow (without the 2) is a better robustness
test for XS modules, as it turns on COW full throttle.

Would you mind doing another sprout/ookow smoke (with
-Accflags=-DPERL_NEW_COPY_ON_WRITE this time)? Thank you.

BTW, perl -V output should show PERL_NEW_COPY_ON_WRITE under
‘Characteristics of this binary’.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @cpansprout

And in case the changed subject confused anyone (I wish RT could keep
the subject when I click Reply), here it is again with the original subject.

On Tue Oct 23 13​:53​:19 2012, sprout wrote​:

On Tue Oct 23 13​:08​:11 2012, smueller@​cpan.org wrote​:

On 10/23/2012 10​:01 PM, Father Chrysostomos via RT wrote​:

(And, smueller, did you see my message about smoking sprout/ookow? If
you have not started, could you use sprout/ookow2 instead? Thank
you.)

Umm, too late. I think you missed my mail about starting the smoke. It
actually appears to be done​:

http​://users.perl5.git.perl.org/~tsee/sprout-ookow/

(Progress​: http​://users.perl5.git.perl.org/~tsee/progress.html)

Thank you.

But, I think you missed my follow-up message. In which case it is my
fault for not stating up front that -Accflags=-DPERL_NEW_COPY_ON_WRITE
is needed. So I apologise for making you do useless work.

(Or am I misreading the reports?)

Should I kick off sprout/ookow2 nonetheless?

On second thought, sprout/ookow (without the 2) is a better robustness
test for XS modules, as it turns on COW full throttle.

Would you mind doing another sprout/ookow smoke (with
-Accflags=-DPERL_NEW_COPY_ON_WRITE this time)? Thank you.

BTW, perl -V output should show PERL_NEW_COPY_ON_WRITE under
‘Characteristics of this binary’.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From @cpansprout

On Tue Oct 23 13​:01​:56 2012, sprout wrote​:

On Tue Oct 16 10​:14​:42 2012, sprout wrote​:

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

Not done yet, but that is next on my list.

It is giving me a bizarre result. Just making the SvGROW macro use
len+1 instead of len gives me a reproducible 1% slowdown.

That’s probably happening because I added extra code to a macro used in
hot code paths.

So a ‘cow when we can’ approach, without making ‘we can’ occur more
often, seems the only choice.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 23, 2012

From [Unknown Contact. See original ticket]

On Tue Oct 23 13​:01​:56 2012, sprout wrote​:

On Tue Oct 16 10​:14​:42 2012, sprout wrote​:

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

Not done yet, but that is next on my list.

It is giving me a bizarre result. Just making the SvGROW macro use
len+1 instead of len gives me a reproducible 1% slowdown.

That’s probably happening because I added extra code to a macro used in
hot code paths.

So a ‘cow when we can’ approach, without making ‘we can’ occur more
often, seems the only choice.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 24, 2012

From @tsee

On 10/23/2012 10​:54 PM, Father Chrysostomos via RT wrote​:

On Tue Oct 23 13​:53​:19 2012, sprout wrote​:

But, I think you missed my follow-up message. In which case it is my
fault for not stating up front that -Accflags=-DPERL_NEW_COPY_ON_WRITE
is needed. So I apologise for making you do useless work.

(Or am I misreading the reports?)

Should I kick off sprout/ookow2 nonetheless?

On second thought, sprout/ookow (without the 2) is a better robustness
test for XS modules, as it turns on COW full throttle.

Would you mind doing another sprout/ookow smoke (with
-Accflags=-DPERL_NEW_COPY_ON_WRITE this time)? Thank you.

BTW, perl -V output should show PERL_NEW_COPY_ON_WRITE under
‘Characteristics of this binary’.

Consider it done.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented Oct 24, 2012

From @cpansprout

On Tue Oct 23 16​:49​:40 2012, sprout wrote​:

On Tue Oct 23 13​:01​:56 2012, sprout wrote​:

On Tue Oct 16 10​:14​:42 2012, sprout wrote​:

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

Not done yet, but that is next on my list.

It is giving me a bizarre result. Just making the SvGROW macro use
len+1 instead of len gives me a reproducible 1% slowdown.

That’s probably happening because I added extra code to a macro used in
hot code paths.

So a ‘cow when we can’ approach, without making ‘we can’ occur more
often, seems the only choice.

My next step after that was going to be to enable COW for magic values.

COW is currently disabled even when assigning a shared hash key scalar
to a variable if the destination has any of these flags set other than
SVp_POK|SVp_POK​:

#define CAN_COW_MASK (SVs_OBJECT|SVs_GMG|SVs_SMG|SVs_RMG|SVf_IOK|SVf_NOK| \
  SVf_POK|SVf_ROK|SVp_IOK|SVp_NOK|SVp_POK|SVf_FAKE| \
  SVf_OOK|SVf_BREAK|SVf_READONLY)

I cannot see how blessedness or magic can affect COWs. Most of the
other flags are never set on the destination in that code path.

This punting on COW for shared hash key scalars was introduced by this
commit​:

commit b8f9541
Author​: Nicholas Clark <nick@​ccl4.org>
Date​: Tue Jun 7 14​:57​:35 2005 +0000

  Ensure string table counts are balanced. (Was not true in op/pack.t)
 
  p4raw-id​: //depot/perl@​24732

One commit before that, I see no unbalanced string table warnings with
pack.t (with PERL_DESTRUCT_LEVEL=2 `pwd`/perl -Ilib t/op/pack.t).

I imagine SVt_BREAK must have been the cause, but it doesn’t happen on
my machine. If SVt_BREAK is set on the destination, then it has been
freed forcibly during global destruction, when there might be references
to it still. (I don’t believe that can actually happen, though.)
Assignment thereto will cause the string table’s reference count to go
up, but not down, resulting in an unbalanced warning.

If I hear no objection soon, I will change it simply to an SVf_BREAK test.

(And if Nicholas Clark sees a problem, he still has several months to
respond before the next stable release.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 24, 2012

From [Unknown Contact. See original ticket]

On Tue Oct 23 16​:49​:40 2012, sprout wrote​:

On Tue Oct 23 13​:01​:56 2012, sprout wrote​:

On Tue Oct 16 10​:14​:42 2012, sprout wrote​:

I also need to allocate that extra byte for strings above the COW
threshold, to store the buffer’s reference count.

Not done yet, but that is next on my list.

It is giving me a bizarre result. Just making the SvGROW macro use
len+1 instead of len gives me a reproducible 1% slowdown.

That’s probably happening because I added extra code to a macro used in
hot code paths.

So a ‘cow when we can’ approach, without making ‘we can’ occur more
often, seems the only choice.

My next step after that was going to be to enable COW for magic values.

COW is currently disabled even when assigning a shared hash key scalar
to a variable if the destination has any of these flags set other than
SVp_POK|SVp_POK​:

#define CAN_COW_MASK (SVs_OBJECT|SVs_GMG|SVs_SMG|SVs_RMG|SVf_IOK|SVf_NOK| \
  SVf_POK|SVf_ROK|SVp_IOK|SVp_NOK|SVp_POK|SVf_FAKE| \
  SVf_OOK|SVf_BREAK|SVf_READONLY)

I cannot see how blessedness or magic can affect COWs. Most of the
other flags are never set on the destination in that code path.

This punting on COW for shared hash key scalars was introduced by this
commit​:

commit b8f9541
Author​: Nicholas Clark <nick@​ccl4.org>
Date​: Tue Jun 7 14​:57​:35 2005 +0000

  Ensure string table counts are balanced. (Was not true in op/pack.t)
 
  p4raw-id​: //depot/perl@​24732

One commit before that, I see no unbalanced string table warnings with
pack.t (with PERL_DESTRUCT_LEVEL=2 `pwd`/perl -Ilib t/op/pack.t).

I imagine SVt_BREAK must have been the cause, but it doesn’t happen on
my machine. If SVt_BREAK is set on the destination, then it has been
freed forcibly during global destruction, when there might be references
to it still. (I don’t believe that can actually happen, though.)
Assignment thereto will cause the string table’s reference count to go
up, but not down, resulting in an unbalanced warning.

If I hear no objection soon, I will change it simply to an SVf_BREAK test.

(And if Nicholas Clark sees a problem, he still has several months to
respond before the next stable release.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Oct 27, 2012

From @cpansprout

On Thu Oct 04 05​:17​:48 2012, davem wrote​:

On Mon, Oct 01, 2012 at 11​:10​:17PM -0700, Father Chrysostomos via RT
wrote​:

We could use Dave Mitchell’s free bit for now, which in
<20111007191900.GF3046@​iabyn.com> he asked us not to squander.

I'm happy for this bit to be used for this purpose.

I have a commit on the sprout/ookow2 branch that adds and uses the
SVf_IsCOW flag. Does anyone have any objections to my merging just that
commit soon? The lack of a separate flag is actually getting in the way
of some work to handle existing shared hash key scalars more efficiently.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2012

From @cpansprout

On Tue Oct 23 22​:44​:44 2012, smueller@​cpan.org wrote​:

On 10/23/2012 10​:54 PM, Father Chrysostomos via RT wrote​:

On Tue Oct 23 13​:53​:19 2012, sprout wrote​:

But, I think you missed my follow-up message. In which case it is my
fault for not stating up front that -Accflags=-DPERL_NEW_COPY_ON_WRITE
is needed. So I apologise for making you do useless work.

(Or am I misreading the reports?)

Should I kick off sprout/ookow2 nonetheless?

On second thought, sprout/ookow (without the 2) is a better robustness
test for XS modules, as it turns on COW full throttle.

Would you mind doing another sprout/ookow smoke (with
-Accflags=-DPERL_NEW_COPY_ON_WRITE this time)? Thank you.

BTW, perl -V output should show PERL_NEW_COPY_ON_WRITE under
‘Characteristics of this binary’.

Consider it done.

Apologies for taking so long to respond to say thank you. This is
extremely helpful. I saw your message saying it was complete, but I
cannot find it now.

I see there are only 29 failures, some of which are due to an Encode bug
(patch submitted), so I am quite optimistic about it.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 8, 2012

From [Unknown Contact. See original ticket]

On Tue Oct 23 22​:44​:44 2012, smueller@​cpan.org wrote​:

On 10/23/2012 10​:54 PM, Father Chrysostomos via RT wrote​:

On Tue Oct 23 13​:53​:19 2012, sprout wrote​:

But, I think you missed my follow-up message. In which case it is my
fault for not stating up front that -Accflags=-DPERL_NEW_COPY_ON_WRITE
is needed. So I apologise for making you do useless work.

(Or am I misreading the reports?)

Should I kick off sprout/ookow2 nonetheless?

On second thought, sprout/ookow (without the 2) is a better robustness
test for XS modules, as it turns on COW full throttle.

Would you mind doing another sprout/ookow smoke (with
-Accflags=-DPERL_NEW_COPY_ON_WRITE this time)? Thank you.

BTW, perl -V output should show PERL_NEW_COPY_ON_WRITE under
‘Characteristics of this binary’.

Consider it done.

Apologies for taking so long to respond to say thank you. This is
extremely helpful. I saw your message saying it was complete, but I
cannot find it now.

I see there are only 29 failures, some of which are due to an Encode bug
(patch submitted), so I am quite optimistic about it.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2012

From @cpansprout

On Fri Oct 26 18​:08​:02 2012, sprout wrote​:

On Thu Oct 04 05​:17​:48 2012, davem wrote​:

On Mon, Oct 01, 2012 at 11​:10​:17PM -0700, Father Chrysostomos via RT
wrote​:

We could use Dave Mitchell’s free bit for now, which in
<20111007191900.GF3046@​iabyn.com> he asked us not to squander.

I'm happy for this bit to be used for this purpose.

I have a commit on the sprout/ookow2 branch that adds and uses the
SVf_IsCOW flag. Does anyone have any objections to my merging just that
commit soon? The lack of a separate flag is actually getting in the way
of some work to handle existing shared hash key scalars more efficiently.

I’ve merged my patch as e3918bb, but the commit message has a mistake
in it. It wrongly states that Peek.t is failing.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2012

From [Unknown Contact. See original ticket]

On Fri Oct 26 18​:08​:02 2012, sprout wrote​:

On Thu Oct 04 05​:17​:48 2012, davem wrote​:

On Mon, Oct 01, 2012 at 11​:10​:17PM -0700, Father Chrysostomos via RT
wrote​:

We could use Dave Mitchell’s free bit for now, which in
<20111007191900.GF3046@​iabyn.com> he asked us not to squander.

I'm happy for this bit to be used for this purpose.

I have a commit on the sprout/ookow2 branch that adds and uses the
SVf_IsCOW flag. Does anyone have any objections to my merging just that
commit soon? The lack of a separate flag is actually getting in the way
of some work to handle existing shared hash key scalars more efficiently.

I’ve merged my patch as e3918bb, but the commit message has a mistake
in it. It wrongly states that Peek.t is failing.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2012

From @cpansprout

On Mon Sep 10 08​:29​:17 2012, davem wrote​:

On Sun, Sep 09, 2012 at 07​:31​:26PM -0700, Andreas J. Koenig via RT
wrote​:

----------
commit 6502e08
Author​: David Mitchell <davem@​iabyn.com>
Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-
structured-match.t line 26.
t/031-structured-match.t .......
Dubious, test returned 255 (wstat 65280, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

$extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen
yet,
then afterwards to be able to retrieve its value. This used to happen
to
work, but can't be relied upon (and no longer works).

It does work as of 1a904fc.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2012

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Nov 27, 2012
@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2012

From @cpansprout

On Thu Oct 04 16​:33​:10 2012, bulk88 wrote​:

I ported your code to Windows, sort of. _msize and HeapSize perfectly
knew the original allocation size given to malloc, they were useless
for
what you wanted to see, so I figured out what the Windows Heap manager
internally says the allocation is. BTW threaded Win32 Perl puts its
own
headers on allocations too, see

http​://perl5.git.perl.org/perl.git/blob/6aa4fbb50135d4863535200030aadfeb8c583c6e​:/win32/vmem.h#l207

. The HEAP_ENTRY struct in Windbg was completely different from
reality.
Variable ActualSize I believe includes the 8 byte Heap header in its
count at all times. The code will not work on anything but my machine
I
think. The warnings about what it was run on are in the source. I hope
this information useful somehow.

It’s interesting to see that it allocates memory in increments of 8
bytes. So I imagine that copy-on-write speeds things up slightly less
on Windows than it does on Darwin.

I wonder, would Windows benefit from different COW threshold settings
than the ones I have hardcoded in sv.c?

I have the #defines set up such that defines on the command line will
override them.

If someone with access to Windows could try different values for
-DSV_COWBUF_THRESHOLD=... and time a real-world program that takes a
while to run (or mktables), that would be helpful. And then with
SV_COWBUF_THRESHOLD set to its optimum value, try compiling with
-DSV_COW_THRESHOLD=... set to different values.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 27, 2012

From [Unknown Contact. See original ticket]

On Thu Oct 04 16​:33​:10 2012, bulk88 wrote​:

I ported your code to Windows, sort of. _msize and HeapSize perfectly
knew the original allocation size given to malloc, they were useless
for
what you wanted to see, so I figured out what the Windows Heap manager
internally says the allocation is. BTW threaded Win32 Perl puts its
own
headers on allocations too, see

http​://perl5.git.perl.org/perl.git/blob/6aa4fbb50135d4863535200030aadfeb8c583c6e​:/win32/vmem.h#l207

. The HEAP_ENTRY struct in Windbg was completely different from
reality.
Variable ActualSize I believe includes the 8 byte Heap header in its
count at all times. The code will not work on anything but my machine
I
think. The warnings about what it was run on are in the source. I hope
this information useful somehow.

It’s interesting to see that it allocates memory in increments of 8
bytes. So I imagine that copy-on-write speeds things up slightly less
on Windows than it does on Darwin.

I wonder, would Windows benefit from different COW threshold settings
than the ones I have hardcoded in sv.c?

I have the #defines set up such that defines on the command line will
override them.

If someone with access to Windows could try different values for
-DSV_COWBUF_THRESHOLD=... and time a real-world program that takes a
while to run (or mktables), that would be helpful. And then with
SV_COWBUF_THRESHOLD set to its optimum value, try compiling with
-DSV_COW_THRESHOLD=... set to different values.

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant