Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document/publicize THINKFIRST #12813

Closed
p5pRT opened this issue Feb 25, 2013 · 33 comments
Closed

document/publicize THINKFIRST #12813

p5pRT opened this issue Feb 25, 2013 · 33 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 25, 2013

Migrated from rt.perl.org#116925 (status was 'resolved')

Searchable as RT116925$

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @rjbs

A number of things broken by the new COW work were answered with, "By the way,
you need to use THINKFIRST there." This, in turn, was followed by, "What?
That's undocumented and how was I supposed to know that?" The third line in
this exchange, if any, was "Welcome to the perl guts!"

It's true enough, but if it's going to be needed, let's get this documented so
we can point people at the documentation when the code breaks.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @doy

On Mon, Feb 25, 2013 at 08​:06​:11AM -0800, Ricardo SIGNES wrote​:

A number of things broken by the new COW work were answered with, "By the way,
you need to use THINKFIRST there." This, in turn, was followed by, "What?
That's undocumented and how was I supposed to know that?" The third line in
this exchange, if any, was "Welcome to the perl guts!"

It's true enough, but if it's going to be needed, let's get this documented so
we can point people at the documentation when the code breaks.

Also, the core typemap for char* (not const char*) needs to be updated
to use THINKFIRST, since apparently people expect it to produce a
writable string buffer.

-doy

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 25, 2013

From @bulk88

On Mon Feb 25 08​:33​:29 2013, doy@​tozt.net wrote​:

Also, the core typemap for char* (not const char*) needs to be updated
to use THINKFIRST, since apparently people expect it to produce a
writable string buffer.

-doy

perlxs and perlxstypemap needs to be fixed. It uses "bool_t
rpcb_gettime(const char *host, time_t *timep);" as an example, but in
the XS code it uses "char *" everywhere. It needs to be written in
perlguts or perlxs to advise the reader to always use const char * and
SvPV_const() if possible for performance reasons. Also a new T-* type
needs to be created for non-de-cowed PVs. I randomly throw out T_CPV as
a suggestion. I am not sure what exactly needs to be added to perlguts.

As a separate issue, "Memory Allocation" in perlguts should mention
sv_usepvn_flags exists, and the pod for sv_usepvn_flags needs to say it
only takes "Newx" memory never any kind of other memory like "malloc"
memory.

This ticket also needs to be linked to #116569.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Mar 21, 2013

From @Leont

On Mon, Feb 25, 2013 at 5​:06 PM, Ricardo SIGNES
<perlbug-followup@​perl.org> wrote​:

A number of things broken by the new COW work were answered with, "By the way,
you need to use THINKFIRST there." This, in turn, was followed by, "What?
That's undocumented and how was I supposed to know that?" The third line in
this exchange, if any, was "Welcome to the perl guts!"

It's true enough, but if it's going to be needed, let's get this documented so
we can point people at the documentation when the code breaks.

Father C already documented SV_THINKFIRST in a11eaec, though I think
he was too optimistic (it's not just needed on SvPVX, but on any SvPV*
write) and not marked as API. SV_CHECK_THINKFIRST and
SV_CHECK_THINKFIRST_COW_DROP still need to be documented though.

Leon

@p5pRT
Copy link
Author

p5pRT commented Mar 22, 2013

From @iabyn

On Thu, Mar 21, 2013 at 01​:15​:00PM +0100, Leon Timmermans wrote​:

On Mon, Feb 25, 2013 at 5​:06 PM, Ricardo SIGNES
<perlbug-followup@​perl.org> wrote​:

A number of things broken by the new COW work were answered with, "By the way,
you need to use THINKFIRST there." This, in turn, was followed by, "What?
That's undocumented and how was I supposed to know that?" The third line in
this exchange, if any, was "Welcome to the perl guts!"

It's true enough, but if it's going to be needed, let's get this documented so
we can point people at the documentation when the code breaks.

Father C already documented SV_THINKFIRST in a11eaec, though I think
he was too optimistic (it's not just needed on SvPVX, but on any SvPV*
write) and not marked as API. SV_CHECK_THINKFIRST and
SV_CHECK_THINKFIRST_COW_DROP still need to be documented though.

Also, Its not just adding API documentation. The xs*.pod and/or perlguts
docs need updating so people know they need to use these.

--
You never really learn to swear until you learn to drive.

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2013

From @rjbs

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not going to gain much by
documenting it now, encouraging its uptake, then changing how one must use it or its future
analogue.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2013

From @cpansprout

On Tue Mar 26 13​:55​:32 2013, rjbs wrote​:

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not
going to gain much by
documenting it now, encouraging its uptake, then changing how one must
use it or its future
analogue.

Also, if I’m not mistaken, sv_force_normal *is* already in the API, so
we don’t actually *need* to ‘expose’ SV_CHECK_THINKFIRST.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2013

From @Leont

On Tue, Mar 26, 2013 at 9​:55 PM, Ricardo SIGNES via RT
<perlbug-followup@​perl.org> wrote​:

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not going to gain much by
documenting it now, encouraging its uptake, then changing how one must use it or its future
analogue.

But what to do about the typemap?

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2013

From @bulk88

On Tue Mar 26 17​:13​:22 2013, sprout wrote​:

Also, if I’m not mistaken, sv_force_normal *is* already in the API, so
we don’t actually *need* to ‘expose’ SV_CHECK_THINKFIRST.

And why unconditionally call a function that isn't needed?

______________________________________________________________________
#if PERL_BCDVERSION >= 0x5007001
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv)))
sv_force_normal_flags((sv), SV_COW_DROP_PV)
#else
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv))) sv_force_normal((sv))
#endif
______________________________________________________________________
later on
______________________________________________________________________
  PREP_SV_SET(TARG);
  SvOK_off(TARG);
______________________________________________________________________

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2013

From @bulk88

On Tue Mar 26 20​:53​:12 2013, bulk88 wrote​:

On Tue Mar 26 17​:13​:22 2013, sprout wrote​:

Also, if I’m not mistaken, sv_force_normal *is* already in the API, so
we don’t actually *need* to ‘expose’ SV_CHECK_THINKFIRST.

And why unconditionally call a function that isn't needed?

______________________________________________________________________
#if PERL_BCDVERSION >= 0x5007001
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv)))
sv_force_normal_flags((sv), SV_COW_DROP_PV)
#else
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv))) sv_force_normal((sv))
#endif
______________________________________________________________________
later on
______________________________________________________________________
PREP_SV_SET(TARG);
SvOK_off(TARG);
______________________________________________________________________

  ^
Accidental post /|\
  |
And why unconditionally call a function that isn't needed?

This is what I figured out is how to make an existing SV undef-enough to
write directly into. SvUPGRADE will also be required but isn't listed here.

______________________________________________________________________
#if PERL_BCDVERSION >= 0x5007001
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv)))
sv_force_normal_flags((sv), SV_COW_DROP_PV)
#else
# define PREP_SV_SET(sv) if(SvTHINKFIRST((sv))) sv_force_normal((sv))
#endif
______________________________________________________________________
later on
______________________________________________________________________
PREP_SV_SET(sv);
SvOK_off(sv);
______________________________________________________________________

Some dont care about the contents of a COW SV and want to release it.
Others want to use the contents but also overwrite it. There are 2
different features. Also, SvOK_off may call sv_backoff(), (force_normal
wont) which is also confusion, since OOK is another form of trickery.

In a number of places SV_CHECK_THINKFIRST_COW_DROP is followed exactly
by, or often (not all branches), SvOK_off. SvOK_off,
SV_CHECK_THINKFIRST_COW_DROP, and SvUPGRADE are all flag test and maybe
func call macros. Maybe the 3 should be merged into 1 or 2 flag tests (2
meaning flags and sv type num), and 1 func with a flags bitfield (extend
force_normal to also do upgrades and mathom sv_upgrade(). All 3 macros
would call (made up name) sv_force_anywayuwant_flags under the hood with
a flags argument (1 char devoted to new requested type, 0x00 (SVt_NULL,
can't upgrade to that right?) or 0xff mean no upgrade happens, other 3
bytes control want to write to PV, want to replace, de-cow, cow drop,
de-offset, get/set magic calls during emptying of SV,
de-rv/glob/version, die on non-COW RO, etc (my list is too large to be
sane and some things are the same)).

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @iabyn

On Tue, Mar 26, 2013 at 05​:13​:22PM -0700, Father Chrysostomos via RT wrote​:

On Tue Mar 26 13​:55​:32 2013, rjbs wrote​:

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not
going to gain much by
documenting it now, encouraging its uptake, then changing how one must
use it or its future
analogue.

Also, if I’m not mistaken, sv_force_normal *is* already in the API, so
we don’t actually *need* to ‘expose’ SV_CHECK_THINKFIRST.

Also, since the thing that triggered this discussion originally was
"how to make it safe under COW to directly modify the SvPVX buffer", note
that SV_CHECK_THINKFIRST etc don't actually guarantee you a writeable
SvPVX; for that you need SvPV_force (which does the equivalent of
SV_CHECK_THINKFIRST, but then in addition converts a IV/NV/RV etc into an
equivalent string value).

Since SvPV_force is already API, and is already documented (since 2000 or
earlier) as "You want force if you are going to update the SvPVX
directly", I think that's enough to keep us going till 5.18.

See also my recent comments in ticket RT #116407 about improving the docs
for SvPV to make it clear you can't just trample on PVX, and need
SvPV_force instead.

--
Art is anything that has a label (especially if the label is "untitled 1")

@p5pRT
Copy link
Author

p5pRT commented Feb 5, 2014

From @tonycoz

On Tue Mar 26 13​:55​:32 2013, rjbs wrote​:

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not
going to gain much by
documenting it now, encouraging its uptake, then changing how one must
use it or its future
analogue.

Beating this dead horse which is a blocker for 5.20​:

What might need to be done​:

- neither perlxs nor perlxstut discuss modifying (or even
  reading) the PV part of a scalar directly, but since they're
  about *XS* I'm not sure they should

- L<perlguts/Working with SVs> should probably discuss how to
  work with the SV's buffer, including SvPV_force(),
  SvPVbyte_force(), SvPVutf8_force(), SvGROW(). I'm not sure if
  this belongs after the SvGROW discussion, or after the
  sv_cat*() discussion.

- do we need a typemap entry? I can think of two cases where
  you're writing back to the SV​:

  a) the caller provides a fixed length buffer, in which case
  you need your own typemap entry (or inline code) to
  verify/force the buffer size

  b) the caller provides a variable length buffer

  For case b) in most cases I think you'd also need the
  length (remember, it's writable, no-one does gets() type
  interfaces anymore, right?), which requires inline code.

  So the only case that doesn't need the length (and hence
  inline code or a custom typemap) is a gets() style interface,
  for which I'm inclined to let people shoot their own feet off.

  The only real way I could see doing it would be to provide our
  own custom typedefs​:

  typedef unsigned char WRITABLE_BYTE;
  typedef unsigned char WRITABLE_UTF8;
  typedef unsigned char WRITABLE_DONTCARE; /* for the wild ones */

  and create typemaps for WRITABLE_BYTE etc​:

  WRITEABLE_BYTE * T_WRITEABLE_BYTE
  WRITEABLE_UTF8 * T_WRITEABLE_UTF8
  WRITEABLE_DONTCARE * T_WRITEABLE_DONTCARE
  ...
  T_WRITEABLE_BYTE
  $var = SvPVbyte_force_nolen($arg); // no such macro

  T_WRITEABLE_UTF8
  $var = SvPVutf8_force_nolen($arg); // no such macro

  T_WRITEABLE_BYTE
  $var = SvPV_force_nolen($arg);

What doesn't need to be done​:

- SvPV() documents that you should use SvPV_force() if you want
  to write to the PV string. Possibly this could be made
  clearer (making the mention of SvPV_force() a new paragraph
  might do it)

- SvPV_force() documents that the PV string is writable after
  calling it.

- under Copy on Write, perlguts documents that you need
  sv_force_normal() or SvPV_force_nolen() to make the SV
  non-COW.

Hopefully that makes some sense.

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 5, 2014

From @bulk88

On Tue Feb 04 20​:24​:36 2014, tonyc wrote​:

On Tue Mar 26 13​:55​:32 2013, rjbs wrote​:

I have moved this from 5.18 to 5.20 blockers.

It sounds like this needs more than mere documentation, and we're not
going to gain much by
documenting it now, encouraging its uptake, then changing how one must
use it or its future
analogue.

Beating this dead horse which is a blocker for 5.20​:

What might need to be done​:

- neither perlxs nor perlxstut discuss modifying (or even
reading) the PV part of a scalar directly, but since they're
about *XS* I'm not sure they should

- L<perlguts/Working with SVs> should probably discuss how to
work with the SV's buffer, including SvPV_force(),
SvPVbyte_force(), SvPVutf8_force(), SvGROW(). I'm not sure if
this belongs after the SvGROW discussion, or after the
sv_cat*() discussion.

- do we need a typemap entry? I can think of two cases where
you're writing back to the SV​:

a) the caller provides a fixed length buffer, in which case
you need your own typemap entry (or inline code) to
verify/force the buffer size

b) the caller provides a variable length buffer

For case b) in most cases I think you'd also need the
length (remember, it's writable, no-one does gets() type
interfaces anymore, right?), which requires inline code.

So the only case that doesn't need the length (and hence
inline code or a custom typemap) is a gets() style interface,
for which I'm inclined to let people shoot their own feet off.

The only real way I could see doing it would be to provide our
own custom typedefs​:

 typedef unsigned char WRITABLE\_BYTE;
 typedef unsigned char WRITABLE\_UTF8;
 typedef unsigned char WRITABLE\_DONTCARE; /\* for the wild ones \*/

and create typemaps for WRITABLE_BYTE etc​:

 WRITEABLE\_BYTE \*           T\_WRITEABLE\_BYTE
 WRITEABLE\_UTF8 \*           T\_WRITEABLE\_UTF8
 WRITEABLE\_DONTCARE \*       T\_WRITEABLE\_DONTCARE
 \.\.\.
 T\_WRITEABLE\_BYTE
    $var = SvPVbyte\_force\_nolen\($arg\); // no such macro

 T\_WRITEABLE\_UTF8
    $var = SvPVutf8\_force\_nolen\($arg\); // no such macro

 T\_WRITEABLE\_BYTE
    $var = SvPV\_force\_nolen\($arg\);

What doesn't need to be done​:

- SvPV() documents that you should use SvPV_force() if you want
to write to the PV string. Possibly this could be made
clearer (making the mention of SvPV_force() a new paragraph
might do it)

- SvPV_force() documents that the PV string is writable after
calling it.

- under Copy on Write, perlguts documents that you need
sv_force_normal() or SvPV_force_nolen() to make the SV
non-COW.

Hopefully that makes some sense.

Tony

The problem is SvPV_force_nolen and SvPVbyte_force_nolen do something totally useless if you just want to write into them. If I am going to just write, why do I want to A. copy the COW contents to a new non-COW block, B. convert encodings, then write over it? I want a uninitialized PV. sv_force_normal() copies the COW data over. Some people want R/W buffers to manipulate in place the data too.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Feb 7, 2014

From perl5-porters@perl.org

Tony Cook wrote​:

Beating this dead horse which is a blocker for 5.20​:

What might need to be done​:

The Copy on Write section added to perlguts was meant to address this.
I think it is sufficient that this ticket no longer needs to be a
blocker, but I do not mean to imply that your suggestions should not
be followed,

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2014

From @tonycoz

On Wed Feb 05 01​:57​:24 2014, bulk88 wrote​:

The problem is SvPV_force_nolen and SvPVbyte_force_nolen do something
totally useless if you just want to write into them. If I am going to
just write, why do I want to A. copy the COW contents to a new non-COW
block, B. convert encodings, then write over it? I want a
uninitialized PV. sv_force_normal() copies the COW data over. Some
people want R/W buffers to manipulate in place the data too.

I think​:

  sv_force_normal_flags(sv, SV_COW_DROP_PV);
  SvUPGRADE(sv, SVt_PV);
  // SvGROW etc

is the answer there, though the name of the flag leaves something to desire,
since it doesn't just drop the PV for COW, but also for REGEXP and GLOB
arguments.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2014

From @tonycoz

On Tue Feb 04 20​:24​:36 2014, tonyc wrote​:

- L<perlguts/Working with SVs> should probably discuss how to
work with the SV's buffer, including SvPV_force(),
SvPVbyte_force(), SvPVutf8_force(), SvGROW(). I'm not sure if
this belongs after the SvGROW discussion, or after the
sv_cat*() discussion.

Attached a potential patch.

Tony

@p5pRT
Copy link
Author

p5pRT commented Mar 26, 2014

From @tonycoz

0001-perl-116925-discuss-modifying-an-SV-s-buffer.patch
From 25598ee4bac31536cab2363f4bc3c70c716f1d8e Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 26 Mar 2014 16:18:08 +1100
Subject: [perl #116925] discuss modifying an SV's buffer

in the same place we otherwise talk about SVs
---
 pod/perlguts.pod |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 3fb5137..db8548a 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -159,6 +159,38 @@ decrease, the allocated memory of an SV and that it does not automatically
 add space for the trailing NUL byte (perl's own string functions typically do
 C<SvGROW(sv, len + 1)>).
 
+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV.  This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference.  This can be
+used to implement something like read():
+
+    (void)SvPVbyte_force(sv, len);
+    s = SvGROW(sv, offset + needlen + 1);
+    /* something that modifies up to needlen bytes at s, but modifies newlen
+         eg. newlen = read(fd, s + offset. needlen);
+       ignoring errors for these examples
+     */
+    s[offset + newlen] = '\0';
+    SvCUR_set(sv, offset + newlen);
+    SvUTF8_off(sv);
+    SvSETMAGIC(sv);
+
+If you don't need the existing content of the SV, you can avoid some
+copying with:
+
+    if (SvTHINKFIRST(sv))
+        sv_force_normal_flags(sv, SV_COW_DROP_PV);
+    SvUPGRADE(sv, SVt_PV);
+    s = SvGROW(sv, needlen + 1);
+    /* something that modifies up to needlen bytes at s, but modifies newlen
+         eg. newlen = read(fd, s. needlen);
+     */
+    s[newlen] = '\0';
+    SvCUR_set(sv, newlen);
+    SvPOK_only(sv); /* also clears SVf_UTF8 */
+    SvSETMAGIC(sv);
+
 If you have an SV and want to know what kind of data Perl thinks is stored
 in it, you can use the following macros to check the type of SV you have.
 
-- 
1.7.10.4

@p5pRT
Copy link
Author

p5pRT commented Mar 31, 2014

From @rjbs

I'd like to see this reviewed and (if approved) applied, but I've removed this ticket from blocking.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2014

From @bulk88

On Mon Feb 25 10​:38​:49 2013, bulk88 wrote​:

As a separate issue, "Memory Allocation" in perlguts should mention
sv_usepvn_flags exists, and the pod for sv_usepvn_flags needs to say it
only takes "Newx" memory never any kind of other memory like "malloc"
memory.

This pod issue is still a problem in blead. I was 1/4 done writing a new ticket for it before I found this old post of mine.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2014

From @bulk88

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:

On Tue Feb 04 20​:24​:36 2014, tonyc wrote​:

- L<perlguts/Working with SVs> should probably discuss how to
work with the SV's buffer, including SvPV_force(),
SvPVbyte_force(), SvPVutf8_force(), SvGROW(). I'm not sure if
this belongs after the SvGROW discussion, or after the
sv_cat*() discussion.

Attached a potential patch.

Tony

+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

WHy the "(void)SvPVbyte_force(sv, len);" line? Won't SvGROW drop the COW and RVs?

+If you don't need the existing content of the SV, you can avoid some
+copying with​:
+
+ if (SvTHINKFIRST(sv))
+ sv_force_normal_flags(sv, SV_COW_DROP_PV);
+ SvUPGRADE(sv, SVt_PV);
+ s = SvGROW(sv, needlen + 1);

Maybe the API needs to be reworks. WOuldn't sv_force_normal_flags create a SVPV without a buffer already? So why the SvUPGRADE? I Kno the SvUPGRADE is protection so the SvGROW doesn't segv because its an SVIV with no body, only a head. Instread of SvUPGRADE(sv, SVt_PV); SvGROW(sv, needlen + 1); try this


if(SvTYPE(sv) >= SVt_PV) {
  if( SvIsCOW(sv) ) {
  needgrow​:
  buffer = sv_grow(sv,len);
  } else if (SvLEN(sv) < (len) {
  goto needgrow;
  } else {
  buffer = SvPVX(sv);
  }
} else {
  goto needgrow;
}


+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:

Is this for read buffer, then you can then write to it? Where the buffer will be read, and conditionally written to in the same C library call? If so it needs to explain better that this produces the Perl level contents for you to read and write to. And this does not supply uninit memory, but only initiated memory. Use "If you want to read and write to an string" . also mention " If you are ONLY WRITTING, see the next paragraph for how to do it faster."

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2014

From @tonycoz

On Wed Apr 16 02​:47​:11 2014, bulk88 wrote​:

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:
+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

WHy the "(void)SvPVbyte_force(sv, len);" line? Won't SvGROW drop the
COW and RVs?

SvGROW() cleans those up, but not some other forms of specialness.
sv_force_normal_flags() called by SvPV_force() variants will also check
SvREADONLY().

+If you don't need the existing content of the SV, you can avoid some
+copying with​:
+
+ if (SvTHINKFIRST(sv))
+ sv_force_normal_flags(sv, SV_COW_DROP_PV);
+ SvUPGRADE(sv, SVt_PV);
+ s = SvGROW(sv, needlen + 1);

Maybe the API needs to be reworks. WOuldn't sv_force_normal_flags
create a SVPV without a buffer already? So why the SvUPGRADE? I Kno
the SvUPGRADE is protection so the SvGROW doesn't segv because its an
SVIV with no body, only a head. Instread of SvUPGRADE(sv, SVt_PV);
SvGROW(sv, needlen + 1); try this

----------------------
if(SvTYPE(sv) >= SVt_PV) {
if( SvIsCOW(sv) ) {
needgrow​:
buffer = sv_grow(sv,len);
} else if (SvLEN(sv) < (len) {
goto needgrow;
} else {
buffer = SvPVX(sv);
}
} else {
goto needgrow;
}
-----------------------

sv_force_normal_flags() only removes specialness, for unspecial SVs (like an undef SV) it does nothing.

So replacing those 3 lines with your if statement would be incorrect.

This code was based on Perl_sv_setpvn().

+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:

Is this for read buffer, then you can then write to it? Where the
buffer will be read, and conditionally written to in the same C
library call? If so it needs to explain better that this produces the
Perl level contents for you to read and write to. And this does not
supply uninit memory, but only initiated memory. Use "If you want to
read and write to an string" . also mention " If you are ONLY
WRITTING, see the next paragraph for how to do it faster."

You're right, it needs to explain that the content is readable. As to only writing, I think the proximity with the other paragraph does that.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 16, 2014

From @bulk88

On Wed Apr 16 15​:17​:05 2014, tonyc wrote​:

On Wed Apr 16 02​:47​:11 2014, bulk88 wrote​:

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:
+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to
be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

WHy the "(void)SvPVbyte_force(sv, len);" line? Won't SvGROW drop the
COW and RVs?

SvGROW() cleans those up, but not some other forms of specialness.
sv_force_normal_flags() called by SvPV_force() variants will also
check
SvREADONLY().

Ok, yor're right about RO flaged.

Another issue, why "SvPVbyte_force" and not "SvPV_force"?

sv_force_normal_flags() only removes specialness, for unspecial SVs
(like an undef SV) it does nothing.

So replacing those 3 lines with your if statement would be incorrect.

There was a misunderstanding or I wasn't clear. I'll rephrase it.

Replace

  if (SvTHINKFIRST(sv))
  sv_force_normal_flags(sv, SV_COW_DROP_PV);
  SvUPGRADE(sv, SVt_PV);
  s = SvGROW(sv, needlen + 1);

with

  if (SvTHINKFIRST(sv))
  sv_force_normal_flags(sv, SV_COW_DROP_PV);
  if(SvTYPE(sv) >= SVt_PV) {
  if( SvIsCOW(sv) ) {
  needgrow​:
  buffer = sv_grow(sv,len);
  } else if (SvLEN(sv) < (len) {
  goto needgrow;
  } else {
  buffer = SvPVX(sv);
  }
  } else {
  goto needgrow;
  }

1 remaining issue, that I dont think POD/docs can fix is, something like

+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

can perform redundant ascii/utf8 conversion and stringifys on a buffer that will be tossed. Here is an unnecessary stringify example with SvPV_force.


void t1(SV* sv)
PPCODE​:
  STRLEN len;
  SvPV_force(sv, len);


C​:\sources\>perl -MDevel​::Peek=Dump -ML​::XS -E"my $i; my $a = \$i; L​::XS​::t1($a); Dump($a);
SV = PVIV(0x8ff9ec) at 0x8fc40c
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  IV = 0
  PV = 0x8fe86c "SCALAR(0x8fc45c)"\0
  CUR = 16
  LEN = 20


  perl519.dll!Perl_sv_2pv_flags(interpreter * my_perl=0x00364e9c, sv * const sv=0x00000006, unsigned int * const lp=0x0012fc68, const long flags=0) Line 2885 C
  perl519.dll!Perl_sv_pvn_force_flags(interpreter * my_perl=0x00364e9c, sv * const sv=0x008fc404, unsigned int * const lp=0x0012fc80, const long flags=2) Line 9498 + 0xf C
  XS.dll!XS_L__XS_t1(interpreter * my_perl=0x00364e9c, cv * cv=0x0093ff14) Line 2957 + 0x1d C
  perl519.dll!Perl_pp_entersub(interpreter * my_perl=0x00000000) Line 2794 C
  perl519.dll!Perl_runops_standard(interpreter * my_perl=0x00364e9c) Line 42 + 0x4 C
  perl519.dll!S_run_body(interpreter * my_perl=0x00000000, long oldscope=1) Line 2449 + 0xa C
  perl519.dll!perl_run(interpreter * my_perl=0x00364e9c) Line 2365 + 0x8 C
  perl519.dll!RunPerl(int argc=3, char * * argv=0x01364c68, char * * env=0x00363328) Line 259 + 0x6 C
  perl.exe!mainCRTStartup() Line 398 + 0xe C
  kernel32.dll!_BaseProcessStart@​4() + 0x23


  Newx(buffer, len, char);
  buffer_end = retval = buffer + len;

  /* Working backwards */
  *--retval = '\0';
  *--retval = ')';<<<<<<<<<<<<<<<<<<<<<<<<LINE 2885 in sv.c
  do {
  *--retval = PL_hexdigit[addr & 15];
  } while (addr >>= 4);
  *--retval = 'x';
  *--retval = '0';
  *--retval = '(';

  retval -= typelen;


This stringifying by SvPV_force matches its docs.


/*
=for apidoc Am|char*|SvPV_force|SV* sv|STRLEN len
Like C<SvPV> but will force the SV into containing a string (C<SvPOK>), and
only a string (C<SvPOK_only>), by hook or by crook. You need force if you are
going to update the C<SvPVX> directly. Processes get magic.

Note that coercing an arbitrary scalar into a plain PV will potentially
strip useful data from it. For example if the SV was C<SvROK>, then the
referent will have its reference count decremented, and the SV itself may
be converted to an C<SvPOK> scalar with a string buffer containing a value
such as C<"ARRAY(0x1234)">.


Then again this proposed wording \|/

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:
+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to
be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

is about having valid read data. I think having

  s = SvGROW(sv, offset + needlen + 1);

after

  (void)SvPVbyte_force(sv, len);

is strange and should be removed for the "we want read and write data" instructions. Wouldnt SvGROW have added uninit memory to the end of the readable valid data, and depending if the user/dev is bright enough realize what he did (mixing up expression "len" with "offset + needlen + 1"), tell some C library to parse/translate uninit memory at the end of the post-COW buffer?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented Apr 17, 2014

From @tonycoz

On Wed, Apr 16, 2014 at 04​:13​:24PM -0700, bulk88 via RT wrote​:

On Wed Apr 16 15​:17​:05 2014, tonyc wrote​:

On Wed Apr 16 02​:47​:11 2014, bulk88 wrote​:

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:
+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to
be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

WHy the "(void)SvPVbyte_force(sv, len);" line? Won't SvGROW drop the
COW and RVs?

SvGROW() cleans those up, but not some other forms of specialness.
sv_force_normal_flags() called by SvPV_force() variants will also
check
SvREADONLY().

Ok, yor're right about RO flaged.

Another issue, why "SvPVbyte_force" and not "SvPV_force"?

That, and the other UTF8 mentions are there to get the reader to think
about about unicode handling.

Using a simple SvPV_force() with my commented out example case would
simply have been incorrect, since read(1) returns octets, with little
reason to expect that it's utf-8 encoded. (Which is why using :utf8
is bad.)

sv_force_normal_flags() only removes specialness, for unspecial SVs
(like an undef SV) it does nothing.

So replacing those 3 lines with your if statement would be incorrect.

There was a misunderstanding or I wasn't clear. I'll rephrase it.

I looked at your quote of my code instead of your prose, my fault.

Replace

if \(SvTHINKFIRST\(sv\)\)
    sv\_force\_normal\_flags\(sv\, SV\_COW\_DROP\_PV\);
SvUPGRADE\(sv\, SVt\_PV\);
s = SvGROW\(sv\, needlen \+ 1\);

with

if \(SvTHINKFIRST\(sv\)\)
    sv\_force\_normal\_flags\(sv\, SV\_COW\_DROP\_PV\);
if\(SvTYPE\(sv\) >= SVt\_PV\) \{
    if\( SvIsCOW\(sv\) \) \{

It can't be COW at this point, sv_force_normal_flags() removes it.

        needgrow&#8203;:
        buffer = sv\_grow\(sv\,len\);
    \} else if \(SvLEN\(sv\) \< \(len\) \{
        goto needgrow;
    \} else \{
        buffer = SvPVX\(sv\);
    \}
\} else \{
    goto needgrow;
\}

I could see something like that as a new macro or function. I don't
think I'd want to ask XS authors to inline that code manually though.

1 remaining issue, that I dont think POD/docs can fix is, something like

+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

can perform redundant ascii/utf8 conversion and stringifys on a buffer that will be tossed. Here is an unnecessary stringify example with SvPV_force.

If the caller supplies a reference or undef or something else that
isn't a string the author could add code to reject them.

If they don't want any part of the the content, they can skip the
SvPV_force() and use the alternate mechanism.

The SvGROW() may end up copying that generated content, but I don't
think I can avoid it without making the sample even more complex.

Then again this proposed wording \|/

On Tue Mar 25 22​:19​:43 2014, tonyc wrote​:
+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to
be
+a PV. This will remove any of various types of non-stringness from
+the SV, including copy-on-write and being a reference. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ s = SvGROW(sv, offset + needlen + 1);

is about having valid read data. I think having

s = SvGROW\(sv\, offset \+ needlen \+ 1\);

after

\(void\)SvPVbyte\_force\(sv\, len\);

is strange and should be removed for the "we want read and write data" instructions. Wouldnt SvGROW have added uninit memory to the end of the readable valid data, and depending if the user/dev is bright enough realize what he did (mixing up expression "len" with "offset + needlen + 1"), tell some C library to parse/translate uninit memory at the end of the post-COW buffer?

That's a good point.

I can see two cases​:

1) offset <= SvCUR() - if the wrapped function does read needlen
bytes, some of the data is going to be uninitialized.

2) offset > SvCUR() - the wrapped function is going to receive
uninitialized content, but the caller is also going to receiver
uninitialized data.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2014

From @tonycoz

On Wed Apr 16 18​:51​:10 2014, tonyc wrote​:

1) offset <= SvCUR() - if the wrapped function does read needlen
bytes, some of the data is going to be uninitialized.

2) offset > SvCUR() - the wrapped function is going to receive
uninitialized content, but the caller is also going to receiver
uninitialized data.

I went through some drafts trying to address this, but I think examples like
this need to stay simple rather than trying to cover every possible use of
the APIs.

So all I've added is a check that offset <= len so we don't return undefined content back to the caller.

I don't think it's necessary to document here that the extra memory allocated by SvGROW() is undefined, since we don't say what it is initialized it, the caller shouldn't be making an assumptions about it.

Tony

@p5pRT
Copy link
Author

p5pRT commented Apr 21, 2014

From @tonycoz

0001-perl-116925-discuss-modifying-an-SV-s-buffer.patch
From ad5fd95262bcbe0faa8f380328c461fb88be4941 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 21 Apr 2014 11:54:08 +1000
Subject: [perl #116925] discuss modifying an SV's buffer

in the same place we otherwise talk about SVs
---
 pod/perlguts.pod |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 3fb5137..165e44e 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -159,6 +159,42 @@ decrease, the allocated memory of an SV and that it does not automatically
 add space for the trailing NUL byte (perl's own string functions typically do
 C<SvGROW(sv, len + 1)>).
 
+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV.  This will remove any of various types of non-stringness from
+the SV while preserving the content of the SV in the PV.  This can be
+used to implement something like read():
+
+    (void)SvPVbyte_force(sv, len);
+    if (offset > len)
+      croak("Supplied offset beyond the end of the string");
+    s = SvGROW(sv, offset + needlen + 1);
+    /* something that modifies up to needlen bytes at s+offset, but
+       modifies newlen bytes
+         eg. newlen = read(fd, s + offset. needlen);
+       ignoring errors for these examples
+     */
+    s[offset + newlen] = '\0';
+    SvCUR_set(sv, offset + newlen);
+    SvUTF8_off(sv);
+    SvSETMAGIC(sv);
+
+If you don't need the existing content of the SV, you can avoid some
+copying with:
+
+    if (SvTHINKFIRST(sv))
+        sv_force_normal_flags(sv, SV_COW_DROP_PV);
+    SvUPGRADE(sv, SVt_PV);
+    s = SvGROW(sv, needlen + 1);
+    /* something that modifies up to needlen bytes at s, but modifies
+       newlen bytes
+         eg. newlen = read(fd, s. needlen);
+     */
+    s[newlen] = '\0';
+    SvCUR_set(sv, newlen);
+    SvPOK_only(sv); /* also clears SVf_UTF8 */
+    SvSETMAGIC(sv);
+
 If you have an SV and want to know what kind of data Perl thinks is stored
 in it, you can use the following macros to check the type of SV you have.
 
-- 
1.7.10.4

@p5pRT
Copy link
Author

p5pRT commented May 6, 2014

From @iabyn

On Sun, Apr 20, 2014 at 07​:04​:29PM -0700, Tony Cook via RT wrote​:

I went through some drafts trying to address this, but I think examples like
this need to stay simple rather than trying to cover every possible use of
the APIs.

So all I've added is a check that offset <= len so we don't return undefined content back to the caller.

I don't think it's necessary to document here that the extra memory allocated by SvGROW() is undefined, since we don't say what it is initialized it, the caller shouldn't be making an assumptions about it.

pod/perlguts.pod | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 3fb5137..165e44e 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@​@​ -159,6 +159,42 @​@​ decrease, the allocated memory of an SV and that it does not automatically
add space for the trailing NUL byte (perl's own string functions typically do
C<SvGROW(sv, len + 1)>).

+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV. This will remove any of various types of non-stringness from
+the SV while preserving the content of the SV in the PV. This can be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ if (offset > len)
+ croak("Supplied offset beyond the end of the string");
+ s = SvGROW(sv, offset + needlen + 1);
+ /* something that modifies up to needlen bytes at s+offset, but
+ modifies newlen bytes
+ eg. newlen = read(fd, s + offset. needlen);
+ ignoring errors for these examples
+ */
+ s[offset + newlen] = '\0';
+ SvCUR_set(sv, offset + newlen);
+ SvUTF8_off(sv);
+ SvSETMAGIC(sv);
+
+If you don't need the existing content of the SV, you can avoid some
+copying with​:
+
+ if (SvTHINKFIRST(sv))
+ sv_force_normal_flags(sv, SV_COW_DROP_PV);
+ SvUPGRADE(sv, SVt_PV);
+ s = SvGROW(sv, needlen + 1);
+ /* something that modifies up to needlen bytes at s, but modifies
+ newlen bytes
+ eg. newlen = read(fd, s. needlen);
+ */
+ s[newlen] = '\0';
+ SvCUR_set(sv, newlen);
+ SvPOK_only(sv); /* also clears SVf_UTF8 */
+ SvSETMAGIC(sv);
+
If you have an SV and want to know what kind of data Perl thinks is stored
in it, you can use the following macros to check the type of SV you have.

I've been reading and re-reading this thread and the two related ones
(#116569, #116407) and keep ending up confused and undecided.

My gut reaction to your proposed doc patch is uneasiness. I don't like the
read() example at at all; it's not immediately obvious what it does, and
what all the values like offset and needlen represent.

My current understanding is that there are four types of string buffer
activity​:

* get-only

  This is already covered in perlguts​: basically it's SvPV() and
  variants.

  Perl-level equivalent​: stringification​: e.g. "$var".

* get-and-set

  This is where you modify an existing value by directly modifying the
  SPVX buffer. Use SvPV_force or variant. If the new string needs to be
  larger than the old one, then use SvGROW and SvCUR_set.

  Perl-level equivalent​: e.g. substr($var, 1, 3, "abcd");

  Perhaps a simple string prepend would make a clearer example?

  I think here we should should also point out that the various sv_cat*()
  functions are more convenient for a simple append than messing about
  with SvGROW.

* set-only

  This is where we don't care about the old value; we just want to set
  the var to a new string value.

  Perl-level equivalent​: e.g. $var = "new string";

  This is where it seems to me, that most of the difficultly lies.
  This is where there's lots of discussion, with a possible need to
  expose previously un- or under-documented and/or non-API functions,
  such as SvTHINKFIRST() and sv_force_normal_flags()

  This is a special case of get-and-set, only we're trying to be more
  efficient and avoiding unnecessarily copy string buffers, or
  converting an NV into a PV, only to discard the result.

  My feeling here is that most of the time people should just use
  sv_setpv() and variants if you already know the string value you wish
  to assign; for example,

  len = read(fh, buf, 1024);
  sv_setpvn(sv, buf, len);

  sv_setpvn() will handle all the de-COWing etc, in an efficient manner.

  In those cases where you just want to allocate a particular sized
  buffer, but then modify it directly,

  sv_setpvn(sv,"",0);
  SvGROW(sv, len);
  ...

  does all the sv_force_normal_flags(), SvUPGRADE() etc stuff for you at
  the expense of temporarily mallocing a 1-byte (+ rounding up) buffer.

* replace buffer

  This is a particular use-case that Leon T mentioned, where you
  want to do the equivalent of

  free(SvPVX(sv));
  SvPVX(sv) = newptr;

  but safely.

  I could be wrong, but doesn't sv_usepvn_flags () and similar do this?

--
The Enterprise is involved in a bizarre time-warp experience which is in
some way unconnected with the Late 20th Century.
  -- Things That Never Happen in "Star Trek" #14

@p5pRT
Copy link
Author

p5pRT commented May 6, 2014

From @Leont

On Tue, May 6, 2014 at 4​:00 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

My gut reaction to your proposed doc patch is uneasiness. I don't like the
read() example at at all; it's not immediately obvious what it does, and
what all the values like offset and needlen represent.

I agree it's not optimal, but that may be inherent to the issue.

* set-only

This is where we don't care about the old value; we just want to set
the var to a new string value\.

Perl\-level equivalent&#8203;: e\.g\. $var = "new string";

This is where it seems to me\, that most of the difficultly lies\.
This is where there's lots of discussion\, with a possible need to
expose previously un\- or under\-documented and/or non\-API functions\,
such as SvTHINKFIRST\(\) and sv\_force\_normal\_flags\(\)

This is a special case of get\-and\-set\, only we're trying to be more
efficient and avoiding unnecessarily copy string buffers\, or
converting an NV into a PV\, only to discard the result\.

My feeling here is that most of the time people should just use
sv\_setpv\(\) and variants if you already know the string value you wish
to assign; for example\,

    len = read\(fh\, buf\, 1024\);
    sv\_setpvn\(sv\, buf\, len\);

sv\_setpvn\(\) will handle all the de\-COWing etc\, in an efficient manner\.

In those cases where you just want to allocate a particular sized
buffer\, but then modify it directly\,

    sv\_setpvn\(sv\,""\,0\);
    SvGROW\(sv\, len\);
    \.\.\.

does all the sv\_force\_normal\_flags\(\)\, SvUPGRADE\(\) etc stuff for you at
the expense of temporarily mallocing a 1\-byte \(\+ rounding up\) buffer\.

I would welcome new API functions that would make this easier.

* replace buffer

This is a particular use\-case that Leon T mentioned\, where you
want to do the equivalent of

    free\(SvPVX\(sv\)\);
    SvPVX\(sv\) = newptr;

but safely\.

I could be wrong\, but doesn't sv\_usepvn\_flags \(\) and similar do this?

sv_usepvn_flags assumed the buffer is malloc()ed, while in my use-case it's
mmap()ed. We could add an "un-owned" flag to sv_usepvn_flags for this sort
of situation though.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From @bulk88

On Tue May 06 07​:01​:16 2014, davem wrote​:

My feeling here is that most of the time people should just use
sv_setpv() and variants if you already know the string value you wish
to assign; for example,

len = read(fh, buf, 1024);
sv_setpvn(sv, buf, len);

sv_setpvn() will handle all the de-COWing etc, in an efficient manner.

That usually results in a redundant double copy, unless the * came from a foreign allocator. Direct writing into the PV is always better.

In those cases where you just want to allocate a particular sized
buffer, but then modify it directly,

sv_setpvn(sv,"",0);
SvGROW(sv, len);
...

does all the sv_force_normal_flags(), SvUPGRADE() etc stuff for you at
the expense of temporarily mallocing a 1-byte (+ rounding up) buffer.

true

* replace buffer

This is a particular use-case that Leon T mentioned, where you
want to do the equivalent of

free(SvPVX(sv));
SvPVX(sv) = newptr;

but safely.

I could be wrong, but doesn't sv_usepvn_flags () and similar do this?

Yes, but only for Newx blocks and only if SV_HAS_TRAILING_NUL is used. Never pass a non-Newx (AKA malloc) ptr to sv_usepvn_flags.

On Tue May 06 15​:42​:16 2014, LeonT wrote​:

sv_usepvn_flags assumed the buffer is malloc()ed, while in my use-case
it's
mmap()ed. We could add an "un-owned" flag to sv_usepvn_flags for this
sort
of situation though.

See paragraph above.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 27, 2014

From @tonycoz

On Tue May 06 07​:01​:16 2014, davem wrote​:

On Sun, Apr 20, 2014 at 07​:04​:29PM -0700, Tony Cook via RT wrote​:

I went through some drafts trying to address this, but I think
examples like
this need to stay simple rather than trying to cover every possible
use of
the APIs.

So all I've added is a check that offset <= len so we don't return
undefined content back to the caller.

I don't think it's necessary to document here that the extra memory
allocated by SvGROW() is undefined, since we don't say what it is
initialized it, the caller shouldn't be making an assumptions about
it.

pod/perlguts.pod | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 3fb5137..165e44e 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@​@​ -159,6 +159,42 @​@​ decrease, the allocated memory of an SV and that
it does not automatically
add space for the trailing NUL byte (perl's own string functions
typically do
C<SvGROW(sv, len + 1)>).

+If you want to write to an existing SV's buffer and set its value to
a
+string, use SvPV_force() or one of its variants to force the SV to
be
+a PV. This will remove any of various types of non-stringness from
+the SV while preserving the content of the SV in the PV. This can
be
+used to implement something like read()​:
+
+ (void)SvPVbyte_force(sv, len);
+ if (offset > len)
+ croak("Supplied offset beyond the end of the string");
+ s = SvGROW(sv, offset + needlen + 1);
+ /* something that modifies up to needlen bytes at s+offset, but
+ modifies newlen bytes
+ eg. newlen = read(fd, s + offset. needlen);
+ ignoring errors for these examples
+ */
+ s[offset + newlen] = '\0';
+ SvCUR_set(sv, offset + newlen);
+ SvUTF8_off(sv);
+ SvSETMAGIC(sv);
+
+If you don't need the existing content of the SV, you can avoid some
+copying with​:
+
+ if (SvTHINKFIRST(sv))
+ sv_force_normal_flags(sv, SV_COW_DROP_PV);
+ SvUPGRADE(sv, SVt_PV);
+ s = SvGROW(sv, needlen + 1);
+ /* something that modifies up to needlen bytes at s, but
modifies
+ newlen bytes
+ eg. newlen = read(fd, s. needlen);
+ */
+ s[newlen] = '\0';
+ SvCUR_set(sv, newlen);
+ SvPOK_only(sv); /* also clears SVf_UTF8 */
+ SvSETMAGIC(sv);
+
If you have an SV and want to know what kind of data Perl thinks is
stored
in it, you can use the following macros to check the type of SV you
have.

I've been reading and re-reading this thread and the two related ones
(#116569, #116407) and keep ending up confused and undecided.

My gut reaction to your proposed doc patch is uneasiness. I don't like
the
read() example at at all; it's not immediately obvious what it does,
and
what all the values like offset and needlen represent.

I'm not fond of using read() in the first example, though it's based on a simple implementation of perl's read() function.

I'm using read() as the function since I believe (hope!) that most of the people digging this deep into guts to make their code efficient understand the behaviour of the POSIX read() function.

Perhaps I should be doing a simple appended read() instead, since it avoid the extra parameter and some of the checks.

My current understanding is that there are four types of string buffer
activity​:

...

* get-and-set

This is where you modify an existing value by directly modifying the
SPVX buffer. Use SvPV_force or variant. If the new string needs to be
larger than the old one, then use SvGROW and SvCUR_set.

Perl-level equivalent​: e.g. substr($var, 1, 3, "abcd");

Perhaps a simple string prepend would make a clearer example?

I think here we should should also point out that the various
sv_cat*()
functions are more convenient for a simple append than messing about
with SvGROW.

If you're doing a string prepend, you should probably using sv_insert(), which I should probably mention.

I've simplified the get-and-set to an append, and added notes about sv_cat*() and sv_insert().

* set-only

This is where we don't care about the old value; we just want to set
the var to a new string value.

Perl-level equivalent​: e.g. $var = "new string";

This is where it seems to me, that most of the difficultly lies.
This is where there's lots of discussion, with a possible need to
expose previously un- or under-documented and/or non-API functions,
such as SvTHINKFIRST() and sv_force_normal_flags()

This is a special case of get-and-set, only we're trying to be more
efficient and avoiding unnecessarily copy string buffers, or
converting an NV into a PV, only to discard the result.

My feeling here is that most of the time people should just use
sv_setpv() and variants if you already know the string value you wish
to assign; for example,

len = read(fh, buf, 1024);
sv_setpvn(sv, buf, len);

sv_setpvn() will handle all the de-COWing etc, in an efficient manner.

I've added a reference to sv_setpvn().

In those cases where you just want to allocate a particular sized
buffer, but then modify it directly,

sv_setpvn(sv,"",0);
SvGROW(sv, len);
...

does all the sv_force_normal_flags(), SvUPGRADE() etc stuff for you at
the expense of temporarily mallocing a 1-byte (+ rounding up) buffer.

I've changed the example to use that.

* replace buffer

This is a particular use-case that Leon T mentioned, where you
want to do the equivalent of

free(SvPVX(sv));
SvPVX(sv) = newptr;

but safely.

I could be wrong, but doesn't sv_usepvn_flags () and similar do this?

I've added a brief discussion of that.

Tony

@p5pRT
Copy link
Author

p5pRT commented May 27, 2014

From @tonycoz

0001-perl-116925-discuss-modifying-an-SV-s-buffer.patch
From 0751e326fb39ab12055583509812e6047ce20e1e Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 27 May 2014 12:01:14 +1000
Subject: [perl #116925] discuss modifying an SV's buffer

in the same place we otherwise talk about SVs
---
 pod/perlguts.pod |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 3fb5137..cf3df89 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -159,6 +159,58 @@ decrease, the allocated memory of an SV and that it does not automatically
 add space for the trailing NUL byte (perl's own string functions typically do
 C<SvGROW(sv, len + 1)>).
 
+If you want to write to an existing SV's buffer and set its value to a
+string, use SvPV_force() or one of its variants to force the SV to be
+a PV.  This will remove any of various types of non-stringness from
+the SV while preserving the content of the SV in the PV.  This can be
+used, for example, to append data from an API function to a buffer
+without extra copying:
+
+    (void)SvPVbyte_force(sv, len);
+    s = SvGROW(sv, len + needlen + 1);
+    /* something that modifies up to needlen bytes at s+len, but
+       modifies newlen bytes
+         eg. newlen = read(fd, s + len, needlen);
+       ignoring errors for these examples
+     */
+    s[len + newlen] = '\0';
+    SvCUR_set(sv, len + newlen);
+    SvUTF8_off(sv);
+    SvSETMAGIC(sv);
+
+If you already have the data in memory or if you want to keep your
+code simple, you can use one of the sv_cat*() variants, such as
+sv_catpvn().  If you want to insert anywhere in the string you can use
+sv_insert() or sv_insert_flags().
+
+If you don't need the existing content of the SV, you can avoid some
+copying with:
+
+    sv_setpvn(sv, "", 0);
+    s = SvGROW(sv, needlen + 1);
+    /* something that modifies up to needlen bytes at s, but modifies
+       newlen bytes
+         eg. newlen = read(fd, s. needlen);
+     */
+    s[newlen] = '\0';
+    SvCUR_set(sv, newlen);
+    SvPOK_only(sv); /* also clears SVf_UTF8 */
+    SvSETMAGIC(sv);
+
+Again, if you already have the data in memory or want to avoid the
+complexity of the above, you can use sv_setpvn().
+
+If you have a buffer allocated with Newx() and want to set that as the
+SV's value, you can use sv_usepvn_flags().  That has some requirements
+if you want to avoid perl re-allocating the buffer to fit the trailing
+NUL:
+
+   Newx(buf, somesize+1, char);
+   /* ... fill in buf ... */
+   buf[somesize] = '\0';
+   sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
+   /* buf now belongs to perl, don't release it */
+
 If you have an SV and want to know what kind of data Perl thinks is stored
 in it, you can use the following macros to check the type of SV you have.
 
-- 
1.7.10.4

@p5pRT
Copy link
Author

p5pRT commented Jun 24, 2014

From @tonycoz

On Mon May 26 19​:02​:51 2014, tonyc wrote​:
...

I've added a brief discussion of that.

Applied as 21134f6.

Tony

@p5pRT p5pRT closed this as completed Jun 24, 2014
@p5pRT
Copy link
Author

p5pRT commented Jun 24, 2014

@tonycoz - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant