Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SvPVutf8 != SvPVX, and sv_2pvutf8 #12703

Closed
p5pRT opened this issue Jan 15, 2013 · 16 comments
Closed

SvPVutf8 != SvPVX, and sv_2pvutf8 #12703

p5pRT opened this issue Jan 15, 2013 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 15, 2013

Migrated from rt.perl.org#116407 (status was 'resolved')

Searchable as RT116407$

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

From @bulk88

Created by @bulk88

I recently found a bug in a core module, where a loop was done over a
string from SvPVutf8(sv, len) to SvEND(sv). The sv is a RO literal SV.
So it looped over uninitialized memory and sometimes segvs.

http​://perl5.git.perl.org/perl.git/blob/a8c6ff7b8e8c6037333c21f9b3f6b38b9278df4f​:/sv.c#l3060

A mortal copy was made of the RO SV was made and returned as the result
of SvPVutf8.

All that SvPVutf8's docs say is "Like C<SvPV>, but converts sv to utf8
first if necessary."

SvPV says "The SV may cache the stringified version becoming C<SvPOK>."
So that is the disclaimer that SvPVX after SvPV is not the same pointer
as returned from SvPV. I think the docs can be improved for SvPVutf8 and
maybe SvPV to make it more obvious that these 2 do not create POK SVs. I
don't have any proposed wording so no patch is included.

This ticket might be related to
http​://perl5.git.perl.org/perl.git/commit/fe46cbda823c09f80e4bc48dd93fafb26cc805f6
and https://rt-archive.perl.org/perl5/Ticket/Display.html?id=108994 .

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.17.7:

Configured by Owner at Sun Dec 16 13:25:34 2012.

Summary of my perl5 (revision 5 version 17 subversion 7 patch blead 
2012-12-06.16:42:20 93a641ae382638ffd1980378be4810244d04f4b0 
v5.17.6-186-g93a641a) configuration:
  Snapshot of: 93a641ae382638ffd1980378be4810244d04f4b0
  Platform:
    osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -GL -G7 
-DWIN32 -D_CONSOLE -DNO_STRICT  -DPERL_TEXTMODE_SCRIPTS 
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO 
-D_USE_32BIT_TIME_T',
    optimize='-MD -Zi -DNDEBUG -O1 -GL -G7',
    cppflags='-DWIN32'
    ccversion='13.10.6030', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf 
-ltcg  -libpath:"c:\perl517\lib\CORE"  -machine:x86'
    libpth="C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\lib"
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib 
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib 
oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  
version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug 
-opt:ref,icf -ltcg  -libpath:"c:\perl517\lib\CORE"  -machine:x86'

Locally applied patches:
    


@INC for perl 5.17.7:
    C:/perl517/site/lib
    C:/perl517/lib
    .


Environment for perl 5.17.7:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\perl517\bin;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 
2003\VC7\BIN;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\Tools;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\Tools\bin\prerelease;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32\wbem;
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

From @tonycoz

On Tue, Jan 15, 2013 at 01​:24​:29PM -0800, bulk88 wrote​:

I recently found a bug in a core module, where a loop was done over a
string from SvPVutf8(sv, len) to SvEND(sv). The sv is a RO literal SV.
So it looped over uninitialized memory and sometimes segvs.

http​://perl5.git.perl.org/perl.git/blob/a8c6ff7b8e8c6037333c21f9b3f6b38b9278df4f​:/sv.c#l3060

A mortal copy was made of the RO SV was made and returned as the result
of SvPVutf8.

All that SvPVutf8's docs say is "Like C<SvPV>, but converts sv to utf8
first if necessary."

SvPV says "The SV may cache the stringified version becoming C<SvPOK>."
So that is the disclaimer that SvPVX after SvPV is not the same pointer
as returned from SvPV. I think the docs can be improved for SvPVutf8 and
maybe SvPV to make it more obvious that these 2 do not create POK SVs. I
don't have any proposed wording so no patch is included.

This ticket might be related to
http​://perl5.git.perl.org/perl.git/commit/fe46cbda823c09f80e4bc48dd93fafb26cc805f6
and https://rt-archive.perl.org/perl5/Ticket/Display.html?id=108994 .

Perhaps​:

  Like C<SvPV>, but returns always returns a utf8 string.

  char* SvPVutf8(SV* sv, STRLEN len)

  This may return a pointer to a temporary instead of C<SvPVX(sv)>.

The other SvPVutf8* macros should be updated too.

Tony

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 13, 2013

From @bulk88

On Tue Jan 15 15​:37​:17 2013, tonyc wrote​:

Perhaps​:

Like C<SvPV>, but returns always returns a utf8 string.

char\*    SvPVutf8\(SV\* sv\, STRLEN len\)

This may return a pointer to a temporary instead of C<SvPVX(sv)>.

The other SvPVutf8* macros should be updated too.

Tony

Does anyone else have comments on this bug report or a wording proposal?

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 6, 2013

From @iabyn

On Wed, Jan 16, 2013 at 10​:36​:12AM +1100, Tony Cook wrote​:

On Tue, Jan 15, 2013 at 01​:24​:29PM -0800, bulk88 wrote​:

I recently found a bug in a core module, where a loop was done over a
string from SvPVutf8(sv, len) to SvEND(sv). The sv is a RO literal SV.
So it looped over uninitialized memory and sometimes segvs.

http​://perl5.git.perl.org/perl.git/blob/a8c6ff7b8e8c6037333c21f9b3f6b38b9278df4f​:/sv.c#l3060

A mortal copy was made of the RO SV was made and returned as the result
of SvPVutf8.

All that SvPVutf8's docs say is "Like C<SvPV>, but converts sv to utf8
first if necessary."

SvPV says "The SV may cache the stringified version becoming C<SvPOK>."
So that is the disclaimer that SvPVX after SvPV is not the same pointer
as returned from SvPV. I think the docs can be improved for SvPVutf8 and
maybe SvPV to make it more obvious that these 2 do not create POK SVs. I
don't have any proposed wording so no patch is included.

This ticket might be related to
http​://perl5.git.perl.org/perl.git/commit/fe46cbda823c09f80e4bc48dd93fafb26cc805f6
and https://rt-archive.perl.org/perl5/Ticket/Display.html?id=108994 .

Perhaps​:

Like C<SvPV>, but returns always returns a utf8 string.

char\*    SvPVutf8\(SV\* sv\, STRLEN len\)

This may return a pointer to a temporary instead of C<SvPVX(sv)>.

The other SvPVutf8* macros should be updated too.

Perhaps we just need to clarify SvPV and rely on the fact that all the
others say "SvPVfoo is just like SvPV, except for foo".

I'd suggest adding the following to SvPV​:

  Note that there is no guarantee that that the return value of
  C<SvPV()> is equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains
  valid data), due to the way things like overloading and Copy-on-write
  are handled. In these cases, the return value may point to a
  temporary buffer or similar. If you absolutely need the SvPVX field to
  be valid (for example, if you intend to write to it), then see
  C<SvPV_force>.

And for SvPV_force, which currently says​:

  Like C<SvPV> but will force the SV into containing a string
  (C<SvPOK>), and only a string (C<SvPOK_only>), by hook or by crook.
  You want force if you are going to update the C<SvPVX> directly.
  Processes get magic.

I'd suggest s/You want force/You need force/ (to make it stronger), and
add​:

  Note that coercing an arbitrary scalar into a plain PV will
  potentially strip useful data from it. For example if the SV was
  C<SvROK>, then referent will have its reference count decremented, and
  the SV itself may be converted to an C<SvPOK> scalar with a string
  buffer containing value such as C<"ARRAY(0x1234)">.

Then make the SvPV_force_nomg() doc refer to SvPV_force() rather than
duplicating all the boiler-plate just added.

Note that updating SvPV* docs like this will also help with the goal of
improving the docs vis-a-vis preparing for re-enabling COW.

--
A major Starfleet emergency breaks out near the Enterprise, but
fortunately some other ships in the area are able to deal with it to
everyone's satisfaction.
  -- Things That Never Happen in "Star Trek" #13

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @iabyn

On Mon, May 06, 2013 at 03​:59​:20PM +0100, Dave Mitchell wrote​:

Perhaps we just need to clarify SvPV and rely on the fact that all the
others say "SvPVfoo is just like SvPV, except for foo".

In the absence of any feedback, I've applied the following commit​:

commit fd14238
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Thu May 9 15​:17​:10 2013 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Thu May 9 15​:50​:46 2013 +0100

  clarify SvPV* and SvPV_force* docs
 
  See RT #116407.
 
  Make it clear that SvPV(sv) does not always equal SvPVX(sv), and to use
  SvPV_force() if necessary. Also, clarify that SvPV_force() will destroy
  any non-string content like a reference. Then, make the other SvPV*
  descriptions in general refer back to SvPV() or SvPV_force(), so that
  people will spot these added caveats.

Affected files ...
 
  M sv.h

Differences ...

Inline Patch
diff --git a/sv.h b/sv.h
index fd3da2d..4bfbf9c 100644
--- a/sv.h
+++ b/sv.h
@@ -852,8 +852,8 @@ Set the actual length of the string which is in the SV.  See C<SvIV_set>.
 =for apidoc Am|U32|SvUTF8|SV* sv
 Returns a U32 value indicating the UTF-8 status of an SV.  If things are set-up
 properly, this indicates whether or not the SV contains UTF-8 encoded data.
-Call this after SvPV() in case any call to string overloading updates the
-internal flag.
+You should use this I<after> a call to SvPV() or one of its variants, in
+case any call to string overloading updates the internal flag.
 
 =for apidoc Am|void|SvUTF8_on|SV *sv
 Turn on the UTF-8 status of an SV (the data is not changed, just the flag).
@@ -1471,13 +1471,17 @@ attention to precisely which outputs are influenced by which inputs.
 /*
 =for apidoc Am|char*|SvPV_force|SV* sv|STRLEN len
 Like C<SvPV> but will force the SV into containing a string (C<SvPOK>), and
-only a string (C<SvPOK_only>), by hook or by crook.  You want force if you are
+only a string (C<SvPOK_only>), by hook or by crook.  You need force if you are
 going to update the C<SvPVX> directly.  Processes get magic.
 
+Note that coercing an arbitrary scalar into a plain PV will potentially
+strip useful data from it. For example if the SV was C<SvROK>, then the
+referent will have its reference count decremented, and the SV itself may
+be converted to an C<SvPOK> scalar with a string buffer containing a value
+such as C<"ARRAY(0x1234)">.
+
 =for apidoc Am|char*|SvPV_force_nomg|SV* sv|STRLEN len
-Like C<SvPV> but will force the SV into containing a string (C<SvPOK>), and
-only a string (C<SvPOK_only>), by hook or by crook.  You want force if you are
-going to update the C<SvPVX> directly.  Doesn't process get magic.
+Like C<SvPV_force>, but doesn't process get magic.
 
 =for apidoc Am|char*|SvPV|SV* sv|STRLEN len
 Returns a pointer to the string in the SV, or a stringified form of
@@ -1485,6 +1489,13 @@ the SV if the SV does not contain a string.  The SV may cache the
 stringified version becoming C<SvPOK>.  Handles 'get' magic.  See also
 C<SvPVx> for a version which guarantees to evaluate sv only once.
 
+Note that there is no guarantee that the return value of C<SvPV()> is
+equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains valid data), due
+to the way that things like overloading and Copy-On-Write are handled.  In
+these cases, the return value may point to a temporary buffer or similar.
+If you absolutely need the SvPVX field to be valid (for example, if you
+intend to write to it), then see L</SvPV_force>.
+
 =for apidoc Am|char*|SvPVx|SV* sv|STRLEN len
 A version of C<SvPV> which guarantees to evaluate C<sv> only once.
 Only use this if C<sv> is an expression with side effects, otherwise use the
@@ -1494,9 +1505,7 @@ more efficient C<SvPV>.
 Like C<SvPV> but doesn't process magic.
 
 =for apidoc Am|char*|SvPV_nolen|SV* sv
-Returns a pointer to the string in the SV, or a stringified form of
-the SV if the SV does not contain a string.  The SV may cache the
-stringified form becoming C<SvPOK>.  Handles 'get' magic.
+Like C<SvPV> but doesn't set a length variable.
 
 =for apidoc Am|char*|SvPV_nomg_nolen|SV* sv
 Like C<SvPV_nolen> but doesn't process magic.


-- 

Counsellor Troi states something other than the blindingly obvious.
  -- Things That Never Happen in "Star Trek" #16

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

@iabyn - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed May 9, 2013
@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @nwc10

On Thu, May 09, 2013 at 03​:52​:15PM +0100, Dave Mitchell wrote​:

On Mon, May 06, 2013 at 03​:59​:20PM +0100, Dave Mitchell wrote​:

Perhaps we just need to clarify SvPV and rely on the fact that all the
others say "SvPVfoo is just like SvPV, except for foo".

In the absence of any feedback, I've applied the following commit​:

Sorry, I'm one of about 600 people guilty of this.

@​@​ -852,8 +852,8 @​@​ Set the actual length of the string which is in the SV. See C<SvIV_set>.
=for apidoc Am|U32|SvUTF8|SV* sv
Returns a U32 value indicating the UTF-8 status of an SV. If things are set-up
properly, this indicates whether or not the SV contains UTF-8 encoded data.
-Call this after SvPV() in case any call to string overloading updates the
-internal flag.
+You should use this I<after> a call to SvPV() or one of its variants, in
+case any call to string overloading updates the internal flag.

Thanks for stressing this further.

+Note that there is no guarantee that the return value of C<SvPV()> is
+equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains valid data), due
+to the way that things like overloading and Copy-On-Write are handled. In
+these cases, the return value may point to a temporary buffer or similar.
+If you absolutely need the SvPVX field to be valid (for example, if you
+intend to write to it), then see L</SvPV_force>.

This also looks useful. Is there a good way to add (without clutter) the
possibility that the implication of this is that repeated calls to SvPV() on
the same scalar might each produce a pointer to a different buffer?

That has caught out some code before, including in the core.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @iabyn

On Thu, May 09, 2013 at 03​:56​:49PM +0100, Nicholas Clark wrote​:

This also looks useful. Is there a good way to add (without clutter) the
possibility that the implication of this is that repeated calls to SvPV() on
the same scalar might each produce a pointer to a different buffer?

That has caught out some code before, including in the core.

Is this good enough for yuz?

commit fe02ddb
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Thu May 9 16​:57​:56 2013 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Thu May 9 17​:05​:15 2013 +0100

  further tweak SvPV() docs
 
  Make it clear that a different pointer may be returned each time.

Affected files ...
 
  M sv.h

Differences ...

Inline Patch
diff --git a/sv.h b/sv.h
index 4bfbf9c..7baef34 100644
--- a/sv.h
+++ b/sv.h
@@ -1490,11 +1490,13 @@ stringified version becoming C<SvPOK>.  Handles 'get' magic.  See also
 C<SvPVx> for a version which guarantees to evaluate sv only once.
 
 Note that there is no guarantee that the return value of C<SvPV()> is
-equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains valid data), due
-to the way that things like overloading and Copy-On-Write are handled.  In
-these cases, the return value may point to a temporary buffer or similar.
-If you absolutely need the SvPVX field to be valid (for example, if you
-intend to write to it), then see L</SvPV_force>.
+equal to C<SvPVX(sv)>, or that C<SvPVX(sv)> contains valid data, or that
+successive calls to C<SvPV(sv)) will return the same pointer value each
+time. This is due to the way that things like overloading and
+Copy-On-Write are handled.  In these cases, the return value may point to
+a temporary buffer or similar.  If you absolutely need the SvPVX field to
+be valid (for example, if you intend to write to it), then see
+L</SvPV_force>.


-- 

"I do not resent criticism, even when, for the sake of emphasis,
it parts for the time with reality".
  -- Winston Churchill, House of Commons, 22nd Jan 1941.

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @bulk88

On Thu May 09 09​:07​:02 2013, davem wrote​:

Note that there is no guarantee that the return value of C<SvPV()> is
-equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains valid
data), due
-to the way that things like overloading and Copy-On-Write are
handled. In
-these cases, the return value may point to a temporary buffer or
similar.
-If you absolutely need the SvPVX field to be valid (for example, if
you
-intend to write to it), then see L</SvPV_force>.
+equal to C<SvPVX(sv)>, or that C<SvPVX(sv)> contains valid data, or
that
+successive calls to C<SvPV(sv)) will return the same pointer value
each
+time. This is due to the way that things like overloading and
+Copy-On-Write are handled. In these cases, the return value may
point to
+a temporary buffer or similar. If you absolutely need the SvPVX
field to
+be valid (for example, if you intend to write to it), then see
+L</SvPV_force>.

"(for example, if you intend to write to it), then see L</SvPV_force>."

SvPV_force runs get magic, so if you are "writing to SvPVX", you just at
minimum did "read" and then maybe (if you are doing XS right) "write"
for what should have been just a "write", which is wrong in magic land.
Also SvPV_force might have done a "sv_2pv_flags" internally to generate
a valid PV buffer (IV to PV conversion) which will be immediately
overwritten (and wasted cpu to generate it) by passing SvPVX to a
non-perl C function. The previous sentence might need to be another
ticket. perlguts in blead says nothing about SvPV is not SvPVX and says
nothing about COW.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 10, 2013

From @iabyn

On Thu, May 09, 2013 at 11​:54​:47AM -0700, bulk88 via RT wrote​:

On Thu May 09 09​:07​:02 2013, davem wrote​:

Note that there is no guarantee that the return value of C<SvPV()> is
-equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains valid
data), due
-to the way that things like overloading and Copy-On-Write are
handled. In
-these cases, the return value may point to a temporary buffer or
similar.
-If you absolutely need the SvPVX field to be valid (for example, if
you
-intend to write to it), then see L</SvPV_force>.
+equal to C<SvPVX(sv)>, or that C<SvPVX(sv)> contains valid data, or
that
+successive calls to C<SvPV(sv)) will return the same pointer value
each
+time. This is due to the way that things like overloading and
+Copy-On-Write are handled. In these cases, the return value may
point to
+a temporary buffer or similar. If you absolutely need the SvPVX
field to
+be valid (for example, if you intend to write to it), then see
+L</SvPV_force>.

"(for example, if you intend to write to it), then see L</SvPV_force>."

SvPV_force runs get magic, so if you are "writing to SvPVX", you just at
minimum did "read" and then maybe (if you are doing XS right) "write"
for what should have been just a "write", which is wrong in magic land.
Also SvPV_force might have done a "sv_2pv_flags" internally to generate
a valid PV buffer (IV to PV conversion) which will be immediately
overwritten (and wasted cpu to generate it) by passing SvPVX to a
non-perl C function.

I have no idea what point you're trying to make here.
I've added some sentences to the docs for SvPV that say under some
circumstances you might need SvPV_force instead. All the stuff about
calling magic etc seems completely orthogonal to that.

The previous sentence might need to be another
ticket. perlguts in blead says nothing about SvPV is not SvPVX and says
nothing about COW.

Currently near the top of my things to do prior to 5.18 is to add some
notes about that issue to perlguts.

--
Indomitable in retreat, invincible in advance, insufferable in victory
  -- Churchill on Montgomery

@p5pRT
Copy link
Author

p5pRT commented May 17, 2013

From @iabyn

On Fri, May 10, 2013 at 12​:13​:38PM +0100, Dave Mitchell wrote​:

The previous sentence might need to be another
ticket. perlguts in blead says nothing about SvPV is not SvPVX and says
nothing about COW.

Currently near the top of my things to do prior to 5.18 is to add some
notes about that issue to perlguts.

It won't be in time for perl 5.18.0 now, but in terms of what we need to add
to perlguts, my feeling is that the general outline should be something
like the following. I'm happy to be corrected however; this isn't an
area I work with much. In particular, I'm not sure whether we need to
mention SV_CHECK_THINKFIRST/SvTHINKFIRST/sv_force_normal_flags(), SvIsCOW
etc?

  Given an arbitrary SV,

  1) if you want to *read* its string value, use SvPV (or one of it's many
  variants such as SvPV_nomg, SvPV_nolen etc, depending on circumstances).
  This will give you a string representation of the SV in a char buffer
  (which may or may not match the SvPVX of the string).

  Note that this preserves the original content of the SV. For example if
  the SV is a reference, then after the call to SvPV(), it's still a
  reference, it's not SvPOK, and the return value points to a temporary
  buffer containing a string like "ARRAY(0x12345678)".

  As a shortcut, if you've already done any get magic processing that was
  required, and if the SV is SvPOK, then its okay to use SvCUR and to
  directly read from the SvPVX buffer.

  2) If you want to *modify* the string value of an SV by writing into its
  string buffer (for example if you wanted to implement an in-place version
  of the ucfirst() function), then you need to first call SvPV_force (or one
  of its variants like SvPV_force_nomg, SvPVutf8_force etc). This will
  coerce the SV into a PV-compatible type regardless of its previous
  contents (or croak with "Can't coerce HASH to string" or similar, if it's
  something not convertible).

  Note that this may destroy the original content of the SV. For example if
  the SV was a reference, then after the call to SvPV(), it's no longer
  a reference, and is instead a plain SvPOK string with a value like
  "ARRAY(0x12345678)"

  After modifying the buffer, you will need to do SvSETMAGIC for the
  changes to potentially take effect (for example if the variable is
  tied).

  As a shortcut, if you've already done any get magic processing that was
  required, and if the SV is SvPOK, then its okay to use SvCUR and to
  directly read from and write to the SvPVX buffer, and to use SvGROW()
  if you want to write beyond the end of the string. You'll still need
  to call SvSETMAGIC afterwards.

  3) if an XS sub is declared as having a 'char *' or similar argument,
  then the standard typemap will do the equivalent of calling SvPV on
  the passed SV to retrieve a string buffer. Note that this means that
  the buffer should be treated as read-only, and not passed to any C
  function which might modify it. If your XS sub needs to modify the
  string, then declare the arg as type 'SV *' and manually perform your
  own SvPV_force and (later SvSETMAGIC) on it.

--
Overhead, without any fuss, the stars were going out.
  -- Arthur C Clarke

@p5pRT
Copy link
Author

p5pRT commented May 22, 2013

From @iabyn

On Fri, May 17, 2013 at 03​:14​:16PM +0100, Dave Mitchell wrote​:

It won't be in time for perl 5.18.0 now, but in terms of what we need to add
to perlguts, my feeling is that the general outline should be something
like the following.

No-ones provided any feedback yet, but I think I've already spotted an
error in my own text​:

I'm happy to be corrected however; this isn't an
area I work with much. In particular, I'm not sure whether we need to
mention SV_CHECK_THINKFIRST/SvTHINKFIRST/sv_force_normal_flags(), SvIsCOW
etc?

Given an arbitrary SV\,

1\) if you want to \*read\* its string value

[snip]

As a shortcut\, if you've already done any get magic processing that was
required\, and if the SV is SvPOK\, then its okay to use SvCUR and to
directly read from the SvPVX buffer\.

I still think that's true

2\) If you want to \*modify\* the string value of an SV by writing into its
string buffer

[snip]

As a shortcut\, if you've already done any get magic processing that was
required\, and if the SV is SvPOK\, then its okay to use SvCUR and to
directly read from and write to the SvPVX buffer\, and to use SvGROW\(\)
if you want to write beyond the end of the string\. You'll still need
to call SvSETMAGIC afterwards\.

I don't think that's true. COW vars have SvPOK set, but you still
shouldn't directly write to them. This has always been true, since it used
to apply to shared-hash keys as well​: although we've changed in it in 5.18
so that the RO flag is no longer set​:

  $ perl5163 -MDevel​::Peek -e '%h=qw(a 1); Dump $_ for keys %h'
  SV = PV(0xd0c140) at 0xd0dfe0
  REFCNT = 2
  FLAGS = (POK,FAKE,READONLY,pPOK)
  PV = 0xd244d0 "a"
  CUR = 1
  LEN = 0
  $ perl5180 -MDevel​::Peek -e '%h=qw(a 1); Dump $_ for keys %h'
  SV = PV(0x19a9e30) at 0x19a91c0
  REFCNT = 2
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x19c0250 "a"
  CUR = 1
  LEN = 0

So, I *suspect* that the correct answer is that if POK is set, you still
need to test SvTHINKFIRST() (rather than SvREADONLY()) ,then call
sv_force_normal() or similar (or use SV_CHECK_THINKFIRST or
SV_CHECK_THINKFIRST_COW_DROP), although frankly this is where I start to
get confused and become unclear exactly what we should be recommending to
XS writers.

--
If life gives you lemons, you'll probably develop a citric acid allergy.

@p5pRT
Copy link
Author

p5pRT commented Jun 9, 2013

From @cpansprout

On Wed May 22 11​:36​:42 2013, davem wrote​:

On Fri, May 17, 2013 at 03​:14​:16PM +0100, Dave Mitchell wrote​:

As a shortcut\, if you've already done any get magic processing

that was

required\, and if the SV is SvPOK\, then its okay to use SvCUR and

to

directly read from and write to the SvPVX buffer\, and to use

SvGROW()

if you want to write beyond the end of the string\. You'll still

need

to call SvSETMAGIC afterwards\.

I don't think that's true. COW vars have SvPOK set, but you still
shouldn't directly write to them. This has always been true, since it
used
to apply to shared-hash keys as well​: although we've changed in it in
5.18
so that the RO flag is no longer set​:

$ perl5163 \-MDevel&#8203;::Peek \-e '%h=qw\(a 1\); Dump $\_ for keys %h'
SV = PV\(0xd0c140\) at 0xd0dfe0
  REFCNT = 2
  FLAGS = \(POK\,FAKE\,READONLY\,pPOK\)
  PV = 0xd244d0 "a"
  CUR = 1
  LEN = 0
$ perl5180 \-MDevel&#8203;::Peek \-e '%h=qw\(a 1\); Dump $\_ for keys %h'
SV = PV\(0x19a9e30\) at 0x19a91c0
  REFCNT = 2
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x19c0250 "a"
  CUR = 1
  LEN = 0

So, I *suspect* that the correct answer is that if POK is set, you
still
need to test SvTHINKFIRST() (rather than SvREADONLY()) ,then call
sv_force_normal() or similar (or use SV_CHECK_THINKFIRST or
SV_CHECK_THINKFIRST_COW_DROP), although frankly this is where I start
to
get confused and become unclear exactly what we should be recommending
to
XS writers.

Currently sv_force_normal is already part of the API, but anything
involving THINKFIRST is not.

Since SvPV_force will itself call sv_force_normal, we can simply
recommend SvPV_force if the existing string needs to be modified.

If the XS author wants to use the existing buffer (for speed) but drop
if it would have to be copied, then sv_force_normal_flags(sv,
SV_COW_DROP_PV) followed by sv_grow is probably the right thing to
recommend.

Now, do we want to add THINKFIRST to the API? I really don’t know. But
I’ve demonstrated that we can provide the ‘right way’ to do things
without changing the current API at all.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jun 11, 2013

From @Leont

On Fri, May 17, 2013 at 4​:14 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

2\) If you want to \*modify\* the string value of an SV by writing into its
string buffer \(for example if you wanted to implement an in\-place version
of the ucfirst\(\) function\)\, then you need to first call SvPV\_force \(or one
of its variants like SvPV\_force\_nomg\, SvPVutf8\_force etc\)\. This will
coerce the SV into a PV\-compatible type regardless of its previous
contents \(or croak with "Can't coerce HASH to string" or similar\, if it's
something not convertible\)\.

Note that this may destroy the original content of the SV\. For example if
the SV was a reference\, then after the call to SvPV\(\)\, it's no longer
a reference\, and is instead a plain SvPOK string with a value like
"ARRAY\(0x12345678\)"

After modifying the buffer\, you will need to do SvSETMAGIC for the
changes to potentially take effect \(for example if the variable is
tied\)\.

As a shortcut\, if you've already done any get magic processing that was
required\, and if the SV is SvPOK\, then its okay to use SvCUR and to
directly read from and write to the SvPVX buffer\, and to use SvGROW\(\)
if you want to write beyond the end of the string\. You'll still need
to call SvSETMAGIC afterwards\.

My use-case is related but now quite the same.

I want to get rid of the current buffer to replace it with my own (in
a set-magic callback that checks if perl threw away the buffer I
attached to the SV, and if so reinstate it).

Currently, I'm doing something like​:

SV_CHECK_THINKFIRST_COW_DROP(sv),
if(SvROK(sv))
  sv_unref_flags(sv, SV_IMMEDIATE_UNREF);
if(SvPOK(sv))
  SvPV_free(sv)

I do think that's a valid, if unusual, use-case.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jun 12, 2013

From @bulk88

On Sun Jun 09 11​:21​:54 2013, sprout wrote​:

Currently sv_force_normal is already part of the API, but anything
involving THINKFIRST is not.

Since SvPV_force will itself call sv_force_normal, we can simply
recommend SvPV_force if the existing string needs to be modified.

If the XS author wants to use the existing buffer (for speed) but drop
if it would have to be copied, then sv_force_normal_flags(sv,
SV_COW_DROP_PV) followed by sv_grow is probably the right thing to
recommend.

Now, do we want to add THINKFIRST to the API? I really don’t know. But
I’ve demonstrated that we can provide the ‘right way’ to do things
without changing the current API at all.

XS requires a read-only buffer, read/write buffer, and write only
buffer. Each has a different set of tasks for the most efficient
execution of each. With RO, passing COW buffer is fine (can IV/NV alone
be COWed or PVs only?), and generate PV representation if not a PV. For
RW, PV it first, then always de-COW, grow if necessary (yes
uninitialized memory will be passed to the C func). For write-only,
de-COW, grow buffer. Do not convert representation. Offer macros to not
do a func call most of the time.

--
bulk88 ~ bulk88 at hotmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant