Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl5 doesn't distinguish "originally was a string" and "originally was a number" and doesn't let one control caching of stringification #12801

Open
p5pRT opened this issue Feb 20, 2013 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 20, 2013

Migrated from rt.perl.org#116871 (status was 'open')

Searchable as RT116871$

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

I am not including a Perl version data in this ticket as it applies to
all Perls that currently exist as of 5.17.9.

Perl doesn't preserve the original type of a scalar var, so one has
difficulty reliably telling these apart​:

$x= "1";
$y= 1;

This causes problems in serialization for variables that have been
used in both numeric and string context. And if improperly handled can
lead to data loss.

We should remember the original type. I believe Chip in the past did
some work on this subject.

Related to this we do not give the user any ability to control the
internal caching of the stringified form. This can lead to surprising
situations like memory overflow from printing out a data structure. I
have been forced in the past to rely on code like this​:

# dont let Perl chew up all our ram caching stuff we will only use once...
my @​copy= @​{$hash->{$thing}};
print join(",", @​copy),"\n";

I think there should be an easier way to control this behavior, and
that perl should be more careful about tracking it.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From perl@profvince.com

Perl doesn't preserve the original type of a scalar var, so one has
difficulty reliably telling these apart​:

$x= "1";
$y= 1;

This causes problems in serialization for variables that have been
used in both numeric and string context. And if improperly handled can
lead to data loss.

We should remember the original type. I believe Chip in the past did
some work on this subject.

My negative opinion on this hasn't changed since
http​://www.nntp.perl.org/group/perl.perl5.porters/2012/08/msg190382.html.

Vincent

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

On 20 February 2013 03​:29, Vincent Pit <perl@​profvince.com> wrote​:

Perl doesn't preserve the original type of a scalar var, so one has
difficulty reliably telling these apart​:

$x= "1";
$y= 1;

This causes problems in serialization for variables that have been
used in both numeric and string context. And if improperly handled can
lead to data loss.

We should remember the original type. I believe Chip in the past did
some work on this subject.

My negative opinion on this hasn't changed since
http​://www.nntp.perl.org/group/perl.perl5.porters/2012/08/msg190382.html.

With all due respect your objections rely on something that is
completely unpractical, and IMO part of it is categorically wrong. For
instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind
the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would
Just Work, and there would be no need for externally tagging. Likewise
if Perl remember that a SvPVIV was originally an IV then this would
also Just Work.

IMO this is doable if we want to do it.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @chipdude

On 2/19/2013 6​:38 PM, demerphq wrote​:

If perl remembered that an SvPVIV was originally a PV then this would
Just Work, and there would be no need for externally tagging. Likewise
if Perl remember that a SvPVIV was originally an IV then this would
also Just Work.

Indeed. But I'm not interesting it doing it right now. The effort
required is about equal to creating a pure C++ parser as part of a Perl
reimplementation. Obviously the latter is more worthwhile.

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From perl@profvince.com

With all due respect your objections rely on something that is
completely unpractical, and IMO part of it is categorically wrong. For
instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind
the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would
Just Work, and there would be no need for externally tagging. Likewise
if Perl remember that a SvPVIV was originally an IV then this would
also Just Work.

I'm not sure how you can consider practical to tell people that 1 and
'1' may be used the same way on all mathematical operators (except for
the rarely used and badly designed bitwise operators), but that they are
in fact different, and that it can bite them hard if pass the wrong one
to lib X that makes use of that difference. Good luck debugging this
through several layers of code. I don't think that's helping anyone, but
hey, what do I know? After all I'm wrong and stuff :)

IMO this is doable if we want to do it.

We can do a lot of things. The question is whether we should do them.

Vincent

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2013

From @demerphq

On 20 February 2013 04​:05, Vincent Pit <perl@​profvince.com> wrote​:

With all due respect your objections rely on something that is
completely unpractical, and IMO part of it is categorically wrong. For
instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind
the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would
Just Work, and there would be no need for externally tagging. Likewise
if Perl remember that a SvPVIV was originally an IV then this would
also Just Work.

I'm not sure how you can consider practical to tell people that 1 and '1'
may be used the same way on all mathematical operators (except for the
rarely used and badly designed bitwise operators), but that they are in fact
different, and that it can bite them hard if pass the wrong one to lib X
that makes use of that difference.

Can you give an example where that would happen?

You seem to be worried about a case im having problems understanding.

I am worried about cases like this​:

$x= "0 but true";
print 0+$x;

I want to be able to know that this needs to be serialized as C<"0 but
true"> and not C<0>.

Similar I want to be able to know that

$x= 1;
print $x;

that I can safely serialize $x as an SvIV and not as a SvPVIV, which
would require me to store both the string and integer representation
(which is what Storable for instance does.)

So, what problem are you worried about in practice? I am really
struggling with imaging a scenario like you described...

Good luck debugging this through several
layers of code. I don't think that's helping anyone,

Like I said, can you turn this from a hypothetical to a more
substantive example of where knowing the source type would cause a
problem?

but hey, what do I
know? After all I'm wrong and stuff :)

Well, you claimed it can't be safely done, I don't see how that can be true.

And I said "with all due respect" for a reason​: I wasn't trying to be
disrespectful in disagreeing with you.

IMO this is doable if we want to do it.

We can do a lot of things. The question is whether we should do them.

Yeah of course. No argument there.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2013

From @ap

* demerphq <demerphq@​gmail.com> [2013-02-20 04​:20]​:

I am worried about cases like this​:

$x= "0 but true";
print 0+$x;

I want to be able to know that this needs to be serialized as C<"0 but
true"> and not C<0>.

Right, the issue isn’t treating perfect equivalents like 1 and '1' as
different things, it is that imperfect conversions that lose information
can attach non-canonical values to SVs a side effect of other operations
and thereafter there is no way to know which value was canonical.

(I have argued before that if anything like this is done, then any core
API that allows distinguishing 1 from '1' in Perl space should have
names based on perlguts terminology, not attractive nuisance names like
`is_number` or anything similar, and the POD should warn against using
them that way.)

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @jandubois

On Thu, Feb 21, 2013 at 12​:31 PM, Aristotle Pagaltzis <pagaltzis@​gmx.de> wrote​:

* demerphq <demerphq@​gmail.com> [2013-02-20 04​:20]​:

I am worried about cases like this​:

$x= "0 but true";
print 0+$x;

I want to be able to know that this needs to be serialized as C<"0 but
true"> and not C<0>.

Right, the issue isn’t treating perfect equivalents like 1 and '1' as
different things, it is that imperfect conversions that lose information
can attach non-canonical values to SVs a side effect of other operations
and thereafter there is no way to know which value was canonical.

This is not generally true; "0 but true" is just a special case that
does get SVf_IOK set when used as a number. In general, only the
private SVp_IOK or SVp_NOK flags will be set if the conversion is
in-exact, to indicate that the PV is the canonical representation​:

$ perl -MDevel​::Peek -e '$a="0"; $b=$a+1; Dump $a'
SV = PVIV(0x100832bf0) at 0x100831cd8
  REFCNT = 1
  FLAGS = (IOK,POK,pIOK,pPOK)
  IV = 0
  PV = 0x100205db0 "0"\0
  CUR = 1
  LEN = 16

$ perl -MDevel​::Peek -e '$a="0 but"; $b=$a+1; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
  REFCNT = 1
  FLAGS = (POK,pIOK,pNOK,pPOK)
  IV = 0
  NV = 0
  PV = 0x100205db0 "0 but"\0
  CUR = 5
  LEN = 16

$ perl -MDevel​::Peek -e '$a="0 but true"; $b=$a+1; Dump $a'
SV = PVIV(0x100832bf0) at 0x100831cd8
  REFCNT = 1
  FLAGS = (IOK,POK,pIOK,pPOK)
  IV = 0
  PV = 0x100205dc0 "0 but true"\0
  CUR = 10
  LEN = 16

There are other special cases for "Inf" and "NaN" too, but it looks
like they will only get SVp_IOK set when they start out as strings,
not as numbers​:

$ perl -MDevel​::Peek -e '$a="Inf"; $b=$a+1; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
  REFCNT = 1
  FLAGS = (NOK,POK,pIOK,pNOK,pPOK,IsUV)
  UV = 18446744073709551615
  NV = inf
  PV = 0x100205db0 "Inf"\0
  CUR = 3
  LEN = 16

$ perl -MDevel​::Peek -e '$a=2**9999; $b="$a"; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
  REFCNT = 1
  FLAGS = (NOK,POK,pNOK,pPOK)
  IV = 0
  NV = inf
  PV = 0x100214f00 "inf"\0
  CUR = 3
  LEN = 48

I think though that the SVp_IOK difference may be an accidental
implementation detail, so it would need tests to make sure we don't
accidentally change it in the future.

Another problem with NaN and Inf is that their string representation
is currently platform dependent (inherited from C RTL).

Cheers,
-Jan

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2013

From @jandubois

On Thu, Feb 21, 2013 at 4​:32 PM, Jan Dubois <jand@​activestate.com> wrote​:

There are other special cases for "Inf" and "NaN" too, but it looks
like they will only get SVp_IOK set when they start out as strings,
not as numbers​:

$ perl -MDevel​::Peek -e '$a="Inf"; $b=$a+1; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
REFCNT = 1
FLAGS = (NOK,POK,pIOK,pNOK,pPOK,IsUV)
UV = 18446744073709551615
NV = inf
PV = 0x100205db0 "Inf"\0
CUR = 3
LEN = 16

$ perl -MDevel​::Peek -e '$a=2**9999; $b="$a"; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
REFCNT = 1
FLAGS = (NOK,POK,pNOK,pPOK)
IV = 0
NV = inf
PV = 0x100214f00 "inf"\0
CUR = 3
LEN = 48

I think though that the SVp_IOK difference may be an accidental
implementation detail, so it would need tests to make sure we don't
accidentally change it in the future.

Never mind, it doesn't actually work​:

$ perl -MDevel​::Peek -e '$a=2**9999; $b=$a+1; $b="$a"; Dump $a'
SV = PVNV(0x100806330) at 0x100831cd8
  REFCNT = 1
  FLAGS = (NOK,POK,pIOK,pNOK,pPOK,IsUV)
  UV = 18446744073709551615
  NV = inf
  PV = 0x100205bc0 "inf"\0
  CUR = 3
  LEN = 48

So there is no way to tell if they started as an NV or a PV.

Cheers,
-Jan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants