Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SvNV() does not store computed value to NV slot #15875

Closed
p5pRT opened this issue Feb 17, 2017 · 30 comments
Closed

SvNV() does not store computed value to NV slot #15875

p5pRT opened this issue Feb 17, 2017 · 30 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 17, 2017

Migrated from rt.perl.org#130801 (status was 'resolved')

Searchable as RT130801$

@p5pRT
Copy link
Author

p5pRT commented Feb 17, 2017

From @pali

When sv is non-magic PV type then functions SvIV(), SvUV() and SvNV()
upgrade sv to type which has slot for also for IV/UV/NV type and store
converted value to that slot. When sv is magic PV type then function
SvIV() and SvUV() do it too.

But SvNV() does not upgrade sv if is magical. And converted value is
just returned, not stored to sv.

It is in function Perl_sv_2nv_flags() (called by SvNV() on sv of PV
type). If sv is SvGMAGICAL() and has SvPOKp() flag then just Atof()
value is returned and not stored back to sv. When sv is not magical then
sv_upgrade() and SvNV_set(sv, Atof(SvPVX_const(sv))) are called.

On the other hand Perl_sv_2iv_flags() (called by SvIV()) calls
S_sv_2iuv_common() for converting and upgrading PV to IV when sv is
and also is not magical. Same apply for Perl_sv_2uv_flags().

So is there reason why SvNV() behave differently for magic scalars? Or
it is bug and SvNV should do similar things as SvIV/SvUV or as SvNV on
non-magical scalars?

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2017

From @khwilliamson

I don't have an answer to this question, but I can tell you why a stringified NV isn't stored. It is because the radix character varies by locale. In Czech, I believe it is a comma; in English, a period. And if the locale changes the character changes, and a PV that was generated in a different locale will be invalid. There were various bug tickets filed because of this. so we stopped storing the stringified number. Another approach would be to attach magic to it, like we do now to PVs that have been collated. The magic knows what locale was used for the stringification, and if the locale changes (reasonably unlikely) invalidate the PV.


Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2017

From @pali

On Monday 20 February 2017 23​:36​:46 you wrote​:

I don't have an answer to this question, but I can tell you why a
stringified NV isn't stored.

But here it is reverse situation​: Numerification of string (convert PV
string to NV floating point).

And current behaviour is that NV value (converted from string PV) is
stored into NV slot unless scalar is magical.

It is because the radix character
varies by locale. In Czech, I believe it is a comma; in English, a
period.

Yes IIRC it is. But I have never seen perl to print me comma in any
floating point number. And I had set different locales... Always it was
dot.

And if the locale changes the character changes, and a PV
that was generated in a different locale will be invalid. There
were various bug tickets filed because of this. so we stopped
storing the stringified number. Another approach would be to attach
magic to it, like we do now to PVs that have been collated. The
magic knows what locale was used for the stringification, and if the
locale changes (reasonably unlikely) invalidate the PV.

Is locale valid also for this reverse situation?

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2017

From @andk

On Mon, 20 Feb 2017 23​:56​:57 +0100, pali@​cpan.org said​:

It is because the radix character
varies by locale. In Czech, I believe it is a comma; in English, a
period.

  > Yes IIRC it is. But I have never seen perl to print me comma in any
  > floating point number. And I had set different locales... Always it was
  > dot.

LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0,42

And if the locale changes the character changes, and a PV
that was generated in a different locale will be invalid. There
were various bug tickets filed because of this. so we stopped
storing the stringified number. Another approach would be to attach
magic to it, like we do now to PVs that have been collated. The
magic knows what locale was used for the stringification, and if the
locale changes (reasonably unlikely) invalidate the PV.

  > Is locale valid also for this reverse situation?

Not sure I understand your question. Maybe you mean this​:

LC_ALL=cs_CZ.UTF-8 perl -le 'use locale; print "0.42"+0'
0,42

--
andreas

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2017

From @pali

On Monday 20 February 2017 19​:12​:09 (Andreas J. Koenig) via RT wrote​:

On Mon, 20 Feb 2017 23​:56​:57 +0100, pali@​cpan.org said​:

It is because the radix character
varies by locale. In Czech, I believe it is a comma; in English, a
period.

Yes IIRC it is. But I have never seen perl to print me comma in any
floating point number. And I had set different locales... Always it was
dot.

LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0,42

And if the locale changes the character changes, and a PV
that was generated in a different locale will be invalid. There
were various bug tickets filed because of this. so we stopped
storing the stringified number. Another approach would be to attach
magic to it, like we do now to PVs that have been collated. The
magic knows what locale was used for the stringification, and if the
locale changes (reasonably unlikely) invalidate the PV.

Is locale valid also for this reverse situation?

Not sure I understand your question. Maybe you mean this​:

LC_ALL=cs_CZ.UTF-8 perl -le 'use locale; print "0.42"+0'
0,42

Ah, I forgot 'use locale;' when testing.

But back to my question, do you know why behaviour is different for
magical and non-magical scalar? This explanation for locale does not
make sense otherwise it should not be stored also for non-magical
scalar.

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2017

From @tonycoz

On Tue, 21 Feb 2017 01​:43​:35 -0800, pali@​cpan.org wrote​:

Ah, I forgot 'use locale;' when testing.

But back to my question, do you know why behaviour is different for
magical and non-magical scalar? This explanation for locale does not
make sense otherwise it should not be stored also for non-magical
scalar.

I'm pretty sure storing the NV as we currently do is a bug​:

$ LANG=de_DE.utf8 ./perl -Ilib -Mlocale -le '$x = "0,42"; print $x+0; no locale; print $x+0'
0,42
0.42

$ LANG=de_DE.utf8 ./perl -Ilib -le '$x = "5,42"; print $x+0;'
5

From the discussion in #p5p, is this still an issue for you?

Tony

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2017

From @pali

On Monday 27 February 2017 16​:00​:00 Tony Cook via RT wrote​:

On Tue, 21 Feb 2017 01​:43​:35 -0800, pali@​cpan.org wrote​:

Ah, I forgot 'use locale;' when testing.

But back to my question, do you know why behaviour is different for
magical and non-magical scalar? This explanation for locale does not
make sense otherwise it should not be stored also for non-magical
scalar.

I'm pretty sure storing the NV as we currently do is a bug​:

So it means that SvNV() should not store computed value to NV slot, right?

$ LANG=de_DE.utf8 ./perl -Ilib -Mlocale -le '$x = "0,42"; print $x+0; no locale; print $x+0'
0,42
0.42

$ LANG=de_DE.utf8 ./perl -Ilib -le '$x = "5,42"; print $x+0;'
5

And then it is different bug...

From the discussion in #p5p, is this still an issue for you?

So from discussion I understood that SvNV(), SvIV() and SvUV() does not
have to store computed values into NV or IV slots. It would mean that my
report in this ticket is not a bug and just expected behaviour.

I'm ok with it but I would suggest to explicitly document this behaviour
in perlapi so that it would be clear also for other people.

@p5pRT
Copy link
Author

p5pRT commented Mar 5, 2017

From @pali

On Tuesday 28 February 2017 14​:22​:25 pali@​cpan.org wrote​:

I'm ok with it but I would suggest to explicitly document this behaviour
in perlapi so that it would be clear also for other people.

In attachment is proposed patch for this documentation change.

@p5pRT
Copy link
Author

p5pRT commented Mar 5, 2017

From @pali

0001-perlapi-Clarify-SvIV-SvUV-SvNV-behavior.patch
From 340a389aa37b76c808ed3f59cb35d530eb641bc3 Mon Sep 17 00:00:00 2001
From: Pali <pali@cpan.org>
Date: Sun, 5 Mar 2017 11:35:51 +0100
Subject: [PATCH] perlapi: Clarify SvIV/SvUV/SvNV behavior

---
 sv.h |   33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/sv.h b/sv.h
index 6227d46..a8fc09e 100644
--- a/sv.h
+++ b/sv.h
@@ -1517,42 +1517,45 @@ Like C<SvPV> but doesn't set a length variable.
 Like C<SvPV_nolen> but doesn't process magic.
 
 =for apidoc Am|IV|SvIV|SV* sv
-Coerces the given SV to an integer and returns it.  See C<L</SvIVx>> for a
-version which guarantees to evaluate C<sv> only once.
+Coerces the given SV to IV and returns it.  Computed value does not have to be
+stored in C<sv>'s IV slot (use C<L</sv_setiv>> for it).  See C<L</SvIVx>> for
+a version which guarantees to evaluate C<sv> only once.
 
 =for apidoc Am|IV|SvIV_nomg|SV* sv
 Like C<SvIV> but doesn't process magic.
 
 =for apidoc Am|IV|SvIVx|SV* sv
-Coerces the given SV to an integer and returns it.
-Guarantees to evaluate C<sv> only once.  Only use
-this if C<sv> is an expression with side effects,
+Coerces the given SV to IV and returns it.  Computed value does not have to be
+stored back to C<sv> (use C<L</sv_setiv>> for it).  Guarantees to evaluate
+C<sv> only once.  Only use this if C<sv> is an expression with side effects,
 otherwise use the more efficient C<SvIV>.
 
 =for apidoc Am|NV|SvNV|SV* sv
-Coerce the given SV to a double and return it.  See C<L</SvNVx>> for a version
-which guarantees to evaluate C<sv> only once.
+Coerces the given SV to NV and returns it.  Computed value does not have to be
+stored back to C<sv> (use C<L</sv_setnv>> for it).  See C<L</SvNVx>> for
+a version which guarantees to evaluate C<sv> only once.
 
 =for apidoc Am|NV|SvNV_nomg|SV* sv
 Like C<SvNV> but doesn't process magic.
 
 =for apidoc Am|NV|SvNVx|SV* sv
-Coerces the given SV to a double and returns it.
-Guarantees to evaluate C<sv> only once.  Only use
-this if C<sv> is an expression with side effects,
+Coerces the given SV to NV and returns it.  Computed value does not have to be
+stored back to C<sv> (use C<L</sv_setnv>> for it).  Guarantees to evaluate
+C<sv> only once.  Only use this if C<sv> is an expression with side effects,
 otherwise use the more efficient C<SvNV>.
 
 =for apidoc Am|UV|SvUV|SV* sv
-Coerces the given SV to an unsigned integer and returns it.  See C<L</SvUVx>>
-for a version which guarantees to evaluate C<sv> only once.
+Coerces the given SV to UV and returns it.  Computed value does not have to be
+stored back to C<sv> (use C<L</sv_setuv>> for it).  See C<L</SvUVx>> for
+a version which guarantees to evaluate C<sv> only once.
 
 =for apidoc Am|UV|SvUV_nomg|SV* sv
 Like C<SvUV> but doesn't process magic.
 
 =for apidoc Am|UV|SvUVx|SV* sv
-Coerces the given SV to an unsigned integer and
-returns it.  Guarantees to evaluate C<sv> only once.  Only
-use this if C<sv> is an expression with side effects,
+Coerces the given SV to UV and returns it.  Computed value does not have to be
+stored back to C<sv> (use C<L</sv_setuv>> for it).  Guarantees to evaluate
+C<sv> only once.  Only use this if C<sv> is an expression with side effects,
 otherwise use the more efficient C<SvUV>.
 
 =for apidoc Am|bool|SvTRUE|SV* sv
-- 
1.7.9.5

@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2017

From @khwilliamson

Thanks, applied as 04e8f31
--
Karl Williamson

@p5pRT p5pRT closed this as completed Apr 11, 2017
@p5pRT
Copy link
Author

p5pRT commented Apr 11, 2017

@khwilliamson - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2017

From @demerphq

On 20 February 2017 at 23​:36, Karl Williamson via RT
<perlbug-followup@​perl.org> wrote​:

I don't have an answer to this question, but I can tell you why a stringified NV isn't stored. It is because the radix character varies by locale. In Czech, I believe it is a comma; in English, a period. And if the locale changes the character changes, and a PV that was generated in a different locale will be invalid. There were various bug tickets filed because of this. so we stopped storing the stringified number. Another approach would be to attach magic to it, like we do now to PVs that have been collated. The magic knows what locale was used for the stringification, and if the locale changes (reasonably unlikely) invalidate the PV.

Is there a reason we cannot make this "no-cache" behavior "use locale" specific?

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV, and we do not cache any
stringifications of NV's, and when not under "use locale" we trust the
PV slot, and we do cache any stringifications.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2017

From zefram@fysh.org

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2017

From @demerphq

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

@p5pRT
Copy link
Author

p5pRT commented Apr 12, 2017

From @cpansprout

On Wed, 12 Apr 2017 10​:56​:42 -0700, demerphq wrote​:

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

How would we distinguish between an NV that got stringified outside of ‘use locale’ and a string that got numified at some point? Presumably we would use just SVp_POK, not SVf_POK, for a stringified NV.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @demerphq

On 12 Apr 2017 23​:24, "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Wed, 12 Apr 2017 10​:56​:42 -0700, demerphq wrote​:

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first
stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would
say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do
not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

How would we distinguish between an NV that got stringified outside of ‘use
locale’ and a string that got numified at some point? Presumably we would
use just SVp_POK, not SVf_POK, for a stringified NV

Do you forsee any difference with current behaviour? As far as I understand
the behavior Karl described right now already has this problem. Once
something becomes NV we stop using its PV slot at all.

Yved

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @demerphq

On 21 February 2017 at 04​:11, Andreas Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Mon, 20 Feb 2017 23​:56​:57 +0100, pali@​cpan.org said​:

It is because the radix character
varies by locale. In Czech, I believe it is a comma; in English, a
period.

Yes IIRC it is. But I have never seen perl to print me comma in any
floating point number. And I had set different locales... Always it was
dot.

LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0,42

Curiously this does not work for me​:

$ LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0.42

Do you know what I might have to do to replicate? I installed the
german utf8 language pack, but i feel like i probably have something
missing as I cannot replicate right now.

Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @eserte

demerphq <demerphq@​gmail.com> writes​:

On 21 February 2017 at 04​:11, Andreas Koenig
<andreas.koenig.7os6VVqR@​franz.ak.mind.de> wrote​:

On Mon, 20 Feb 2017 23​:56​:57 +0100, pali@​cpan.org said​:

It is because the radix character
varies by locale. In Czech, I believe it is a comma; in English, a
period.

Yes IIRC it is. But I have never seen perl to print me comma in any
floating point number. And I had set different locales... Always it was
dot.

LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0,42

Curiously this does not work for me​:

$ LC_ALL=de_DE.UTF-8 perl -le 'use locale; print 0.42'
0.42

Do you know what I might have to do to replicate? I installed the
german utf8 language pack, but i feel like i probably have something
missing as I cannot replicate right now.

Which perl version? You need 5.20 or later​:

$ for i in /opt/perl-5.*/bin/perl; do echo -n "$i​: "; LC_ALL=de_DE.UTF-8 $i -le 'use locale; print 0.42'; done
/opt/perl-5.10.1/bin/perl​: 0.42
/opt/perl-5.12.5/bin/perl​: 0.42
/opt/perl-5.14.4/bin/perl​: 0.42
/opt/perl-5.16.3/bin/perl​: 0.42
/opt/perl-5.18.2/bin/perl​: 0.42
/opt/perl-5.18.4/bin/perl​: 0.42
/opt/perl-5.18.4t/bin/perl​: 0.42
/opt/perl-5.20.1/bin/perl​: 0,42
/opt/perl-5.20.2t/bin/perl​: 0,42
/opt/perl-5.20.3/bin/perl​: 0,42
/opt/perl-5.20.3D/bin/perl​: 0,42
/opt/perl-5.22.0/bin/perl​: 0,42
/opt/perl-5.22.1/bin/perl​: 0,42
...

--
Slaven Rezic - slaven <at> rezic <dot> de

  Berlin Perl Mongers - http​://berlin.pm.org

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @demerphq

On 12 April 2017 at 23​:21, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Wed, 12 Apr 2017 10​:56​:42 -0700, demerphq wrote​:

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

How would we distinguish between an NV that got stringified outside of ‘use locale’ and a string that got numified at some point? Presumably we would use just SVp_POK, not SVf_POK, for a stringified NV.

I assume the semantics would be the same as they are now. But i can't
replicate the command for decimal separator right now so I can't be
sure. If the current implementation is sane then I imagine my proposal
would be too. If the current implementation is not sane, in the sense
that it trusts a cached pv that was turned into an NV but wont upgrade
an NV, then IMO the entire thing is horribly broken anyway, and I wash
my hands of it. *shrug*

I guess it comes down to what the following could should and does do​:

LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use locale;
print $f; print 0.42; } print $f'

If that prints​:
0.42
0,42
0,42
0.42

then under my proposal things would work the same as they do now.

If that prints​:
0.42
0.42
0,42
0.42

Then IMO the current behavior is insane and we should just the revert
the "we dont cache stringified NV's patch" as being completely broken.

I am assuming that we all agree it *should* output like the first set,
and that it does already, given that Karl is not the type of guy to
miss stuff like this. (IOW, I would be pretty surprised if he did
something I would consider to be insane, although not so much
vice-versa ;-).)

Yves
ps​: As an aside, this ticket is a good example of how locale and perl
core philosophy don't play nicely together. Perl says that a number
and a stringified version of that number should be treated the same.
This only works if there is a bijective mapping between the two
representations. Even when you consider a single locale representation
of floating point numbers the mapping is not actually bijective,
although we can mostly ignore/overlook the non bijective cases like
trailing and leading zeros, etc, but when you introduce the concept
that there are multiple string representations for a floating point
number there is a real problem, the entire floating point space has an
1​:N mapping. IMO stuff like this should have been handled with a
locale aware format in sprintf, and ONLY there and perhaps some
utility functions would locale be applied to numbers.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @demerphq

On 13 April 2017 at 09​:05, demerphq <demerphq@​gmail.com> wrote​:

On 12 April 2017 at 23​:21, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Wed, 12 Apr 2017 10​:56​:42 -0700, demerphq wrote​:

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

How would we distinguish between an NV that got stringified outside of ‘use locale’ and a string that got numified at some point? Presumably we would use just SVp_POK, not SVf_POK, for a stringified NV.

I assume the semantics would be the same as they are now. But i can't
replicate the command for decimal separator right now so I can't be
sure. If the current implementation is sane then I imagine my proposal
would be too. If the current implementation is not sane, in the sense
that it trusts a cached pv that was turned into an NV but wont upgrade
an NV, then IMO the entire thing is horribly broken anyway, and I wash
my hands of it. *shrug*

I guess it comes down to what the following could should and does do​:

LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use locale;
print $f; print 0.42; } print $f'

If that prints​:
0.42
0,42
0,42
0.42

then under my proposal things would work the same as they do now.

If that prints​:
0.42
0.42
0,42
0.42

Then IMO the current behavior is insane and we should just the revert
the "we dont cache stringified NV's patch" as being completely broken.

I am assuming that we all agree it *should* output like the first set,
and that it does already, given that Karl is not the type of guy to
miss stuff like this. (IOW, I would be pretty surprised if he did
something I would consider to be insane, although not so much
vice-versa ;-).)

Color me very very surprised​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use locale;

print $f; print 0.42; } print $f'
0.42
0.42
0,42
0.42

Even worse​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f1= "0.42"; my $f2; print 0+$f1; {
use locale; print $f1; print 0.42; $f2="0,42"; print 0+$f2; } print
$f1; print $f2'
0.42
0.42
0,42
0,42
0.42
0,42

IMO these are completely broken semantics. Removing caching/promotion
of SvNV to SvPVNV with these semantics just hides a chunk of the
insanity, it does not fix it, and everybody who does not use locale,
(iow the vast majority) has to pay for the partial fix.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From zefram@fysh.org

demerphq wrote​:

Do you forsee any difference with current behaviour? As far as I understand
the behavior Karl described right now already has this problem. Once
something becomes NV we stop using its PV slot at all.

No, we do not. If a scalar has a PV, we use its PV whenever a PV is
called for, regardless of it having an NV. For example​:

$ perl -MDevel​::Peek=Dump -lwe '$a = "1.50"; print $a + 0.25; print $a; Dump $a'
1.75
1.50
SV = PVNV(0x1e88ea0) at 0x1ea71d8
  REFCNT = 1
  FLAGS = (NOK,POK,IsCOW,pNOK,pPOK)
  IV = 0
  NV = 1.5
  PV = 0x1eaffa0 "1.50"\0
  CUR = 4
  LEN = 10
  COW_REFCNT = 1

Here the PV "1.50" is used for the second print, even though the scalar
is NOK and stringification from the NV would lead to a PV of "1.5".

To cache the non-locale stringification of an NV would be quite different
from current behaviour. Currently we have no caching of stringification
that is distinguished from the scalar having originated from a string​:
if it's OK to use the PV at all then it's OK to use the PV everywhere
that a PV is required.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From zefram@fysh.org

demerphq wrote​:

LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use locale;
print $f; print 0.42; } print $f'

If that prints​:
0.42
0,42
0,42
0.42

That would be insane semantics. That would mean that computing the
numeric coercion of the string caused the string to lose its original
string value, with its string value subsequently determined afresh (in
a locale-influenced manner) by coercion from the numeric result of the
first coercion. We would not want strings to be so fragile.

If that prints​:
0.42
0.42
0,42
0.42

Then IMO the current behavior is insane

That is what it does print. I don't see the insanity. I see semantics
that are consistent​: a string value retains its character sequence
regardless of locale, and a numeric value coerces to string in a way
that depends on the lexical locale settings.

                                   and we should just the revert

the "we dont cache stringified NV's patch" as being completely broken.

I particularly don't see how you can blame caching semantics for any
insanity here. The semantics are quite simple, arising from composition
of the operations to which values are subjected, with lexically-scoped
flags only affecting operations within their scope. If caching
made a visible change to the semantics, deviating from such a simple
arrangement, *that* would be broken and something to change. But that
doesn't happen here. There used to be such brokenness around NV-to-PV
stringification, and the change to stop caching these stringifications
is what fixed it. But that's not brought out in your test code above,
which doesn't stringify any numeric scalar twice​: you're not exercising
the caching semantics.

I am assuming that we all agree it *should* output like the first set,

No, we do not.

ps​: As an aside, this ticket is a good example of how locale and perl
core philosophy don't play nicely together.

Yes. The fact that the stringification depends on locale settings breaks
the model of every scalar having a string value. It means that a numeric
scalar doesn't have a *consistent* string value, so it's not safe to pass
one to another piece of code (to a library function) for use as a string.
The other module might not perceive it as having the string value that
you think it has. We have some other deviations from that principle too​:
most strongly tying, and formerly the $# variable.

The change in 5.20 from having the locale for stringification depend
only on setlocale() calls to having it depend on the lexical locale
flag is a strange one. I'm not sure what the intent was, and as far as
I can see it's not mentioned in perl5200delta. I wonder if some people
are conflating it with the don't-cache-stringification change which was
also in 5.20​: actually they're separate, caching stopped in 5.19.1 and
locale control changed in 5.19.8. Although the stringification depends on
some form of locale both before and after this change, the change makes
a difference to the practical extent of locale dependence, but it's a
matter of debate in which direction. Previously the stringification
didn't depend on lexical flags, so passing an NV to another module for
string purposes was relatively safe. Now the stringification for code
that doesn't use the locale flag is invariant, so passing an NV to another
non-locale module for string purposes is safe (if one is confident about
the other module's internal flag settings).

-zefram

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @demerphq

Thank you for setting me straight. I see my definition of insane is
wrong. I apologise for using that term and I now see why Karl did the
changes he did. I guess we are now in a less bad position than before,
but the whole thing feels wrong to me from the get go. Somehow our
model of strings and numbers doesnt quite manage to be consistent in
every regard. I find that very frustrating. Anyway, I will shut up
now. Sorry for the noise.

Yves

On 13 April 2017 at 11​:33, Zefram <zefram@​fysh.org> wrote​:

demerphq wrote​:

LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use locale;
print $f; print 0.42; } print $f'

If that prints​:
0.42
0,42
0,42
0.42

That would be insane semantics. That would mean that computing the
numeric coercion of the string caused the string to lose its original
string value, with its string value subsequently determined afresh (in
a locale-influenced manner) by coercion from the numeric result of the
first coercion. We would not want strings to be so fragile.

If that prints​:
0.42
0.42
0,42
0.42

Then IMO the current behavior is insane

That is what it does print. I don't see the insanity. I see semantics
that are consistent​: a string value retains its character sequence
regardless of locale, and a numeric value coerces to string in a way
that depends on the lexical locale settings.

                                   and we should just the revert

the "we dont cache stringified NV's patch" as being completely broken.

I particularly don't see how you can blame caching semantics for any
insanity here. The semantics are quite simple, arising from composition
of the operations to which values are subjected, with lexically-scoped
flags only affecting operations within their scope. If caching
made a visible change to the semantics, deviating from such a simple
arrangement, *that* would be broken and something to change. But that
doesn't happen here. There used to be such brokenness around NV-to-PV
stringification, and the change to stop caching these stringifications
is what fixed it. But that's not brought out in your test code above,
which doesn't stringify any numeric scalar twice​: you're not exercising
the caching semantics.

I am assuming that we all agree it *should* output like the first set,

No, we do not.

ps​: As an aside, this ticket is a good example of how locale and perl
core philosophy don't play nicely together.

Yes. The fact that the stringification depends on locale settings breaks
the model of every scalar having a string value. It means that a numeric
scalar doesn't have a *consistent* string value, so it's not safe to pass
one to another piece of code (to a library function) for use as a string.
The other module might not perceive it as having the string value that
you think it has. We have some other deviations from that principle too​:
most strongly tying, and formerly the $# variable.

The change in 5.20 from having the locale for stringification depend
only on setlocale() calls to having it depend on the lexical locale
flag is a strange one. I'm not sure what the intent was, and as far as
I can see it's not mentioned in perl5200delta. I wonder if some people
are conflating it with the don't-cache-stringification change which was
also in 5.20​: actually they're separate, caching stopped in 5.19.1 and
locale control changed in 5.19.8. Although the stringification depends on
some form of locale both before and after this change, the change makes
a difference to the practical extent of locale dependence, but it's a
matter of debate in which direction. Previously the stringification
didn't depend on lexical flags, so passing an NV to another module for
string purposes was relatively safe. Now the stringification for code
that doesn't use the locale flag is invariant, so passing an NV to another
non-locale module for string purposes is safe (if one is confident about
the other module's internal flag settings).

-zefram

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @khwilliamson

On 04/13/2017 03​:33 AM, Zefram wrote​:

The change in 5.20 from having the locale for stringification depend
only on setlocale() calls to having it depend on the lexical locale
flag is a strange one. I'm not sure what the intent was, and as far as
I can see it's not mentioned in perl5200delta. I wonder if some people
are conflating it with the don't-cache-stringification change which was
also in 5.20​: actually they're separate, caching stopped in 5.19.1 and
locale control changed in 5.19.8. Although the stringification depends on
some form of locale both before and after this change, the change makes
a difference to the practical extent of locale dependence, but it's a
matter of debate in which direction. Previously the stringification
didn't depend on lexical flags, so passing an NV to another module for
string purposes was relatively safe. Now the stringification for code
that doesn't use the locale flag is invariant, so passing an NV to another
non-locale module for string purposes is safe (if one is confident about
the other module's internal flag settings).

-zefram

I don't understand what you mean here about strange. Did you have a
particular commit in mind? The one that looks likely to me is​:

commit c8f77a9
*Merge​: 3eab96c c69a26e
  Author​: Karl Williamson <public@​khwilliamson.com>
  Date​: Sat Jan 4 13​:35​:33 2014 -0700

* Merge LC_NUMERIC locale changes branch into blead

  LC_NUMERIC hasn't been implemented quite the same way as the other
  locale categories. And the implementation has been somewhat
haphazard.
  The other categories have implementations where if you're not under
  locale you simply use different operations. That isn't possible with
  LC_NUMERIC, as it may need libc functions that are always subject
to the
  current locale no matter what Perl thinks.

  There are two possible implemantation paths that come to my mind
to deal
  with this. One is to keep correctly set the locale that the libc
  routines need, and switch to the C locale during those places where it
  shouldn't be used. The other way is the opposite, to keep things
in the
  C locale generally, and switch when needed.

  Unfortunately the implementation (prior to this series of commits)
used
  a combination of both possibilities. I am still unsure what the
  original intent was (not having spent the time to dig through the
  history), or even if there was a consistent intent.

  In any event, there has long been infrastructure that facilitates
  switching back and forth between the current underlying locale and
the C
  locale. However this was not documented until now, and so it is not
  surprising that people who came later (including me) did not realize
  it existed, and reinvented things, inconsistently.

  What I've done here is move to the first implementation path mentioned
  above. I believe this is the one more likely to show up other bugs
  during the remainder of the 5.19 development cycle. I have
changed and
  added to the infrastructure, so that it knows whether we should be in
  the C or the underlying locale, and switches/restores if and only
if it
  is necessary. We can change to the other implementation path later
  with only minimal code changes.

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @khwilliamson

On 04/13/2017 11​:32 AM, Karl Williamson wrote​:

On 04/13/2017 03​:33 AM, Zefram wrote​:

The change in 5.20 from having the locale for stringification depend
only on setlocale() calls to having it depend on the lexical locale
flag is a strange one. I'm not sure what the intent was, and as far as
I can see it's not mentioned in perl5200delta. I wonder if some people
are conflating it with the don't-cache-stringification change which was
also in 5.20​: actually they're separate, caching stopped in 5.19.1 and
locale control changed in 5.19.8. Although the stringification
depends on
some form of locale both before and after this change, the change makes
a difference to the practical extent of locale dependence, but it's a
matter of debate in which direction. Previously the stringification
didn't depend on lexical flags, so passing an NV to another module for
string purposes was relatively safe. Now the stringification for code
that doesn't use the locale flag is invariant, so passing an NV to
another
non-locale module for string purposes is safe (if one is confident about
the other module's internal flag settings).

-zefram

I don't understand what you mean here about strange. Did you have a
particular commit in mind? The one that looks likely to me is​:

commit c8f77a9
*Merge​: 3eab96c c69a26e
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Jan 4 13​:35​:33 2014 -0700

* Merge LC_NUMERIC locale changes branch into blead

 LC\_NUMERIC hasn't been implemented quite the same way as the other
 locale categories\.  And the implementation has been somewhat

haphazard.
The other categories have implementations where if you're not under
locale you simply use different operations. That isn't possible with
LC_NUMERIC, as it may need libc functions that are always subject
to the
current locale no matter what Perl thinks.

 There are two possible implemantation paths that come to my mind to

deal
with this. One is to keep correctly set the locale that the libc
routines need, and switch to the C locale during those places where it
shouldn't be used. The other way is the opposite, to keep things
in the
C locale generally, and switch when needed.

 Unfortunately the implementation \(prior to this series of commits\)

used
a combination of both possibilities. I am still unsure what the
original intent was (not having spent the time to dig through the
history), or even if there was a consistent intent.

 In any event\, there has long been infrastructure that facilitates
 switching back and forth between the current underlying locale and

the C
locale. However this was not documented until now, and so it is not
surprising that people who came later (including me) did not realize
it existed, and reinvented things, inconsistently.

 What I've done here is move to the first implementation path mentioned
 above\.  I believe this is the one more likely to show up other bugs
 during the remainder of the 5\.19 development cycle\.  I have changed

and
added to the infrastructure, so that it knows whether we should be in
the C or the underlying locale, and switches/restores if and only
if it
is necessary. We can change to the other implementation path later
with only minimal code changes.

And this came about because of​:

commit bc8ec7c
  Author​: Karl Williamson <public@​khwilliamson.com>
  Date​: Wed Dec 11 16​:25​:02 2013 -0700

  PATCH​: [perl #120723] Setting LC_NUMERIC breaks parsing of constants

  This is the final patch for [perl #120723], and adds tests for it.

  LC_NUMERIC Locale handling was broken for code during the
compilation phase,
  such as BEGIN {} blocks. This is because, for some reason, perl.c set
  LC_NUMERIC unconditionally back to the C locale right after locale
  initialization. I suspect that was to allow the core's parsing to
not be
  affected by locale. However, earlier commits in this series have
added code to
  change/restore the locale during sections of the parsing where
this might
  matter, so this setting to the C locale is not needed.

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From zefram@fysh.org

Karl Williamson wrote​:

I don't understand what you mean here about strange.

I meant that it has a strange status with respect to those issues of the
scalar abstraction and consistency of string value that I had discussed
in the preceding paragraph. It is also strange that it is not mentioned
in perl5200delta.

                                                 Did you have a

particular commit in mind? The one that looks likely to me is​:

I haven't narrowed it down to a specific commit. The one you identify
is probably it. I was referring to this change in behaviour​:

$ LANG=de_DE perl5.19.7 -MPOSIX=setlocale,LC_ALL -lwe '{ use locale; print 1.5; } setlocale(LC_ALL, ""); print 1.5;'
1.5
1,5
$ LANG=de_DE perl5.19.8 -MPOSIX=setlocale,LC_ALL -lwe '{ use locale; print 1.5; } setlocale(LC_ALL, ""); print 1.5;'
1,5
1.5

-zefram

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @cpansprout

On Wed, 12 Apr 2017 23​:44​:50 -0700, demerphq wrote​:

On 12 Apr 2017 23​:24, "Father Chrysostomos via RT" <
perlbug-followup@​perl.org> wrote​:

On Wed, 12 Apr 2017 10​:56​:42 -0700, demerphq wrote​:

On 12 Apr 2017 11​:56, "Zefram" <zefram@​fysh.org> wrote​:

demerphq wrote​:

Is there a reason we cannot make this "no-cache" behavior "use locale"
specific?

Yes​: getting different stringifications arises from two stringification
operations having different locale status, and one can't tell whether
they'll be different just by looking at whether the first one had
locale enabled.

$ LANG=de_DE perl -lwe '$a = 1.5; print "$a"; { use locale; print "$a"; }'
1.5
1,5

I feel like you did not read what I wrote. This example does not
demonstrate a flaw in my proposal as far as I can tell. The first
stringify
should be cached, then ignored by the second stringify as locale is in
effect. Were there a third stringify at the end after the block I would
say
it should use the result of the first stringify.

For instance I would have thought the logic would be​: if under "use
locale" we do not trust the PV slot of a PVNV,

You seem to be imagining that "use locale" is a global flag affecting
the whole program run uniformly. It's actually a lexically-scoped flag,
affecting each statement individually

Yes. I get that. So what?

If there is a flag or function that's says 'use locale' is in effect, and
I know there is one to use as the regex engine uses it, then every time we
do SVpv on an NV we can check the flag and behave accordingly. If we do
not
cache locale stringified strings and do not trust the PV slot of an NV
under locale we should do the right thing.

How would we distinguish between an NV that got stringified outside of ‘use
locale’ and a string that got numified at some point? Presumably we would
use just SVp_POK, not SVf_POK, for a stringified NV

Do you forsee any difference with current behaviour? As far as I understand
the behavior Karl described right now already has this problem. Once
something becomes NV we stop using its PV slot at all.

Yved

It looks to me as though this scalar is still POK​:

$ perl5.24.0 -MDevel​::Peek -e '$x = "1.3"; 0+$x; Dump $x'
SV = PVNV(0x7f9faa0042b0) at 0x7f9faa02ae58
  REFCNT = 1
  FLAGS = (NOK,POK,IsCOW,pIOK,pNOK,pPOK)
  IV = 1
  NV = 1.3
  PV = 0x7f9fa9c04c40 "1.3"\0
  CUR = 3
  LEN = 10
  COW_REFCNT = 1

--

Father Chrysostomod

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2017

From @cpansprout

On Thu, 13 Apr 2017 00​:17​:01 -0700, demerphq wrote​:

Color me very very surprised​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use
locale;

print $f; print 0.42; } print $f'
0.42
0.42
0,42
0.42

That is what I would expect. At no point did you assign a value to $f other than the string "0.42".

(I do agree, though, that locales are fundamentally incompatible with Perl’s scalar model.)

Even worse​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f1= "0.42"; my $f2; print 0+$f1; {
use locale; print $f1; print 0.42; $f2="0,42"; print 0+$f2; } print
$f1; print $f2'
0.42
0.42
0,42
0,42
0.42
0,42

IMO these are completely broken semantics.

Again, that is what I would expect. As long as I assign the string "0.42" to a scalar, I want it stringified exactly the same way, regardless of what contexts I might have used it in, even if we are under a different locale.

I do not disagree that the semantics are broken, though. The whole locale model should have been better thought through and probably provided as a functional interface, rather than a change in the way scalars behave.

Removing caching/promotion
of SvNV to SvPVNV with these semantics just hides a chunk of the
insanity, it does not fix it, and everybody who does not use locale,
(iow the vast majority) has to pay for the partial fix.

cheers,
Yves

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2017

From @demerphq

On 13 April 2017 at 22​:55, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Thu, 13 Apr 2017 00​:17​:01 -0700, demerphq wrote​:

Color me very very surprised​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f= "0.42"; print 0+$f; { use
locale;

print $f; print 0.42; } print $f'
0.42
0.42
0,42
0.42

That is what I would expect. At no point did you assign a value to $f other than the string "0.42".

(I do agree, though, that locales are fundamentally incompatible with Perl’s scalar model.)

Even worse​:

$ LC_ALL=de_DE.UTF-8 perl -le 'my $f1= "0.42"; my $f2; print 0+$f1; {
use locale; print $f1; print 0.42; $f2="0,42"; print 0+$f2; } print
$f1; print $f2'
0.42
0.42
0,42
0,42
0.42
0,42

IMO these are completely broken semantics.

Again, that is what I would expect. As long as I assign the string "0.42" to a scalar, I want it stringified exactly the same way, regardless of what contexts I might have used it in, even if we are under a different locale.

Yes, once you take the position that "the original string should be
preserved" some of this makes more sense.

I do not disagree that the semantics are broken, though. The whole locale model should have been better thought through and probably provided as a functional interface, rather than a change in the way scalars behave.

Thanks. This is what I should have said myself.

Yves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant