Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: [rt.cpan.org #101876] losing string value of semi-numeric string #14464

Open
p5pRT opened this issue Feb 2, 2015 · 7 comments
Open

Re: [rt.cpan.org #101876] losing string value of semi-numeric string #14464

p5pRT opened this issue Feb 2, 2015 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 2, 2015

Migrated from rt.perl.org#123715 (status was 'open')

Searchable as RT123715$

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

From @demerphq

On 2 February 2015 at 11​:33, Zefram via RT
<bug-Sereal-Encoder@​rt.cpan.org> wrote​:

Mon Feb 02 05​:33​:20 2015​: Request 101876 was acted upon.
Transaction​: Ticket created by zefram@​fysh.org
Queue​: Sereal-Encoder
Subject​: losing string value of semi-numeric string
Broken in​: (no value)
Severity​: (no value)
Owner​: Nobody
Requestors​: zefram@​fysh.org
Status​: new
Ticket <URL​: https://rt.cpan.org/Ticket/Display.html?id=101876 >

$ perl -MSereal​::Encoder=encode_sereal -MSereal​::Decoder=decode_sereal -lwe 'print $]; print $Sereal​::Encoder​::VERSION; my $a="0 but true"; print decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print decode_sereal(encode_sereal($a));'
5.018002
3.005
0 but true
0 but true
0

I believe the first encoding is representing $a as a string but the
second encoding is representing it as a pure integer, based on the IOK
flag. In the case of this string, along with infinitely many others
such as "00", "01", and "1 ", the integer representation is lossy.
It's particularly significant for strings such as "0 but true" and "00"
which qualify as true but come out as false when mangled by the lossy
encoding. But even when the truth value doesn't change, it is not at
all acceptable to lose the string value.

The underlying mistake is that you've treated the IOK flag as implying
that the scalar is fully characterised by its IV. In general that is
not the case. For scalars that are both IOK and POK, to see whether
integer representation suffices you need to perform the IV->PV coercion
yourself, and see whether the PV generated from the IV matches the
scalar's actual PV. Similar remarks apply to NOK and NV. For extra fun,
the exact meaning of the [PIN]OK flags varies between Perl versions.

No. I disagree. This is a bug in perl itself.

$ perl -MDevel​::Peek -le'my $x="0 but true"; my $y=0+$x; Dump($x)'
SV = PVIV(0x7cdd88) at 0x7d9a48
  REFCNT = 1
  FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
  IV = 0
  PV = 0x7d2b90 "0 but true"\0
  CUR = 10
  LEN = 16

The IOK flag should NOT be set here, it should be pIOK only.

IOK means that the integer representation is either a) canonical, or
b) a faithful representation of the PV.

pIOK is supposed to mean that the cached value of the string can be
used, but that it is not a faithful representation of the string it
was derived from.

(If IOK and pIOK do not mean these things then it is a total waste to
have both set of flags, which seems an unreasonable interpretation.)

Compare to this​:

$ perl -MDevel​::Peek -le'my $x="0blahblah"; my $y=0+$x; Dump($x)'
SV = PVNV(0x1bcaf10) at 0x1beaa58
  REFCNT = 1
  FLAGS = (PADMY,POK,pIOK,pNOK,pPOK)
  IV = 0
  NV = 0
  PV = 0x1be3ba0 "0blahblah"\0
  CUR = 9

IMO this is clearly a bug in the special case logic for "0 but true".
It should NOT set the IOK flag, it should set only the pIOK flag.

I will naturally try to fix this in Sereal, but I consider this a bug
in Perl and I am sending this to perlbug because of it.

cheers,
Yves

perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

From @sisyphus

-----Original Message-----
From​: yves orton (via RT)
Sent​: Monday, February 02, 2015 9​:51 PM

IMO this is clearly a bug in the special case logic for "0 but true".
It should NOT set the IOK flag, it should set only the pIOK flag.

I will naturally try to fix this in Sereal, but I consider this a bug
in Perl and I am sending this to perlbug because of it.

I notice that '0 but true' doesn't trigger the "isn't numeric" warning when
used in numeric context​:

C​:\>perl -wle "$x = '0 but true' + 17;print $x;"
17

C​:\>

And the looks_like_number() API function returns true for '0 but true' - as
also, of course, does Scalar​::Util​::looks_like_number​:

C​:\>perl -MScalar​::Util -le "print 'nok' if
Scalar​::Util​::looks_like_number('0 but true');"
nok

C​:\>

The string '0 but true' doesn't look much like a number to me, but this
strange behaviour seems to have been round for quite a while. I find it in
5.8.8, 5.20.0, and current blead.

Cheers,
Rob

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

From @iabyn

On Mon, Feb 02, 2015 at 11​:01​:09PM +1100, sisyphus1@​optusnet.com.au wrote​:

The string '0 but true' doesn't look much like a number to me, but this
strange behaviour seems to have been round for quite a while. I find it in
5.8.8, 5.20.0, and current blead.

It's documented in perlfunc​:

  ...
  C<"0 but true"> in Perl. This string is true in boolean context and C<0>
  in numeric context. It is also exempt from the normal B<-w> warnings
  on improper numeric conversions.

and has been that way since perl 3.

--
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
  -- Life of Brian

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

From @demerphq

On 2 February 2015 at 15​:50, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Feb 02, 2015 at 11​:01​:09PM +1100, sisyphus1@​optusnet.com.au wrote​:

The string '0 but true' doesn't look much like a number to me, but this
strange behaviour seems to have been round for quite a while. I find it in
5.8.8, 5.20.0, and current blead.

It's documented in perlfunc​:

\.\.\.
C\<"0 but true"> in Perl\.  This string is true in boolean context and C\<0>
in numeric context\.  It is also exempt from the normal B\<\-w> warnings
on improper numeric conversions\.

and has been that way since perl 3.

Just to be clear I have no intention of breaking this. I just want to
make sure it does not set IOK when it is used.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 2, 2015

From @cpansprout

On Mon Feb 02 02​:51​:59 2015, demerphq wrote​:

On 2 February 2015 at 11​:33, Zefram via RT
<bug-Sereal-Encoder@​rt.cpan.org> wrote​:

Mon Feb 02 05​:33​:20 2015​: Request 101876 was acted upon.
Transaction​: Ticket created by zefram@​fysh.org
Queue​: Sereal-Encoder
Subject​: losing string value of semi-numeric string
Broken in​: (no value)
Severity​: (no value)
Owner​: Nobody
Requestors​: zefram@​fysh.org
Status​: new
Ticket <URL​: https://rt.cpan.org/Ticket/Display.html?id=101876 >

$ perl -MSereal​::Encoder=encode_sereal
-MSereal​::Decoder=decode_sereal -lwe 'print $]; print
$Sereal​::Encoder​::VERSION; my $a="0 but true"; print
decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print
decode_sereal(encode_sereal($a));'
5.018002
3.005
0 but true
0 but true
0

I believe the first encoding is representing $a as a string but the
second encoding is representing it as a pure integer, based on the
IOK
flag. In the case of this string, along with infinitely many others
such as "00", "01", and "1 ", the integer representation is lossy.
It's particularly significant for strings such as "0 but true" and
"00"
which qualify as true but come out as false when mangled by the lossy
encoding. But even when the truth value doesn't change, it is not at
all acceptable to lose the string value.

The underlying mistake is that you've treated the IOK flag as
implying
that the scalar is fully characterised by its IV. In general that is
not the case. For scalars that are both IOK and POK, to see whether
integer representation suffices you need to perform the IV->PV
coercion
yourself, and see whether the PV generated from the IV matches the
scalar's actual PV. Similar remarks apply to NOK and NV. For extra
fun,
the exact meaning of the [PIN]OK flags varies between Perl versions.

No. I disagree. This is a bug in perl itself.

$ perl -MDevel​::Peek -le'my $x="0 but true"; my $y=0+$x; Dump($x)'
SV = PVIV(0x7cdd88) at 0x7d9a48
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 0
PV = 0x7d2b90 "0 but true"\0
CUR = 10
LEN = 16

The IOK flag should NOT be set here, it should be pIOK only.

IOK means that the integer representation is either a) canonical, or
b) a faithful representation of the PV.

pIOK is supposed to mean that the cached value of the string can be
used, but that it is not a faithful representation of the string it
was derived from.

(If IOK and pIOK do not mean these things then it is a total waste to
have both set of flags, which seems an unreasonable interpretation.)

Compare to this​:

$ perl -MDevel​::Peek -le'my $x="0blahblah"; my $y=0+$x; Dump($x)'
SV = PVNV(0x1bcaf10) at 0x1beaa58
REFCNT = 1
FLAGS = (PADMY,POK,pIOK,pNOK,pPOK)
IV = 0
NV = 0
PV = 0x1be3ba0 "0blahblah"\0
CUR = 9

IMO this is clearly a bug in the special case logic for "0 but true".
It should NOT set the IOK flag, it should set only the pIOK flag.

I will naturally try to fix this in Sereal, but I consider this a bug
in Perl and I am sending this to perlbug because of it.

Do you think "00" should also have the IOK flag off?

$ perl -le ' use Devel​::Peek; $x = "00"; 0+$x; Dump $x'
SV = PVIV(0x7ff3da027410) at 0x7ff3da02c330
  REFCNT = 1
  FLAGS = (IOK,POK,pIOK,pPOK)
  IV = 0
  PV = 0x7ff3d9c06220 "00"\0
  CUR = 2
  LEN = 16

You could say that is not ‘a faithful representation of the PV’ because the numeric value would stringify differently.

But, honestly, apart from Sereal, are there any cases where such a distinction would matter? In practice, what does IOK mean? It means to skip a non-numeric warning and not even bother checking the PV. Anything else?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2015

From @demerphq

On 2 February 2015 at 21​:44, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Feb 02 02​:51​:59 2015, demerphq wrote​:

On 2 February 2015 at 11​:33, Zefram via RT
<bug-Sereal-Encoder@​rt.cpan.org> wrote​:

Mon Feb 02 05​:33​:20 2015​: Request 101876 was acted upon.
Transaction​: Ticket created by zefram@​fysh.org
Queue​: Sereal-Encoder
Subject​: losing string value of semi-numeric string
Broken in​: (no value)
Severity​: (no value)
Owner​: Nobody
Requestors​: zefram@​fysh.org
Status​: new
Ticket <URL​: https://rt.cpan.org/Ticket/Display.html?id=101876 >

$ perl -MSereal​::Encoder=encode_sereal
-MSereal​::Decoder=decode_sereal -lwe 'print $]; print
$Sereal​::Encoder​::VERSION; my $a="0 but true"; print
decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print
decode_sereal(encode_sereal($a));'
5.018002
3.005
0 but true
0 but true
0

I believe the first encoding is representing $a as a string but the
second encoding is representing it as a pure integer, based on the
IOK
flag. In the case of this string, along with infinitely many others
such as "00", "01", and "1 ", the integer representation is lossy.
It's particularly significant for strings such as "0 but true" and
"00"
which qualify as true but come out as false when mangled by the lossy
encoding. But even when the truth value doesn't change, it is not at
all acceptable to lose the string value.

The underlying mistake is that you've treated the IOK flag as
implying
that the scalar is fully characterised by its IV. In general that is
not the case. For scalars that are both IOK and POK, to see whether
integer representation suffices you need to perform the IV->PV
coercion
yourself, and see whether the PV generated from the IV matches the
scalar's actual PV. Similar remarks apply to NOK and NV. For extra
fun,
the exact meaning of the [PIN]OK flags varies between Perl versions.

No. I disagree. This is a bug in perl itself.

$ perl -MDevel​::Peek -le'my $x="0 but true"; my $y=0+$x; Dump($x)'
SV = PVIV(0x7cdd88) at 0x7d9a48
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 0
PV = 0x7d2b90 "0 but true"\0
CUR = 10
LEN = 16

The IOK flag should NOT be set here, it should be pIOK only.

IOK means that the integer representation is either a) canonical, or
b) a faithful representation of the PV.

pIOK is supposed to mean that the cached value of the string can be
used, but that it is not a faithful representation of the string it
was derived from.

(If IOK and pIOK do not mean these things then it is a total waste to
have both set of flags, which seems an unreasonable interpretation.)

Compare to this​:

$ perl -MDevel​::Peek -le'my $x="0blahblah"; my $y=0+$x; Dump($x)'
SV = PVNV(0x1bcaf10) at 0x1beaa58
REFCNT = 1
FLAGS = (PADMY,POK,pIOK,pNOK,pPOK)
IV = 0
NV = 0
PV = 0x1be3ba0 "0blahblah"\0
CUR = 9

IMO this is clearly a bug in the special case logic for "0 but true".
It should NOT set the IOK flag, it should set only the pIOK flag.

I will naturally try to fix this in Sereal, but I consider this a bug
in Perl and I am sending this to perlbug because of it.

Do you think "00" should also have the IOK flag off?

Yes.

$ perl -le ' use Devel​::Peek; $x = "00"; 0+$x; Dump $x'
SV = PVIV(0x7ff3da027410) at 0x7ff3da02c330
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 0
PV = 0x7ff3d9c06220 "00"\0
CUR = 2
LEN = 16

You could say that is not ‘a faithful representation of the PV’ because the numeric value would stringify differently.

Yes.

But, honestly, apart from Sereal, are there any cases where such a distinction would matter? In practice, what does IOK mean? It means to skip a non-numeric warning and not even bother checking the PV. Anything else?

Every serialization package out there.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants