Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNcytyping appears to have regressed and broken again in 5.8.8 #8807

Open
p5pRT opened this issue Feb 28, 2007 · 13 comments
Open

NaNcytyping appears to have regressed and broken again in 5.8.8 #8807

p5pRT opened this issue Feb 28, 2007 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 28, 2007

Migrated from rt.perl.org#41645 (status was 'open')

Searchable as RT41645$

@p5pRT
Copy link
Author

p5pRT commented Feb 28, 2007

From @obra

Created by jesse@bestpractical.com

Last night, I was chatting with a pythonista friend of mine who cracked a joke about how poorly 5.6 worked, using as his example "nancytyping." (0 + "Nancy" numifies to "NaN").

He admitted that this had been fixed for 5.8.x or so. As I was playing around, I discovered that

1) Yes, it had been fixed, at least by 5.8.4..and appears to have stayed fixed though 5.8.7

2) It appears to have broken again for 5.8.8 and a recent blead.

3) There are no tests for it. The following testfile should probably get added to core​:

#!./perl

BEGIN {
  chdir 't' if -d 't';
  @​INC = '../lib';
}

use strict;
use warnings;

use Test​::More tests => 2;
{
no warnings;

# 5.6 and 5.8.8 and newer appear to treat "Nancy" as "NaN", leading to
# some very wrong coercion

is (0+"Nancy", 0);
is (0+"Fred", 0);
}

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.7:

Configured by Debian Project at Thu Dec 15 17:30:10 UTC 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.6.14.3, archname=i486-linux-gnu-thread-multi
    uname='linux ninsei 2.6.14.3 #1 smp preempt mon nov 28 19:51:50 pst 2005 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.7 -Dsitearch=/usr/local/lib/perl/5.8.7 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.7 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.0.3 20051201 (prerelease) (Debian 4.0.2-5)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.5.so, so=so, useshrplib=true, libperl=libperl.so.5.8.7
    gnulibc_version='2.3.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962


@INC for perl v5.8.7:
    /etc/perl
    /usr/local/lib/perl/5.8.7
    /usr/local/share/perl/5.8.7
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    .


Environment for perl v5.8.7:
    HOME=/home/jesse
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @ysth

Jesse Vincent wrote​:

Last night, I was chatting with a pythonista friend of mine who cracked a
joke about how poorly 5.6 worked, using as his example "nancytyping." (0 +
"Nancy" numifies to "NaN").

He admitted that this had been fixed for 5.8.x or so. As I was playing
around, I discovered that

1) Yes, it had been fixed, at least by 5.8.4..and appears to have stayed
fixed though 5.8.7

2) It appears to have broken again for 5.8.8 and a recent blead.

3) There are no tests for it. The following testfile should probably get
added to core​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are treated
in general; it's not much different from 0+"123beer" being 123.

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @schwern

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are treated
in general; it's not much different from 0+"123beer" being 123.

Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of course
"Infant" is inf. About the only reasonable special case of this sort of
thing is "Infinity".

At least they warn.

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @ysth

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are
treated
in general; it's not much different from 0+"123beer" being 123.

Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of course
"Infant" is inf. About the only reasonable special case of this sort of
thing is "Infinity".

At least they warn.

The idea is that you be able to take a stringized number and get vaguely
the same number back out of it. And stringizing of NaNs and Infs is loosely
defined​: Inf properly stringizes as either "inf" or "infinity" (leaving
broken MSWin32 out of it) while an NaN stringizes as "nan" followed by
an optional, implementation-defined, series of other characters.

We could add another, more specific, warning when we notice that only
the first "Nan" in "Nancy" was consumed in getting the numeric value,
but I don't think that adds much value.

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @nwc10

On Wed, Feb 28, 2007 at 09​:22​:28PM -0800, Michael G Schwern wrote​:

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are treated
in general; it's not much different from 0+"123beer" being 123.

Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of course
"Infant" is inf. About the only reasonable special case of this sort of
thing is "Infinity".

At least they warn.

I thought that the have /\A-?inf(?​:inity)?\z/i as the valid ways of spelling
infinity. I can't remember whether NaN is allowed to be /\Anan\z/i or if
it's also acceptable to say /\Anan\(.*\)\z/i

I *think* that all this broke between 5.6.x and 5.8.0, in that in 5.6.x
platforms where atof() recognised those strings got those values, and
everyone else got 0.0

I don't have a machine with 5.6.x where 'NaN' is NaN, but I tried this
test program

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  while (*++argv) {
  double f = atof(*argv);
  printf("'%s' is %g\n", *argv, f);
  }
  return 0;
}

on FreeBSD, Linux and Solaris and the results are, well, not what I'd hoped
for​:

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is inf
'nan' is nan
'nancy' is nan
'NaN' is nan
'infinite' is inf
'infinitesimal' is inf
'-inFerior' is -inf

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is inf
'nan' is nan
'nancy' is nan
'NaN' is nan
'infinite' is inf
'infinitesimal' is inf
'-inFerior' is -inf

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is Inf
'nan' is NaN
'nancy' is NaN
'NaN' is NaN
'infinite' is Inf
'infinitesimal' is Inf
'-inFerior' is -Inf

(and not what I thought happened when I chatted to Jesse about this on IRC)

Well, at least Solaris still is C, rather than C99​:

$ ./atof 0x3 0x3p2
'0x3' is 3
'0x3p2' is 12
$ uname
Linux

$ ./atof 0x3 0x3p2
'0x3' is 3
'0x3p2' is 12
$ uname
FreeBSD

$ ./atof 0x3 0x3p2
'0x3' is 0
'0x3p2' is 0
$ uname
SunOS

It was that bit of C99 stupidity that prompted this whole issue, IIRC.

You thought long long was bad - wait until you realise that they changed the
documented behaviour of a library function without changing the name. Your
libc can't be both C89 and C99 conformant.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @demerphq

On 3/1/07, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Feb 28, 2007 at 09​:22​:28PM -0800, Michael G Schwern wrote​:

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are treated
in general; it's not much different from 0+"123beer" being 123.

Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of course
"Infant" is inf. About the only reasonable special case of this sort of
thing is "Infinity".

At least they warn.

I thought that the have /\A-?inf(?​:inity)?\z/i as the valid ways of spelling
infinity. I can't remember whether NaN is allowed to be /\Anan\z/i or if
it's also acceptable to say /\Anan\(.*\)\z/i

I *think* that all this broke between 5.6.x and 5.8.0, in that in 5.6.x
platforms where atof() recognised those strings got those values, and
everyone else got 0.0

I don't have a machine with 5.6.x where 'NaN' is NaN, but I tried this
test program

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
while (*++argv) {
double f = atof(*argv);
printf("'%s' is %g\n", *argv, f);
}
return 0;
}

For the record here is what VC7 does​:

D​:\dev\perl\ver>atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is 0
'nan' is 0
'nancy' is 0
'NaN' is 0
'infinite' is 0
'infinitesimal' is 0
'-inFerior' is 0

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @ysth

Nicholas Clark wrote​:

I *think* that all this broke between 5.6.x and 5.8.0, in that in 5.6.x
platforms where atof() recognised those strings got those values, and
everyone else got 0.0

But I hear Jesse saying that what you and I call broke, he called a fix.
Unless I completely misunderstood his bug report.

So what's your take on this, Nicholas?

I don't have a machine with 5.6.x where 'NaN' is NaN, but I tried this
test program

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
while (*++argv) {
double f = atof(*argv);
printf("'%s' is %g\n", *argv, f);
}
return 0;
}

on FreeBSD, Linux and Solaris and the results are, well, not what I'd
hoped
for​:

Not what you hoped for how? You might also try strtod and see how
much of the string was considered part of the number.

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is inf
'nan' is nan
'nancy' is nan
'NaN' is nan
'infinite' is inf
'infinitesimal' is inf
'-inFerior' is -inf

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is inf
'nan' is nan
'nancy' is nan
'NaN' is nan
'infinite' is inf
'infinitesimal' is inf
'-inFerior' is -inf

$ ./atof 42 infinity nan nancy NaN infinite infinitesimal -inFerior
'42' is 42
'infinity' is Inf
'nan' is NaN
'nancy' is NaN
'NaN' is NaN
'infinite' is Inf
'infinitesimal' is Inf
'-inFerior' is -Inf

(and not what I thought happened when I chatted to Jesse about this on
IRC)

Well, at least Solaris still is C, rather than C99​:

$ ./atof 0x3 0x3p2
'0x3' is 3
'0x3p2' is 12
$ uname
Linux

$ ./atof 0x3 0x3p2
'0x3' is 3
'0x3p2' is 12
$ uname
FreeBSD

$ ./atof 0x3 0x3p2
'0x3' is 0
'0x3p2' is 0
$ uname
SunOS

It was that bit of C99 stupidity that prompted this whole issue, IIRC.

You thought long long was bad - wait until you realise that they changed
the
documented behaviour of a library function without changing the name. Your
libc can't be both C89 and C99 conformant.

Yeah, that really, really sucked.

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @ysth

demerphq wrote​:

For the record here is what VC7 does​:

D​:\dev\perl\ver>atof 42 infinity nan nancy NaN infinite infinitesimal
-inFerior
'42' is 42
'infinity' is 0
'nan' is 0
'nancy' is 0
'NaN' is 0
'infinite' is 0
'infinitesimal' is 0
'-inFerior' is 0

msvcrt (all versions, AFAIK) just plain doesn't do this.
Which actually less awful that it gets the stringification side
wrong too (inf being "1.#INF" or something crazy like that).

@p5pRT
Copy link
Author

p5pRT commented Mar 1, 2007

From @obra

On Thu, Mar 01, 2007 at 11​:32​:01AM -0800, Yitzchak Scott-Thoennes wrote​:

Nicholas Clark wrote​:

I *think* that all this broke between 5.6.x and 5.8.0, in that in 5.6.x
platforms where atof() recognised those strings got those values, and
everyone else got 0.0

But I hear Jesse saying that what you and I call broke, he called a fix.
Unless I completely misunderstood his bug report.

The backstory I'd gotten from folks was "this was broken in 5.6, fixed
for 5.8 and.. oh hey. broke again in 5.8.8"

I have no investment in one outcome or the other. (If you're trying to
add nancy and zero, you deserve to get hurt, IMO ;) And you do get a
warning.

So, I think I'd like to see my tests end up in core to make sure that
the behaviour stays consistent, but I don't care if if the logic is inverted.

-jesse

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2007

From @schwern

Yitzchak Scott-Thoennes wrote​:

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are
treated
in general; it's not much different from 0+"123beer" being 123.
Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of course
"Infant" is inf. About the only reasonable special case of this sort of
thing is "Infinity".

At least they warn.

The idea is that you be able to take a stringized number and get vaguely
the same number back out of it.

Yep, and I'm willing to bet that most folks would not guess that "Nancy" ==
nan and "Infant" == inf. They'd thing 0 if they're even aware of the
behavior of "j random string" as a number.

And stringizing of NaNs and Infs is loosely
defined​: Inf properly stringizes as either "inf" or "infinity" (leaving
broken MSWin32 out of it) while an NaN stringizes as "nan" followed by
an optional, implementation-defined, series of other characters.

Ok, let's define it a little better then.

  lc $thing eq "nan" -> nan
  lc $thing eq "inf" or lc $thing eq "infinity" -> inf
  $number . $garbage -> $number and a warning

Everything else​: 0 and a warning.

I'll bet this is all just a consequence of logic like "find the longest
prefix which looks like a number".

OTOH the heuristic of "it will use the longest prefix which looks like a
number" is easier to explain... although most people don't think of nan and
inf as numbers.

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2007

From @ysth

Michael SG chwern wrote​:

Yitzchak Scott-Thoennes wrote​:

Yitzchak Scott-Thoennes wrote​:

To me, 0+"Nancy" being NaN is consistent with how numeric values are
treated
in general; it's not much different from 0+"123beer" being 123.
Interpreting "123beer" as 123 is already pretty desperate but at least
understandable. Interpreting "Nancy" as NaN is well gone. And of
course
"Infant" is inf. About the only reasonable special case of this sort
of
thing is "Infinity".

At least they warn.

The idea is that you be able to take a stringized number and get vaguely
the same number back out of it.

Yep, and I'm willing to bet that most folks would not guess that "Nancy"

nan and "Infant" == inf. They'd thing 0 if they're even aware of the
behavior of "j random string" as a number.

And stringizing of NaNs and Infs is loosely
defined​: Inf properly stringizes as either "inf" or "infinity" (leaving
broken MSWin32 out of it) while an NaN stringizes as "nan" followed by
an optional, implementation-defined, series of other characters.

Ok, let's define it a little better then.

lc $thing eq "nan" \-> nan
lc $thing eq "inf" or lc $thing eq "infinity" \-> inf
$number \. $garbage \-> $number and a warning

Everything else​: 0 and a warning.

I was describing NV -> string and trying to say that string -> NV should
work the same. But I just see you describing string -> NV. Are you
meaning to suggest changing NV -> string conversion too?

I'll bet this is all just a consequence of logic like "find the longest
prefix which looks like a number".

OTOH the heuristic of "it will use the longest prefix which looks like a
number" is easier to explain... although most people don't think of nan
and inf as numbers.

That heuristic certainly wins the "whatever is shortest to document" front,
which is a big plus to me.

@p5pRT
Copy link
Author

p5pRT commented Mar 2, 2007

From @schwern

Yitzchak Scott-Thoennes wrote​:

And stringizing of NaNs and Infs is loosely
defined​: Inf properly stringizes as either "inf" or "infinity" (leaving
broken MSWin32 out of it) while an NaN stringizes as "nan" followed by
an optional, implementation-defined, series of other characters.
Ok, let's define it a little better then.

lc $thing eq "nan" \-> nan
lc $thing eq "inf" or lc $thing eq "infinity" \-> inf
$number \. $garbage \-> $number and a warning

Everything else​: 0 and a warning.

I was describing NV -> string and trying to say that string -> NV should
work the same. But I just see you describing string -> NV. Are you
meaning to suggest changing NV -> string conversion too?

Yeah, wasn't that the whole point of the bug?

I thought NV -> string was pretty well handled, if not exactly spelled out
in the docs. NaN -> "nan" and Infinity -> "inf".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants