Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support hexadecimal floats #13966

Closed
p5pRT opened this issue Jul 3, 2014 · 66 comments
Closed

support hexadecimal floats #13966

p5pRT opened this issue Jul 3, 2014 · 66 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 3, 2014

Migrated from rt.perl.org#122219 (status was 'resolved')

Searchable as RT122219$

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @jhi

[resubmitting since I think the grues ate my first attempt]

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1
* printf %a %A
* input (PV->NV)​: "0xh.hhhpnnn" + 3

Lack of %a noted by Dan Kogai​: https://groups.google.com/d/msg/perl.perl5.porters/c84JU0olnbQ/YwQczyrqE2YJ
Pointer given by Dan​: http​://en.wikipedia.org/wiki/Printf_format_string#Type

Possibly useful resource​: http​://www.exploringbinary.com/hexadecimal-floating-point-constants/ found by quick googling.

Ruby does support the %a %A as noted by Dan, and Python has float.hex() and float.fromhex().

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @jhi

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1

Oops, 0.01

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @iabyn

On Wed, Jul 02, 2014 at 07​:49​:46PM -0700, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1

Wouldn't that change the meaning of existing legal syntax​: e.g.

  print 0x1.10;
 
which currently prints "110", but would change to print "1.0625"

--
No matter how many dust sheets you use, you will get paint on the carpet.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @ilmari

Dave Mitchell <davem@​iabyn.com> writes​:

On Wed, Jul 02, 2014 at 07​:49​:46PM -0700, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1
  ^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

print 0x1\.10;

which currently prints "110", but would change to print "1.0625"

  $ perl -e 'print 0x1.10p+0'
  Bareword found where operator expected at -e line 1, near "10p"
  (Missing operator before p?)
  syntax error at -e line 1, near "10p
  "
  Execution of -e aborted due to compilation errors.

--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @jhi

On Thursday-201407-03, 8​:26, Dagfinn Ilmari Mannsåker via RT wrote​:

Dave Mitchell <davem@​iabyn.com> writes​:

On Wed, Jul 02, 2014 at 07​:49​:46PM -0700, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1
^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

 print 0x1\.10;

which currently prints "110", but would change to print "1.0625"

 $ perl \-e 'print 0x1\.10p\+0'
 Bareword found where operator expected at \-e line 1\, near "10p"
          \(Missing operator before p?\)
 syntax error at \-e line 1\, near "10p
 "
 Execution of \-e aborted due to compilation errors\.

Yeah, I think the 'p' (hmm, is that 'P' with %A?) is a mandatory part of
the package.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @Hugmeir

On Thu, Jul 3, 2014 at 2​:34 PM, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

On Thursday-201407-03, 8​:26, Dagfinn Ilmari Mannsåker via RT wrote​:

Dave Mitchell <davem@​iabyn.com> writes​:

On Wed, Jul 02, 2014 at 07​:49​:46PM -0700, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1

                   ^^^^^^^^^                        ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

 print 0x1\.10;

which currently prints "110", but would change to print "1.0625"

 $ perl \-e 'print 0x1\.10p\+0'
 Bareword found where operator expected at \-e line 1\, near "10p"
          \(Missing operator before p?\)
 syntax error at \-e line 1\, near "10p
 "
 Execution of \-e aborted due to compilation errors\.

Yeah, I think the 'p' (hmm, is that 'P' with %A?) is a mandatory part of the
package.

sub deadbeefp () {3}
0x1.deadbeefp+0

Personally, I think adding the construct + a deprecation warning for
pathological cases is a good enough (tm) tradeoff.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @iabyn

On Thu, Jul 03, 2014 at 01​:25​:54PM +0100, Dagfinn Ilmari Mannsåker wrote​:

Dave Mitchell <davem@​iabyn.com> writes​:

On Wed, Jul 02, 2014 at 07​:49​:46PM -0700, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN, e.g. 0x1.47ae147ae147bp-7 is 0.1
^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

print 0x1\.10;

which currently prints "110", but would change to print "1.0625"

$ perl \-e 'print 0x1\.10p\+0'
Bareword found where operator expected at \-e line 1\, near "10p"
         \(Missing operator before p?\)
syntax error at \-e line 1\, near "10p
"
Execution of \-e aborted due to compilation errors\.

Ah sorry, didn't spot the p.

--
You're only as old as you look.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2014

From @jhi

Yeah, I think the 'p' (hmm, is that 'P' with %A?) is a mandatory part of the
package.

sub deadbeefp () {3}
0x1.deadbeefp+0

You have a twisted mind, and this is a compliment.

Personally, I think adding the construct + a deprecation warning for
pathological cases is a good enough (tm) tradeoff.

Based on
http​://grep.cpan.me/?q=0x%5B0-9a-f%5D%2B%5C.%5B0-9a-f%5D%2Bp%5B%2B-%5D%5Cd%2B
(that's /0x[0-9a-f]+\.[0-9a-f]+p[+-]\d+/) I wouldn't bother even with a
warning. (All the hits seem to be to modules which already somehow try
to handle this currently non-native format.)

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2014

From @jhi

So I did some hacking to get this working for at least *printf and literals, and two patches are attached.
I cheated and just punted to using sprintf/strtod.

However​: the "hexadecimal floats" support seems to be quite... interesting. As in "interesting times" interesting.

So it's a C99 feature. Output with sprintf %a %A, input with strtod (or strtold). In theory.

The attached patches (and their tests) work with​:

OSX x86
Linux x86
Linux x86 -Duselongdouble

(I *think* the output side at least did work in win32, but the win32 smoker must be overwhelmed or something, I seem to get no results)

But cracks start to appear...

OS X x86 with -Duselongdouble has differences in the *printf output
Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all, haven't dug into it)

On the output side differences are easy since we are talking about floats​: the exponent may float.
0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

But even what the basic %a means seems to be up to interpretation​:
not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

But if strtod is not working, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems, like BSD, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized), we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2014

From @jhi

0001-Hexfloat-sprintf-a-A-part-of-perl-122219.patch
From aab62f78c4f785265ec874e220e45ec4a0653b06 Mon Sep 17 00:00:00 2001
From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Wed, 30 Jul 2014 21:59:57 -0400
Subject: [PATCH 1/2] Hexfloat sprintf %a/%A, part of perl #122219

Just punt the task to system printf, do whatever it does
for %a/%A (%[efgEFG] are handled likewise).

Let me count the ways this can go wrong:
(1) long doubles
(2) no %a (it's C99)
(2) different implementations of %a
(3) broken implementations of %a
(5) IEEE 754 does not define endianness (big, little, mixed (some arms))
(6) non-IEEE-754 formats (vax, cray, ibm, ...)
---
 pod/perlfunc.pod |  7 +++++++
 sv.c             | 38 ++++++++++++++++++++++++++++++--------
 t/op/sprintf.t   |  4 ++--
 t/op/sprintf2.t  | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 94 insertions(+), 11 deletions(-)

diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 173615b..877dc71 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -7109,6 +7109,8 @@ In addition, Perl permits the following widely-supported conversions:
    %p    a pointer (outputs the Perl value's address in hexadecimal)
    %n    special: *stores* the number of characters output so far
          into the next argument in the parameter list
+   %a    hexadecimal floats
+   %A    like %a, but using upper-case letters
 
 Finally, for backward (and we do mean "backward") compatibility, Perl
 permits these unnecessary but widely-supported conversions:
@@ -7125,6 +7127,11 @@ exponent less than 100 is system-dependent: it may be three or less
 (zero-padded as necessary).  In other words, 1.23 times ten to the
 99th may be either "1.23e99" or "1.23e099".
 
+Note that the hexadecimal digits produced by C<%a> and C<%A> are
+system-dependent: most machines use the 64-bit IEEE 754 double
+precision floating point, but some do not.  Watch out especially
+for the C<uselongdouble> Perl configuration option.
+
 Between the C<%> and the format letter, you may specify several
 additional attributes controlling the interpretation of the format.
 In order, these are:
diff --git a/sv.c b/sv.c
index afd4376..df2f54c 100644
--- a/sv.c
+++ b/sv.c
@@ -11376,6 +11376,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 	case 'e': case 'E':
 	case 'f':
 	case 'g': case 'G':
+	case 'a': case 'A':
 	    if (vectorize)
 		goto unknown;
 
@@ -11428,14 +11429,30 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 	    /* nv * 0 will be NaN for NaN, +Inf and -Inf, and 0 for anything
 	       else. frexp() has some unspecified behaviour for those three */
 	    if (c != 'e' && c != 'E' && (nv * 0) == 0) {
-		i = PERL_INT_MIN;
-		/* FIXME: if HAS_LONG_DOUBLE but not USE_LONG_DOUBLE this
-		   will cast our (long double) to (double) */
-		(void)Perl_frexp(nv, &i);
-		if (i == PERL_INT_MIN)
-		    Perl_die(aTHX_ "panic: frexp");
-		if (i > 0)
-		    need = BIT_DIGITS(i);
+                i = PERL_INT_MIN;
+                /* FIXME: if HAS_LONG_DOUBLE but not USE_LONG_DOUBLE this
+                   will cast our (long double) to (double) */
+                (void)Perl_frexp(nv, &i);
+                if (i == PERL_INT_MIN)
+                    Perl_die(aTHX_ "panic: frexp");
+                if (c == 'a' || c == 'A') {
+                    /* This computation probably overshoots,
+                     * but that is better than undershooting. */
+                    need +=
+                        (nv < 0) + /* possible unary minus */
+                        2 + /* "0x" */
+                        2 + /* "1." */
+                        /* We want one byte per each 4 bits in the
+                         * mantissa.  This works out to about 0.83
+                         * bytes per NV decimal digit (of 4 bits):
+                         * (NV_DIG * log(10)/log(2)) / 4 */
+                        ((NV_DIG * 5) / 6 + 1) +
+                        2 + /* "p+" */
+                        (i >= 0 ? BIT_DIGITS(i) : 1 + BIT_DIGITS(-i)) +
+                        1;   /* \0 */
+                } else if (i > 0) {
+                    need = BIT_DIGITS(i);
+                } /* if i < 0, the number of digits is hard to predict. */
 	    }
 	    need += has_precis ? precis : 6; /* known default */
 
@@ -11573,6 +11590,11 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p
 
                 STORE_LC_NUMERIC_SET_TO_NEEDED();
 
+                /* XXX Configure test for sprintf %a/%A support.
+                 * It is a C99 feature, but might be implemented elsewhere.
+                 * The bad news is that if there is no support,
+                 * we would need to implement %a/%A ourselves. */
+
                 /* hopefully the above makes ptr a very constrained format
                  * that is safe to use, even though it's not literal */
                 GCC_DIAG_IGNORE(-Wformat-nonliteral);
diff --git a/t/op/sprintf.t b/t/op/sprintf.t
index 4c41b16..234a7d6 100644
--- a/t/op/sprintf.t
+++ b/t/op/sprintf.t
@@ -179,7 +179,7 @@ __END__
 >%6. 6s<    >''<          >%6. 6s INVALID REDUNDANT< >(See use of $w in code above)<
 >%6 .6s<    >''<          >%6 .6s INVALID REDUNDANT<
 >%6.6 s<    >''<          >%6.6 s INVALID REDUNDANT<
->%A<        >''<          >%A INVALID REDUNDANT<
+>%A<        >0<           ><	 >tested in sprintf2.t skip: all<
 >%B<        >2**32-1<     >11111111111111111111111111111111<
 >%+B<       >2**32-1<     >11111111111111111111111111111111<
 >%#B<       >2**32-1<     >0B11111111111111111111111111111111<
@@ -213,7 +213,7 @@ __END__
 >%#X<       >2**32-1<     >0XFFFFFFFF<
 >%Y<        >''<          >%Y INVALID REDUNDANT<
 >%Z<        >''<          >%Z INVALID REDUNDANT<
->%a<        >''<          >%a INVALID REDUNDANT<
+>%a<        >0<           ><	 >tested in sprintf2.t skip: all<
 >%b<        >2**32-1<     >11111111111111111111111111111111<
 >%+b<       >2**32-1<     >11111111111111111111111111111111<
 >%#b<       >2**32-1<     >0b11111111111111111111111111111111<
diff --git a/t/op/sprintf2.t b/t/op/sprintf2.t
index 6fd0bde..72bde57 100644
--- a/t/op/sprintf2.t
+++ b/t/op/sprintf2.t
@@ -12,7 +12,54 @@ BEGIN {
 eval { my $q = pack "q", 0 };
 my $Q = $@ eq '';
 
-plan tests => 1406 + ($Q ? 0 : 12);
+# %a and %A depend on the floating point config
+# This totally doesn't test non-IEEE-754 float formats.
+my @hexfloat;
+if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there
+    @hexfloat =  (
+        [ '%a',      '0',       '0x0p+0' ],
+        [ '%a',      '1',       '0x1p+0' ],
+        [ '%a',      '1.0',     '0x1p+0' ],
+        [ '%a',      '3.14',    '0x1.91eb851eb851fp+1' ],
+        [ '%a',      '-1.0',    '-0x1p+0' ],
+        [ '%a',      '-3.14',   '-0x1.91eb851eb851fp+1' ],
+        [ '%a',      '0.1',     '0x1.999999999999ap-4' ],
+        [ '%a',      '2**-10',  '0x1p-10' ],
+        [ '%a',      '2**10',   '0x1p+10' ],
+        [ '%a',      '1e-9',    '0x1.12e0be826d695p-30' ],
+        [ '%a',      '1e9',     '0x1.dcd65p+29' ],
+        [ '%13a',    '3.14',    '0x1.91eb851eb851fp+1' ],
+        [ '%.7a',    '3.14',    '0x1.91eb852p+1' ],
+        [ '%.8a',    '3.14',    '0x1.91eb851fp+1' ],
+        [ '%.20a',   '3.14',    '0x1.91eb851eb851f0000000p+1' ],
+        [ '%20.10a', '3.14',    '   0x1.91eb851eb8p+1' ],
+        [ '%20.15a', '3.14',    '0x1.91eb851eb851f00p+1' ],
+        [ '%A',      '3.14',    '0X1.91EB851EB851FP+1' ],
+        );
+} elsif ($Config{nvsize} == 16) { # x86 long double, at least
+    @hexfloat =  (
+        [ '%a',      '0',      '0x0p+0' ],
+        [ '%a',      '1',      '0x8p-3' ],
+        [ '%a',      '1.0',    '0x8p-3' ],
+        [ '%a',      '3.14',   '0xc.8f5c28f5c28f5c3p-2' ],
+        [ '%a',      '-1.0',   '-0x8p-3' ],
+        [ '%a',      '-3.14',  '-0xc.8f5c28f5c28f5c3p-2' ],
+        [ '%a',      '0.1',    '0xc.ccccccccccccccdp-7' ],
+        [ '%a',      '2**-10', '0x8p-13' ],
+        [ '%a',      '2**10',  '0x8p+7' ],
+        [ '%a',      '1e-9',   '0x8.9705f4136b4a597p-33' ],
+        [ '%a',      '1e9',    '0xe.e6b28p+26' ],
+        [ '%13a',    '3.14',   '0xc.8f5c28f5c28f5c3p-2' ],
+        [ '%.7a',    '3.14',   '0xc.8f5c28fp-2' ],
+        [ '%.8a',    '3.14',   '0xc.8f5c28f6p-2' ],
+        [ '%.20a',   '3.14',   '0xc.8f5c28f5c28f5c300000p-2' ],
+        [ '%20.10a', '3.14',   '   0xc.8f5c28f5c3p-2' ],
+        [ '%20.15a', '3.14',   '0xc.8f5c28f5c28f5c3p-2' ],
+        [ '%A',      '3.14',   '0XC.8F5C28F5C28F5C3P-2' ],
+        );
+}
+
+plan tests => 1406 + ($Q ? 0 : 12) + @hexfloat;
 
 use strict;
 use Config;
@@ -336,3 +383,10 @@ is $o::count, '1', 'sprinf %1s overload count';
 $o::count = 0;
 () = sprintf "%.1s", $o;
 is $o::count, '1', 'sprinf %.1s overload count';
+
+for my $t (@hexfloat) {
+    my ($format, $arg, $expected) = @$t;
+    $arg = eval $arg;
+    my $result = sprintf($format, $arg);
+    is($result, $expected, "'$format' '$arg' -> '$result' cf '$expected'");
+}
-- 
1.8.5.2 (Apple Git-48)

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2014

From @jhi

0002-Hexfloat-literals-part-of-perl-122219.patch
From 4d7069f0e1cf210e0cf8a3385cfb5e5716a5303b Mon Sep 17 00:00:00 2001
From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Thu, 31 Jul 2014 12:37:58 -0400
Subject: [PATCH 2/2] Hexfloat literals, part of perl #122219

Punt to strtod/strtold, just like with decimal floats.

The hexfloat support is C99 feature, like its converse %a/%A.
---
 MANIFEST         |   1 +
 pod/perldata.pod |   8 +++++
 pod/perldiag.pod |  17 ++++++++++
 t/op/hexfloat.t  |  78 +++++++++++++++++++++++++++++++++++++++++++
 t/op/sprintf2.t  |   8 +++++
 toke.c           | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 203 insertions(+), 9 deletions(-)
 create mode 100644 t/op/hexfloat.t

diff --git a/MANIFEST b/MANIFEST
index 54c5bea..5b99b16 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5086,6 +5086,7 @@ t/op/hash-rt85026.t		See if hash iteration/deletion works
 t/op/hash.t			See if the complexity attackers are repelled
 t/op/hashwarn.t			See if warnings for bad hash assignments work
 t/op/heredoc.t			See if heredoc edge and corner cases work
+t/op/hexfloat.t			See if hexadecimal float literals work
 t/op/inccode.t			See if coderefs work in @INC
 t/op/inccode-tie.t		See if tie to @INC works
 t/op/incfilter.t		See if the source filters in coderef-in-@INC work
diff --git a/pod/perldata.pod b/pod/perldata.pod
index d8edfe9..40d3336 100644
--- a/pod/perldata.pod
+++ b/pod/perldata.pod
@@ -402,6 +402,7 @@ integer formats:
     0xdead_beef         # more hex   
     0377                # octal (only numbers, begins with 0)
     0b011011            # binary
+    0x1.999ap-4         # hexadecimal floating point
 
 You are allowed to use underscores (underbars) in numeric literals
 between digits for legibility (but not multiple underscores in a row:
@@ -425,6 +426,13 @@ Hexadecimal, octal, or binary, representations in string literals
 representation.  The hex() and oct() functions make these conversions
 for you.  See L<perlfunc/hex> and L<perlfunc/oct> for more details.
 
+Hexadecimal floating point is useful for accurately presenting
+floating point values, avoiding conversions to or from decimal floating
+point, and therefore avoiding possible loss in precision.  Notice
+that while most current platforms use 64-bit IEEE 754 floating point,
+not all do.  For example x86 platforms can be configured with "long doubles",
+which are not compatible with normal "doubles".
+
 You can also embed newlines directly in your strings, i.e., they can end
 on a different line than they begin.  This is nice, but if you forget
 your trailing quote, the error will not be reported until Perl finds
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index e41c8cc..d3553bd 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2172,6 +2172,23 @@ created on an emergency basis to prevent a core dump.
 (F) The parser has given up trying to parse the program after 10 errors.
 Further error messages would likely be uninformative.
 
+=item Hexadecimal float malformed: '%s'
+
+(W syntax) Hexadecimal float literals (like 0x12.34p5) are unsupported
+in this system.
+
+=item Hexadecimal float overflow: '%s'
+
+(W syntax) Hexadecimal float literal overflowed.
+
+=item Hexadecimal float underflow: '%s'
+
+(W syntax) Hexadecimal float literal underflowed.
+
+=item Hexadecimal float unsupported: '%s'
+
+(F) Hexadecimal float literals (like 0x12.34p5) are unsupported in this system.
+
 =item Hexadecimal number > 0xffffffff non-portable
 
 (W portable) The hexadecimal number you specified is larger than 2**32-1
diff --git a/t/op/hexfloat.t b/t/op/hexfloat.t
new file mode 100644
index 0000000..eb8f6bb
--- /dev/null
+++ b/t/op/hexfloat.t
@@ -0,0 +1,78 @@
+#!./perl
+
+use strict;
+
+BEGIN {
+    chdir 't' if -d 't';
+    require './test.pl';
+}
+
+plan(tests => 38);
+
+# Test hexfloat literals.
+
+is(0x1p0, 1);
+is(0x1.p0, 1);
+is(0x1.0p0, 1);
+
+is(0x1p1, 2);
+is(0x1.p1, 2);
+is(0x1.0p1, 2);
+
+is(0x.1p0, 0.0625);
+is(0x0.1p0, 0.0625);
+
+# Positive exponents.
+is(0x1p2, 4);
+is(0x1p+2, 4);
+
+# Negative exponents.
+is(0x1p-1, 0.5);
+is(0x1.p-1, 0.5);
+is(0x1.0p-1, 0.5);
+
+is(0x1p+2, 4);
+is(0x1p-2, 0.25);
+
+is(0x3p+2, 12);
+is(0x3p-2, 0.75);
+
+# Negative sign.
+is(-0x1p+2, -4);
+is(-0x1p-2, -0.25);
+
+is(0x0.10p0, 0.0625);
+is(0x0.1p0, 0.0625);
+is(0x.1p0, 0.0625);
+
+is(0x12p+3, 144);
+is(0x12p-3, 2.25);
+
+# Hexdigits (lowercase).
+is(0x9p+0, 9);
+is(0xap+0, 10);
+is(0xfp+0, 15);
+is(0x10p+0, 16);
+is(0x11p+0, 17);
+is(0xabp+0, 171);
+is(0xab.cdp+0, 171.80078125);
+
+# Uppercase hexdigits and exponent prefix.
+is(0xAp+0, 10);
+is(0xFp+0, 15);
+is(0xABP+0, 171);
+is(0xAB.CDP+0, 171.80078125);
+
+# Underbars.
+is(0xa_b.c_dp+0, 171.80078125);
+
+# Note that the hexfloat representation is not unique
+# since the exponent can be shifted: no different from
+# 3e4 cf 30e3 cf 30000.
+
+# Needs to use within because of long doubles.
+within(0x1.999999999999ap-4, 0.1, 1e-9);
+within(0xc.ccccccccccccccdp-7, 0.1, 1e-9);
+
+# sprintf %a/%A testing is done in sprintf2.t,
+# trickier than necessary because of long doubles.
diff --git a/t/op/sprintf2.t b/t/op/sprintf2.t
index 72bde57..824c06a 100644
--- a/t/op/sprintf2.t
+++ b/t/op/sprintf2.t
@@ -34,7 +34,11 @@ if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there
         [ '%.20a',   '3.14',    '0x1.91eb851eb851f0000000p+1' ],
         [ '%20.10a', '3.14',    '   0x1.91eb851eb8p+1' ],
         [ '%20.15a', '3.14',    '0x1.91eb851eb851f00p+1' ],
+
         [ '%A',      '3.14',    '0X1.91EB851EB851FP+1' ],
+
+        [ '%a',      0x12.34p5,    '0x1.234p+9' ],
+        [ '%a',      0x1_2.3_4p5,  '0x1.234p+9' ],
         );
 } elsif ($Config{nvsize} == 16) { # x86 long double, at least
     @hexfloat =  (
@@ -55,7 +59,11 @@ if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there
         [ '%.20a',   '3.14',   '0xc.8f5c28f5c28f5c300000p-2' ],
         [ '%20.10a', '3.14',   '   0xc.8f5c28f5c3p-2' ],
         [ '%20.15a', '3.14',   '0xc.8f5c28f5c28f5c3p-2' ],
+
         [ '%A',      '3.14',   '0XC.8F5C28F5C28F5C3P-2' ],
+
+        [ '%a',      0x12.34p5,    '0x9.1ap+6' ],
+        [ '%a',      0x1_2.3_4p5,  '0x9.1ap+6' ],
         );
 }
 
diff --git a/toke.c b/toke.c
index b0997ef..8454d6f 100644
--- a/toke.c
+++ b/toke.c
@@ -9796,6 +9796,7 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
     bool floatit;			/* boolean: int or float? */
     const char *lastub = NULL;		/* position of last underbar */
     static const char* const number_too_long = "Number too long";
+    bool hexfloat = FALSE;
 
     PERL_ARGS_ASSERT_SCAN_NUM;
 
@@ -9909,6 +9910,14 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
 		    /* make sure they said 0x */
 		    if (shift != 4)
 			goto out;
+
+                    if (s[1] == '.' &&
+                        /* hexfloat? peekahead to avoid matching ".." */
+                        (isXDIGIT(s[2]) || s[1] == 'p' || s[2] == 'P')) {
+                        s++;
+                        goto out;
+                    }
+
 		    b = (*s++ & 7) + 9;
 
 		    /* Prepare to put the digit we have onto the end
@@ -9977,6 +9986,25 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
 				  sv, NULL, NULL, 0);
 	    else if (PL_hints & HINT_NEW_BINARY)
 		sv = new_constant(start, s - start, "binary", sv, NULL, NULL, 0);
+            if (*s == '.' || *s == 'p' || *s == 'P') {
+                /* sloppy (on the underbars) but quick detection of
+                 * hexfloats, the decimal detection will be more
+                 * thorough. */
+                const char* h = s;
+                if (*h == '.') {
+                    h++;
+                    while (isXDIGIT(*h) || *h == '_') h++;
+                }
+                if (*h == 'p' || *h == 'P') {
+                    h++;
+                    if (*h == '+' || *h == '-')
+                        h++;
+                    if (isDIGIT(*h)) {
+                        hexfloat = TRUE;
+                        goto decimal;
+                    }
+                }
+            }
 	}
 	break;
 
@@ -9989,10 +10017,16 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
       decimal:
 	d = PL_tokenbuf;
 	e = PL_tokenbuf + sizeof PL_tokenbuf - 6; /* room for various punctuation */
-	floatit = FALSE;
+        floatit = FALSE;
+        if (hexfloat) {
+            floatit = TRUE;
+            *d++ = '0';
+            *d++ = 'x';
+            s = start + 2;
+        }
 
 	/* read next group of digits and _ and copy into d */
-	while (isDIGIT(*s) || *s == '_') {
+	while (isDIGIT(*s) || (hexfloat && isXDIGIT(*s)) || *s == '_') {
 	    /* skip underscores, checking for misplaced ones
 	       if -w is on
 	    */
@@ -10032,7 +10066,8 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
 
 	    /* copy, ignoring underbars, until we run out of digits.
 	    */
-	    for (; isDIGIT(*s) || *s == '_'; s++) {
+	    for (; isDIGIT(*s) || (hexfloat && isXDIGIT(*s)) ||
+                     *s == '_'; s++) {
 	        /* fixed length buffer check */
 		if (d >= e)
 		    Perl_croak(aTHX_ "%s", number_too_long);
@@ -10058,12 +10093,21 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
 	}
 
 	/* read exponent part, if present */
-	if ((*s == 'e' || *s == 'E') && strchr("+-0123456789_", s[1])) {
-	    floatit = TRUE;
+	if (((*s == 'e' || *s == 'E') || (*s == 'p' || *s == 'P')) &&
+            strchr("+-0123456789_", s[1])) {
+            floatit = TRUE;
+
+	    /* regardless of whether user said 3E5 or 3e5, use lower 'e',
+               ditto for p (hexfloats) */
+            if ((*s == 'e' || *s == 'E')) {
+		/* At least some Mach atof()s don't grok 'E' */
+                *d++ = 'e';
+            } else if ((*s == 'p' || *s == 'P')) {
+                *d++ = 'p';
+            }
+
 	    s++;
 
-	    /* regardless of whether user said 3E5 or 3e5, use lower 'e' */
-	    *d++ = 'e';		/* At least some Mach atof()s don't grok 'E' */
 
 	    /* stray preinitial _ */
 	    if (*s == '_') {
@@ -10127,9 +10171,47 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
             STORE_NUMERIC_LOCAL_SET_STANDARD();
 	    /* terminate the string */
 	    *d = '\0';
-	    nv = Atof(PL_tokenbuf);
+            if (hexfloat) {
+                /* for hexfloats, punt to strtod/strtold, or die. */
+                /* XXX Configure test for strtod/strtold hexfloat support.
+                 * It is a C99 feature, but might be implemented elsewhere. */
+                char* endp = PL_tokenbuf;
+                dSAVE_ERRNO;
+                SETERRNO(0,0);
+#if defined(USE_LONG_DOUBLE) && defined(HAS_STRTOLD)
+                nv = strtold(PL_tokenbuf, &endp);
+#elif defined(HAS_STRTOD)
+                nv = strtod(PL_tokenbuf, &endp);
+#else
+                Perl_croak(aTHX_
+                           "Hexadecimal float unsupported: '%s'",
+                           PL_tokenbuf);
+#endif
+                /* XXX test these warnings */
+                /* errno is ERANGE, commonly, but any non-zero
+                 * errno should indicate failure (note that the
+                 * scope above is intentionally tight: set errno
+                 * to zero, call strtod or strtold, inspect errno.) */
+                if (errno) {
+                    if (nv == NV_INF || nv == -NV_INF)
+                        Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX),
+                                       "Hexadecimal float overflow: '%s'",
+                                       PL_tokenbuf);
+                    else if (nv == 0.0)
+                        Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX),
+                                       "Hexadecimal float underflow: '%s'",
+                                       PL_tokenbuf);
+                }
+                if (endp == NULL || endp == PL_tokenbuf || *endp)
+                    Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX),
+                                   "Hexadecimal float malformed: '%s'",
+                                   PL_tokenbuf);
+                RESTORE_ERRNO;
+            } else {
+                nv = Atof(PL_tokenbuf);
+            }
             RESTORE_NUMERIC_LOCAL();
-	    sv = newSVnv(nv);
+            sv = newSVnv(nv);
 	}
 
 	if ( floatit
-- 
1.8.5.2 (Apple Git-48)

@p5pRT
Copy link
Author

p5pRT commented Aug 2, 2014

From [Unknown Contact. See original ticket]

So I did some hacking to get this working for at least *printf and literals, and two patches are attached.
I cheated and just punted to using sprintf/strtod.

However​: the "hexadecimal floats" support seems to be quite... interesting. As in "interesting times" interesting.

So it's a C99 feature. Output with sprintf %a %A, input with strtod (or strtold). In theory.

The attached patches (and their tests) work with​:

OSX x86
Linux x86
Linux x86 -Duselongdouble

(I *think* the output side at least did work in win32, but the win32 smoker must be overwhelmed or something, I seem to get no results)

But cracks start to appear...

OS X x86 with -Duselongdouble has differences in the *printf output
Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all, haven't dug into it)

On the output side differences are easy since we are talking about floats​: the exponent may float.
0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

But even what the basic %a means seems to be up to interpretation​:
not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

But if strtod is not working, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems, like BSD, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized), we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2014

From @arc

Jarkko Hietaniemi via RT <perlbug-comment@​perl.org> wrote​:

So I did some hacking to get this working for at least *printf and literals, and two patches are attached.

Excellent — thanks!

I cheated and just punted to using sprintf/strtod.
(I *think* the output side at least did work in win32, but the win32 smoker must be overwhelmed or something, I seem to get no results)

According to this page​:

http​://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.71).aspx

the compiler in Visual Studio 2003 doesn't support %a formats in
printf. AIUI, we aim to support VC6, which I assume also doesn't
support %a. So I think punting to sprintf/strtod for hex-float
support, while admirably tempting from a laziness point of view, may
not be a viable approach, at least on win32.

Corrections welcome from anyone who knows anything about win32.

On the output side differences are easy since we are talking about floats​: the exponent may float.
0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

I think that's not terribly unreasonable. An IEEE double has 53 bits
of significand, which can be emitted with a single bit (whose value is
1 except in denormals) before the hexadecimal point, and thirteen hex
digits (four bits apiece) after it. An x86 long double, on the other
hand, has 63 bits of significand, so emitting 3 bits before the point
and 15 nybbles after it seems straightforward.

But I take your point that it's somewhat vexing for these purposes.

But even what the basic %a means seems to be up to interpretation​:
not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

That's undeniably a fairly cruddy %a implementation (in the sense that
if you wanted all those extra digits you'd surely ask for them) but
it's not actually *wrong*. Which is, yes, also vexing for our
purposes.

But if strtod is not working, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems, like BSD, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized), we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

As far as I know, it's possible to implement hex float I/O without
bit-banging as long as you've got ldexp, frexp, isnormal, isnan, and
isinf. But I doubt very much whether those can reliably be found on
older systems that lack hex-float support in strtod and %a in sprintf.
:-(

What would happen if we borrowed one of the other implementations
wholesale? Are there any licensing issues getting in the way?

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2014

From @jhi

0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

I think that's not terribly unreasonable. An IEEE double has 53 bits
of significand, which can be emitted with a single bit (whose value is
1 except in denormals) before the hexadecimal point, and thirteen hex
digits (four bits apiece) after it. An x86 long double, on the other
hand, has 63 bits of significand, so emitting 3 bits before the point
and 15 nybbles after it seems straightforward.

I should have included more examples, I think Solaris provided those...
it's not just due to long doubles. I don't have a C99 spec in front of
me, but I doubt how well defined the format it is...

But even what the basic %a means seems to be up to interpretation​:
not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

That's undeniably a fairly cruddy %a implementation (in the sense that
if you wanted all those extra digits you'd surely ask for them) but
it's not actually *wrong*. Which is, yes, also vexing for our
purposes.

For example​: what is the '%a' supposed to "optimize for"? As few
hexdigits before the "." as possible? Maximize the exponent? Minimize
it? Steer it towards the closest/lowest/highest exponent divisible by
four? By eight?

As far as I know, it's possible to implement hex float I/O without
bit-banging as long as you've got ldexp, frexp, isnormal, isnan, and
isinf. But I doubt very much whether those can reliably be found on
older systems that lack hex-float support in strtod and %a in sprintf.
:-(

Indeed.

(Which reminds me that our inf/nan support is still a bit dubious.)

What would happen if we borrowed one of the other implementations
wholesale? Are there any licensing issues getting in the way?

BSD licensed code is no problem, we have historically borrowed used
that... mergesort, for example. drand48_r.

For the netlib code, somebody with legal chops would have to take a look
for compatibility with Artistic/GPL. Not that I expect any problems,
since e.g. Python includes it.

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2014

From @jhi

Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all, haven't dug into it)

Now did. Ugh.

In Solaris 10, strtod must be in "c99 mode" for the hexfloats to be recognized. (strtold is always in this mode). The "c99 mode' is achieved by using "c99" as the Solaris Studio compiler (driver), instead of "cc".

In Solaris 9 (or earlier), there is no support for hexfloats. (Not blaming Solaris in particular​: I'm pretty certain many older OS releases will be similarly C99-unsupportive.)

If one is not using Solaris Studio cc (something beginning with g, maybe), one can live dangerously and explicitly link in either of /usr/lib/{32,64}/values-xpg6.o and get the "c99 strtod". Dangerous living because probably many other things get "upgraded", too.

Executive summary​: using the netlib dtoa.c (*) is starting to sound even more siren-like.

(*) an odd name, given that it's strtod implementation...

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2014

From [Unknown Contact. See original ticket]

Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all, haven't dug into it)

Now did. Ugh.

In Solaris 10, strtod must be in "c99 mode" for the hexfloats to be recognized. (strtold is always in this mode). The "c99 mode' is achieved by using "c99" as the Solaris Studio compiler (driver), instead of "cc".

In Solaris 9 (or earlier), there is no support for hexfloats. (Not blaming Solaris in particular​: I'm pretty certain many older OS releases will be similarly C99-unsupportive.)

If one is not using Solaris Studio cc (something beginning with g, maybe), one can live dangerously and explicitly link in either of /usr/lib/{32,64}/values-xpg6.o and get the "c99 strtod". Dangerous living because probably many other things get "upgraded", too.

Executive summary​: using the netlib dtoa.c (*) is starting to sound even more siren-like.

(*) an odd name, given that it's strtod implementation...

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2014

From @jhi

[dtoa.c] an odd name, given that it's strtod implementation...

Good news, everyone... the netlib dtoa.c contains *both* strtod() and dtoa(), the latter useable for sprintfing.

It is quite widely used​: Python, PHP, and *Java*; and Chrome, Firefox, and Safari.

More useful reading​: http​://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/
(note that this article is 2 years old, the bugs referred to have been corrected)

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2014

From [Unknown Contact. See original ticket]

[dtoa.c] an odd name, given that it's strtod implementation...

Good news, everyone... the netlib dtoa.c contains *both* strtod() and dtoa(), the latter useable for sprintfing.

It is quite widely used​: Python, PHP, and *Java*; and Chrome, Firefox, and Safari.

More useful reading​: http​://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/
(note that this article is 2 years old, the bugs referred to have been corrected)

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2014

From @jhi

From https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122482:

And since we are not really depending on the system strtod​:s anyway (except for nan/inf), it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not, however, solve the hexadecimal fp output.

On the hexadecimal output the killer wording in the C99 seems to be that trailing zeros *may* be printed. And this is what Solaris does, but glibc (Linux), and whatever is used in OS X, do not.

@p5pRT
Copy link
Author

p5pRT commented Aug 7, 2014

From [Unknown Contact. See original ticket]

From https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122482:

And since we are not really depending on the system strtod​:s anyway (except for nan/inf), it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not, however, solve the hexadecimal fp output.

On the hexadecimal output the killer wording in the C99 seems to be that trailing zeros *may* be printed. And this is what Solaris does, but glibc (Linux), and whatever is used in OS X, do not.

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2014

From @cpansprout

On Thu Jul 03 08​:18​:01 2014, jhi wrote​:

Yeah, I think the 'p' (hmm, is that 'P' with %A?) is a mandatory
part of the
package.

sub deadbeefp () {3}
0x1.deadbeefp+0

You have a twisted mind, and this is a compliment.

Personally, I think adding the construct + a deprecation warning for
pathological cases is a good enough (tm) tradeoff.

Based on
http​://grep.cpan.me/?q=0x%5B0-9a-f%5D%2B%5C.%5B0-9a-f%5D%2Bp%5B%2B-
%5D%5Cd%2B
(that's /0x[0-9a-f]+\.[0-9a-f]+p[+-]\d+/) I wouldn't bother even with
a
warning. (All the hits seem to be to modules which already somehow
try
to handle this currently non-native format.)

This came up on the list a couple of years ago. At the time I think the consensus was to allow parser plugins to extend the syntax, instead of hard-coding one of them into toke.c.

When we first tried to reserve this syntax (or something similar) by deprecating 0xf00 followed by a dot, several cases showed up in the perl tests themselves. I think they got changed, masking the fact that such syntax already occurs in real life.

Now this is all from memory without actually looking anything up....

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2014

From @jhi

This came up on the list a couple of years ago. At the time I think
the consensus was to allow parser plugins to extend the syntax,
instead of hard-coding one of them into toke.c.

Having looked at the toke.c now for a while, I think the plugin plan is wishful thinking unless something drastic happens first.

When we first tried to reserve this syntax (or something similar) by
deprecating 0xf00 followed by a dot, several cases showed up in the
perl tests themselves. I think they got changed, masking the fact
that such syntax already occurs in real life.

I would find that surprising... the "pEXPONENT" part is currently syntax error.

@p5pRT
Copy link
Author

p5pRT commented Aug 10, 2014

From [Unknown Contact. See original ticket]

This came up on the list a couple of years ago. At the time I think
the consensus was to allow parser plugins to extend the syntax,
instead of hard-coding one of them into toke.c.

Having looked at the toke.c now for a while, I think the plugin plan is wishful thinking unless something drastic happens first.

When we first tried to reserve this syntax (or something similar) by
deprecating 0xf00 followed by a dot, several cases showed up in the
perl tests themselves. I think they got changed, masking the fact
that such syntax already occurs in real life.

I would find that surprising... the "pEXPONENT" part is currently syntax error.

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From @jhi

For better or worse, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have
http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl
http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl, scalbnl, or ldexp
http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf
http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From [Unknown Contact. See original ticket]

For better or worse, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have
http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl
http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl, scalbnl, or ldexp
http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf
http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From @craigberry

On Thu, Aug 14, 2014 at 6​:56 AM, Jarkko Hietaniemi via RT
<perlbug-comment@​perl.org> wrote​:

For better or worse, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have
http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl
http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl, scalbnl, or ldexp
http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf
http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Would there be any advantage in toke.c to using Uquad_t or U64TYPE
(where available) rather than UV for the chunk that holds the
mantissa? The size chosen for Perl's integers don't necessarily
reflect what's available on the platform?

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From @jhi

On Thursday-201408-14, 8​:51, Craig A. Berry wrote​:

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Would there be any advantage in toke.c to using Uquad_t or U64TYPE
(where available) rather than UV for the chunk that holds the
mantissa? The size chosen for Perl's integers don't necessarily
reflect what's available on the platform?

Ah, good point. As a matter of fact, I use that very fact in sv.c
already (look for MANTISSATYPE). I'll take a look in a couple of days
once we see how widespread damage this first batch caused.

(I also need to think more carefully what happens/should happen at
floating point "extremities" like Inf and Nan.)

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From @arc

Jarkko Hietaniemi via RT <perlbug-comment@​perl.org> wrote​:

For better or worse, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have
http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl
http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl, scalbnl, or ldexp
http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf
http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats, without depending on C99 or using system printf/strtod.

Hurrah! Thanks very much for this.

Earlier in this ticket, Brian Fraser pointed out the existence of
cases like this​:

sub ap1 { 'z' }
is 0x1.ap1, '1z';

Jarkko reports having found no such affected code using grep.cpan.me,
and I freely stipulate that any code whose meaning changes in the
presence of hex float literals (like this example) would be somewhat
pathological. However, I do find myself wondering whether hex float
literals should be accepted only in the presence of a suitable
feature.

Any thoughts? Am I worrying unnecessarily?

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Aug 14, 2014

From @jhi

On Thursday-201408-14, 9​:13, Aaron Crane wrote​:

However, I do find myself wondering whether hex float
literals should be accepted only in the presence of a suitable
feature.

I would wait for Andreas' CPAN smokes.

@p5pRT
Copy link
Author

p5pRT commented Aug 15, 2014

From @jhi

On Friday, August 15, 2014, Craig A. Berry <craig.a.berry@​gmail.com> wrote​:

On Fri, Aug 15, 2014 at 4​:10 PM, Craig A. Berry <craig.a.berry@​gmail.com
<javascript​:;>> wrote​:

On Fri, Aug 15, 2014 at 10​:14 AM, Jarkko Hietaniemi <jhi@​iki.fi
<javascript​:;>> wrote​:

- VMS? Runs across three architectures​: Itanium or Alpha or VAX.
I assumed 128-bit "true" IEEE 754 for all of them (and little-endian).

On OpenVMS I64 as of v5.21.2-156-gd8bcb4d with -Duse64bitint
-Duselongdouble I get​:

$ perl -e "$x = sprintf(qq/%A/, 0);"
assert error​: expression = vend < vdig + sizeof(vdig), in file
D0​:[craig.blead]sv.c;1 at line 11759

Dunno what's wrong yet.

The VMS debugger shows the following​:

SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vend​: 2060475744
SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vdig[0​:31]
[0]-[31]​: 0
2060475712
DBG> evaluate sizeof(vdig)
32
DBG> evaluate vend < vdig + sizeof(vdig)
%DEBUG-I-SCALEADD, pointer addition​: scale factor of 1 applied to right
argument
0

So the assertion 2060475744 < 2060475712 + 32 is false because the LHS
is actually equal, not less than, the RHS. I don't understand the
code well enough to know what that means.

Neither do I, I just recently wrote it...

That means that for some reason v (the pointer for the hexdigits (really
0-15, not the '0'..'f') has extended all the way to the end of the the
buffer. I see why i think... i will push a branch

--
There is this special biologist word we use for 'stable'. It is 'dead'. --
Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @sisyphus

-----Original Message-----
From​: Craig A. Berry

$ perl -e "$x = sprintf(qq/%A/, 0);"
assert error​: expression = vend < vdig + sizeof(vdig), in file
D0​:[craig.blead]sv.c;1 at line 11759

At least that one works correctly for me on (debian wheezy) powerpc64
perl-5.21.3, built from yesterday's git with -Duselongdouble
(double-double).

Here's some values that don't look right, however​:

For 1e-298, the 2 doubles (most significant first) are 0210be08d0527e1d and
0000000069c4b77f, both of which are positive values.

If I do 'printf "%A", 1e-298;' then I get​:
0XB.E08D0527E1D000069C4B77FP-991

Those 4 zeroes in the middle are wrong - they should appear at the end.
(This probably just means that the value of the exponent of the least
significant double has been miscalculated.)
But I think it's also incorrect at the start. The most siginificant 13 bits
of the mantissa (including the implied leading '1') are 1000010111110 -
which doesn't correlate at all well with 0XB.E0
Data​::Float​::DoubleDouble gives the following hex value of the double-double
1e-298​:
+0x1.0be08d0527e1d69c4b77f000000p-990

(In the Data​::Float​::DoubleDouble representation, I opted to have the first
character be the leading 0 or 1 .... which leaves 105 bits .... which needs
27 hex characters, the last of which can only be either 8 or 0 (as the last
3 bits are always zero).
I did that to retain some correlation between the representation of the
value, and the actual hex-encoding of the double-double.
And then, as it turns out, C's "%La" does exactly the same formatting, which
is quite fortuitous ... hell, I didn't even know C was capable of hex
formatting of double-doubles until just now !)

Another value I looked at was 193e-3.
In this case the 2 doubles are 3fc8b4395810624e and bc56872b020c49ba - the
first of which is a positive value; the second being *negative*.
Therefore the actual value of the double-double is going to be less than the
value of the most significant double.
However, 'printf "%A", 193e-3;' outputs​:
0XB.4395810624E872B020C49BAP-4

Again, the prefix looks wrong - most siginificant 13 bits are 1100010110100.
Also, if the most significant double ends in "4395810624E" we would expect
that , following the subtraction, we would see "4395810624D" (or less), but
we still see "4395810624E" in there.

Data​::Float​::DoubleDouble says +0x1.8b4395810624dd2f1a9fbe76c8cp-3 (and I'll
have to investigate how the final hex char came to be something other than
"8" or "0" ;-)

I also looked at 2 ** 200. That came out as 0X0P+0.
I'm guessing it has looked at the mantissa, seen only zeroes , forgotten
about the implied leading "1", and decided the value was zero.

The fourth value I looked at was 2 ** 0.5. As with 193e-3, the least
significant double is negative - which again seems to have been overlooked.
The 2 doubles are 3ff6a09e667f3bcd and bc9bdd3413b26456, and 'printf "%A", 2
** 0.5;' outputs​:
0XA.09E667F3BCDDD3413B26456P-1
Correct value is 0x1.6a09e667f3bcc908b2fb1366ea8p0

The actual script I ran is attached (try.pl), but to run it you'll need to
be on a machine whose long double is double-double, and whose perl was built
with -Duselongdouble.
Also attached is the output of the script (out.txt).

Btw, I've just checked that the above Data​::Float​::DoubleDouble values agree
with C's "%La" output, and they do - except for the final "c" in the second
example (which should be 8 ... and I'll have to work out how that 107th bit
got set.)

Thanks for taking this on, Jarrko. Apologies that I haven't come up with
something more constructive than "this is wrong and that aint right".

Cheers,
Rob

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @sisyphus

1000010111110000010001101000001010010011111100001110101101001110001001011011
10111111100000000000000000000000
The 2 doubles (most siginificant first)​:
(+) 0210be08d0527e1d, (+) 0000000069c4b77f
0210be08d0527e1d0000000069c4b77f
0XB.E08D0527E1D000069C4B77FP-991
+0x1.0be08d0527e1d69c4b77f000000p-990

1100010110100001110010101100000010000011000100100110111010010111100011010100
11111101111100111011011001000110
The 2 doubles (most siginificant first)​:
(+) 3fc8b4395810624e, (-) bc56872b020c49ba
3fc8b4395810624ebc56872b020c49ba
0XB.4395810624E872B020C49BAP-4
+0x1.8b4395810624dd2f1a9fbe76c8cp-3

1000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000
The 2 doubles (most siginificant first)​:
(+) 4c70000000000000, (+) 0000000000000000
4c700000000000000000000000000000
0X0P+0
+0x1.000000000000000000000000000p200

1011010100000100111100110011001111111001110111100110010010000100010110010111
11011000100110110011011101010100
The 2 doubles (most siginificant first)​:
(+) 3ff6a09e667f3bcd, (-) bc9bdd3413b26456
3ff6a09e667f3bcdbc9bdd3413b26456
0XA.09E667F3BCDDD3413B26456P-1
+0x1.6a09e667f3bcc908b2fb1366ea8p0

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @sisyphus

try.pl

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @sisyphus

-----Original Message-----
From​: sisyphus1@​optusnet.com.au
Sent​: Sunday, August 17, 2014 8​:40 PM

Another value I looked at was 193e-3.
[snip]
Data​::Float​::DoubleDouble says +0x1.8b4395810624dd2f1a9fbe76c8cp-3 (and
I'll
have to investigate how the final hex char came to be something other than
"8" or "0" ;-)

I don't think this is central to this thread.

The setting of the last hex char to "c" arises from the (known) perl bug
where the value that perl assigns to some NVs is off by one or more ULPs.

As regards 193e-3, instead of assigning correct doubles (3fc8b4395810624e
and bc56872b020c49bc), perl has assigned bc56872b020c49ba as the least
significant double. This actually means that perl has assigned an
illegitimate value to the double-double.
I think 3fc8b4395810624ebc56872b020c49ba is not a valid double-double
representation - and this is what throws out the calculations performed by
D​::F​::DD.

We can force perl to assign the correct double-double representation (and
this is the only way of doing it that I know of) by doing​:

use Math​::NV qw(​:all);
$nv = nv('193e-3');

If we do that then the correct representation of
3fc8b4395810624ebc56872b020c49bc gets assigned to $nv, and D​:F​::DD then
provides correct results.

I suppose D​:F​:DD could strive to detect and correct perl's mistakes, but
that is not a high priority for me.

Cheers,
Rob

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @jhi

On Sunday-201408-17, 6​:40, sisyphus1@​optusnet.com.au wrote​:

At least that one works correctly for me on (debian wheezy) powerpc64
perl-5.21.3, built from yesterday's git with -Duselongdouble
(double-double).

Here's some values that don't look right, however​:

For 1e-298, the 2 doubles (most significant first) are 0210be08d0527e1d and
0000000069c4b77f, both of which are positive values

The currently-in-blead version is all sorts of wrong for IEEE 754 128
long doubles, and for double-doubles, sorry about that. I'm trying to
stop breaking things, with help from Craig.

@p5pRT
Copy link
Author

p5pRT commented Aug 17, 2014

From @jhi

Thanks for taking this on, Jarrko. Apologies that I haven't come up with
something more constructive than "this is wrong and that aint right".

Get thee the http​://perl5.git.perl.org/perl.git and retry.

It's probably still quite wrong for double-doubles, but at least it
should be less wrong.

@p5pRT
Copy link
Author

p5pRT commented Aug 18, 2014

From @sisyphus

-----Original Message-----
From​: Jarkko Hietaniemi

It's probably still quite wrong for double-doubles, but at least it should
be less wrong.

The value expressed for 2 ** 200 is a big improver ;-)
It's now at 0X01P199 (which is off by a power of 2).

Of the other values I looked at last night, they seem to have changed only
in the leading digits.
What was "0X0A.BCDEF..." has been transformed into "0X01.ABCDEF ...", though
the correct form begins "0X01.HABCDEF... " (where H stands for some hex
digit).

For example, yesterday's blead presented 1e-298 as​:
0XB.E08D0527E1D000069C4B77FP-991

Today's blead presents it as​:
0X1.BE08D0527E1D000069C4B77FP-991

And the correct rendition is​:
0X1.0BE08D0527E1D69C4B77FP-990

Even for an easily representable float such as 128.625 (where the entire
value is held in the most siginificant double and the least significant
double is 0), today's blead presents it as 0X1.14P+6, but correct rendition
is 0X1.014P+7.

Anyway - good luck with it. (It would be nice to see this up and running
with double-doubles, but it's not something that I'm reliant upon.)

Is it not possible for you to achieve the desired result via C's %La/%LA
formatting ?

Cheers,
Rob

@p5pRT
Copy link
Author

p5pRT commented Aug 18, 2014

From @jhi

The value expressed for 2 ** 200 is a big improver ;-)
It's now at 0X01P199 (which is off by a power of 2).

Of the other values I looked at last night, they seem to have changed only
in the leading digits.
What was "0X0A.BCDEF..." has been transformed into "0X01.ABCDEF ...", though
the correct form begins "0X01.HABCDEF... " (where H stands for some hex
digit).

If you could do​:

grep longdblkind config.sh

I'll also email you a test code, the output of which would be of interest.

For example, yesterday's blead presented 1e-298 as​:
0XB.E08D0527E1D000069C4B77FP-991

Today's blead presents it as​:
0X1.BE08D0527E1D000069C4B77FP-991

And the correct rendition is​:
0X1.0BE08D0527E1D69C4B77FP-990

Even for an easily representable float such as 128.625 (where the entire
value is held in the most siginificant double and the least significant
double is 0), today's blead presents it as 0X1.14P+6, but correct rendition
is 0X1.014P+7.

Anyway - good luck with it. (It would be nice to see this up and running
with double-doubles, but it's not something that I'm reliant upon.)

Is it not possible for you to achieve the desired result via C's %La/%LA
formatting ?

That would leave us dependent on the vendors' implementations of C99.
Two problems with this​:

(1) C99 - which we do not require, and enabling of which requires often
various contortions while compiling​: different cc wrapper, different
flags, different libraries.

(2) there's wiggle room in the spec, which inevitably leads into
diverging implementations. One example of wiggle room is whether to
print the trailing zero nybbles. Another is the choice of lead
xdigit/exponent alignment. Another huge one is what the heck to do with
the long doubles... at least with our own implementation we get to do
our own mistakes. (Cue in http​://xkcd.com/927/)

Cheers,
Rob

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2015

From @jhi

This is way, way implemented already.

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2015

@jhi - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Mar 4, 2015
@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @pjacklam

I'm trying to implement hexadecimal/octal/binary floats for bignum and the
Math​::Big* modules, but I have stubled across a few things I find a bit
odd. It would be nice if someone could explain the logic to me. I am using
Perl 5.22.0 on Cygwin (I was going to try the cases below on Solaris as
well, but as of Perl 5.22.0 I have so far had no success building Perl on
Solaris with the Sun compiler.)

Case 1​:

This looks OK​:

$ perl -wle 'print 0x0.1p+0'
0.0625

but what's with the following output​:

$ perl -wle 'print 0x0.10000000000000001p+0'
Hexadecimal float​: mantissa overflow at -e line 1.
3.3881317890172e-21

If I do a similar case with decimal floats, the added ...0001 at the end
doesn't cause a totally different output​:

$ perl -wle 'print 0.1e0'
0.1

$ perl -wle 'print 0.10000000000000000000000000001e0'
0.1

Case 2​:

The following gives me a warning about an invalid octal digit, as expected​:

$ perl -wle 'print 018p0'
Illegal octal digit '8' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But if the invalid digit is after the dot, I get no warning​:

$ perl -wle 'print 01.8p0'
1

Ditto with binary numbers​:

$ perl -wle 'print 0b2.1p0'
Illegal binary digit '2' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But no warning if the digit 2 is after the dot​:

$ perl -wle 'print 0b1.2p0'
1

Peter

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @pjacklam

(Sorry I forgot to add perl5-porters to my previous message.)

Here is another odd case. This looks OK​:

$ perl -wle 'print 0 + 0x0.p0'
0

But what's with the following?

$ perl -wle 'print 0 + 0x.p0'
0p0

Peter

2015-11-05 15​:15 GMT+01​:00 Peter John Acklam <pjacklam@​gmail.com>​:

I'm trying to implement hexadecimal/octal/binary floats for bignum and the
Math​::Big* modules, but I have stubled across a few things I find a bit
odd. It would be nice if someone could explain the logic to me. I am using
Perl 5.22.0 on Cygwin (I was going to try the cases below on Solaris as
well, but as of Perl 5.22.0 I have so far had no success building Perl on
Solaris with the Sun compiler.)

Case 1​:

This looks OK​:

$ perl -wle 'print 0x0.1p+0'
0.0625

but what's with the following output​:

$ perl -wle 'print 0x0.10000000000000001p+0'
Hexadecimal float​: mantissa overflow at -e line 1.
3.3881317890172e-21

If I do a similar case with decimal floats, the added ...0001 at the end
doesn't cause a totally different output​:

$ perl -wle 'print 0.1e0'
0.1

$ perl -wle 'print 0.10000000000000000000000000001e0'
0.1

Case 2​:

The following gives me a warning about an invalid octal digit, as expected​:

$ perl -wle 'print 018p0'
Illegal octal digit '8' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But if the invalid digit is after the dot, I get no warning​:

$ perl -wle 'print 01.8p0'
1

Ditto with binary numbers​:

$ perl -wle 'print 0b2.1p0'
Illegal binary digit '2' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But no warning if the digit 2 is after the dot​:

$ perl -wle 'print 0b1.2p0'
1

Peter

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @jhi

On Thu, Nov 5, 2015 at 9​:15 AM, Peter John Acklam via RT
<perlbug-followup@​perl.org> wrote​:

I'm trying to implement hexadecimal/octal/binary floats for bignum and the
Math​::Big* modules, but I have stubled across a few things I find a bit
odd. It would be nice if someone could explain the logic to me. I am using
Perl 5.22.0 on Cygwin

(I was going to try the cases below on Solaris as
well, but as of Perl 5.22.0 I have so far had no success building Perl on
Solaris with the Sun compiler.)

Well, that is strange, too​: I have built it fine in x86 Solaris.
Which Sun compiler?

(Also in sparc solaris, but there I think is only gcc, not Sun
compiler, have to check.)

Case 1​:

This looks OK​:

$ perl -wle 'print 0x0.1p+0'
0.0625

but what's with the following output​:

$ perl -wle 'print 0x0.10000000000000001p+0'
Hexadecimal float​: mantissa overflow at -e line 1.
3.3881317890172e-21

The hexadecimal float parsing code knows exactly when the added digits
would fall to the floor. The decimal conversion doesn't care. In
other words, the hexfloat conversion is stricter than the decimal
float parsing. This may or may not be seen as a downside.

If I do a similar case with decimal floats, the added ...0001 at the end
doesn't cause a totally different output​:

$ perl -wle 'print 0.1e0'
0.1

$ perl -wle 'print 0.10000000000000000000000000001e0'
0.1

Case 2​:

The following gives me a warning about an invalid octal digit, as expected​:

$ perl -wle 'print 018p0'
Illegal octal digit '8' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But if the invalid digit is after the dot, I get no warning​:

$ perl -wle 'print 01.8p0'
1

Ditto with binary numbers​:

Sorry to state the obvious but hexadecimal floats are about ...
hexadecimal floats. Not binary, not octal. There was no precedent
for things like "binary floats", so no such code was added. (And on a
personal note, the octal syntax should just die.)

In most cases when you start mixing dots and digits (or hexdigits),
what you get is either the string-concat-dot, or the
version-string-dot.

I tried to carve a very strict path through the lexer which allows
only the hexadecimal float format, as defined by C99. Which means
most importantly two things​: the "0x" prefix is required, and the
"pNN" exponent is required, otherwise there is just a boatload of
ambiguities. If those two are not present, the hexadecimal floating
point syntax is not relevant.

$ perl -wle 'print 0b2.1p0'
Illegal binary digit '2' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.

But no warning if the digit 2 is after the dot​:

$ perl -wle 'print 0b1.2p0'
1

Peter

--
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @Abigail

On Thu, Nov 05, 2015 at 03​:24​:05PM +0100, Peter John Acklam wrote​:

(Sorry I forgot to add perl5-porters to my previous message.)

Here is another odd case. This looks OK​:

$ perl -wle 'print 0 + 0x0.p0'
0

But what's with the following?

$ perl -wle 'print 0 + 0x.p0'
0p0

The 0x.p0 is parsed as 0 . 'p0', and it gives an error under strict
(bareword 'p0' not allowed).

I understand the C<< . 'p0' >> part, but what I did not expect is
that C<< 0x >> is valid, and equivalent to 0​:

  $ perl -Mstrict -wE 'say 0x'
  0
  $

C<< 0x_ >> is valid as well, but it warns about a misplaced _ in
a number, and does so twice​:

  $ perl -Mstrict -wE 'say 0x_'
  Misplaced _ in number at -e line 1.
  Misplaced _ in number at -e line 1.
  0
  $

But it's great for JAPHs.

Abigail

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @jhi

the hexfloat conversion is stricter than the decimal
float parsing. This may or may not be seen as a downside.

... but I think it's the right thing to do, since one of the major
reasons for hexadecimal floats is that you can specify down to the
last bit the float you are expecting, without any conversion lossage
due to from decimal. (The classic example of "0.1" not being an
exact number.)

--
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @jhi

I tried to carve a very strict path through the lexer which allows
only the hexadecimal float format, as defined by C99. Which means
most importantly two things​: the "0x" prefix is required, and the
"pNN" exponent is required, otherwise there is just a boatload of
ambiguities. If those two are not present, the hexadecimal floating
point syntax is not relevant.

http​://perl5.git.perl.org/perl.git/blob/HEAD​:/t/op/hexfp.t

may be illuminating here. There I parse valid and invalid hexfp.

The tests after the comment "Test certain things that are not
hexfloats and should stay that way." shows some of the not-hex-fp
things I ran into.

--
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @khwilliamson

On 11/05/2015 08​:13 AM, Jarkko Hietaniemi wrote​:

the hexfloat conversion is stricter than the decimal
float parsing. This may or may not be seen as a downside.

... but I think it's the right thing to do, since one of the major
reasons for hexadecimal floats is that you can specify down to the
last bit the float you are expecting, without any conversion lossage
due to from decimal. (The classic example of "0.1" not being an
exact number.)

I view it as not a downside, but an upside

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From Eirik-Berg.Hanssen@allverden.no

On Thu, Nov 5, 2015 at 3​:47 PM, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

Sorry to state the obvious but hexadecimal floats are about ...
hexadecimal floats. Not binary, not octal. There was no precedent
for things like "binary floats", so no such code was added. (And on a
personal note, the octal syntax should just die.)

  Wow. So …

$ perl -wle 'print 0b1.1p0'
1.5
$

  … that's just emergent behaviour?

  Cool! :)

Eirik

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @jhi

On Thursday-201511-05 10​:29, Eirik Berg Hanssen via RT wrote​:

… that's just emergent behaviour?

"Emergent behaviour" describes the whole of Perl rather beautifully,
don't you think?

@p5pRT
Copy link
Author

p5pRT commented Nov 5, 2015

From @jhi

So it really does look like the hexfp parsing code implementation is leaking over to supporting unintentionally also binary and octal...

My preference would be to stop this particular emergent behaviour, at least for now.

Stopping the leak would be trivial​:

--- a/toke.c
+++ b/toke.c
@​@​ -10455,7 +10455,7 @​@​ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp)
  Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX), "Misplaced _ in number");
  }

- if (UNLIKELY(HEXFP_PEEK(s))) {
+ if (UNLIKELY(HEXFP_PEEK(s)) && skip == 4) {
  /* Do sloppy (on the underbars) but quick detection
  * (and value construction) for hexfp, the decimal
  * detection will shortly be more thorough with the

and then e.g. 0b101.101p0 would barf as expected, on the "p0".

Whether we want to really support "binfp" and "octfp", I don't know. I'm worried about what surprising corners of the language that will reveal...

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2015

From @pjacklam

2015-11-05 15​:47 GMT+01​:00 Jarkko Hietaniemi <jhi@​iki.fi>​:

Peter John Acklam via RT <perlbug-followup@​perl.org> wrote​:

I'm trying to implement hexadecimal/octal/binary floats for bignum and
the
Math​::Big* modules (...)

Sorry to state the obvious but hexadecimal floats are about ...
hexadecimal floats. Not binary, not octal.

Yes, but Perl being ... eh ... Perl, you never know. :-) And oct()
isn't only about octal numbers -- it handles hexadecimal and
binary numbers too. I thought the handling of binary and octal
floats was intended.

Anyway, thanks for the explanations, everyone!

Peter

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2015

From @ikegami

On Thu, Nov 5, 2015 at 9​:47 AM, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

On Thu, Nov 5, 2015 at 9​:15 AM, Peter John Acklam via RT

but what's with the following output​:

$ perl -wle 'print 0x0.10000000000000001p+0'
Hexadecimal float​: mantissa overflow at -e line 1.
3.3881317890172e-21

The hexadecimal float parsing code knows exactly when the added digits
would fall to the floor. The decimal conversion doesn't care. In
other words, the hexfloat conversion is stricter than the decimal
float parsing. This may or may not be seen as a downside.

That addresses the warning, but what about the fact that the result could
be approximately right instead of completly wrong?

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2015

From @jhi

That addresses the warning, but what about the fact that the result could
be approximately right instead of completly wrong?

I have no idea what are you saying here.

If you are saying that hexadecimal floating constants should silently
drop the low-order digits, you won't be seeing patches from me to that
effect, for the reasons I've already explained.

--
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2015

From Eirik-Berg.Hanssen@allverden.no

On Fri, Nov 6, 2015 at 10​:14 PM, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

That addresses the warning, but what about the fact that the result could
be approximately right instead of completly wrong?

I have no idea what are you saying here.

If you are saying that hexadecimal floating constants should silently
drop the low-order digits, you won't be seeing patches from me to that
effect, for the reasons I've already explained.

  He's not saying anything about "silently".

  I think he's suggesting that dropping the low-order digits would be
better than dropping the high-order digits​:

eirik@​purplehat[23​:08​:10]$ perl -E 'say 0xa501.00000000000001p+0'
1
eirik@​purplehat[23​:08​:21]
$ perl -E 'say 0xa503.00000000000001p+0'
3
eirik@​purplehat[23​:08​:25]$ perl -E 'say 0xa580.00000000000001p+0'
128
eirik@​purplehat[23​:08​:32]
$

  I'm inclined to agree.

  (Hey, without warnings enabled, it is actually dropping the high-order
digits "silently".)

Eirik

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2015

From @jhi

On Friday-201511-06 17​:10, Eirik Berg Hanssen wrote​:

On Fri, Nov 6, 2015 at 10​:14 PM, Jarkko Hietaniemi <jhi@​iki.fi
<mailto​:jhi@​iki.fi>> wrote​:

> That addresses the warning\, but what about the fact that the result could
> be approximately right instead of completly wrong?

I have no idea what are you saying here\.

If you are saying that hexadecimal floating constants should silently
drop the low\-order digits\, you won't be seeing patches from me to that
effect\, for the reasons I've already explained\.

He's not saying anything about "silently".

I think he's suggesting that dropping the low-order digits would be
better than dropping the high-order digits​:

eirik@​purplehat[23​:08​:10]$ perl -E 'say 0xa501.00000000000001p+0'
1
eirik@​purplehat[23​:08​:21]
$ perl -E 'say 0xa503.00000000000001p+0'
3
eirik@​purplehat[23​:08​:25]$ perl -E 'say 0xa580.00000000000001p+0'
128
eirik@​purplehat[23​:08​:32]
$

I'm inclined to agree.

Okay, that's a definite bug. It seems I lacked enough creative
evilness when trying to test all the corners.

Hmm. What would be a good mode of failure here? Emit a warning
that must be explicitly disabled? (And of course, stop 'shifting'
the high-order bits away.)

(Hey, without warnings enabled, it is actually dropping the
high-order digits "silently".)

Eirik

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2015

From @jhi

On Friday-201511-06 17​:10, Eirik Berg Hanssen wrote​:

On Fri, Nov 6, 2015 at 10​:14 PM, Jarkko Hietaniemi <jhi@​iki.fi
<mailto​:jhi@​iki.fi>> wrote​:

> That addresses the warning\, but what about the fact that the result could
> be approximately right instead of completly wrong?

I have no idea what are you saying here\.

If you are saying that hexadecimal floating constants should silently
drop the low\-order digits\, you won't be seeing patches from me to that
effect\, for the reasons I've already explained\.

He's not saying anything about "silently".

I think he's suggesting that dropping the low-order digits would be
better than dropping the high-order digits​:

eirik@​purplehat[23​:08​:10]$ perl -E 'say 0xa501.00000000000001p+0'
1
eirik@​purplehat[23​:08​:21]
$ perl -E 'say 0xa503.00000000000001p+0'
3
eirik@​purplehat[23​:08​:25]$ perl -E 'say 0xa580.00000000000001p+0'
128
eirik@​purplehat[23​:08​:32]
$

I'm inclined to agree.

Now opened

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=126582

explicitly for this.

(Hey, without warnings enabled, it is actually dropping the
high-order digits "silently".)

Eirik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant