Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instant.from-posix has false future leap second knowledge #4552

Open
p6rt opened this issue Sep 20, 2015 · 10 comments
Open

Instant.from-posix has false future leap second knowledge #4552

p6rt opened this issue Sep 20, 2015 · 10 comments
Labels
RFC Request For Comments

Comments

@p6rt
Copy link

p6rt commented Sep 20, 2015

Migrated from rt.perl.org#126119 (status was 'open')

Searchable as RT126119$

@p6rt
Copy link
Author

p6rt commented Sep 20, 2015

From zefram@fysh.org

The Instant class supposedly represents times on the TAI time scale,
with subtraction of Instants yielding counts of atomic seconds.
The corresponding difference of POSIX time_t values yields a count
of non-leap seconds. The difference between these two differences,
for corresponding endpoints, therefore yields a count of leap seconds.
Instant.from-posix() will conveniently perform the translation. So here's
some code to count the leap seconds that occur in an interval specified
by time_t values​:

$ ./perl6 -e 'sub leaps-between ($a, $b) { (Instant.from-posix($b) - Instant.from-posix($a)) - ($b - $a) }; say leaps-between(time - 1000000000, time); say leaps-between(time, time + 1000000000)'
14
0

The first of the time intervals on which I've tried that runs from January
1984 to today, and the count of 14 leap seconds is historically correct.
The second interval runs from today to May 2047... and will almost
certainly contain a non-zero number of leap seconds. The count of zero
is bogus.

In reality, we don't know how many leap seconds there will be in
the next gigasecond. We can guess​: anything from five to twenty
is defensible. But there is no value that we can say is certainly
correct. So Instant.from-posix(time + 1000000000) cannot produce any
definitely-correct value. It really ought to signal an error.

If it is really intended to guess, in cases where a definitive answer is
available, then the guess that it is making is crap. To make a reasonable
guess, use a quadratic formula based on the observed tidal braking,
and quantise it to one-second leaps on appropriate month boundaries.
A finer guess could be made for the next few decades by extrapolating
from recent decades' measurements.

Also, if it's a guessing function, it ought to be named in a way that
clues in the user. "from-posix-best-guess" or so. It would also be
sane to have both kinds of conversion function.

-zefram

@p6rt
Copy link
Author

p6rt commented Jul 26, 2016

From @zoffixznet

My feedback for RFC​:

Currently, the leap seconds are added as-they-get-known. For example, on my bleed Rakudo, I get 14/1 with your code, because the newly announced Dec.2016 leap second was added.

This means the result of the code is dependent on the version of the compiler and is thus inherently unpredictable, **even for historical values.**

So trying to use a "guessing formula" won't rectify the issue, since at best it can guess, but it would add overhead execution time. So IMO, the current behaviour is fine as is. If someone's application requires super precision they should make/use a module that can both utilize various guessing formulas and be updated on a particular system more easily than entire Rakudo.

--
Cheers,
ZZ | https://twitter.com/zoffix

@p6rt
Copy link
Author

p6rt commented Jul 26, 2016

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Jul 27, 2016

From @zoffixznet

This is a response to this Zefram's email that for some reason didn't make it to the ticket​: http://www.nntp.perl.org/group/perl.perl6.compiler/2016/07/msg13370.html

Quoting Zefram <zefram@​fysh.org>​:

A version with reasonable guessing for `future' times would be qualitatively
similar, in that it can produce either the right answer or a wrong
answer, but the wrong answers would be quantitatively less wrong.

But that would prevent you from working around the issue and it wouldn't be useful.
If your algorithm fails when future unknown leap seconds are zero, it won't magically
succeed if given what amounts to a throw of dice. The only difference is you're paying
with computing power for that dice throw (and .from-posix isn't the only method that
has this leap second caveat).

Returning zero is much more predictable than returning a guess, because I can choose
any guessing algorithm I wish, should I require it. But most importantly, I
can just patch my code for older compilers when new leap seconds are announced. Here's
a rough patch for your code snippet​:

./perl6 -e 'sub leaps-between ($a, $b) { (Instant.from-posix($b) - Instant.from-posix($a)) - ($b - $a) + ($*VM.version before v2016.07 and $b after DateTime.new("2016-12-31T23​:59​:59") ?? 1 !! 0) }; say leaps-between(time - 1000000000, time); say leaps-between(time, time + 1000000000)'

Note that I was able to insert the now-known new leap second at the precise
point of time it exists, and the code gives correct result regardless
of whether I'm using a pre 2016.07 compiler. I was able to do so *precisely*
because Rakudo does not try to guess what it doesn't know and returns 0 instead.

I documented the behavior and the workaround method for older compilers in the docs​:
https://docs.perl6.org/type/Instant#Future_Leap_Seconds When I get some tuits, I'll
also release a module that would make it easier to get the correct number of seconds,
regardless of the compiler. IMO that resolves this ticket. I'll close it in a few days,
unless there's more feedback on the issue.

@p6rt
Copy link
Author

p6rt commented Jul 27, 2016

From zoffix@zoffix.com

Perhaps, we should evaluate some of the leap-second estimation
algorithms. There are
more leap second problems in Rakudo besides the .from-posix, that I
now opened in
https://rt-archive.perl.org/perl6/Ticket/Display.html?id=128752

Quoting Zefram <zefram@​fysh.org>​:

Zoffix Znet via RT wrote​:

Returning zero is much more predictable than returning a guess,

The case where the algorithm is operating in its unknown-future regime is
not as easily spotted as that. The return value from the conversion is
not a fixed value, it's a value that is well-formed and valid apart from
(probably) being the wrong answer. But even identifying the last leap
second the implementation knows about doesn't really tell you where
its knowledge ends​: at any time there is some period following the
last scheduled leap second for which it is known that there will be no
further leaps.

Signalling an error would make clear whether the threshold of
implementation knowledge has been passed. Another possibility, which
I didn't raise earlier, would be to export a value that explicitly
identifies where the threshold is​: one would document that the conversion
only works for times earlier than this advertised threshold.

But most importantly, I
can just patch my code for older compilers when new leap seconds
are announced.

That only handles leap seconds that the user code knows about
specifically. If you pursue that approach you'd end up with a long and
growing list of leap seconds in the user code, which rather defeats the
point of using the core implementation which has such a list. The fixup
code would be even more complicated than the core implementation, because
it would also incorporate knowledge of which leap seconds are known to
a long and growing list of core implementations. It would be easier to
fully reimplement Instant.from-posix() with one's own leap second list,
rather than use the core Instant.from-posix() and then fix up in this way.

There certainly are good reasons to have an explicit distinction between
the known and unknown regions, but this fixup scenario doesn't make
much sense. Your earlier "choose any guessing algorithm" is a better
motivating scenario.

Note that I was able to insert the now-known new leap second at the precise
point of time it exists, and the code gives correct result regardless
of whether I'm using a pre 2016.07 compiler.

No, it does not. It does succeed in incorporating knowledge of
that specific leap second, but the answer that it gives for the next
gigasecond is 1 leap second, which is almost certainly not correct,
just as the original 0 was almost certainly not correct. (Slightly less
certain, of course, as it's a move in the direction of the likely range.)
On future versions of Rakudo that know about even more leap seconds it
will then give different results, progressively closer to correct, so
the consistency across Rakudo versions that you achieve is rather limited.

Perhaps you only intended this code to be applied to the region for
which the code has knowledge of the leap schedule, in which case you have
achieved what you intended and the above would be an irrelevant trifle.
But you did include the next-gigasecond invocation in your version of
the example.

I was able to do so *precisely*
because Rakudo does not try to guess what it doesn't know and
returns 0 instead.

That's not quite true. You were able to do it that way because you know
exactly what guess the older versions of Rakudo would make. That guess
didn't have to be the especially crap one that there would be no more leap
seconds ever; it suffices that the guess is deterministic. But this feels
trifling when, as I said earlier, I find this fixup concept untenable.

I documented the behavior and the workaround method for older
compilers in the docs​:
https://docs.perl6.org/type/Instant#Future_Leap_Seconds

That's certainly a significant improvement, thanks. There are a couple
of issues with the wording, though.

"methods ... do not make any guesses" isn't really true, because the
method does return a specific answer that implies a specific future leap
schedule. From the point of view of the caller, it's guessing that there
will never be any more leap seconds after the last one it knows about.
If it signalled an error, that would constitute not guessing.

Also, "leap seconds in the future" reads as if it's referring to the
future of when the method is called. This could do with some rewording
in accordance with what we discussed upthread, to make clear that it may
apply to leap seconds in the past of the call time. The true situation
is somewhat implied by the subsequent discussion of "depending on the
compiler version", but that comes across as conflicting with the "in
the future" rather than as clarifying it.

Putting those together, I suggest that the first sentence of this doc
section should read

The methods that involve knowledge of leap seconds always assume
that there will be no further leaps after the last leap second
that the implementation knows about, which may not be the last
leap second that has actually been scheduled\.

-zefram

@p6rt
Copy link
Author

p6rt commented Jul 28, 2016

From zefram@fysh.org

Zoffix Znet via RT wrote​:

This means the result of the code is dependent on the version of the
compiler and is thus inherently unpredictable, **even for historical
values.**

Absolutely right that the `future' behaviour also applies to times that
are actually in the past, when running on an old Rakudo version. It's the
future from the point of view of when the code was written that matters.

Be careful when you speak of unpredictability. There are different
classes of unpredictability that are worth distinguishing. The current
implementation is unpredictable in that it can produce either the right
answer or a wrong answer, and the latter can be very wrong. A version
with reasonable guessing for `future' times would be qualitatively
similar, in that it can produce either the right answer or a wrong
answer, but the wrong answers would be quantitatively less wrong.
But an implementation that throws an exception for unknown times would
only be able to either produce the right answer or throw an exception,
and not able to produce a wrong answer. This is still in one sense
unpredictable, but it's qualitatively better, and is predictably correct
in the non-exception cases.

So trying to use a "guessing formula" won't rectify the issue, since
at best it can guess, but it would add overhead execution time.

This is a reasonable position to take. It is fine to punt that kind of
use case to the module ecosystem.

So IMO, the current behaviour is fine as is.

But this is a poor conclusion. If your position is that a good guess
is no more use than a bad guess, this implies that you're only concerned
about whether the answer from the function is correct or incorrect, and
that an incorrect answer has no value. But with the current behaviour
it's impossible to tell which you're getting, which means that the
answer in any case, and therefore the function, is of no value at all.
(Except when you know you're asking about historical times that all
versions of Rakudo know about.)

If you're not interested in making a reasonable guess for the unknown
cases, the only sensible behaviour for those cases is an exception.

If you really really want to bless the current behaviour, then the
function needs to be documented with appropriate qualification about
the quality of the answer. Since the answer for the `future' period is
garbage, and the caller can't know when that period begins, you'd need
to caution the user in terms such as "Only good for times up to the
year 2014; when applied to any later time the result is meaningless.".
Once you've got that in the API definition, of course, you might as
well enforce it by having the function throw an exception for anything
past 2014.

-zefram

@p6rt
Copy link
Author

p6rt commented Jul 28, 2016

From zefram@fysh.org

Zoffix Znet via RT wrote​:

Returning zero is much more predictable than returning a guess,

The case where the algorithm is operating in its unknown-future regime is
not as easily spotted as that. The return value from the conversion is
not a fixed value, it's a value that is well-formed and valid apart from
(probably) being the wrong answer. But even identifying the last leap
second the implementation knows about doesn't really tell you where
its knowledge ends​: at any time there is some period following the
last scheduled leap second for which it is known that there will be no
further leaps.

Signalling an error would make clear whether the threshold of
implementation knowledge has been passed. Another possibility, which
I didn't raise earlier, would be to export a value that explicitly
identifies where the threshold is​: one would document that the conversion
only works for times earlier than this advertised threshold.

But most importantly, I
can just patch my code for older compilers when new leap seconds are announced.

That only handles leap seconds that the user code knows about
specifically. If you pursue that approach you'd end up with a long and
growing list of leap seconds in the user code, which rather defeats the
point of using the core implementation which has such a list. The fixup
code would be even more complicated than the core implementation, because
it would also incorporate knowledge of which leap seconds are known to
a long and growing list of core implementations. It would be easier to
fully reimplement Instant.from-posix() with one's own leap second list,
rather than use the core Instant.from-posix() and then fix up in this way.

There certainly are good reasons to have an explicit distinction between
the known and unknown regions, but this fixup scenario doesn't make
much sense. Your earlier "choose any guessing algorithm" is a better
motivating scenario.

Note that I was able to insert the now-known new leap second at the precise
point of time it exists, and the code gives correct result regardless
of whether I'm using a pre 2016.07 compiler.

No, it does not. It does succeed in incorporating knowledge of
that specific leap second, but the answer that it gives for the next
gigasecond is 1 leap second, which is almost certainly not correct,
just as the original 0 was almost certainly not correct. (Slightly less
certain, of course, as it's a move in the direction of the likely range.)
On future versions of Rakudo that know about even more leap seconds it
will then give different results, progressively closer to correct, so
the consistency across Rakudo versions that you achieve is rather limited.

Perhaps you only intended this code to be applied to the region for
which the code has knowledge of the leap schedule, in which case you have
achieved what you intended and the above would be an irrelevant trifle.
But you did include the next-gigasecond invocation in your version of
the example.

I was able to do so *precisely*
because Rakudo does not try to guess what it doesn't know and returns 0 instead.

That's not quite true. You were able to do it that way because you know
exactly what guess the older versions of Rakudo would make. That guess
didn't have to be the especially crap one that there would be no more leap
seconds ever; it suffices that the guess is deterministic. But this feels
trifling when, as I said earlier, I find this fixup concept untenable.

I documented the behavior and the workaround method for older compilers in the docs​:
https://docs.perl6.org/type/Instant#Future_Leap_Seconds

That's certainly a significant improvement, thanks. There are a couple
of issues with the wording, though.

"methods ... do not make any guesses" isn't really true, because the
method does return a specific answer that implies a specific future leap
schedule. From the point of view of the caller, it's guessing that there
will never be any more leap seconds after the last one it knows about.
If it signalled an error, that would constitute not guessing.

Also, "leap seconds in the future" reads as if it's referring to the
future of when the method is called. This could do with some rewording
in accordance with what we discussed upthread, to make clear that it may
apply to leap seconds in the past of the call time. The true situation
is somewhat implied by the subsequent discussion of "depending on the
compiler version", but that comes across as conflicting with the "in
the future" rather than as clarifying it.

Putting those together, I suggest that the first sentence of this doc
section should read

  The methods that involve knowledge of leap seconds always assume
  that there will be no further leaps after the last leap second
  that the implementation knows about, which may not be the last
  leap second that has actually been scheduled.

-zefram

@p6rt
Copy link
Author

p6rt commented Jul 28, 2016

From zefram@fysh.org

zoffix@​zoffix.com wrote​:

Perhaps, we should evaluate some of the leap-second estimation algorithms.

If you like. To be clear, I'm not pushing for the conversion to use an
estimation strategy per se, and we're now going beyond what's necessary
to address my original bug report. Documenting the existing behaviour
resolves the bug that I reported qua bug. I do still reckon the existing
behaviour sucks, but with it as a documented API we're in the realm
of differing judgements on a language design question, rather than a
clear bug.

The preferred outcome for which I was pushing was to have either (or
both, in separate methods) of the behaviours that I consider sensible​:
signalling an error or making a reasonable estimate. You've made clear
that you're not at all a fan of estimation, and that's fine. It's totally
compatible with my preferences, if one then accepts the conclusion that
the conversion should signal an error. But I'm getting the impression
that you don't find erroring very palatable either.

The starting point for leap estimation is the tidal braking effect by
which the Moon is gradually slowing the rotation of the Earth, affecting
the UT1<->TT relationship. This is a long-term secular change, which
therefore must be taken into account in order to make any reasonable
estimate any significant number of years beyond one's present knowledge.
There are several other effects on the Earth's rotation which affect
UT1<->TT, but they are all oscillations (on periods of a day up to
decades), not secular drift. They can therefore be ignored, at least
for an initial version and for our purposes quite likely forever, even
though on the decadal timescale they swamp the tidal braking effect.
To qualify as a reasonable estimate of UT1<->TT it is both necessary
and sufficient to account for recent length of day and tidal braking.

<http://www.ucolick.org/~sla/leapsecs/deltat.html> has some nice plots
showing differences between time scales. The first plot, with the
3000-year span of UT1<->TT, is the most relevant to our situation.
The roughly-quadratic curve there is what needs to be extrapolated.
The way this historical information has been determined over such a span,
extending way before mechanical time measurement, is pretty clever stuff​:
a written record that a solar eclipse was visible from a particular
geographic location tells you which way the Earth was pointing (UT1)
at a time (TT) that can be precisely determined by orbital calculations.

There have been many academic attempts to model and extrapolate this
curve. They differ largely in how closely they attempt to model
the last couple of centuries for which we have much more precise
measurements. Any attempt to model variations on such a short
timescale necessarily ends up modelling some of the oscillations,
not just the secular trend, so ends up a lot more complicated.
<http://www.ucolick.org/~sla/leapsecs/future2100.pdf> is a nice plot
comparing a variety of models against the eclipse observations. As you
can see, there's quite a bit of disagreement between the models, and none
of them is a great fit to the observations. But the models are of value​:
there's a rough agreement if one ignores the linear ones.

Excluding the linear models, it looks like none of them is compellingly
superior to another for our purpose. Let's therefore take the simplest
class of these models​: a pure linear increase in length of day, giving
a pure parabola of delta-T. Middle-of-the-road values to use are a
LOD increase of 1.7 ms per century (astronomers use the mean Julian
century, 36525 days), with LOD exactly equal to 86400 s at the year 1820.
(1820 is the midpoint of the 1750-to-1890 span of the observations behind
Simon Newcomb's theory of the planetary orbits, which is indirectly what
fixed the length of the second as a modern unit of measurement.)

To fill out our model of projected UTC, let's presume that at the
threshold date (the date of the next possible leap second not yet
scheduled) UT1=UTC, then we'll graft onto that the remaining portion of
the delta-T parabola. That gives us a model of TAI-UT1 for the future.
Then let's suppose that each leap second happens at the end of the
Gregorian month in which the fractional part of TAI-UT1 crosses 0.5.
(Current practice is for leaps to happen only in June and December,
but the rules allow the end of any month. Starting in the 38th century
we require more than one leap per month; it's anybody's guess how UTC
will actually be managed then, so it's not too bad to model it as a
multi-second leap at the end of the month.)

The attached program implements this model, for bidirectional conversions
between TAI and UTC. There is... an amount of support code. I had to
import a bunch of fundamental Gregorian calendar stuff imitating my Perl
5 module Date​::ISO8601. The actual leap second logic is only 80 lines
in the middle of the 400 line file, and that covers exact conversions
for the known schedule as well as the estimation for the unknown future.
Each of the two regimes takes about half of the 80 lines. There's then a
bunch of ISO 8601 text formatting and parsing code, which doesn't support
the conversions themselves but is only used for the testing interface.
Invoke like this​:

$ perl6 utc_estimate.pl6 '2016-07-28T02​:26​:33 UTC' '2016-12-01T00​:00​:20.123 TAI'
2016-07-28T02​:27​:09 TAI = 2016-07-28T02​:26​:33 UTC
2016-12-01T00​:00​:20.123000 TAI = 2016-11-30T23​:59​:44.123000 UTC

Errors are checked everywhere they should be, but the error messages are
not awesome. Conversions in both directions tick correctly through leap
seconds, both real ones and guessed future ones​:

2015-07-01T00​:00​:34 TAI = 2015-06-30T23​:59​:59 UTC
2015-07-01T00​:00​:35 TAI = 2015-06-30T23​:59​:60 UTC
2015-07-01T00​:00​:36 TAI = 2015-07-01T00​:00​:00 UTC
2017-12-01T00​:00​:36 TAI = 2017-11-30T23​:59​:59 UTC
2017-12-01T00​:00​:37 TAI = 2017-11-30T23​:59​:60 UTC
2017-12-01T00​:00​:38 TAI = 2017-12-01T00​:00​:00 UTC

That was satisfying to write.

There are
more leap second problems in Rakudo besides the .from-posix,

Sure. Anything else using the leap second table runs into the same
issues in some form.

that I now
opened in
https://rt-archive.perl.org/perl6/Ticket/Display.html?id=128752

As written, that ticket is about a more specific idea of exposing the leap
second table explicitly. On its own that doesn't address the analogous
issues for other uses of the table. I don't have very much opinion
about exposing the table per se. If you do expose the current table,
you'd probably want to expose a threshold-of-the-unknown date as well,
because there's more to leap schedule knowledge than just the dates of
actual leaps. If you expose it in writable form, I'd recommend writing
via method rather than via lvalue, to avoid tying yourself to the table's
current format.

-zefram

@p6rt
Copy link
Author

p6rt commented Jul 28, 2016

From zefram@fysh.org

use v6;

# classes for representing times on particular time scales

class TaiTime {
  has FatRat​:D $.linear1958 = 0.FatRat;
  method from_linear1958(TaiTime​:U​: FatRat​:D $linear1958) {
  self.new(​:$linear1958)
  }
  multi method perl(TaiTime​:D​:) {
  "{self.perl}.from_linear1958({$!linear1958.perl})"
  }
  method mjd(TaiTime​:D​:) { 36204 + ($!linear1958 / 86400) }
  method from_mjd(TaiTime​:U​: FatRat​:D $mjd) {
  self.from_linear1958(($mjd - 36204) * 86400)
  }
  method mjdn(TaiTime​:D​:) { self.mjd.truncate }
  method mjdf(TaiTime​:D​:) { self.mjd % 1 }
  method mjdnf(TaiTime​:D​:) {
  my $mjd = self.mjd;
  return ($mjd.truncate, $mjd % 1);
  }
  method from_mjdnf(TaiTime​:U​: (Int​:D $mjdn,
  FatRat​:D $mjdf where $mjdf >= 0 && $mjdf < 1)) {
  self.from_mjd($mjdn + $mjdf)
  }
}

class UtcTime {
  has Int​:D $.mjdn = 0;
  has FatRat​:D $.mjdf = 0.FatRat;
  method mjdnf(UtcTime​:D​:) { ($!mjdn, $!mjdf) }
  method from_mjdnf(UtcTime​:U​: (Int​:D $mjdn,
  FatRat​:D $mjdf where $mjdf >= 0)) {
  self.new(​:$mjdn, :$mjdf)
  }
  multi method perl(UtcTime​:D​:) {
  "{self.perl}.from_mjdnf(({$!mjdn.perl}, {$!mjdf.perl}))"
  }
  method mjd(UtcTime​:D​:) { $!mjdn + $!mjdf }
  method from_mjd(UtcTime​:U​: FatRat​:D $mjd) {
  self.from_mjdnf(($mjd.truncate, $mjd % 1))
  }
}

# Gregorian calendar and 24-hour clock

sub year_leap(Int​:D $y) { $y %% 4 && ($y !%% 100 || $y %% 400) }

sub year_days(Int​:D $y) { year_leap($y) ?? 366 !! 365 }

my @​month_length = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
sub month_days(Int​:D $y, Int​:D $m) {
  if $m == 2 {
  return year_leap($y) ?? 29 !! 28;
  } else {
  return @​month_length[$m - 1];
  }
}

my @​nonleap_monthstarts =
  (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365);
my @​leap_monthstarts =
  (0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366);
sub year_monthstarts(Int​:D $y) {
  year_leap($y) ?? @​leap_monthstarts !! @​nonleap_monthstarts
}

sub ymd_from_mjdn(Int​:D $mjdn) {
  my $d = $mjdn - -678941;
  my $qc = $d div (365 * 400 + 97);
  $d %= (365 * 400 + 97);
  my $y = $d div 366;
  my $leaps = ($y + 3) div 4;
  $leaps -= ($leaps - 1) div 25 unless $leaps == 0;
  $d -= 365 * $y + $leaps;
  my $yd = year_days($y);
  if $d >= $yd {
  $d -= $yd;
  $y++;
  }
  $d++;
  my @​monthstarts = year_monthstarts($y);
  my $m = 1;
  while $d > @​monthstarts[$m] { $m++; }
  return ($qc * 400 + $y, $m, $d - @​monthstarts[$m - 1]);
}

sub ymd_to_mjdn((Int​:D $y, Int​:D $m where $m >= 1 && $m <= 12,
  Int​:D $d where $d >= 1)) {
  my @​monthstarts = year_monthstarts($y);
  my $md = @​monthstarts[$m] - @​monthstarts[$m - 1];
  $d <= $md or die "day number out of range";
  my $dd = @​monthstarts[$m - 1] + $d - 1;
  my $qc = $y div 400;
  my $yy = $y % 400;
  my $leaps = ($yy + 3) div 4;
  $leaps -= ($leaps - 1) div 25 unless $leaps == 0;
  return -678941 + (365 * 400 + 97) * $qc + 365 * $yy + $leaps + $dd;
}

sub hms_from_mjdf(FatRat​:D $mjdf where $mjdf >= 0) {
  my $hf = $mjdf * 24;
  my $h = $hf.truncate;
  if $h >= 24 {
  return (23, 59, ($mjdf * 86400) - 86340);
  } else {
  my $mf = ($hf % 1) * 60;
  my $m = $mf.truncate;
  return ($h, $m, ($mf % 1) * 60);
  }
}

sub hms_to_mjdf((Int​:D $h where $h >= 0 && $h <= 23,
  Int​:D $m where $m >= 0 && $m <= 59,
  FatRat​:D $s where $s >= 0)) {
  ($h == 23 && $m == 59) || $s < 60 or die "seconds out of range";
  return ($h * 60 + $m).FatRat / 1440 + $s / 86400;
}

# leap second processing

my @​known_leaps = <
  1972-06 1972-12 1973-12 1974-12 1975-12 1976-12 1977-12 1978-12 1979-12
  1981-06 1982-06 1983-06 1985-06 1987-12 1989-12 1990-12 1992-06 1993-06
  1994-06 1995-12 1997-06 1998-12 2005-12 2008-12 2012-06 2015-06 2016-12

;
my $next_possible_leap = "2017-06";

sub leap_month_to_mjdn(Str​:D $str) {
  (my $y, my $m) = $str.split("-").map({ .Int });
  $m++;
  if $m == 13 { $y++; $m = 1; }
  return ymd_to_mjdn(($y, $m, 1));
}

sub leap_month_to_tai(Str​:D $str, Int​:D $dtai) {
  return TaiTime.from_mjd(leap_month_to_mjdn($str) +
  FatRat.new($dtai, 86400));
}

my @​utc_segs = ({
  start_utc_mjdn => leap_month_to_mjdn("1971-12"),
  start_tai => leap_month_to_tai("1971-12", 10).linear1958,
  dtai => 10,
},);
for @​known_leaps {
  my $dtai = @​utc_segs[*-1]<dtai> + 1;
  my $bound = leap_month_to_tai($_, $dtai).linear1958;
  @​utc_segs[*-1]<end_tai> = $bound;
  my $end_utc_mjdn = leap_month_to_mjdn($_);
  @​utc_segs[*-1]<end_utc_mjdn> = $end_utc_mjdn;
  @​utc_segs.push​: {
  start_utc_mjdn => $end_utc_mjdn,
  start_tai => $bound,
  dtai => $dtai,
  };
}
@​utc_segs[*-1]<end_tai> =
  leap_month_to_tai($next_possible_leap, @​utc_segs[*-1]<dtai>).linear1958;
@​utc_segs[*-1]<end_utc_mjdn> = leap_month_to_mjdn($next_possible_leap);

my $lod_nominal_time = leap_month_to_tai("1819-12", -20);
my $lod_increase_rate = 0.0017;
my $xlod_at_threshold = $lod_increase_rate *
  (@​utc_segs[*-1]<end_tai> - $lod_nominal_time.linear1958) /
  (86400*36525);

sub tai-ut1_for_tai(FatRat​:D $lin) {
  my $tdiffdays = ($lin - @​utc_segs[*-1]<end_tai>) / 86400;
  return @​utc_segs[*-1]<dtai> + $xlod_at_threshold * $tdiffdays +
  $lod_increase_rate * $tdiffdays * $tdiffdays / (2 * 36525);
}

sub dtai_for_month(Int​:D $mjdn) {
  my $dtai = 0;
  loop {
  my $lin = TaiTime.from_mjdnf(($mjdn, 0.FatRat)).linear1958 +
  $dtai;
  my $tai-ut1 = tai-ut1_for_tai($lin);
  my $new_dtai = ($tai-ut1 + 0.5).truncate;
  return $dtai if $new_dtai == $dtai;
  $dtai = $new_dtai;
  }
}

sub mjdn_start_month(Int​:D $mjdn) {
  my $ymd = ymd_from_mjdn($mjdn);
  return ymd_to_mjdn(($ymd[0], $ymd[1], 1));
}

sub mjdn_next_month(Int​:D $mjdn) {
  my $ymd = ymd_from_mjdn($mjdn);
  return $mjdn + month_days($ymd[0], $ymd[1]);
}

sub utc_from_tai_best_guess(TaiTime​:D $tai) {
  my $lin = $tai.linear1958;
  if $lin < @​utc_segs[0]<start_tai> {
  die "can't handle time prior to leap-seconds UTC";
  } elsif $lin < @​utc_segs[*-1]<end_tai> {
  my $l = 0;
  my $r = @​utc_segs.elems - 1;
  until $r == $l {
  my $t = ($l + $r) +> 1;
  if $lin < @​utc_segs[$t]<end_tai> {
  $r = $t;
  } else {
  $l = $t + 1;
  }
  }
  my $dtai = @​utc_segs[$l]<dtai>;
  my $utc_mjd = $tai.mjd - FatRat.new($dtai, 86400);
  my $utc_mjdnf = ($utc_mjd.truncate, $utc_mjd % 1);
  if $utc_mjdnf[0] == @​utc_segs[$l]<end_utc_mjdn> {
  $utc_mjdnf = ($utc_mjdnf[0] - 1, $utc_mjdnf[1] + 1);
  }
  return UtcTime.from_mjdnf($utc_mjdnf);
  } else {
  my $tai-ut1 = tai-ut1_for_tai($lin);
  my $rough_mjd = $tai.mjd -
  FatRat.new(($tai-ut1 + 1).truncate, 86400);
  my $rough_mjdn = $rough_mjd.truncate;
  my $m0_mjdn = mjdn_start_month($rough_mjdn);
  my $m0_dtai = dtai_for_month($m0_mjdn);
  my $m0_start = TaiTime.from_mjd($m0_mjdn +
  FatRat.new($m0_dtai, 86400)).linear1958;
  loop {
  my $m1_mjdn = mjdn_next_month($m0_mjdn);
  my $m1_dtai = dtai_for_month($m1_mjdn);
  my $m1_start = TaiTime.from_mjd($m1_mjdn +
  FatRat.new($m1_dtai, 86400)).linear1958;
  if $lin < $m1_start {
  my $utc_mjd = $tai.mjd -
  FatRat.new($m0_dtai, 86400);
  my $utc_mjdnf =
  ($utc_mjd.truncate, $utc_mjd % 1);
  while $utc_mjdnf[0] >= $m1_mjdn {
  $utc_mjdnf = ($utc_mjdnf[0] - 1,
  $utc_mjdnf[1] + 1);
  }
  return UtcTime.from_mjdnf($utc_mjdnf);
  }
  $m0_mjdn = $m1_mjdn;
  $m0_dtai = $m1_dtai;
  $m0_start = $m1_start;
  }
  }
}

sub utc_to_tai_best_guess(UtcTime​:D $utc) {
  my $mjdnf = $utc.mjdnf;
  if $mjdnf[0] < @​utc_segs[0]<start_utc_mjdn> {
  die "can't handle time prior to leap-seconds UTC";
  } elsif $mjdnf[0] < @​utc_segs[*-1]<end_utc_mjdn> {
  my $l = 0;
  my $r = @​utc_segs.elems - 1;
  until $r == $l {
  my $t = ($l + $r) +> 1;
  if $mjdnf[0] < @​utc_segs[$t]<end_utc_mjdn> {
  $r = $t;
  } else {
  $l = $t + 1;
  }
  }
  my $dtai = @​utc_segs[$l]<dtai>;
  if $mjdnf[1] >= 1 &&
  $mjdnf[0] != @​utc_segs[$l]<end_utc_mjdn> - 1 {
  die "given UTC time does not exist";
  }
  my $tai = TaiTime.from_mjd($mjdnf[0] +
  FatRat.new($dtai, 86400) + $mjdnf[1]);
  $tai.linear1958 < @​utc_segs[$l]<end_tai>
  or die "given UTC time does not exist";
  return $tai;
  } else {
  my $m0_mjdn = mjdn_start_month($mjdnf[0]);
  my $m0_dtai = dtai_for_month($m0_mjdn);
  my $m1_mjdn = mjdn_next_month($m0_mjdn);
  if $mjdnf[1] >= 1 && $mjdnf[0] != $m1_mjdn - 1 {
  die "given UTC time does not exist";
  }
  my $m1_dtai = dtai_for_month($m1_mjdn);
  my $m1_start = TaiTime.from_mjd($m1_mjdn +
  FatRat.new($m1_dtai, 86400)).linear1958;
  my $tai = TaiTime.from_mjd($mjdnf[0] +
  FatRat.new($m0_dtai, 86400) + $mjdnf[1]);
  $tai.linear1958 < $m1_start
  or die "given UTC time does not exist";
  return $tai;
  }
}

# ISO 8601 string format

my regex decdig { <[0123456789]> }

sub fatrat_from_str(Str​:D $str) {
  /^(<decdig>+)(\.(<decdig>+))?$/.ACCEPTS($str)
  or die "malformed numeric string";
  return FatRat.new(($0.Str ~ ($1 =​:= Nil ?? "" !! $1[0].Str)).Int,
  10 ** ($1 =​:= Nil ?? 0 !! $1[0].chars));
}

sub iso8601_from_y(Int​:D $y) {
  sprintf($y < 0 || $y > 9999 ?? "%+05d" !! "%04d", $y)
}

sub iso8601_to_y(Str​:D $str) {
  /^ ( <[-+]> <decdig>**4..* || <decdig>**4 ) $/.ACCEPTS($str) &&
  !/^\-0+$/.ACCEPTS($str)
  or die "malformed year string";
  return $str.Int;
}

sub iso8601_from_ymd((Int​:D $y, Int​:D $m, Int​:D $d)) {
  iso8601_from_y($y) ~ sprintf("-%02d-%02d", $m, $d)
}

sub iso8601_to_ymd(Str​:D $str) {
  /^(<[-+]>?<decdig>+)\-(<decdig>**2)\-(<decdig>**2)$/.ACCEPTS($str)
  or die "malformed date string";
  return (iso8601_to_y($0.Str), $1.Int, $2.Int);
}

sub iso8601_from_hms((Int​:D $h, Int​:D $m, FatRat​:D $s)) {
  my $res = sprintf("%02d​:%02d​:%02d", $h, $m, $s.truncate);
  my $f = $s % 1;
  if $f != 0 {
  my $us = $f * 1000000;
  $res ~= sprintf(".%06d", $us.truncate);
  $us %% 1 or $res = "";
  }
  return $res;
}

sub iso8601_to_hms(Str​:D $str) {
  /^(<decdig>**2)\​:(<decdig>**2)\​:(<decdig>**2 (\.<decdig>+)?)$/\
  .ACCEPTS($str)
  or die "malformed time string";
  return ($0.Int, $1.Int, fatrat_from_str($2.Str));
}

sub iso8601_from_mjdn(Int​:D $mjdn) { iso8601_from_ymd(ymd_from_mjdn($mjdn)) }

sub iso8601_to_mjdn(Str​:D $str) { ymd_to_mjdn(iso8601_to_ymd($str)) }

sub iso8601_from_mjdf(FatRat​:D $mjdf where $mjdf >= 0) {
  iso8601_from_hms(hms_from_mjdf($mjdf))
}

sub iso8601_to_mjdf(Str​:D $str) { hms_to_mjdf(iso8601_to_hms($str)) }

sub iso8601_from_mjdnf((Int​:D $mjdn, FatRat​:D $mjdf where $mjdf >= 0)) {
  iso8601_from_mjdn($mjdn) ~ "T" ~ iso8601_from_mjdf($mjdf)
}

sub iso8601_to_mjdnf(Str​:D $str) {
  /^ ((<decdig>||<[-+]>)+) <[tT]> ((<decdig>||<[​:.]>)+) $/.ACCEPTS($str)
  or die "malformed date/time string";
  return (iso8601_to_mjdn($0.Str), iso8601_to_mjdf($1.Str));
}

# main program for test

sub mjdnf_from_str(Str​:D $str) {
  if /^<decdig>+(\.<decdig>+)?$/.ACCEPTS($str) {
  my $mjd = fatrat_from_str($str);
  return ($mjd.truncate, $mjd % 1);
  } else {
  return iso8601_to_mjdnf($str);
  }
}

for @​*ARGS {
  /^(<-[\x20]>+) ' ' (​:i (tai|utc))$/.ACCEPTS($_)
  or die "bad argument";
  my $mjdnf = mjdnf_from_str($0.Str);
  my $tai;
  my $utc;
  if /^(​:i tai)$/.ACCEPTS($1.Str) {
  $tai = TaiTime.from_mjdnf($mjdnf);
  $utc = utc_from_tai_best_guess($tai);
  } else {
  $utc = UtcTime.from_mjdnf($mjdnf);
  $tai = utc_to_tai_best_guess($utc);
  }
  say "{iso8601_from_mjdnf($tai.mjdnf)} TAI = " ~
  "{iso8601_from_mjdnf($utc.mjdnf)} UTC";
}

@p6rt
Copy link
Author

p6rt commented Jul 30, 2016

From zefram@fysh.org

I was expecting this ticket to yield some statement about the design
objectives of Instant.from-posix() and the related leap second code,
but that hasn't happened yet, and it's beginning to look as though there
isn't any firm objective. So I think it might be helpful to lay out
the problem space.

The underlying question to ask when designing this kind of API is what
class of use case it's trying to satisfy. There may be multiple use
cases of interest -- there certainly will be over the module ecosystem
as a whole -- and there may be a need for multiple versions of TAI<->UTC
conversion to satisfy all the ones we're concerned with. For each use
case we can look at what kind of requirements it places on the conversion
functions, and for each possible conversion function we can look at what
kind of requirements it can satisfy.

Given the unavoidable split in TAI<->UTC conversion, between the known and
unknown regions of the leap schedule, each possible conversion function
is going to have two distinct behaviours. A caller whose interests
span both regions will get either behaviour, generally not knowing which
it will get. For the caller to be satisfied by a conversion function,
therefore, its needs must be satisfied by each behaviour individually.
Conversely, a conversion function only really provides those guarantees
that are common to both of its behaviours. It's a weakest-link deal.

Let's look at what the various discussed conversion semantics actually
provide in this respect, to a caller spanning both regions​:

A​: correct answers for the known region, error for the unknown region.
  Doesn't guarantee to produce an answer, but does guarantee that any
  answer produced will be correct.

B​: correct answers for the known region, estimate for the unknown region.
  Guarantees to produce a plausible estimate, not necessarily correct.

C​: correct answers for the known region, presumption of no leap seconds
  in the unknown region (current behaviour). Guarantees to produce some
  answer, but with no guarantee of quality, the answer may be garbage.

You can see why I say that the current behaviour (C) sucks. It doesn't
seem at all useful for any caller that might run into the unknown region.

But let's more rigorously look at this from the point of view of caller
use cases. I see these possibilities for callers' requirements​:

0. require correct answers, only operating on times up to 2015.
  This can be satisfied by a historical leap schedule baked into
  the implementation, and so is satisfied by any version that we've
  discussed, if the input is definitely so limited. To avoid accidents,
  however, the caller would probably like some checking that a correct
  answer can actually be produced, with error signalling where it can't
  (behaviour A).

1. require correct answers, only operating on times for which the leap
  schedule has been determined by the time the call is made (so only
  operating up to a few weeks into the future). This is not satisfied
  by a schedule baked into the implementation, but can in principle
  be satisfied by downloading more schedule at runtime. We haven't
  discussed this here, but [perl #​128752] has touched on it. As with
  case 0, error signalling where the input exceeds the intended limits
  is desirable.

2. require correct answers, including well into the future. This cannot
  be satisfied by any means short of waiting until the times of interest
  are no longer far in the future. Any application with this requirement
  has a serious design problem, which cannot be solved by any kind of
  cleverness in its libraries. It has to be addressed by redesigning
  the application.

3. require answers to be correct, but don't need to always get an
  answer. This requires basically conversion behaviour A, erroring
  on the unknown region. Optionally there could be some downloading
  of new leap schedule beyond what's baked into the implementation,
  but erroring is definitely required when that is exceeded.

4. require answers to be correct where the implementation can easily
  do that, and otherwise require answers consistent with what other
  versions of the code produce. Since some future version of the
  implementation will know the real leap schedule for whatever time is
  being asked about, being consistent with that requires producing the
  correct answer for all times, including those years in the future.
  This therefore reduces to the impossible case 2.

5. require plausible estimates. This requires basically conversion
  behaviour B. As with case 3, there could optionally be some
  downloading of new leap schedule, but that can be exceeded and so
  some estimation behaviour is necessary. Strictly speaking it's not
  necessarily required to use the actual historical leap schedule at all,
  but the plausibility of answers that contradict the history is low.

6. need to get answers, but have no quality requirement on what the
  answers are. This doesn't require the use of any estimation for the
  unknown region, and equally doesn't require the use of any historical
  leap schedule for the known region. This case can be satisfied by
  much simpler code that doesn't know anything about leap seconds.
  An application declaring this requirement isn't really requiring
  any form of TAI<->UTC conversion, and would be better off using its
  TAI or UTC times unconverted, rather than pretending that it's doing
  a conversion.

So I see needs for conversion behaviours A and B, but behaviour C is
overcomplicated for the only use case that it really satisfies (6).

Over to you​: what use cases are Instant.from-posix() and friends intended
to satisfy?

-zefram

@p6rt p6rt added the RFC Request For Comments label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

1 participant