Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTime.new accepts bogus string input #4907

Closed
p6rt opened this issue Dec 23, 2015 · 10 comments
Closed

DateTime.new accepts bogus string input #4907

p6rt opened this issue Dec 23, 2015 · 10 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Dec 23, 2015

Migrated from rt.perl.org#127002 (status was 'rejected')

Searchable as RT127002$

@p6rt
Copy link
Author

p6rt commented Dec 23, 2015

From zefram@fysh.org

The documentation for DateTime.new says that when given a string input
the string is to be in ISO 8601 format, and that an exception will be
thrown when it is given an invalid string input. In fact it accepts
some non ISO 8601 strings​:

  "2000-01-01T00​:00​:00+0000" (mixed basic and extended representations)
  "2000-01-01T00​:00​:00-00" (wrong sign for zero timezone)
  "2000-01-0\x[666]T00​:00​:00" (non-ASCII digit)

The first two failures are specific to the timezone portion of the string.
Non-ASCII digits are accepted in all digit positions.

-zefram

@p6rt
Copy link
Author

p6rt commented Dec 23, 2015

From @lizmat

On 23 Dec 2015, at 23​:07, Zefram (via RT) <perl6-bugs-followup@​perl.org> wrote​:

# New Ticket Created by Zefram
# Please include the string​: [perl #​127002]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127002 >

The documentation for DateTime.new says that when given a string input
the string is to be in ISO 8601 format, and that an exception will be
thrown when it is given an invalid string input. In fact it accepts
some non ISO 8601 strings​:

"2000-01-01T00​:00​:00+0000" (mixed basic and extended representations)
"2000-01-01T00​:00​:00-00" (wrong sign for zero timezone)
"2000-01-0\x[666]T00​:00​:00" (non-ASCII digit)

The first two failures are specific to the timezone portion of the string.
Non-ASCII digits are accepted in all digit positions.

I tend to say ENOTABUG, as they are cases of “be liberal in what you accept” and they match the regular expression. If they wouldn’t match that, *then* an exception would be thrown.

$ 6 'DateTime.new("2000-01-01T00​:00​:00+0000").say'
2000-01-01T00​:00​:00Z
$ 6 'DateTime.new(“2000-01-01T00​:00​:00-00").say'
2000-01-01T00​:00​:00Z
$ 6 'DateTime.new("2000-01-0\x[666]T00​:00​:00").say'
2000-01-06T00​:00​:00Z

They all seem to generate the correct DateTime object. Note that if there is an error​:

$ 6 'DateTime.new("2000-01-0\x[666]T00​:00​:90").say'
Second out of range. Is​: 90, should be in ^61
$ 6 'DateTime.new("2000-01-0\x[666]TOOMUCH").say'
Invalid DateTime string '2000-01-0٦TOOMUCH'; use an ISO 8601 timestamp (yyyy-mm-ddThh​:mm​:ssZ or yyyy-mm-ddThh​:mm​:ss+01​:00) instead

Liz

@p6rt
Copy link
Author

p6rt commented Dec 23, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Dec 23, 2015

From zefram@fysh.org

Elizabeth Mattijsen via RT wrote​:

I tend to say ENOTABUG, as they are cases of "be liberal in what you
accept" and they match the regular expression.

If the code is not a bug, then the documentation is, because it explicitly
says that invalid input will result in an exception. You're welcome to
change it to accurately describe the code. Something like

  Accepts a C<Str|/type/Str> formatted as C<yyyy-mm-ddThh​:mm​:ssZ>
  or C<yyyy-mm-ddThh​:mm​:ss+01​:00> (ISO 8601 timestamp notation).
  Also accepts some other strings that are not in ISO 8601 notation.
  When given an input string that is not in ISO 8601 notation, it
  either throws an exception of type L<X​::Temporal​::InvalidFormat>
  or tries to make sense of the string and returns some resulting
  C<DateTime> object.

They all seem to generate the correct DateTime object.

I would not say that there is any correct DateTime object to generate from
those strings. They are not in ISO 8601 format, and the documentation
doesn't say what other string formats it accepts.

-zefram

@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

From zefram@fysh.org

Commit 895546990f6001a5999ef, attempting to fix [perl #​127004], has
added some more kinds of non ISO 8601 string that are accepted as input.

-zefram

@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

From @labster

On Wed Dec 23 16​:05​:59 2015, zefram@​fysh.org wrote​:

Commit 895546990f6001a5999ef, attempting to fix [perl #​127004], has
added some more kinds of non ISO 8601 string that are accepted as input.

-zefram

http://design.perl6.org/S32/Temporal.html#line_97 says that DateTime.new accepts RFC 3339 strings, not ISO 8601. It should probably accept both. One of the "wrong" formats in particular, the timezone -0000, is required by RFC 3339.

I'm pretty sure that this is a docs bug, not a rakudobug. The purpose of DateTime.new is not to validate your datetime format, it's to produce an internal representation of a date string. DateTime.new($foo).Str does produce valid and correct ISO 8601 strings, and that's good enough for me. And I imagine that most of the users will agree that we should apply Postel's law here -- so long as we document it. Any solution beyond that should go into module space.

Rejecting ticket.

@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

@labster - Status changed from 'open' to 'rejected'

@p6rt p6rt closed this as completed Dec 24, 2015
@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

From zefram@fysh.org

Brent Laabs via RT wrote​:

http://design.perl6.org/S32/Temporal.html#line_97 says that DateTime.new
accepts RFC 3339 strings, not ISO 8601. It should probably accept
both. One of the "wrong" formats in particular, the timezone -0000,
is required by RFC 3339.

If you specifically intend to accept RFC 3339 format then OK, that's
grounds to accept "-00​:00". But the documentation for the constructor
should then say that it's accepting RFC 3339 format as well. In any case,
RFC 3339 doesn't account for the other two faults.

The purpose of DateTime.new is not to validate your datetime
format,

Input validation is a useful feature, and is actually being advertised by
the documentation. If you don't intend to provide a validation service
then the documentation mustn't advertise it. But that would be a pity;
you're *nearly* performing the useful validation already.

And I imagine that most of the users will agree that we should apply
Postel's law here

The Postel principle doesn't properly apply here. It's a strategy
to deal with ambiguous specs, as many of the early IETF RFCs are.
The liberal/conservative split covers the area of ambiguity, thus covering
up the spec author's mistakes. Where the spec is unambiguous, however,
there is no basis for this kind of liberality. Accepting things that
are unambiguously prohibited by the spec isn't a pragmatic approach
to problematic circumstances; it's just gratuitous deviation from
proper behaviour. ISO 8601, for all its readability issues, is quite
clear about what is permitted, and the things I have called out are
unambiguously prohibited.

In any case, this code doesn't look like intentional liberality​: it's not
even a misguided attempt to apply the Postel principle. It looks like
it's just sloppy verification. Postel has only been invoked post-hoc
to excuse the sloppiness.

Rejecting ticket.

As I said in an earlier message, if the code is not a bug then the
documentation is. In this ticket I'm reporting the mismatch between
code and documentation. Rejecting the ticket doesn't make sense, unless
you're rejecting my contention that there is such a mismatch or that
the mismatch is a problem needing rectification. Are you?

-zefram

@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

From @labster

On Thu Dec 24 03​:00​:45 2015, zefram@​fysh.org wrote​:

Brent Laabs via RT wrote​:

http://design.perl6.org/S32/Temporal.html#line_97 says that DateTime.new
accepts RFC 3339 strings, not ISO 8601. It should probably accept
both. One of the "wrong" formats in particular, the timezone -0000,
is required by RFC 3339.

If you specifically intend to accept RFC 3339 format then OK, that's
grounds to accept "-00​:00". But the documentation for the constructor
should then say that it's accepting RFC 3339 format as well. In any case,
RFC 3339 doesn't account for the other two faults.

The purpose of DateTime.new is not to validate your datetime
format,

Input validation is a useful feature, and is actually being advertised by
the documentation. If you don't intend to provide a validation service
then the documentation mustn't advertise it. But that would be a pity;
you're *nearly* performing the useful validation already.

And I imagine that most of the users will agree that we should apply
Postel's law here

The Postel principle doesn't properly apply here. It's a strategy
to deal with ambiguous specs, as many of the early IETF RFCs are.
The liberal/conservative split covers the area of ambiguity, thus covering
up the spec author's mistakes. Where the spec is unambiguous, however,
there is no basis for this kind of liberality. Accepting things that
are unambiguously prohibited by the spec isn't a pragmatic approach
to problematic circumstances; it's just gratuitous deviation from
proper behaviour. ISO 8601, for all its readability issues, is quite
clear about what is permitted, and the things I have called out are
unambiguously prohibited.

In any case, this code doesn't look like intentional liberality​: it's not
even a misguided attempt to apply the Postel principle. It looks like
it's just sloppy verification. Postel has only been invoked post-hoc
to excuse the sloppiness.

Rejecting ticket.

As I said in an earlier message, if the code is not a bug then the
documentation is. In this ticket I'm reporting the mismatch between
code and documentation. Rejecting the ticket doesn't make sense, unless
you're rejecting my contention that there is such a mismatch or that
the mismatch is a problem needing rectification. Are you?

-zefram

I'm rejecting essentially every assertion you made in the last reply.

There is no mismatch between docs and implementation (see Raku/doc@4ee960f ). There is no desire to see verification in this code, because unambiguous data should work without having to massage it into the most correct format -- the precise definition of which isn't freely available on the web. Most people want the DateTime library to be fast, not to involve extra verification steps. That "most people" doesn't mean "me" -- it's the opinion of many people who have worked on Perl 5 DateTime (c.f. DateTime​::LazyInit).

Your interpretation of Postel's law is incorrect, too. While the spec is clear, other implementations of the spec may have minor mistakes. Like a sign or mixing extended and condensed formats. We shouldn't feel bad about accepting those formats, no matter what you think. So long as we produce correct output (see your other tickets), we're following Postel's law.

Input validation is a useful feature, but you can hardly call a parenthetical an advertisement of that feature (as it was in the older version of the docs). So I guess I disagree with you there as well.

@p6rt
Copy link
Author

p6rt commented Dec 24, 2015

From zefram@fysh.org

Brent Laabs via RT wrote​:

There is no mismatch between docs and implementation (see
Raku/doc@4ee960f
).

Ah, the mismatch has been resolved by that change, in response to
this ticket. That would make this ticket resolved, not rejected.

Your interpretation of Postel's law is incorrect, too. While the spec
is clear, other implementations of the spec may have minor mistakes.

This argument might carry some weight if the code were actually written to
accept a well-defined set of non ISO 8601 inputs that had been identified
as likely mistaken outputs from other code. But it's not​: the range
of out-of-spec inputs that is accepted arises merely from the bugs that
happened to creep into the attempt at validation. The use of non-ASCII
digits, in particular, is a very unlikely mistake to make in output code.
They're accepted because the \d regexp term makes it easy to mistakenly
accept them on input.

Input validation is a useful feature, but you can hardly call a
parenthetical an advertisement of that feature

"An invalid input string throws an exception" was not parenthetical,
it was (and remains) quite prominent at the start of a paragraph.
It's true that the statement of the permitted format wasn't (and isn't)
really clear, but the parenthetical reference to ISO 8601 was the most
solid part of it, and so seemed a helpful clarification. It did not come
across as obiter dicta. It would be ridiculous to deny significance
to this sort of clarifying clause merely because it is syntactically
contained in parentheses.

-zefram

@p6rt p6rt added the Bug label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant