Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range.WHICH fails on many kinds of endpoints #5606

Open
p6rt opened this issue Aug 20, 2016 · 7 comments
Open

Range.WHICH fails on many kinds of endpoints #5606

p6rt opened this issue Aug 20, 2016 · 7 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Aug 20, 2016

Migrated from rt.perl.org#129019 (status was 'open')

Searchable as RT129019$

@p6rt
Copy link
Author

p6rt commented Aug 20, 2016

From zefram@fysh.org

(​:a..​:b).WHICH
Range|a True..b True
(List..Pair).WHICH
Use of uninitialized value $!min of type List in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful. in block <unit> at <unknown file> line 1
Use of uninitialized value $!max of type Pair in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful. in block <unit> at <unknown file> line 1
Range|..

Even where it doesn't produce these warnings and an empty representation
of an endpoint, this style of .WHICH is very poor and clash-prone.
Range.WHICH should apply .WHICH to the endpoint objects.

-zefram

@p6rt
Copy link
Author

p6rt commented Aug 30, 2017

From @skids

On Sat, 20 Aug 2016 10​:24​:51 -0700, zefram@​fysh.org wrote​:

(​:a..​:b).WHICH
Range|a True..b True
(List..Pair).WHICH
Use of uninitialized value $!min of type List in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to
something meaningful. in block <unit> at <unknown file> line 1
Use of uninitialized value $!max of type Pair in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to
something meaningful. in block <unit> at <unknown file> line 1
Range|..

Even where it doesn't produce these warnings and an empty
representation
of an endpoint, this style of .WHICH is very poor and clash-prone.
Range.WHICH should apply .WHICH to the endpoint objects.

-zefram

Unfortunately using the .WHICHs of the endpoints will not be sufficient.

For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH.
Since any Str can foil any attempt we make to simply bracket the result,
we have no choice than to either use an escape character quoteish
construct, or prepend a length​:

Range|(8)|Str|Foo^..Str|Bar

...we could, however, decide whether or not to do so
by detecting anything in $!min which may jump the rails.

Same problem with Pair, BTW, which already .WHICHs its members​:

$ perl6 -e '("Foo|Str|2" => "1").WHICH.say'
Pair|Str|Foo|Str|2|Str|1
$ perl6 -e '("Foo" => "2|Str|1").WHICH.say'
Pair|Str|Foo|Str|2|Str|1

...and given the use of WHICH in hashing, this is more consequential
than it may seem at first glance.

@p6rt
Copy link
Author

p6rt commented Aug 30, 2017

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Aug 30, 2017

From zefram@fysh.org

Brian S. Julin via RT wrote​:

For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH.

Yes, and that's a bigger problem. In general Rakudo's .WHICH methods
suffer this sort of problem when incorporating the .WHICH values of
subobjects. See [perl #​128943] (Set, and in which I sketched out how
to fix it) and [perl #​128947] (Pair, just like your Pair example).

we have no choice than to either use an escape character quoteish
construct, or prepend a length​:

The former would be much nicer to work with.

-zefram

@p6rt
Copy link
Author

p6rt commented Sep 13, 2017

From @skids

On Wed, 30 Aug 2017 07​:03​:10 -0700, zefram@​fysh.org wrote​:

Brian S. Julin via RT wrote​:

For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH.

Yes, and that's a bigger problem. In general Rakudo's .WHICH methods
suffer this sort of problem when incorporating the .WHICH values of
subobjects. See [perl #​128943] (Set, and in which I sketched out how
to fix it) and [perl #​128947] (Pair, just like your Pair example).

we have no choice than to either use an escape character quoteish
construct, or prepend a length​:

The former would be much nicer to work with.

-zefram

OK, a few things to note going forward​: 1) the .Str of a WHICH is not
necessarily the value of the WHICH. I think in the case where we have
really large objects it would be OK for there to be some tiny chance
of a collision between two WHICH.Str's as long as the actual WHICHs
do not collide. So using a hash in the visual presentation may be
OK, but using that hash for actually implementing === or eqv would
be wrong (as is done now in some cases)

Note Range does _NOT_ do so for eqv but seems to punt to comparing WHICH
for ===, and Range's WHICH is itself a Str, creating "accidental collisions"
which the spec advises against.

$ perl6 -e 'say ("200^".."foo") eqv ("200"^.."foo")'
False
$ perl6 -e 'say ("200^".."foo") === ("200"^.."foo")'
True

Second, the spec only says that mutables must produce an ObjAt (and in fact
there's room left open for not requiring WHICH implementation at all on value types).
For value types, .WHICH could very well be just identity... or a mixin on the value
that stringifies differently if we so desire or which can be used to associate a cached
digest. It would seem to me that implementing eqv and === candidates to work
directly on values would be faster than making up some weird WHICH value and then
comparing those values, unless we have a cache of WHICHs which include a hash which
can be used to shortcut the test and only do the longhand comparison when the hashes match...
preferably the same hash that needs to be done when using them as hash bucket keys anyway.

Third, there is a test or two in the current test suite that test for current
stringlike behavior which will probably have to be fixed in errata whatever be
done to fix the current situation.

Fifth, as an aside for non-value-types, even ignoring NUMA, it is not enough to
munge a pointer address... the WHICH cannot be a deterministic derivative of
the object storage address... which the GC might change and of course might be reused.
I can't remember what scheme we are doing here but ISTR it was sane, if not the
most efficient possible.

Finally, the cases where we might not present a WHICH.Str which is guaranteed
to be unique, a clue to the user that "hey this is a value type so just look
at the .perl" might help.

So, I'd suggest the stringification of a .WHICH be of limited length, give a few
clues about the type/value and what is in it, and above some threshold or when
anything messy is detected, elipses and a hash, and we just tell the users we are doing
that and not to rely on that stringification for end-of-the-world scenario things.

Meanwhile, implement value-type .WHICHs as either identity or a mixin, and try like
heck to code around actually using the latter for anything that does not involve
implementing speedups, since they tie up resources.

Phew. Maybe this was one of those brain pretzels S)2 warned us about.

@p6rt
Copy link
Author

p6rt commented Sep 13, 2017

From zefram@fysh.org

Brian S. Julin via RT wrote​:

                it would be OK for there to be some tiny chance

of a collision between two WHICH.Str's as long as the actual WHICHs
do not collide.

One could make that distinction, but then the .Str of the .WHICH would
not fulfill the purposes for which .WHICH is used, and would seem pretty
pointless.

Consider the Set class, as an example user of .WHICH. It needs something
that it can easily hash and compare, and whose equality corresponds
precisely to object identity. That doesn't have to be a string, but a
string is a convenient format for that. So OK, if .WHICH.Str doesn't do
the job then Set can go to extra effort to use the `real' non-colliding
.WHICH value. In this context, a colliding .WHICH.Str is worse than
useless​: it's an attractive nuisance, because it'll function well enough
as a .WHICH substitute to pass most test suites but then fail in real use.

So, I'd suggest the stringification of a .WHICH be of limited length, give a few
clues about the type/value and what is in it, and above some threshold or when
anything messy is detected, elipses and a hash, and we just tell the users we are doing
that and not to rely on that stringification for end-of-the-world scenario things.

Key question​: what value does a colliding .WHICH.Str add? You seem to
envision it as a human-oriented inspection mechanism. But we've aleady
got more than one of those. .perl, .gist, and to a lesser extent .Str all
provide some kind of representation of an object's content as a string.
There is conceptual room for more than one of these, since they can take
different views of what aspects of an object are important and what kind
of ambiguities in the output are acceptable. But in practice there isn't
enough attention paid to these to justify even the range of methods that
already exist. In many respects .perl and .gist are near clones, because
.gist has neither a clearly defined output format nor clearly different
rules about what to represent. It would make a fair bit of sense at this
point to delete the less-useful .gist. It would be crazy to go the other
way, adding yet another stringification mechanism with no hard identity
requirements, no defined format, and no rules about how much to represent.

Even if there were some demand for yet another stringification method,
.WHICH.Str would be the wrong place to put it. Half-hearted inspection
is entirely contrary to the basic concept of .WHICH. If such a method
is to be added, it should be a method directly on the principal object.
.WHICH doesn't have to supply a string directly, or even just a string
wrapped in a funny class, but the object it supplies should be concerned
entirely with the precise identity of the principal object. For the
.WHICH value to stringify to anything that doesn't have the same identity
properties would be misleading.

If you're interested in a human inspecting the .WHICH value itself,
rather than inspecting the principal object, then the most important
method to consider is .WHICH.perl. By the intent of .perl, this ought
to produce a string that fully represents the actual .WHICH value.
Ellipsis is not useful here. It would be acceptable for .WHICH.gist
to provide a lossy representation, but .gist is so loosely defined that
almost anything is acceptable.

there's room left open for not requiring WHICH implementation at all on value types).
For value types, .WHICH could very well be just identity

That too would break Set and anything else that needs a way to hash
object identity. A .WHICH method producing a consistent type of output
is useful on *all* types.

    It would seem to me that implementing eqv and === candidates to work

directly on values would be faster than making up some weird WHICH value and then
comparing those values,

Sure. eqv/=== should absolutely be implemented in type-specific ways
where that can be done faster than .WHICH comparison. And code comparing
for object identity should use the likely-faster === in preference to
comparing .WHICH values. But .WHICH still needs to be there, for users
like Set that need to do more than identity comparison, and equality
comparison of .WHICH values must always behave the same as ===.

-zefram

@p6rt
Copy link
Author

p6rt commented Sep 13, 2017

From @skids

On Wed, 13 Sep 2017 10​:29​:05 -0700, zefram@​fysh.org wrote​:

Brian S. Julin via RT wrote​:

it would be OK for there to be some tiny chance
of a collision between two WHICH.Str's as long as the actual WHICHs
do not collide.

One could make that distinction, but then the .Str of the .WHICH would
not fulfill the purposes for which .WHICH is used, and would seem
pretty
pointless.

I don't think so... and also in answer to​:

Key question​: what value does a colliding .WHICH.Str add?

For example it provides a way to tell with very high probability
whether the two 500,000 element sets which only differ in one element,
or in sort order are the same set, if you are not in a position to just
=== them, by comparing only two lines of output... and such comparisons
during debug are pretty much the most common way it is used, though
usually with less problematic non-value types.

It can also, if well constructed, tell you whether an unseeded hash
would put the two in the same bucket, though that is less useful.

Consider the Set class, as an example user of .WHICH. It needs something
that it can easily hash and compare, and whose equality corresponds
precisely to object identity.

I think for initial hash generation that pretty much boils down (recursively)
to its serialization/.freeze format for value types, and for subsequent hash
caching, if implemented, there is a case to be made that that cache should
be stored in the .WHICH but not considered a hashable part of the WHICH,
so you can just take it and use it if it has already been calculated.
Or at least you should be able to look up hashes in a table of WHICHs for
which caching has been performed.

That doesn't have to be a string, but a
string is a convenient format for that. So OK, if .WHICH.Str doesn't
do
the job then Set can go to extra effort to use the `real' non-
colliding
.WHICH value. In this context, a colliding .WHICH.Str is worse than
useless​: it's an attractive nuisance, because it'll function well
enough
as a .WHICH substitute to pass most test suites but then fail in real
use.

I'd hardly rate "being wrong once in a lifetime" at a "nuisance" level
of utility.

Given that WHICH's implementation is purposefully unspecified so it can be
implementation-specific, any tests in roast looking at WHICH.Str or dissecting .WHICH
values should probably be suspect​: you compare .WHICH values of two things
that should or should not be different, and that's about it. Now, if rakudo
wants to test the contents of .WHICH or .WHICH.Str in its test suite, that's
up to rakudo.

Even if there were some demand for yet another stringification method,
.WHICH.Str would be the wrong place to put it. Half-hearted
inspection
is entirely contrary to the basic concept of .WHICH. If such a method
is to be added, it should be a method directly on the principal
object.
.WHICH doesn't have to supply a string directly, or even just a string
wrapped in a funny class, but the object it supplies should be
concerned
entirely with the precise identity of the principal object. For the
.WHICH value to stringify to anything that doesn't have the same
identity
properties would be misleading.

If you're interested in a human inspecting the .WHICH value itself,
rather than inspecting the principal object, then the most important
method to consider is .WHICH.perl. By the intent of .perl, this ought
to produce a string that fully represents the actual .WHICH value.
Ellipsis is not useful here. It would be acceptable for .WHICH.gist
to provide a lossy representation, but .gist is so loosely defined
that
almost anything is acceptable.

We agree on WHICH.perl and WHICH.gist, and I'm happy to let others
argue over which of those WHICH.Str would spit out.

there's room left open for not requiring WHICH implementation at all
on value types).
For value types, .WHICH could very well be just identity

That too would break Set and anything else that needs a way to hash
object identity.

No not really. It's just that the spec specifies that .WHICH is to
be used for hashing, and does not specify how that hashing is to
occur. There's no specified API presented uniformly by everything
that comes out of a WHICH, and this seems to be intentional.

A .WHICH method producing a consistent type of output is useful on *all* types.

I think it is a double-edged sword; It can be useful but has its perils.
It certainly *can* be done this way...

...or your implementation could demand all .WHICH produce an object with a
.HASH-ME-BABYCAKES method. Or just demand all value types and ObjAt have such
a method and have WHICH be a no-op on value types. Or add a dash of canonicalism
to .freeze, which you need to implement anyway, and an adverb to gen a hash.
Or it could simply demand a type-specific === multi candidate is present in-scope
and that object-keyed hashes are a transparent layered construct mapping types
to subhashes keyed by that specific type, and let that type define how those
subhashes work.

There are various ways to skin that cat and unless someone comes up with a
compelling reason to do it one particular way, I think this has been left
open for "laboratory of democracy" purposes. I'm not finding the case for
Str being a common go-to format very compelling so far, personally.

Anyway, I don't want to derail this or your other tickets since the .WHICH
values themselves are definitely dysfunctional no matter how it they stringify.
I'd say that's more important to fix.

@p6rt p6rt added the Bug label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant