Skip Menu |
Report information
Id: 129019
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: zefram [at] fysh.org
Cc:
AdminCc:

Severity: (no value)
Tag: Bug
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Date: Sat, 20 Aug 2016 18:24:37 +0100
To: rakudobug [...] perl.org
Subject: [BUG] Range.WHICH fails on many kinds of endpoints
From: Zefram <zefram [...] fysh.org>
Download (untitled) / with headers
text/plain 671b
Show quoted text
> (:a..:b).WHICH
Range|a True..b True Show quoted text
> (List..Pair).WHICH
Use of uninitialized value $!min of type List in string context. Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful. in block <unit> at <unknown file> line 1 Use of uninitialized value $!max of type Pair in string context. Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful. in block <unit> at <unknown file> line 1 Range|.. Even where it doesn't produce these warnings and an empty representation of an endpoint, this style of .WHICH is very poor and clash-prone. Range.WHICH should apply .WHICH to the endpoint objects. -zefram
Download (untitled) / with headers
text/plain 1.4k
On Sat, 20 Aug 2016 10:24:51 -0700, zefram@fysh.org wrote: Show quoted text
> > (:a..:b).WHICH
> Range|a True..b True
> > (List..Pair).WHICH
> Use of uninitialized value $!min of type List in string context. > Methods .^name, .perl, .gist, or .say can be used to stringify it to > something meaningful. in block <unit> at <unknown file> line 1 > Use of uninitialized value $!max of type Pair in string context. > Methods .^name, .perl, .gist, or .say can be used to stringify it to > something meaningful. in block <unit> at <unknown file> line 1 > Range|.. > > Even where it doesn't produce these warnings and an empty > representation > of an endpoint, this style of .WHICH is very poor and clash-prone. > Range.WHICH should apply .WHICH to the endpoint objects. > > -zefram
Unfortunately using the .WHICHs of the endpoints will not be sufficient. For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH. Since any Str can foil any attempt we make to simply bracket the result, we have no choice than to either use an escape character quoteish construct, or prepend a length: Range|(8)|Str|Foo^..Str|Bar ...we could, however, decide whether or not to do so by detecting anything in $!min which may jump the rails. Same problem with Pair, BTW, which already .WHICHs its members: $ perl6 -e '("Foo|Str|2" => "1").WHICH.say' Pair|Str|Foo|Str|2|Str|1 $ perl6 -e '("Foo" => "2|Str|1").WHICH.say' Pair|Str|Foo|Str|2|Str|1 ...and given the use of WHICH in hashing, this is more consequential than it may seem at first glance.
To: "Brian S. Julin via RT" <perl6-bugs-followup [...] perl.org>
Date: Wed, 30 Aug 2017 04:29:51 +0100
CC: zefram [...] fysh.org
Subject: Re: [perl #129019] [BUG] Range.WHICH fails on many kinds of endpoints
From: Zefram <zefram [...] fysh.org>
Download (untitled) / with headers
text/plain 536b
Brian S. Julin via RT wrote: Show quoted text
>For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH.
Yes, and that's a bigger problem. In general Rakudo's .WHICH methods suffer this sort of problem when incorporating the .WHICH values of subobjects. See [perl #128943] (Set, and in which I sketched out how to fix it) and [perl #128947] (Pair, just like your Pair example). Show quoted text
>we have no choice than to either use an escape character quoteish >construct, or prepend a length:
The former would be much nicer to work with. -zefram
Download (untitled) / with headers
text/plain 3.4k
On Wed, 30 Aug 2017 07:03:10 -0700, zefram@fysh.org wrote: Show quoted text
> Brian S. Julin via RT wrote:
> >For example "Foo^".."Bar" and "Foo"^.."Bar" would put out the same WHICH.
> > Yes, and that's a bigger problem. In general Rakudo's .WHICH methods > suffer this sort of problem when incorporating the .WHICH values of > subobjects. See [perl #128943] (Set, and in which I sketched out how > to fix it) and [perl #128947] (Pair, just like your Pair example). >
> >we have no choice than to either use an escape character quoteish > >construct, or prepend a length:
> > The former would be much nicer to work with. > > -zefram
OK, a few things to note going forward: 1) the .Str of a WHICH is not necessarily the value of the WHICH. I think in the case where we have really large objects it would be OK for there to be some tiny chance of a collision between two WHICH.Str's as long as the actual WHICHs do not collide. So using a hash in the visual presentation may be OK, but using that hash for actually implementing === or eqv would be wrong (as is done now in some cases) Note Range does _NOT_ do so for eqv but seems to punt to comparing WHICH for ===, and Range's WHICH is itself a Str, creating "accidental collisions" which the spec advises against. $ perl6 -e 'say ("200^".."foo") eqv ("200"^.."foo")' False $ perl6 -e 'say ("200^".."foo") === ("200"^.."foo")' True Second, the spec only says that mutables must produce an ObjAt (and in fact there's room left open for not requiring WHICH implementation at all on value types). For value types, .WHICH could very well be just identity... or a mixin on the value that stringifies differently if we so desire or which can be used to associate a cached digest. It would seem to me that implementing eqv and === candidates to work directly on values would be faster than making up some weird WHICH value and then comparing those values, unless we have a cache of WHICHs which include a hash which can be used to shortcut the test and only do the longhand comparison when the hashes match... preferably the same hash that needs to be done when using them as hash bucket keys anyway. Third, there is a test or two in the current test suite that test for current stringlike behavior which will probably have to be fixed in errata whatever be done to fix the current situation. Fifth, as an aside for non-value-types, even ignoring NUMA, it is not enough to munge a pointer address... the WHICH cannot be a deterministic derivative of the object storage address... which the GC might change and of course might be reused. I can't remember what scheme we are doing here but ISTR it was sane, if not the most efficient possible. Finally, the cases where we might not present a WHICH.Str which is guaranteed to be unique, a clue to the user that "hey this is a value type so just look at the .perl" might help. So, I'd suggest the stringification of a .WHICH be of limited length, give a few clues about the type/value and what is in it, and above some threshold or when anything messy is detected, elipses and a hash, and we just tell the users we are doing that and not to rely on that stringification for end-of-the-world scenario things. Meanwhile, implement value-type .WHICHs as either identity or a mixin, and try like heck to code around actually using the latter for anything that does not involve implementing speedups, since they tie up resources. Phew. Maybe this was one of those brain pretzels S)2 warned us about.
Date: Wed, 13 Sep 2017 18:28:47 +0100
Subject: Re: [perl #129019] [BUG] Range.WHICH fails on many kinds of endpoints
From: Zefram <zefram [...] fysh.org>
To: "Brian S. Julin via RT" <perl6-bugs-followup [...] perl.org>
Download (untitled) / with headers
text/plain 4.1k
Brian S. Julin via RT wrote: Show quoted text
> it would be OK for there to be some tiny chance >of a collision between two WHICH.Str's as long as the actual WHICHs >do not collide.
One could make that distinction, but then the .Str of the .WHICH would not fulfill the purposes for which .WHICH is used, and would seem pretty pointless. Consider the Set class, as an example user of .WHICH. It needs something that it can easily hash and compare, and whose equality corresponds precisely to object identity. That doesn't have to be a string, but a string is a convenient format for that. So OK, if .WHICH.Str doesn't do the job then Set can go to extra effort to use the `real' non-colliding .WHICH value. In this context, a colliding .WHICH.Str is worse than useless: it's an attractive nuisance, because it'll function well enough as a .WHICH substitute to pass most test suites but then fail in real use. Show quoted text
>So, I'd suggest the stringification of a .WHICH be of limited length, give a few >clues about the type/value and what is in it, and above some threshold or when >anything messy is detected, elipses and a hash, and we just tell the users we are doing >that and not to rely on that stringification for end-of-the-world scenario things.
Key question: what value does a colliding .WHICH.Str add? You seem to envision it as a human-oriented inspection mechanism. But we've aleady got more than one of those. .perl, .gist, and to a lesser extent .Str all provide some kind of representation of an object's content as a string. There is conceptual room for more than one of these, since they can take different views of what aspects of an object are important and what kind of ambiguities in the output are acceptable. But in practice there isn't enough attention paid to these to justify even the range of methods that already exist. In many respects .perl and .gist are near clones, because .gist has neither a clearly defined output format nor clearly different rules about what to represent. It would make a fair bit of sense at this point to delete the less-useful .gist. It would be crazy to go the other way, adding yet another stringification mechanism with no hard identity requirements, no defined format, and no rules about how much to represent. Even if there were some demand for yet another stringification method, .WHICH.Str would be the wrong place to put it. Half-hearted inspection is entirely contrary to the basic concept of .WHICH. If such a method is to be added, it should be a method directly on the principal object. .WHICH doesn't have to supply a string directly, or even just a string wrapped in a funny class, but the object it supplies should be concerned entirely with the precise identity of the principal object. For the .WHICH value to stringify to anything that doesn't have the same identity properties would be misleading. If you're interested in a human inspecting the .WHICH value itself, rather than inspecting the principal object, then the most important method to consider is .WHICH.perl. By the intent of .perl, this ought to produce a string that fully represents the actual .WHICH value. Ellipsis is not useful here. It would be acceptable for .WHICH.gist to provide a lossy representation, but .gist is so loosely defined that almost anything is acceptable. Show quoted text
>there's room left open for not requiring WHICH implementation at all on value types). >For value types, .WHICH could very well be just identity
That too would break Set and anything else that needs a way to hash object identity. A .WHICH method producing a consistent type of output is useful on *all* types. Show quoted text
> It would seem to me that implementing eqv and === candidates to work >directly on values would be faster than making up some weird WHICH value and then >comparing those values,
Sure. eqv/=== should absolutely be implemented in type-specific ways where that can be done faster than .WHICH comparison. And code comparing for object identity should use the likely-faster === in preference to comparing .WHICH values. But .WHICH still needs to be there, for users like Set that need to do more than identity comparison, and equality comparison of .WHICH values must always behave the same as ===. -zefram
Download (untitled) / with headers
text/plain 5.4k
On Wed, 13 Sep 2017 10:29:05 -0700, zefram@fysh.org wrote: Show quoted text
> Brian S. Julin via RT wrote:
> > it would be OK for there to be some tiny chance > > of a collision between two WHICH.Str's as long as the actual WHICHs > > do not collide.
> > One could make that distinction, but then the .Str of the .WHICH would > not fulfill the purposes for which .WHICH is used, and would seem > pretty > pointless.
I don't think so... and also in answer to: Show quoted text
> Key question: what value does a colliding .WHICH.Str add?
For example it provides a way to tell with very high probability whether the two 500,000 element sets which only differ in one element, or in sort order are the same set, if you are not in a position to just === them, by comparing only two lines of output... and such comparisons during debug are pretty much the most common way it is used, though usually with less problematic non-value types. It can also, if well constructed, tell you whether an unseeded hash would put the two in the same bucket, though that is less useful. Show quoted text
> Consider the Set class, as an example user of .WHICH. It needs something > that it can easily hash and compare, and whose equality corresponds > precisely to object identity.
I think for initial hash generation that pretty much boils down (recursively) to its serialization/.freeze format for value types, and for subsequent hash caching, if implemented, there is a case to be made that that cache should be stored in the .WHICH but not considered a hashable part of the WHICH, so you can just take it and use it if it has already been calculated. Or at least you should be able to look up hashes in a table of WHICHs for which caching has been performed. Show quoted text
> That doesn't have to be a string, but a > string is a convenient format for that. So OK, if .WHICH.Str doesn't > do > the job then Set can go to extra effort to use the `real' non- > colliding > .WHICH value. In this context, a colliding .WHICH.Str is worse than > useless: it's an attractive nuisance, because it'll function well > enough > as a .WHICH substitute to pass most test suites but then fail in real > use.
I'd hardly rate "being wrong once in a lifetime" at a "nuisance" level of utility. Given that WHICH's implementation is purposefully unspecified so it can be implementation-specific, any tests in roast looking at WHICH.Str or dissecting .WHICH values should probably be suspect: you compare .WHICH values of two things that should or should not be different, and that's about it. Now, if rakudo wants to test the contents of .WHICH or .WHICH.Str in its test suite, that's up to rakudo. Show quoted text
> Even if there were some demand for yet another stringification method, > .WHICH.Str would be the wrong place to put it. Half-hearted > inspection > is entirely contrary to the basic concept of .WHICH. If such a method > is to be added, it should be a method directly on the principal > object. > .WHICH doesn't have to supply a string directly, or even just a string > wrapped in a funny class, but the object it supplies should be > concerned > entirely with the precise identity of the principal object. For the > .WHICH value to stringify to anything that doesn't have the same > identity > properties would be misleading.
Show quoted text
> If you're interested in a human inspecting the .WHICH value itself, > rather than inspecting the principal object, then the most important > method to consider is .WHICH.perl. By the intent of .perl, this ought > to produce a string that fully represents the actual .WHICH value. > Ellipsis is not useful here. It would be acceptable for .WHICH.gist > to provide a lossy representation, but .gist is so loosely defined > that > almost anything is acceptable.
We agree on WHICH.perl and WHICH.gist, and I'm happy to let others argue over which of those WHICH.Str would spit out. Show quoted text
> > there's room left open for not requiring WHICH implementation at all > > on value types). > > For value types, .WHICH could very well be just identity
> > That too would break Set and anything else that needs a way to hash > object identity.
No not really. It's just that the spec specifies that .WHICH is to be used for hashing, and does not specify how that hashing is to occur. There's no specified API presented uniformly by everything that comes out of a WHICH, and this seems to be intentional. Show quoted text
> A .WHICH method producing a consistent type of output is useful on *all* types.
I think it is a double-edged sword; It can be useful but has its perils. It certainly *can* be done this way... ...or your implementation could demand all .WHICH produce an object with a .HASH-ME-BABYCAKES method. Or just demand all value types and ObjAt have such a method and have WHICH be a no-op on value types. Or add a dash of canonicalism to .freeze, which you need to implement anyway, and an adverb to gen a hash. Or it could simply demand a type-specific === multi candidate is present in-scope and that object-keyed hashes are a transparent layered construct mapping types to subhashes keyed by that specific type, and let that type define how those subhashes work. There are various ways to skin that cat and unless someone comes up with a compelling reason to do it one particular way, I think this has been left open for "laboratory of democracy" purposes. I'm not finding the case for Str being a common go-to format very compelling so far, personally. Anyway, I don't want to derail this or your other tickets since the .WHICH values themselves are definitely dysfunctional no matter how it they stringify. I'd say that's more important to fix.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org