Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version comparison confused by digit with diacritics #5422

Open
p6rt opened this issue Jul 5, 2016 · 7 comments
Open

Version comparison confused by digit with diacritics #5422

p6rt opened this issue Jul 5, 2016 · 7 comments

Comments

@p6rt
Copy link

p6rt commented Jul 5, 2016

Migrated from rt.perl.org#128546 (status was 'open')

Searchable as RT128546$

@p6rt
Copy link
Author

p6rt commented Jul 5, 2016

From zefram@fysh.org

The Version class accepts numeric components that contain digits with
diacritics, and faithfully preserves the grapheme string just as it
preserves non-ASCII digits. But these components then behave badly
in comparisons​:

Version.new("34\x[308]5") leg Version.new("4")
Less

The digit with diacritic effectively terminates the digit sequence,
for the purpose of finding a numeric value. This is probably due to
[perl #​128542]. This implies that fixing that bug, making the coercion
reject such modified digits (as appears to be the intent), would cause
the Version comparison to signal an error. That would also be buggy
behaviour, so Version has a problem distinct from the problem with
Str.Int.

-zefram

@p6rt
Copy link
Author

p6rt commented Nov 26, 2016

From @zoffixznet

On Tue, 05 Jul 2016 09​:52​:46 -0700, zefram@​fysh.org wrote​:

The Version class accepts numeric components that contain digits with
diacritics, and faithfully preserves the grapheme string just as it
preserves non-ASCII digits. But these components then behave badly
in comparisons​:

Version.new("34\x[308]5") leg Version.new("4")
Less

The digit with diacritic effectively terminates the digit sequence,
for the purpose of finding a numeric value. This is probably due to
[perl #​128542]. This implies that fixing that bug, making the coercion
reject such modified digits (as appears to be the intent), would cause
the Version comparison to signal an error. That would also be buggy
behaviour, so Version has a problem distinct from the problem with
Str.Int.

-zefram

Thanks for the report, however, there's no bug here, as strings are valid
version parts, which is what the diaeresis causes the part to parse as
(as opposed to numbers).

The `leg` operator coerces Versions to strings, and in this case string
`"34\x[308]5"` is Less than `"4"`. The more appropriate operator to compare
versions is `cmp`.

With `cmp`, string parts are always Order​::Less than number parts, so to
see the comparison working properly, we'd need to compare versions with
both of those parts being stringy​:

<Zoffix> m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]4")
<camelia> rakudo-moar 2f72fa​: OUTPUT«More␤»
<Zoffix> m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]5")
<camelia> rakudo-moar 2f72fa​: OUTPUT«Same␤»
<Zoffix> m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]6")
<camelia> rakudo-moar 2f72fa​: OUTPUT«Less␤»

This also works with Version literals​:
<Zoffix> m​: say v34̈5 cmp v34̈6
<camelia> rakudo-moar 2f72fa​: OUTPUT«Less␤»
<Zoffix> m​: say v34̈5 cmp v34̈5
<camelia> rakudo-moar 2f72fa​: OUTPUT«Same␤»
<Zoffix> m​: say v34̈5 cmp v34̈4
<camelia> rakudo-moar 2f72fa​: OUTPUT«More␤»

Cheers,
ZZ

@p6rt
Copy link
Author

p6rt commented Nov 26, 2016

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Nov 26, 2016

@zoffixznet - Status changed from 'open' to 'rejected'

@p6rt
Copy link
Author

p6rt commented Nov 27, 2016

From @zoffixznet

A fair point. Re-opening with the intent to make synthetic digits match the same way as punctuation.

I tried a few implementations, like changing the .comb to

  .comb(/​:r ‘*’ || [(\d) <?{ nqp​::iseq_s(nqp​::chr(nqp​::ord(nqp​::substr($_, nqp​::chars($_)-1, 1))), nqp​::substr($_, nqp​::chars($_)-1, 1)) given $/.Str }> ]+ || <.alpha>+ /)

But all of them ended up being 10 to 64 times slower than just regular \d+.

By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.

@p6rt
Copy link
Author

p6rt commented Nov 27, 2016

@zoffixznet - Status changed from 'rejected' to 'open'

@p6rt
Copy link
Author

p6rt commented Dec 29, 2016

From @samcv

On Sat, 26 Nov 2016 19​:52​:48 -0800, cpan@​zoffix.com wrote​:

By defining a token that matches only non-synthetic Nd chars, the
slowdown is only 2x, so I'll see if we can make that token available
somewhere in the guts, since we needed it in the Perl6/Grammar.nqp
too.

This looks like a bug in cmp.

<samcv> j​: say Version.new("34\x[308]5") cmp Version.new("4")
<camelia> rakudo-jvm 8ca367​: OUTPUT«More␤»

<samcv> m​: say Version.new("34\x[308]5") cmp Version.new("4")
<camelia> rakudo-moar 347271​: OUTPUT«Less␤»

In MoarVM we just check if the graphemes are numerically higher while going the length of the string, we need to not do this if there are diacritics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant