Skip Menu |
Report information
Id: 128546
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: zefram [at] fysh.org
Cc:
AdminCc:

Severity: (no value)
Tag: Bug
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Date: Tue, 5 Jul 2016 17:52:34 +0100
Subject: [BUG] Version comparison confused by digit with diacritics
From: Zefram <zefram [...] fysh.org>
To: rakudobug [...] perl.org
Download (untitled) / with headers
text/plain 712b
The Version class accepts numeric components that contain digits with diacritics, and faithfully preserves the grapheme string just as it preserves non-ASCII digits. But these components then behave badly in comparisons: Show quoted text
> Version.new("34\x[308]5") leg Version.new("4")
Less The digit with diacritic effectively terminates the digit sequence, for the purpose of finding a numeric value. This is probably due to [perl #128542]. This implies that fixing that bug, making the coercion reject such modified digits (as appears to be the intent), would cause the Version comparison to signal an error. That would also be buggy behaviour, so Version has a problem distinct from the problem with Str.Int. -zefram
RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 1.9k
On Tue, 05 Jul 2016 09:52:46 -0700, zefram@fysh.org wrote: Show quoted text
> The Version class accepts numeric components that contain digits with > diacritics, and faithfully preserves the grapheme string just as it > preserves non-ASCII digits. But these components then behave badly > in comparisons: >
> > Version.new("34\x[308]5") leg Version.new("4")
> Less > > The digit with diacritic effectively terminates the digit sequence, > for the purpose of finding a numeric value. This is probably due to > [perl #128542]. This implies that fixing that bug, making the coercion > reject such modified digits (as appears to be the intent), would cause > the Version comparison to signal an error. That would also be buggy > behaviour, so Version has a problem distinct from the problem with > Str.Int. > > -zefram
Thanks for the report, however, there's no bug here, as strings are valid version parts, which is what the diaeresis causes the part to parse as (as opposed to numbers). The `leg` operator coerces Versions to strings, and in this case string `"34\x[308]5"` is Less than `"4"`. The more appropriate operator to compare versions is `cmp`. With `cmp`, string parts are always Order::Less than number parts, so to see the comparison working properly, we'd need to compare versions with both of those parts being stringy: <Zoffix> m: say Version.new("34\x[308]5") cmp Version.new("34\x[308]4") <camelia> rakudo-moar 2f72fa: OUTPUT«More␤» <Zoffix> m: say Version.new("34\x[308]5") cmp Version.new("34\x[308]5") <camelia> rakudo-moar 2f72fa: OUTPUT«Same␤» <Zoffix> m: say Version.new("34\x[308]5") cmp Version.new("34\x[308]6") <camelia> rakudo-moar 2f72fa: OUTPUT«Less␤» This also works with Version literals: <Zoffix> m: say v34̈5 cmp v34̈6 <camelia> rakudo-moar 2f72fa: OUTPUT«Less␤» <Zoffix> m: say v34̈5 cmp v34̈5 <camelia> rakudo-moar 2f72fa: OUTPUT«Same␤» <Zoffix> m: say v34̈5 cmp v34̈4 <camelia> rakudo-moar 2f72fa: OUTPUT«More␤» Cheers, ZZ
Date: Sat, 26 Nov 2016 22:10:30 +0000
From: Zefram <zefram [...] fysh.org>
To: Zoffix Znet via RT <perl6-bugs-followup [...] perl.org>
Subject: Re: [perl #128546] [BUG] [UNI] Version comparison confused by digit with diacritics
Download (untitled)
application/octet-stream 1.9k

Message body not shown because it is not plain text.

RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 626b
A fair point. Re-opening with the intent to make synthetic digits match the same way as punctuation. I tried a few implementations, like changing the .comb to .comb(/:r ‘*’ || [(\d) <?{ nqp::iseq_s(nqp::chr(nqp::ord(nqp::substr($_, nqp::chars($_)-1, 1))), nqp::substr($_, nqp::chars($_)-1, 1)) given $/.Str }> ]+ || <.alpha>+ /) But all of them ended up being 10 to 64 times slower than just regular \d+. By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.
Download (untitled) / with headers
text/plain 676b
On Sat, 26 Nov 2016 19:52:48 -0800, cpan@zoffix.com wrote: Show quoted text
> By defining a token that matches only non-synthetic Nd chars, the > slowdown is only 2x, so I'll see if we can make that token available > somewhere in the guts, since we needed it in the Perl6/Grammar.nqp > too.
This looks like a bug in cmp. <samcv> j: say Version.new("34\x[308]5") cmp Version.new("4") <camelia> rakudo-jvm 8ca367: OUTPUT«More␤» <samcv> m: say Version.new("34\x[308]5") cmp Version.new("4") <camelia> rakudo-moar 347271: OUTPUT«Less␤» In MoarVM we just check if the graphemes are numerically higher while going the length of the string, we need to not do this if there are diacritics.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org