Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grapheme-level reasoning is NYI for regexes, .chars, .ord (among others) in Rakudo #953

Closed
p6rt opened this issue Apr 27, 2009 · 14 comments
Closed
Labels
NYI Features not yet implemented Todo

Comments

@p6rt
Copy link

p6rt commented Apr 27, 2009

Migrated from rt.perl.org#65170 (status was 'resolved')

Searchable as RT65170$

@p6rt
Copy link
Author

p6rt commented Apr 27, 2009

From @wollmers

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING
DOT BELOW]".chars;'
2

Expected result​: 1

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE,
COMBINING DOT BELOW]".chars;'
3

Expected result​: 1

see specs​:
http://svn.pugscode.org/pugs/docs/Perl6/Spec/S32-setting-library/Str.pod

----quote----
A Str can exist at several Unicode levels at once. Which level you
interact with typically depends on what your current lexical context has
declared the "working Unicode level to be". Default is C<Grapheme>.
----end of quote---

Helmut Wollmersdorfer

@p6rt
Copy link
Author

p6rt commented Apr 27, 2009

From @wollmers

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE]".ord;'
65
$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE]".ord;'
550

Both results should be the same in grapheme mode. Grapheme mode is default.

Helmut Wollmersdorfer

@p6rt
Copy link
Author

p6rt commented Aug 16, 2009

From @kyleha

This is an automatically generated mail to inform you that tests are now available in t/spec/S29-conversions/ord_and_chr.t

commit 2f95300eed064742a91048b65ae28efea460b673
Author​: kyle <kyle@​c213334d-75ef-0310-aa23-eaa082d1ae64>
Date​: Sun Aug 16 04​:04​:25 2009 +0000

  [t/spec] Test for RT #​65172
 
  git-svn-id​: http://svn.pugscode.org/pugs@&#8203;28003 c213334d-75ef-0310-aa23-eaa082d1ae64

Inline Patch
diff --git a/t/spec/S29-conversions/ord_and_chr.t b/t/spec/S29-conversions/ord_and_chr.t
index ee3820d..38d9a5d 100644
--- a/t/spec/S29-conversions/ord_and_chr.t
+++ b/t/spec/S29-conversions/ord_and_chr.t
@@ -121,7 +121,7 @@ my @maps = (
   "\o03", 3,
 );
 
-plan 37+@maps*2;
+plan 38+@maps*2;
 
 for @maps -> $char, $code {
   my $descr = "\\{$code}{$code >= 32 ?? " == '{$char}'" !! ""}";
@@ -152,4 +152,12 @@ is chr(104, 101, 108, 108, 111), 'hello', 'chr works with a list of ints';
 #?rakudo skip 'RT #62772'
 ok ord("") ~~ Failure, 'ord("") returns a Failure';
 
+# RT #65172
+{
+    my $rt65172a = "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE]";
+    my $rt65172b = "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE]";
+    #?rakudo todo 'RT #65172'
+    is $rt65172a.ord, $rt65172b.ord, '.ord defaults to grapheme mode';
+}
+
 #vim: ft=perl6

@p6rt
Copy link
Author

p6rt commented Aug 16, 2009

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Aug 16, 2009

From @kyleha

This is an automatically generated mail to inform you that tests are now available in t/spec/S02-builtin_data_types/unicode.t

commit e00ba2f74e7c0923b6550f76f6d22d10950c90a1
Author​: kyle <kyle@​c213334d-75ef-0310-aa23-eaa082d1ae64>
Date​: Sun Aug 16 03​:59​:55 2009 +0000

  [t/spec] Additional tests for RT #​65170
 
  git-svn-id​: http://svn.pugscode.org/pugs@&#8203;28002 c213334d-75ef-0310-aa23-eaa082d1ae64

Inline Patch
diff --git a/t/spec/S02-builtin_data_types/unicode.t b/t/spec/S02-builtin_data_types/unicode.t
index 2263361..a399c9d 100644
--- a/t/spec/S02-builtin_data_types/unicode.t
+++ b/t/spec/S02-builtin_data_types/unicode.t
@@ -1,7 +1,7 @@
 use v6;
 
 use Test;
-plan 15;
+plan 17;
 
 #L<S02/"Built-In Data Types"/".bytes, .codes or .graphs">
 
@@ -16,6 +16,16 @@ is "foo\r\nbar".graphs, 7, 'CRLF is 1 graph';
 # Speculation, .chars is unspecced, also use Bytes etc.
 is $u.chars, 1, '.chars defaults to .graphs';
 
+# RT #65170
+{
+    my $rt65170;
+
+    $rt65170 = "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING DOT BELOW]";
+    is $rt65170.chars, 1, '.chars defaults to .graphs (2)';
+    $rt65170 = "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE, COMBINING DOT BELOW]";
+    is $rt65170.chars, 1, '.chars defaults to .graphs (3)';
+}
+
 #L<S02/"Built-In Data Types"/"coerce to the proper units">
     $u = "\x[41,
             E1,

@p6rt
Copy link
Author

p6rt commented Aug 16, 2009

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Oct 5, 2011

From @coke

On Mon Apr 27 06​:20​:19 2009, helmut@​wollmersdorfer.at wrote​:

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE]".ord;'
65
$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE]".ord;'
550

Both results should be the same in grapheme mode. Grapheme mode is default.

Helmut Wollmersdorfer

No change as of rakudo 545638a

--
Will "Coke" Coleda

@p6rt
Copy link
Author

p6rt commented Feb 23, 2014

From @coke

On Sat Aug 15 21​:08​:05 2009, KyleHa wrote​:

This is an automatically generated mail to inform you that tests are
now available in t/spec/S02-builtin_data_types/unicode.t

commit e00ba2f74e7c0923b6550f76f6d22d10950c90a1
Author​: kyle <kyle@​c213334d-75ef-0310-aa23-eaa082d1ae64>
Date​: Sun Aug 16 03​:59​:55 2009 +0000

[t/spec] Additional tests for RT #​65170

git-svn-id​: http://svn.pugscode.org/pugs@&#8203;28002 c213334d-75ef-0310-
aa23-eaa082d1ae64

diff --git a/t/spec/S02-builtin_data_types/unicode.t b/t/spec/S02-
builtin_data_types/unicode.t
index 2263361..a399c9d 100644
--- a/t/spec/S02-builtin_data_types/unicode.t
+++ b/t/spec/S02-builtin_data_types/unicode.t
@​@​ -1,7 +1,7 @​@​
use v6;

use Test;
-plan 15;
+plan 17;

#L<S02/"Built-In Data Types"/".bytes, .codes or .graphs">

@​@​ -16,6 +16,16 @​@​ is "foo\r\nbar".graphs, 7, 'CRLF is 1 graph';
# Speculation, .chars is unspecced, also use Bytes etc.
is $u.chars, 1, '.chars defaults to .graphs';

+# RT #​65170
+{
+ my $rt65170;
+
+ $rt65170 = "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING
DOT BELOW]";
+ is $rt65170.chars, 1, '.chars defaults to .graphs (2)';
+ $rt65170 = "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE,
COMBINING DOT BELOW]";
+ is $rt65170.chars, 1, '.chars defaults to .graphs (3)';
+}
+
#L<S02/"Built-In Data Types"/"coerce to the proper units">
$u = "\x[41,
E1,

Fudged this test file for rakudo and added it to the list of files to run (so we can tell when this feature is working)

--
Will "Coke" Coleda

@p6rt
Copy link
Author

p6rt commented Feb 23, 2014

From @coke

On Sat Feb 22 17​:33​:28 2014, coke wrote​:

Fudged this test file for rakudo and added it to the list of files to
run (so we can tell when this feature is working)

--
Will "Coke" Coleda

@p6rt
Copy link
Author

p6rt commented Feb 27, 2014

From @tadzik

"( ͡° �� ͡°)".chars on Rakudo (all 3 backends, as of today) returns 11,
while there are 8 characters in that face.

@p6rt
Copy link
Author

p6rt commented Jul 10, 2014

From @masak

<lelf> Is \w defined as <alnum> on purpose?
<moritz> yes
<lelf> It is not in perl5
<moritz> no?
<lelf> With the very good reason that it matches combining chars for example
<lelf> m​: say 'xyz̧p' ~~ /\w+/
<camelia> rakudo-moar 4cad54​: OUTPUT«「xyzï½£â�¤â�¤Â»
<moritz> those are meant to be deal with at grapheme level in p6
<jnthn> Right; that's a lack of NFG...
<masak> oh, so the above's a bug?
<jnthn> masak​: Well, it's an "NFG is NYI"...
<masak> do we have "NFG is NYI" in RT?
<moritz> masak​: we have
<masak> moritz​: thank you.
<jnthn> If not, we should have one, but I suspect we do...
<jnthn> And I'd include this as an example in it.
* masak gets on it
<jnthn> So we have that ticket a source to mine for test cases :)

@p6rt
Copy link
Author

p6rt commented Jul 10, 2014

From [Unknown Contact. See original ticket]

<lelf> Is \w defined as <alnum> on purpose?
<moritz> yes
<lelf> It is not in perl5
<moritz> no?
<lelf> With the very good reason that it matches combining chars for example
<lelf> m​: say 'xyz̧p' ~~ /\w+/
<camelia> rakudo-moar 4cad54​: OUTPUT«「xyzï½£â�¤â�¤Â»
<moritz> those are meant to be deal with at grapheme level in p6
<jnthn> Right; that's a lack of NFG...
<masak> oh, so the above's a bug?
<jnthn> masak​: Well, it's an "NFG is NYI"...
<masak> do we have "NFG is NYI" in RT?
<moritz> masak​: we have
<masak> moritz​: thank you.
<jnthn> If not, we should have one, but I suspect we do...
<jnthn> And I'd include this as an example in it.
* masak gets on it
<jnthn> So we have that ticket a source to mine for test cases :)

@p6rt
Copy link
Author

p6rt commented Apr 28, 2015

From @jnthn

On Mon Apr 27 05​:47​:38 2009, helmut@​wollmersdorfer.at wrote​:

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING
DOT BELOW]".chars;'
2

Expected result​: 1

$ ./perl6 -e 'say "\c[LATIN CAPITAL LETTER A, COMBINING DOT ABOVE,
COMBINING DOT BELOW]".chars;'
3

Expected result​: 1

see specs​:
http://svn.pugscode.org/pugs/docs/Perl6/Spec/S32-setting-library/Str.pod

----quote----
A Str can exist at several Unicode levels at once. Which level you
interact with typically depends on what your current lexical context has
declared the "working Unicode level to be". Default is C<Grapheme>.
----end of quote---

The test cases associated with this ticket are now passing on Rakudo on MoarVM, along with many, many other NFG-related tests. I'll resolve this catch-all ticket, and we can open more specific ones for particular NFG-related issues that crop up.

@p6rt p6rt closed this as completed Apr 28, 2015
@p6rt
Copy link
Author

p6rt commented Apr 28, 2015

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt added NYI Features not yet implemented Todo labels Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NYI Features not yet implemented Todo
Projects
None yet
Development

No branches or pull requests

1 participant