Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stranded strings with combiners or ZWJ on borders break my NFG expectations ( (“\x[0305]a” x 2).chars.say ) #6412

Closed
p6rt opened this issue Jul 26, 2017 · 6 comments

Comments

@p6rt
Copy link

p6rt commented Jul 26, 2017

Migrated from rt.perl.org#131801 (status was 'resolved')

Searchable as RT131801$

@p6rt
Copy link
Author

p6rt commented Jul 26, 2017

From @AlexDaniel

Code​:
say (�\c[COMBINING OVERLINE]a� x 2).chars

Result​:
4

Code​:
say (�\c[COMBINING OVERLINE]a� ~ �\c[COMBINING OVERLINE]a�).chars

Result​:
3

Both should produce the same result (3). What happens here is �a� on one side is not being squished into one grapheme with a combiner on another side.

Please note that combiners are not the only thing can cause this. Here is the same thing with ZWJ​:

Code​:
my $x = �\x[2695]\x[FE0F]a\x[1F468]\x[200D]�;
say ($x ~ $x).chars;
say ($x x 2).chars

Result​:
5
6

I have a feeling that this is a known issue, and that there might be a ticket for this already. However, I couldn't find it.

@p6rt
Copy link
Author

p6rt commented Aug 23, 2017

From @AlexDaniel

Submitting so that it does not slip through the cracks.

<AlexDaniel> m​: ("\c[COMBINING ACUTE ACCENT]" x 5).chars.say
<camelia> rakudo-moar 636a3c​: OUTPUT​: «5â�¤Â»
<samcv> AlexDaniel, that is a bug
<samcv> i will be fixing it for my grant though
<samcv> it only occurs in certain cases

@p6rt
Copy link
Author

p6rt commented Aug 23, 2017

@samcv - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Aug 23, 2017

From @AlexDaniel

This is the same issue, but it is interesting nonetheless​:

<AlexDaniel> m​: dd (0x0F75.chr x 2).uninames
<camelia> rakudo-moar 636a3c​: OUTPUT​: «("TIBETAN VOWEL SIGN AA", "TIBETAN VOWEL SIGN U", "TIBETAN VOWEL SIGN AA", "TIBETAN VOWEL SIGN U").Seqâ�¤»
<AlexDaniel> m​: dd (0x0F75.chr ~ 0x0F75.chr).uninames
<camelia> rakudo-moar 636a3c​: OUTPUT​: «("TIBETAN VOWEL SIGN AA", "TIBETAN VOWEL SIGN AA", "TIBETAN VOWEL SIGN U", "TIBETAN VOWEL SIGN U").Seqâ�¤»

Note that the order should be normalized.
On 2017-07-26 03​:36​:32, alex.jakimenko@​gmail.com wrote​:

Code​:
say (“\c[COMBINING OVERLINE]a” x 2).chars

Result​:
4

Code​:
say (“\c[COMBINING OVERLINE]a” ~ “\c[COMBINING OVERLINE]a”).chars

Result​:
3

Both should produce the same result (3). What happens here is “a” on
one side is not being squished into one grapheme with a combiner on
another side.

Please note that combiners are not the only thing can cause this. Here
is the same thing with ZWJ​:

Code​:
my $x = “\x[2695]\x[FE0F]a\x[1F468]\x[200D]”;
say ($x ~ $x).chars;
say ($x x 2).chars

Result​:
5
6

I have a feeling that this is a known issue, and that there might be a
ticket for this already. However, I couldn't find it.

@p6rt
Copy link
Author

p6rt commented Aug 24, 2017

From @samcv

This has been fixed as of this MoarVM commit​:
MoarVM/MoarVM@62f66cb

Tests have been added to roast here​:
Raku/roast@1e4fd21

@p6rt p6rt closed this as completed Aug 24, 2017
@p6rt
Copy link
Author

p6rt commented Aug 24, 2017

@samcv - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant