Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji sequences with ZERO WIDTH JOINER counted as separate chars when they probably shouldn't #4946

Closed
p6rt opened this issue Dec 27, 2015 · 4 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Dec 27, 2015

Migrated from rt.perl.org#127048 (status was 'resolved')

Searchable as RT127048$

@p6rt
Copy link
Author

p6rt commented Dec 27, 2015

From @AlexDaniel

This is a continuation of
https://rt.perl.org/Public/Bug/Display.html?id=127047

From http://unicode.org/reports/tr51/#Emoji_ZWJ_Sequences​:

“The U+200D ZERO WIDTH JOINER (ZWJ) can be used between the elements of a
sequence of characters to indicate that a single glyph should be presented
if available.”

“So to the user, these would behave like single emoji characters, even
though internally they are sequences.”

It sounds like we shouldn't cut these sequences in half when doing .substr
(which in turn means that these should be treated as one grapheme).

There is a chart of possible combinations here
http://www.unicode.org/emoji/charts/emoji-zwj-sequences.html, but I think
that any sequence with U+200D ZERO WIDTH JOINER should probably result in
one grapheme. As crazy as it sounds…

@p6rt
Copy link
Author

p6rt commented Dec 27, 2015

From @AlexDaniel

It should also be noted that ZERO WIDTH JOINER is used for other purposes
too​:
https://books.google.ee/books?id=wn5sXG8bEAcC&lpg=PA287&ots=J1bym1VbXE&dq=unicode%20%22ZERO%20WIDTH%20JOINER&pg=PA287#v=onepage&q=unicode%20%22ZERO%20WIDTH%20JOINER&f=false

But I'm not sure if it should affect the character count in such cases.

@p6rt
Copy link
Author

p6rt commented Sep 12, 2017

From @samcv

This has been resolved since a month or so ago. This was closed with this commit​:
MoarVM/MoarVM@fa5158a3

@p6rt p6rt closed this as completed Sep 12, 2017
@p6rt
Copy link
Author

p6rt commented Sep 12, 2017

@samcv - Status changed from 'new' to 'resolved'

@p6rt p6rt added the uni label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant