Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: unicode sorting/comparison #1515

Closed
p5pRT opened this issue Mar 28, 2000 · 6 comments
Closed

Re: unicode sorting/comparison #1515

p5pRT opened this issue Mar 28, 2000 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 28, 2000

Migrated from rt.perl.org#2732 (status was 'resolved')

Searchable as RT2732$

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2000

From joseph@5sigma.com

So, a casual look tells me sv.c compares UTF-8 bytewise. Am I
correct? Is the plan to implement the sorting algorithm in the
Unicode tech report and (thereby) change the semantics? Or would
there be some sort of pragma, locale, or whatever? Or IS there
a plan? :-)

  -joseph

On Fri, 24 Mar 2000 10​:20​:00 -0800 (PST) larry@​wall.org (Larry Wall) wrote​:

* joseph@​5sigma.com writes​:
* : On Fri, 24 Mar 2000 09​:59​:31 -0800 (PST) larry@​wall.org (Larry Wall) wrote​:
* :
* : * joseph@​5sigma.com writes​:
* : * : Where can I find information on status and current directions in
* : * : unicode-based sorting and comparison in Perl?
* : *
* : * http​://www.unicode.org/unicode/reports/techreports.html
* : *
* : * :-) * rand
* : *
* : * Larry
* :
* : I saw that. But are there any Perl examples and/or some discussion
* : around?
*
* Nope. That is why what I said was only partially funny.
*
* Larry

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2000

From @TimToady

joseph@​5sigma.com writes​:
: So, a casual look tells me sv.c compares UTF-8 bytewise. Am I
: correct? Is the plan to implement the sorting algorithm in the
: Unicode tech report and (thereby) change the semantics? Or would
: there be some sort of pragma, locale, or whatever? Or IS there
: a plan? :-)

The plan is to leave basic cmp doing what it does, and to implement
Unicode sorting as a Schwartzian Transform (which is pretty much
required by the UTR). Once we've got an explicit interface nailed
down, we can make it implicit with a "use sort" or some such. But
please think of it as putting a wrapper around the simple sort,
not as changing the semantics of the simple sort. In the end, you
still need the simple sort.

Larry

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2000

From [Unknown Contact. See original ticket]

On Mon, 27 Mar 2000 08​:50​:09 -0800 (PST) larry@​wall.org (Larry Wall) wrote​:

* The plan is to leave basic cmp doing what it does, and to implement
* Unicode sorting as a Schwartzian Transform (which is pretty much
* required by the UTR). Once we've got an explicit interface nailed
* down, we can make it implicit with a "use sort" or some such. But
* please think of it as putting a wrapper around the simple sort,
* not as changing the semantics of the simple sort. In the end, you
* still need the simple sort.
*
* Larry

Ah, that makes sense. So if I'm trying to explain this, I can
say UTF-8 will be compared bytewise and that other Unicode
collating orders will be implemented by pragma?

  -joseph

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2000

From @TimToady

joseph@​5sigma.com writes​:
: On Mon, 27 Mar 2000 08​:50​:09 -0800 (PST) larry@​wall.org (Larry Wall) wrote​:
:
: * The plan is to leave basic cmp doing what it does, and to implement
: * Unicode sorting as a Schwartzian Transform (which is pretty much
: * required by the UTR). Once we've got an explicit interface nailed
: * down, we can make it implicit with a "use sort" or some such. But
: * please think of it as putting a wrapper around the simple sort,
: * not as changing the semantics of the simple sort. In the end, you
: * still need the simple sort.
: *
: * Larry
:
: Ah, that makes sense. So if I'm trying to explain this, I can
: say UTF-8 will be compared bytewise and that other Unicode
: collating orders will be implemented by pragma?

Sure, you can say that. :-)

Larry

@p5pRT
Copy link
Author

p5pRT commented Nov 24, 2002

From @jhi

(cleaning away old tickets)

I think this issue has been resolved by 5.8.0 (or, even, 5.6.1).
As envisioned by Larry, comparison (cmp) by default goes by the
raw numerical values (codepoints), and the real Unicode collation
algorithm is implemented externally (by the Unicode​::Collate module,
which is part of 5.8.0, and also available at CPAN.)

Therefore, I'm marking the ticket as resolved.

@p5pRT
Copy link
Author

p5pRT commented Nov 24, 2002

@jhi - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant