Re: unicode sorting/comparison #1515

p5pRT · 2000-03-28T11:38:36Z

Migrated from rt.perl.org#2732 (status was 'resolved')

Searchable as RT2732$

p5pRT · 2000-03-28T11:38:36Z

From joseph@5sigma.com

So, a casual look tells me sv.c compares UTF-8 bytewise. Am I
correct? Is the plan to implement the sorting algorithm in the
Unicode tech report and (thereby) change the semantics? Or would
there be some sort of pragma, locale, or whatever? Or IS there
a plan? :-)

-joseph

On Fri, 24 Mar 2000 10:20:00 -0800 (PST) larry@wall.org (Larry Wall) wrote:

* joseph@5sigma.com writes:
* : On Fri, 24 Mar 2000 09:59:31 -0800 (PST) larry@wall.org (Larry Wall) wrote:
* :
* : * joseph@5sigma.com writes:
* : * : Where can I find information on status and current directions in
* : * : unicode-based sorting and comparison in Perl?
* : *
* : * http://www.unicode.org/unicode/reports/techreports.html
* : *
* : * :-) * rand
* : *
* : * Larry
* :
* : I saw that. But are there any Perl examples and/or some discussion
* : around?
*
* Nope. That is why what I said was only partially funny.
*
* Larry

p5pRT · 2000-03-28T11:38:38Z

From @TimToady

joseph@5sigma.com writes:
: So, a casual look tells me sv.c compares UTF-8 bytewise. Am I
: correct? Is the plan to implement the sorting algorithm in the
: Unicode tech report and (thereby) change the semantics? Or would
: there be some sort of pragma, locale, or whatever? Or IS there
: a plan? :-)

The plan is to leave basic cmp doing what it does, and to implement
Unicode sorting as a Schwartzian Transform (which is pretty much
required by the UTR). Once we've got an explicit interface nailed
down, we can make it implicit with a "use sort" or some such. But
please think of it as putting a wrapper around the simple sort,
not as changing the semantics of the simple sort. In the end, you
still need the simple sort.

Larry

p5pRT · 2000-03-28T11:38:41Z

From [Unknown Contact. See original ticket]

On Mon, 27 Mar 2000 08:50:09 -0800 (PST) larry@wall.org (Larry Wall) wrote:

* The plan is to leave basic cmp doing what it does, and to implement
* Unicode sorting as a Schwartzian Transform (which is pretty much
* required by the UTR). Once we've got an explicit interface nailed
* down, we can make it implicit with a "use sort" or some such. But
* please think of it as putting a wrapper around the simple sort,
* not as changing the semantics of the simple sort. In the end, you
* still need the simple sort.
*
* Larry

Ah, that makes sense. So if I'm trying to explain this, I can
say UTF-8 will be compared bytewise and that other Unicode
collating orders will be implemented by pragma?

-joseph

p5pRT · 2000-03-28T11:38:43Z

From @TimToady

joseph@5sigma.com writes:
: On Mon, 27 Mar 2000 08:50:09 -0800 (PST) larry@wall.org (Larry Wall) wrote:
:
: * The plan is to leave basic cmp doing what it does, and to implement
: * Unicode sorting as a Schwartzian Transform (which is pretty much
: * required by the UTR). Once we've got an explicit interface nailed
: * down, we can make it implicit with a "use sort" or some such. But
: * please think of it as putting a wrapper around the simple sort,
: * not as changing the semantics of the simple sort. In the end, you
: * still need the simple sort.
: *
: * Larry
:
: Ah, that makes sense. So if I'm trying to explain this, I can
: say UTF-8 will be compared bytewise and that other Unicode
: collating orders will be implemented by pragma?

Sure, you can say that. :-)

Larry

p5pRT · 2002-11-24T16:05:03Z

From @jhi

(cleaning away old tickets)

I think this issue has been resolved by 5.8.0 (or, even, 5.6.1).
As envisioned by Larry, comparison (cmp) by default goes by the
raw numerical values (codepoints), and the real Unicode collation
algorithm is implemented externally (by the Unicode::Collate module,
which is part of 5.8.0, and also available at CPAN.)

Therefore, I'm marking the ticket as resolved.

p5pRT · 2002-11-24T17:14:06Z

@jhi - Status changed from 'open' to 'resolved'

p5pRT closed this as completed Nov 24, 2002

p5pRT added the Severity Low label Oct 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re: unicode sorting/comparison #1515

Re: unicode sorting/comparison #1515

p5pRT commented Mar 28, 2000

p5pRT commented Mar 28, 2000

p5pRT commented Mar 28, 2000

p5pRT commented Mar 28, 2000

p5pRT commented Mar 28, 2000

p5pRT commented Nov 24, 2002

p5pRT commented Nov 24, 2002

Re: unicode sorting/comparison #1515

Re: unicode sorting/comparison #1515

Comments

p5pRT commented Mar 28, 2000

p5pRT commented Mar 28, 2000

From joseph@5sigma.com

p5pRT commented Mar 28, 2000

From @TimToady

p5pRT commented Mar 28, 2000

From [Unknown Contact. See original ticket]

p5pRT commented Mar 28, 2000

From @TimToady

p5pRT commented Nov 24, 2002

From @jhi

p5pRT commented Nov 24, 2002