Case folding of ß 00DF ß LATIN SMALL LETTER SHARP S #3352

p6rt · 2014-03-04T20:56:48Z

Migrated from rt.perl.org#121377 (status was 'resolved')

Searchable as RT121377$

p6rt · 2010-06-15T09:16:28Z

From @masak

<sorear> How does case insensitive matching work in perl 6?
<sorear> e.g. "ß" ~~ m:i/SS/
<masak> sorear: that's the syntax, so I assume you're asking about the
semantics.
<masak> oh wait, that example is tricky :)
<masak> I would be surprised if Perl 6 is spec'd to handle that.
<sorear> yes. semantics, and dark corners thereof
<sorear> yes, S05 says exactly nothing on the subject
<sorear> other than "ignores case distinctions"
<moritz_> rakudo: say "ß" ~~ /:i SS/
<p6eval> rakudo cfbeb5: OUTPUT«␤»
<moritz_> rakudo: say uc "ß"
<p6eval> rakudo cfbeb5: OUTPUT«SS␤»
<masak> o.O
<masak> German is strange.
<moritz_> it sure is.
<moritz_> masak: want to submit a bug report about inconsistency?
* masak submits rakudobug

p6rt · 2010-08-11T16:52:40Z

@coke - Status changed from 'new' to 'open'

p6rt · 2012-10-21T01:17:19Z

From @coke

On Tue Jun 15 02:16:28 2010, masak wrote:

<sorear> How does case insensitive matching work in perl 6?
<sorear> e.g. "ß" ~~ m:i/SS/
<masak> sorear: that's the syntax, so I assume you're asking about the
semantics.
<masak> oh wait, that example is tricky :)
<masak> I would be surprised if Perl 6 is spec'd to handle that.
<sorear> yes. semantics, and dark corners thereof
<sorear> yes, S05 says exactly nothing on the subject
<sorear> other than "ignores case distinctions"
<moritz_> rakudo: say "ß" ~~ /:i SS/
<p6eval> rakudo cfbeb5: OUTPUT«␤»
<moritz_> rakudo: say uc "ß"
<p6eval> rakudo cfbeb5: OUTPUT«SS␤»
<masak> o.O
<masak> German is strange.
<moritz_> it sure is.
<moritz_> masak: want to submit a bug report about inconsistency?
* masak submits rakudobug

Behavior changed:

"ß" ~~ m:i/SS/
#<failed match>
say uc "ß"
ß

Closable?

--
Will "Coke" Coleda

p6rt · 2012-10-21T09:09:03Z

From @masak

On Sat Oct 20 18:17:19 2012, coke wrote:

On Tue Jun 15 02:16:28 2010, masak wrote:

<sorear> How does case insensitive matching work in perl 6?
<sorear> e.g. "ß" ~~ m:i/SS/
<masak> sorear: that's the syntax, so I assume you're asking about
the
semantics.
<masak> oh wait, that example is tricky :)
<masak> I would be surprised if Perl 6 is spec'd to handle that.
<sorear> yes. semantics, and dark corners thereof
<sorear> yes, S05 says exactly nothing on the subject
<sorear> other than "ignores case distinctions"
<moritz_> rakudo: say "ß" ~~ /:i SS/
<p6eval> rakudo cfbeb5: OUTPUT«␤»
<moritz_> rakudo: say uc "ß"
<p6eval> rakudo cfbeb5: OUTPUT«SS␤»
<masak> o.O
<masak> German is strange.
<moritz_> it sure is.
<moritz_> masak: want to submit a bug report about inconsistency?
* masak submits rakudobug

Behavior changed:

"ß" ~~ m:i/SS/
#<failed match>
say uc "ß"
ß

Closable?

Well, the *inconsistency* seems to be gone... but by pushing the
semantics in (what I consider to be) the wrong direction. I.e. now
instead of one of two things behaving the wrong way, both do.

p6rt · 2013-07-19T22:56:51Z

From @ShimmerFairy

<lue> r: say "ß".uc
<camelia> rakudo 45d447: OUTPUT«ß␤»
<lue> r: say "ẞ".lc.uc
<camelia> rakudo 45d447: OUTPUT«SS␤»

Both examples above are meant to result in SS. Note that the capital
eszett does convert to a lowercase one:

<lue> r: say "ẞ".lc
<camelia> rakudo 45d447: OUTPUT«ß␤»

p6rt · 2014-03-04T20:56:48Z

From @moritz

<moritz> p6: say 'ß'.uc, 'ß'.tc, 'ß'.tclc
<camelia> rakudo-jvm f2471a: OUTPUT«SSSSß␤»
<camelia> ..rakudo-parrot f2471a, rakudo-moar f2471a: OUTPUT«ßßß␤»
<camelia> ..niecza v24-109-g48a8de3: OUTPUT«ßSsSs␤»

All these answers are wrong. 'ß'.uc is supposed to be 'SS' or possibly
'ẞ', and 'ß'.tc and 'ß'.tclc should both be 'Ss'

p6rt · 2014-03-05T14:28:49Z

From @coke

On Tue Mar 04 12:56:48 2014, moritz wrote:

<moritz> p6: say 'ß'.uc, 'ß'.tc, 'ß'.tclc
<camelia> rakudo-jvm f2471a: OUTPUT«SSSSß␤»
<camelia> ..rakudo-parrot f2471a, rakudo-moar f2471a: OUTPUT«ßßß␤»
<camelia> ..niecza v24-109-g48a8de3: OUTPUT«ßSsSs␤»

All these answers are wrong. 'ß'.uc is supposed to be 'SS' or possibly
'ẞ', and 'ß'.tc and 'ß'.tclc should both be 'Ss'

Is this a unicode specified behavior (if so, can we have a URL for posterity?) or is this a native speaker response which contradicts unicode?

What's the desired behavior if ß is not at the beginning of the string?

There are already tests for this behavior in S32-str/{uc,tclc,tc}.t which might need to be cleaned up as a result of this test.
--
Will "Coke" Coleda

p6rt · 2014-03-05T14:28:49Z

The RT System itself - Status changed from 'new' to 'open'

p6rt · 2014-03-28T16:18:06Z

From @moritz

On Wed Mar 05 06:28:49 2014, coke wrote:

On Tue Mar 04 12:56:48 2014, moritz wrote:

<moritz> p6: say 'ß'.uc, 'ß'.tc, 'ß'.tclc
<camelia> rakudo-jvm f2471a: OUTPUT«SSSSß␤»
<camelia> ..rakudo-parrot f2471a, rakudo-moar f2471a: OUTPUT«ßßß␤»
<camelia> ..niecza v24-109-g48a8de3: OUTPUT«ßSsSs␤»

All these answers are wrong. 'ß'.uc is supposed to be 'SS' or
possibly
'ẞ', and 'ß'.tc and 'ß'.tclc should both be 'Ss'

Is this a unicode specified behavior (if so, can we have a URL for
posterity?)

Yes. http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf refers to SpecialCasing.txt, and SpecialCasing.txt contains this:

==
# Format
# ==============================================================================

# The entries in this file are in the following machine-readable format:
#
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
#
# <code>, <lower>, <title>, and <upper> provide character values in hex. If ther
e is more
# than one character, they are separated by spaces. Other than as used to separa
te
# elements, spaces are to be ignored.

[...]
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S

or is this a native speaker response which contradicts
unicode?

It's in line with both the expectations of one native speaker (me) and with Unicode.

What's the desired behavior if ß is not at the beginning of the
string?

.uc should fold it to 'SS' regardless, and .tclc and .lc should leave it alone

There are already tests for this behavior in S32-str/{uc,tclc,tc}.t
which might need to be cleaned up as a result of this test.

So far that tests that I've seen seem to agree with me, but I haven't looked at all of them yet.

p6rt · 2015-10-09T09:45:59Z

From @jnthn

On Fri Mar 28 09:18:06 2014, moritz wrote:

On Wed Mar 05 06:28:49 2014, coke wrote:

On Tue Mar 04 12:56:48 2014, moritz wrote:

<moritz> p6: say 'ß'.uc, 'ß'.tc, 'ß'.tclc
<camelia> rakudo-jvm f2471a: OUTPUT«SSSSß␤»
<camelia> ..rakudo-parrot f2471a, rakudo-moar f2471a: OUTPUT«ßßß␤»
<camelia> ..niecza v24-109-g48a8de3: OUTPUT«ßSsSs␤»

All these answers are wrong. 'ß'.uc is supposed to be 'SS' or
possibly
'ẞ', and 'ß'.tc and 'ß'.tclc should both be 'Ss'

Is this a unicode specified behavior (if so, can we have a URL for
posterity?)

Yes. http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf refers to
SpecialCasing.txt, and SpecialCasing.txt contains this:

==
# Format
#

==
# The entries in this file are in the following machine-readable
format:
#
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? #
<comment>
#
# <code>, <lower>, <title>, and <upper> provide character values in
hex. If ther
e is more
# than one character, they are separated by spaces. Other than as used
to separa
te
# elements, spaces are to be ignored.

[...]
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to
titlecase(uppercase(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S

This is now implemented, and tests unfudged.

p6rt · 2015-10-09T09:46:03Z

@jnthn - Status changed from 'open' to 'resolved'

p6rt closed this as completed Oct 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case folding of ß 00DF ß LATIN SMALL LETTER SHARP S #3352

Case folding of ß 00DF ß LATIN SMALL LETTER SHARP S #3352

p6rt commented Mar 4, 2014

p6rt commented Jun 15, 2010

p6rt commented Aug 11, 2010

p6rt commented Oct 21, 2012

p6rt commented Oct 21, 2012

p6rt commented Jul 19, 2013

p6rt commented Mar 4, 2014

p6rt commented Mar 5, 2014

p6rt commented Mar 5, 2014

p6rt commented Mar 28, 2014

p6rt commented Oct 9, 2015

==
# Format
#

p6rt commented Oct 9, 2015

Case folding of ß 00DF ß LATIN SMALL LETTER SHARP S #3352

Case folding of ß 00DF ß LATIN SMALL LETTER SHARP S #3352

Comments

p6rt commented Mar 4, 2014

p6rt commented Jun 15, 2010

From @masak

p6rt commented Aug 11, 2010

p6rt commented Oct 21, 2012

From @coke

p6rt commented Oct 21, 2012

From @masak

p6rt commented Jul 19, 2013

From @ShimmerFairy

p6rt commented Mar 4, 2014

From @moritz

p6rt commented Mar 5, 2014

From @coke

p6rt commented Mar 5, 2014

p6rt commented Mar 28, 2014

From @moritz

== # Format # ==============================================================================

p6rt commented Oct 9, 2015

From @jnthn

== # Format #

p6rt commented Oct 9, 2015

==
# Format
# ==============================================================================

==
# Format
#