Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move capitalize from builtins to setting #816

Closed
p6rt opened this issue Mar 21, 2009 · 7 comments
Closed

Move capitalize from builtins to setting #816

p6rt opened this issue Mar 21, 2009 · 7 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Mar 21, 2009

Migrated from rt.perl.org#64060 (status was 'resolved')

Searchable as RT64060$

@p6rt
Copy link
Author

p6rt commented Mar 21, 2009

From @clintongormley

The attached patch moves capitalize from builtins/any-str.pir to
setting/Any-str.pm, and removes a redundant line where the string length
was being checked multiple times, even though it wouldn't have changed.

@p6rt
Copy link
Author

p6rt commented Mar 21, 2009

From @clintongormley

capitalize_move_to_setting.patch
diff --git a/src/builtins/any-str.pir b/src/builtins/any-str.pir
index 0369a40..066ff56 100644
--- a/src/builtins/any-str.pir
+++ b/src/builtins/any-str.pir
@@ -23,7 +23,7 @@ the size of that file down and to emphasize their generic,
 .namespace []
 .sub 'onload' :anon :init :load
     $P0 = get_hll_namespace ['Any']
-    '!EXPORT'('capitalize,chomp,chars,:d,:e,:f,index,rindex,ord,substr,trim', 'from'=>$P0)
+    '!EXPORT'('chomp,chars,:d,:e,:f,index,rindex,ord,substr,trim', 'from'=>$P0)
 .end
 
 
@@ -33,48 +33,6 @@ the size of that file down and to emphasize their generic,
 
 .namespace ['Any']
 
-=item capitalize
-
- our Str multi Str::capitalize ( Str $string )
-
-Has the effect of first doing an C<lc> on the entire string, then performing a
-C<s:g/(\w+)/{ucfirst $1}/> on it.
-
-=cut
-
-.sub 'capitalize' :method :multi(_)
-    .local string tmps
-    .local pmc retv
-    .local int len
-
-    retv = new 'Str'
-    tmps = self
-
-    len = length tmps
-    if len == 0 goto done
-
-    downcase tmps
-
-    .local int pos
-    .local string s1
-    pos = 0
-  next_word:
-    pos = find_cclass .CCLASS_LOWERCASE, tmps, pos, len
-    s1 = substr tmps, pos, 1
-    upcase s1
-    substr tmps, pos, 1, s1
-    len = length tmps
-    pos+=1
-    if pos == len goto done
-    pos = find_not_cclass .CCLASS_LOWERCASE, tmps, pos, len
-    if pos == len goto done
-    goto next_word
-
-  done:
-    retv = tmps
-    .return (retv)
-.end
-
 .sub 'chars' :method :multi(_)
     $S0 = self
     $I0 = length $S0
diff --git a/src/setting/Any-str.pm b/src/setting/Any-str.pm
index bbf28a3..49855d1 100644
--- a/src/setting/Any-str.pm
+++ b/src/setting/Any-str.pm
@@ -19,6 +19,38 @@ class Any is also {
         self gt '' ?? self.substr(0,1).lc ~ self.substr(1) !! ""
     }
 
+    our Str multi method capitalize is export {
+        return Q:PIR {
+            $S0 = self
+            .local int len 
+            
+            len = length $S0
+            if len == 0 goto done 
+         
+            downcase $S0
+
+            .local int pos 
+            .local string s1 
+            pos = 0 
+
+          next_word: 
+            pos = find_cclass .CCLASS_LOWERCASE, $S0, pos, len 
+            s1 = substr $S0, pos, 1 
+            upcase s1 
+            substr $S0, pos, 1, s1 
+            pos+=1 
+            if pos == len goto done 
+
+            pos = find_not_cclass .CCLASS_LOWERCASE, $S0, pos, len 
+            if pos == len goto done 
+            goto next_word 
+         
+          done: 
+            %r = box $S0
+        }
+
+    }
+
     our List multi method split(Code $delimiter, $limit = *) {
         my $s = ~self;
         my $l = $limit ~~ Whatever ?? Inf !! $limit;

@p6rt
Copy link
Author

p6rt commented Mar 25, 2009

From @pmichaud

I'm declining this patch in favor of a slightly shorter version
taken almost directly from the synopsis​:

  our Str multi method capitalize() is export {
  self.lc.subst(/\w+/, { .ucfirst }, :global)
  }

Just for reference (and for consideration in other upcoming patches) --
the patch and the original code it was based on had a few issues​:

(1) The synopsis says that the conversion takes place on word
characters; the original code was considering only alphabetics.

(2) Methods that don't accept additional positional parameters really
should have an explicit empty parameter list, as shown above.

(3) Case conversions on a string *can* cause its length to change -- in
particular, the character "ß" (U+00DF) becomes "SS" when converted to
uppercase. (I'm not sure that we have any tests for this at present,
and it probably doesn't work when ICU isn't present.)

Thanks for the patch!

Pm

1 similar comment
@p6rt
Copy link
Author

p6rt commented Mar 25, 2009

From @pmichaud

I'm declining this patch in favor of a slightly shorter version
taken almost directly from the synopsis​:

  our Str multi method capitalize() is export {
  self.lc.subst(/\w+/, { .ucfirst }, :global)
  }

Just for reference (and for consideration in other upcoming patches) --
the patch and the original code it was based on had a few issues​:

(1) The synopsis says that the conversion takes place on word
characters; the original code was considering only alphabetics.

(2) Methods that don't accept additional positional parameters really
should have an explicit empty parameter list, as shown above.

(3) Case conversions on a string *can* cause its length to change -- in
particular, the character "ß" (U+00DF) becomes "SS" when converted to
uppercase. (I'm not sure that we have any tests for this at present,
and it probably doesn't work when ICU isn't present.)

Thanks for the patch!

Pm

@p6rt
Copy link
Author

p6rt commented Mar 25, 2009

@pmichaud - Status changed from 'new' to 'resolved'

@p6rt p6rt closed this as completed Mar 25, 2009
@p6rt
Copy link
Author

p6rt commented Mar 25, 2009

From @moritz

Patrick R. Michaud via RT wrote​:

(3) Case conversions on a string *can* cause its length to change -- in
particular, the character "ß" (U+00DF) becomes "SS" when converted to
uppercase. (I'm not sure that we have any tests for this at present,
and it probably doesn't work when ICU isn't present.)

How can you doubt, when a German hacker takes care of the test suite?
after all we're about the only ones who use that weird character ;-)

It's in t/spec/S32-str/uc.t, line 45, for regexes this is tested in
S05-modifier/ignorecase.t.

Actually there might be codepoints that turn into multiple codepoints on
conversion to upper case; in particular if there's a precomposed
character of a lower case letter and some diacritics, but no upper case
equivalent precomposed character exists in the Unicode repertoire.
(I don't know if such a thing actually exists, but it's entirely possible).

Cheers,
Moritz

1 similar comment
@p6rt
Copy link
Author

p6rt commented Mar 25, 2009

From @moritz

Patrick R. Michaud via RT wrote​:

(3) Case conversions on a string *can* cause its length to change -- in
particular, the character "ß" (U+00DF) becomes "SS" when converted to
uppercase. (I'm not sure that we have any tests for this at present,
and it probably doesn't work when ICU isn't present.)

How can you doubt, when a German hacker takes care of the test suite?
after all we're about the only ones who use that weird character ;-)

It's in t/spec/S32-str/uc.t, line 45, for regexes this is tested in
S05-modifier/ignorecase.t.

Actually there might be codepoints that turn into multiple codepoints on
conversion to upper case; in particular if there's a precomposed
character of a lower case letter and some diacritics, but no upper case
equivalent precomposed character exists in the Unicode repertoire.
(I don't know if such a thing actually exists, but it's entirely possible).

Cheers,
Moritz

@p6rt p6rt added the patch label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant