Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes to split methods #336

Closed
p6rt opened this issue Sep 22, 2008 · 18 comments
Closed

Some fixes to split methods #336

p6rt opened this issue Sep 22, 2008 · 18 comments

Comments

@p6rt
Copy link

p6rt commented Sep 22, 2008

Migrated from rt.perl.org#59184 (status was 'resolved')

Searchable as RT59184$

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

From cdavaz@gmail.com

I have implemented the limit parameter on both Str.split(String,
Integer) and Str.split(Regex, Integer). In doing so I had to change
the method signature of Str.split(String) to ".sub 'split' :method
:multi(_, _)" from ".sub 'split' :method :multi('String')". The former
method signature is the correct one anyway as the latter just
restricts the invoker to being a 'String' and doesn't say anything
about the argument type.

All the tests except for one in split.p6 "pass" (verify by just
looking at the output). The one that doesn't pass is​:

say 102030405.split(0).perl;

which produces​:

["1.", "2", "3e+", "8"]

Note that this bug is not introduced by the patch, it was only
discovered because now we can do Int.split(Int).

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

From cdavaz@gmail.com

split.diff
Index: src/builtins/any-str.pir
===================================================================
--- src/builtins/any-str.pir	(revision 31332)
+++ src/builtins/any-str.pir	(working copy)
@@ -197,8 +197,10 @@
 .end
 
 .namespace['Any']
-.sub 'split' :method :multi('String')
+.sub 'split' :method :multi(_, _)
     .param string delim
+    .param int count        :optional
+    .param int has_count    :opt_flag
     .local string objst
     .local pmc pieces
     .local pmc tmps
@@ -214,6 +216,10 @@
     len = pieces
     i = 0
   loop:
+    unless has_count goto skip_count
+    dec count
+    if count < 0 goto done
+  skip_count:
     if i == len goto done
 
     tmps = new 'Perl6Str'
@@ -229,6 +235,8 @@
 
 .sub 'split' :method :multi(_, 'Sub')
     .param pmc regex
+    .param int count        :optional
+    .param int has_count    :opt_flag
     .local pmc match
     .local pmc retv
     .local int start_pos
@@ -244,6 +252,10 @@
     goto done
 
   loop:
+    unless has_count goto skip_count
+    dec count
+    if count < 0 goto done
+  skip_count:
     match = regex($S0, 'continue' => start_pos)
     end_pos = match.'from'()
     end_pos -= start_pos

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

From cdavaz@gmail.com

split.p6

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

From @moritz

On Sun Sep 21 23​:47​:04 2008, cdavaz@​gmail.com wrote​:

I have implemented the limit parameter on both Str.split(String,
Integer) and Str.split(Regex, Integer). In doing so I had to change
the method signature of Str.split(String) to ".sub 'split' :method
:multi(_, _)" from ".sub 'split' :method :multi('String')". The former
method signature is the correct one anyway as the latter just
restricts the invoker to being a 'String' and doesn't say anything
about the argument type.

Applied as r31333, thank you very much.

All the tests except for one in split.p6 "pass" (verify by just
looking at the output).

For the mid-term and long term goals it would be much better if you
could supply the test cases as patches to t/spec/S29-str/split-simple.t,
and use the testing tools that are come with Test.pm. Manually comparing
output to expected output doesn't scale.

The one that doesn't pass is​:

say 102030405.split(0).perl;

which produces​:

["1.", "2", "3e+", "8"]

Note that this bug is not introduced by the patch, it was only
discovered because now we can do Int.split(Int).

Aye.

I think we have no spec (yet?) on how to stringify integers, and I don't
think we'll have one in the synopsis, since Larry has other things to do ;-)

It's likely that we either won't have exact standards (and therefore
we'll use only small numbers for testing), or that the test suite will
define stringification rules.

But that's not the topic of this ticket, so closing it now.

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

@moritz - Status changed from 'open' to 'resolved'

@p6rt
Copy link
Author

p6rt commented Sep 22, 2008

From @pmichaud

The one that doesn't pass is​:

say 102030405.split(0).perl;

which produces​:

["1.", "2", "3e+", "8"]

Note that this bug is not introduced by the patch, it was only
discovered because now we can do Int.split(Int).

Aye.

I think we have no spec (yet?) on how to stringify integers, and I don't
think we'll have one in the synopsis, since Larry has other things to
do ;-)

I see this as being a Rakudo/Parrot bug, not a specification bug.
Clearly 102030405 should not lose precision the way it is here. The
problem appears to be that Parrot doesn't give us a good mechanism to
stringify (floating point) values in a way that doesn't lose precision
-- see RT #​59006.

As to _why_ the above involves floating points at all -- currently the
Rakudo handles numeric constants from the source is to place the text
into a Perl 6 Str object, and then numify that object. This allows us
to easily handle underscores "102_030_405" as well as various radix
markers like "0x" and "0b". Unfortunately, the numification operator
(prefix​:<+>) currently corresponds to Parrot's "get_number" vtable
method, which returns a num (float), and when that value gets
stringified we lose precision --- thus RT #​59006.

When RT #​59006 is fixed, this issue will also likely be fixed. However,
I agree that it's also not part of this ticket's topic, and so we should
leave this closed.

Thanks!

Pm

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

cdavaz@gmail.com - Status changed from 'resolved' to 'open'

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

I have made the following fixes to split​:

A negative or zero limit returns all split elements (before returned the
whole string)

If limit is less than the total number of possible elements that the
string could be split into, the last element contains the rest of the
unsplit string.

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

split.diff
Index: src/builtins/any-str.pir
===================================================================
--- src/builtins/any-str.pir	(revision 31355)
+++ src/builtins/any-str.pir	(working copy)
@@ -203,32 +203,44 @@
     .param int has_count    :opt_flag
     .local string objst
     .local pmc pieces
-    .local pmc tmps
     .local pmc retv
     .local int len
+    .local int pos
     .local int i
 
     retv = new 'List'
 
+    # per Perl 5's negative LIMIT behavior
+    if has_count goto handle_count
+  handle_count:
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     objst = self
+    length $I0, delim
     split pieces, delim, objst
-
     len = pieces
+    pos = 0
     i = 0
+
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S0 = substr objst, pos
+    retv.'push'($S0)
+    goto done
   skip_count:
     if i == len goto done
-
-    tmps = new 'Perl6Str'
-    tmps = pieces[i]
-
-    retv.'push'(tmps)
-
+    $S0 = pieces[i]
+    retv.'push'($S0)
+    length $I1, $S0
+    pos += $I0
+    pos += $I1
     inc i
     goto loop
+
   done:
     .return(retv)
 .end
@@ -246,6 +258,13 @@
     retv = new 'List'
     start_pos = 0
 
+    # per Perl 5's negative LIMIT behavior
+    if has_count goto handle_count
+  handle_count:
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     match = regex($S0)
     if match goto loop
     retv.'push'($S0)
@@ -254,7 +273,10 @@
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S1 = substr $S0, start_pos
+    retv.'push'($S1)
+    goto done
   skip_count:
     match = regex($S0, 'continue' => start_pos)
     end_pos = match.'from'()

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

Sorry I made a small mistake in the last patch, this one gives it​:

Changed

if has_count goto handle_count

To

unless has_count goto positive_count

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

split.diff
Index: src/builtins/any-str.pir
===================================================================
--- src/builtins/any-str.pir	(revision 31355)
+++ src/builtins/any-str.pir	(working copy)
@@ -203,32 +203,44 @@
     .param int has_count    :opt_flag
     .local string objst
     .local pmc pieces
-    .local pmc tmps
     .local pmc retv
     .local int len
+    .local int pos
     .local int i
 
     retv = new 'List'
 
+    # per Perl 5's negative LIMIT behavior
+    unless has_count goto positive_count
+  handle_count:
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     objst = self
+    length $I0, delim
     split pieces, delim, objst
-
     len = pieces
+    pos = 0
     i = 0
+
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S0 = substr objst, pos
+    retv.'push'($S0)
+    goto done
   skip_count:
     if i == len goto done
-
-    tmps = new 'Perl6Str'
-    tmps = pieces[i]
-
-    retv.'push'(tmps)
-
+    $S0 = pieces[i]
+    retv.'push'($S0)
+    length $I1, $S0
+    pos += $I0
+    pos += $I1
     inc i
     goto loop
+
   done:
     .return(retv)
 .end
@@ -246,6 +258,13 @@
     retv = new 'List'
     start_pos = 0
 
+    # per Perl 5's negative LIMIT behavior
+    unless has_count goto positive_count
+  handle_count:
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     match = regex($S0)
     if match goto loop
     retv.'push'($S0)
@@ -254,7 +273,10 @@
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S1 = substr $S0, start_pos
+    retv.'push'($S1)
+    goto done
   skip_count:
     match = regex($S0, 'continue' => start_pos)
     end_pos = match.'from'()

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

Grr.. wrong again sorry!! Forgot to remove the handle_count label.
Please let me know if you see any problems.

@p6rt
Copy link
Author

p6rt commented Sep 23, 2008

From cdavaz@gmail.com

split.diff
Index: src/builtins/any-str.pir
===================================================================
--- src/builtins/any-str.pir	(revision 31355)
+++ src/builtins/any-str.pir	(working copy)
@@ -203,32 +203,43 @@
     .param int has_count    :opt_flag
     .local string objst
     .local pmc pieces
-    .local pmc tmps
     .local pmc retv
     .local int len
+    .local int pos
     .local int i
 
     retv = new 'List'
 
+    # per Perl 5's negative LIMIT behavior
+    unless has_count goto positive_count
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     objst = self
+    length $I0, delim
     split pieces, delim, objst
-
     len = pieces
+    pos = 0
     i = 0
+
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S0 = substr objst, pos
+    retv.'push'($S0)
+    goto done
   skip_count:
     if i == len goto done
-
-    tmps = new 'Perl6Str'
-    tmps = pieces[i]
-
-    retv.'push'(tmps)
-
+    $S0 = pieces[i]
+    retv.'push'($S0)
+    length $I1, $S0
+    pos += $I0
+    pos += $I1
     inc i
     goto loop
+
   done:
     .return(retv)
 .end
@@ -246,6 +257,12 @@
     retv = new 'List'
     start_pos = 0
 
+    # per Perl 5's negative LIMIT behavior
+    unless has_count goto positive_count
+    unless count < 1 goto positive_count
+    has_count = 0
+
+  positive_count:
     match = regex($S0)
     if match goto loop
     retv.'push'($S0)
@@ -254,7 +271,10 @@
   loop:
     unless has_count goto skip_count
     dec count
-    if count < 0 goto done
+    unless count < 1 goto skip_count
+    $S1 = substr $S0, start_pos
+    retv.'push'($S1)
+    goto done
   skip_count:
     match = regex($S0, 'continue' => start_pos)
     end_pos = match.'from'()

@p6rt
Copy link
Author

p6rt commented Sep 25, 2008

From @moritz

On Mon Sep 22 22​:55​:29 2008, cdavaz wrote​:

Grr.. wrong again sorry!! Forgot to remove the handle_count label.
Please let me know if you see any problems.

Sorry, I've lost track of your patches (probably due to our parallel
mailing list/RT/IRC conversations); are there still split() patches open
which should be applied?

@p6rt
Copy link
Author

p6rt commented Sep 26, 2008

From cdavaz@gmail.com

Nope, that last one was it. Still waiting on a decision for how edge
cases on limit are to be handled.

On Thu, Sep 25, 2008 at 10​:49 PM, Moritz Lenz via RT
<perl6-bugs-followup@​perl.org> wrote​:

On Mon Sep 22 22​:55​:29 2008, cdavaz wrote​:

Grr.. wrong again sorry!! Forgot to remove the handle_count label.
Please let me know if you see any problems.

Sorry, I've lost track of your patches (probably due to our parallel
mailing list/RT/IRC conversations); are there still split() patches open
which should be applied?

@p6rt
Copy link
Author

p6rt commented Mar 31, 2009

From @masak

<ron_> rakudo​: say 102030405.split(0).perl;
<p6eval> rakudo c01555​: OUTPUT«["1", "2", "3", "4", "5"]␤»
<ron_> Looking at rt (59184
http://rt.perl.org/rt3/Public/Bug/Display.html?id=59184) and the current
tests in t/spec/S32-str/split-simple.t and wondering why the ticket is
still open.
<masak> ron_​: I'll close it. thanks.

@p6rt
Copy link
Author

p6rt commented Mar 31, 2009

@masak - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Mar 31, 2009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant