Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split should return captured delimiters #817

Closed
p6rt opened this issue Mar 21, 2009 · 17 comments
Closed

split should return captured delimiters #817

p6rt opened this issue Mar 21, 2009 · 17 comments

Comments

@p6rt
Copy link

p6rt commented Mar 21, 2009

Migrated from rt.perl.org#64062 (status was 'resolved')

Searchable as RT64062$

@p6rt
Copy link
Author

p6rt commented Mar 21, 2009

From @clintongormley

'abc-def--ghi'.split(/(\-+)/)
  should return ['abc','-','def,'--','ghi']
  but it returns ['abc','def,'ghi']

added tests to t/spec/S32-str/split-simple.t

@p6rt
Copy link
Author

p6rt commented Mar 21, 2009

From @clintongormley

added tests to t/spec/S32-str/split-simple.t

or i would have done if i could remember my pugs password!

Index​: split-simple.t

--- split-simple.t (revision 25961)
+++ split-simple.t (working copy)
@​@​ -2,7 +2,7 @​@​
use Test;

# L<S29/Str/"=item split">
-plan 32;
+plan 35;

=begin description

@​@​ -65,4 +65,14 @​@​
ok (''.split('')).elems == 0, q{''.split('') returns empty list};
ok (split('', '')).elems == 0, q{''.split('') returns empty list};

+# split should return capture
+my @​split = 'abc def ghi'.split(/(\s+)/);
+#?rakudo skip "split_capture"
+#?DOES 3
+{
+ ok @​split.elems == 5, q{split returns captured delimiter} ;
+ ok @​split[1] eq ' ', q{split captured single space};
+ ok @​split[3] eq ' ', q{split captured multiple spaces};
+}
+
# vim​: ft=perl6

@p6rt
Copy link
Author

p6rt commented Mar 21, 2009

@clintongormley - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Mar 22, 2009

From @moritz

On Sat Mar 21 13​:01​:25 2009, DrTech wrote​:

added tests to t/spec/S32-str/split-simple.t

or i would have done if i could remember my pugs password!

I've added the tests, and re-sent your pugs password.

Thanks for the report and the tests,
Moritz

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

Please find attached a patch to implement requested functionality.
Passes split-simple.t tests and includes a patch that updates those
tests and adds one more that verifies correct performance with a return
limit count.

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

split-simple.patch
Index: t/spec/S32-str/split-simple.t
===================================================================
--- t/spec/S32-str/split-simple.t	(revision 26002)
+++ t/spec/S32-str/split-simple.t	(working copy)
@@ -2,7 +2,7 @@
 use Test;
 
 # L<S29/Str/"=item split">
-plan 45;
+plan 46;
 
 =begin description
 
@@ -83,12 +83,13 @@
 
 # split should return capture
 my @split = 'abc def ghi'.split(/(\s+)/);
-#?rakudo todo "split should return captures"
-#?DOES 3
-{
-    ok @split.elems == 5, q{split returns captured delimiter} ;
-    ok @split[1] eq ' ', q{split captured single space};
-    ok @split[3] eq ' ', q{split captured multiple spaces};
-}
+ok @split.elems == 5, q{split returns captured delimiter} ;
+ok @split[1] eq ' ', q{split captured single space};
+ok @split[3] eq ' ', q{split captured multiple spaces};
+@split = 'abc::def::ghi'.split(/(\:)/, 5);
+ok  @split.elems == 5   and
+    @split[3] eq 'def'  and
+    @split[4] eq ':',
+    q{split with capture obeyed limit};
 
 # vim: ft=perl6

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

Any-str.patch
diff --git a/src/setting/Any-str.pm b/src/setting/Any-str.pm
index 27c2080..85e970a 100644
--- a/src/setting/Any-str.pm
+++ b/src/setting/Any-str.pm
@@ -54,9 +54,25 @@ class Any is also {
         my $s = ~self;
         my $l = $limit ~~ Whatever ?? Inf !! $limit;
         my $keep = '';
+
         return gather {
             while $l > 1 && $s ~~ $delimiter {
                 take $keep ~ $s.substr(0, $/.from);
+                $l--;
+
+                # match objects too tied to underlying strings so copy ...
+                my @mat_cap = @().map: { substr($_, 0) };
+                my $mat_cap_n = $l min @mat_cap.elems;
+                if ($mat_cap_n) {
+                    if $mat_cap_n == @mat_cap {
+                        take @mat_cap
+                    }
+                    else {
+                        take @mat_cap[ 0 .. ($mat_cap_n -1) ]
+                    }
+                    $l -= $mat_cap_n
+                }
+
                 if $/.from == $/.to {
                     $keep = $s.substr($/.to, 1);
                     $s.=substr($/.to + 1);
@@ -64,7 +80,6 @@ class Any is also {
                     $keep = '';
                     $s.=substr($/.to)
                 }
-                $l--;
             }
             take $keep ~ $s if $l > 0;
         }

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From duff@lighthouse.tamucc.edu

On Thu, Mar 26, 2009 at 06​:00​:29PM -0700, Ron Schmidt via RT wrote​:

Please find attached a patch to implement requested functionality.
Passes split-simple.t tests and includes a patch that updates those
tests and adds one more that verifies correct performance with a return
limit count.

Index​: t/spec/S32-str/split-simple.t

--- t/spec/S32-str/split-simple.t (revision 26002)
+++ t/spec/S32-str/split-simple.t (working copy)
@​@​ -2,7 +2,7 @​@​
use Test;

# L<S29/Str/"=item split">
-plan 45;
+plan 46;

=begin description

@​@​ -83,12 +83,13 @​@​

# split should return capture
my @​split = 'abc def ghi'.split(/(\s+)/);
-#?rakudo todo "split should return captures"
-#?DOES 3
-{
- ok @​split.elems == 5, q{split returns captured delimiter} ;
- ok @​split[1] eq ' ', q{split captured single space};
- ok @​split[3] eq ' ', q{split captured multiple spaces};
-}
+ok @​split.elems == 5, q{split returns captured delimiter} ;
+ok @​split[1] eq ' ', q{split captured single space};
+ok @​split[3] eq ' ', q{split captured multiple spaces};
+@​split = 'abc​::def​::ghi'.split(/(\​:)/, 5);
+ok @​split.elems == 5 and
+ @​split[3] eq 'def' and
+ @​split[4] eq '​:',
+ q{split with capture obeyed limit};

That's not right, is it? Or I don't understand what perl 6 does with
the capturing parens that's different from perl 5. In perl 5 (and what
I would expect from perl 6), @​split would contain the following strings
at the following array indices​:

0 "abc"
1 "​:"
2 ""
3 "​:"
4 "def"
5 "​:"
6 ""
7 "​:"
8 "ghi"

I assume the error is because of the switch in the string from using a
single delimiter to using two delimiters (or from not using /'​::'/ as
the regex depending on how you look at things).

Also, setting the limit parameter to 5 seems fairly useless in the
example because there's only 5 things to split to begin with. Usually
the limit comes in to play when your string contains more pieces that
are potentially splittable on your delimiter than what you want.

For instance, if the string were "abc​::def​::ghi​::jkl", @​split would
contain ("abc","​:","","​:","def","​:","","​:","ghi​::jkl")

-Scott
--
Jonathan Scott Duff
duff@​lighthouse.tamucc.edu

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

That's not right, is it? Or I don't understand what perl 6 does with
the capturing parens that's different from perl 5. In perl 5 (and what
I would expect from perl 6), @​split would contain the following strings
at the following array indices​:

...

Your concern seems to be with the added test and your issues with
respect to that test are valid. A small change to the Any-str.pm/parrot
patch and a revised limit test are included with the updated attachments
that, I believe, address the noted problem(s).

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

split-simple.patch
Index: t/spec/S32-str/split-simple.t
===================================================================
--- t/spec/S32-str/split-simple.t	(revision 26002)
+++ t/spec/S32-str/split-simple.t	(working copy)
@@ -2,7 +2,7 @@
 use Test;
 
 # L<S29/Str/"=item split">
-plan 45;
+plan 46;
 
 =begin description
 
@@ -83,12 +83,13 @@
 
 # split should return capture
 my @split = 'abc def ghi'.split(/(\s+)/);
-#?rakudo todo "split should return captures"
-#?DOES 3
-{
-    ok @split.elems == 5, q{split returns captured delimiter} ;
-    ok @split[1] eq ' ', q{split captured single space};
-    ok @split[3] eq ' ', q{split captured multiple spaces};
-}
+ok @split.elems == 5, q{split returns captured delimiter} ;
+ok @split[1] eq ' ', q{split captured single space};
+ok @split[3] eq ' ', q{split captured multiple spaces};
+@split = 'abc::def::ghi'.split(/(\:)/, 3);
+ok  @split.elems == 5       and
+    @split[3] eq ':'        and
+    @split[4] eq 'def::ghi',
+    q{split with capture obeyed limit};
 
 # vim: ft=perl6

@p6rt
Copy link
Author

p6rt commented Mar 27, 2009

From @ronaldxs

Any-str.patch
diff --git a/src/setting/Any-str.pm b/src/setting/Any-str.pm
index 27c2080..0346d4e 100644
--- a/src/setting/Any-str.pm
+++ b/src/setting/Any-str.pm
@@ -54,9 +54,23 @@ class Any is also {
         my $s = ~self;
         my $l = $limit ~~ Whatever ?? Inf !! $limit;
         my $keep = '';
+
         return gather {
             while $l > 1 && $s ~~ $delimiter {
                 take $keep ~ $s.substr(0, $/.from);
+
+                # match objects too tied to underlying strings so copy ...
+                my @mat_cap = @().map: { substr($_, 0) };
+                my $mat_cap_n = $l min @mat_cap.elems;
+                if ($mat_cap_n) {
+                    if $mat_cap_n == @mat_cap {
+                        take @mat_cap
+                    }
+                    else {
+                        take @mat_cap[ 0 .. ($mat_cap_n -1) ]
+                    }
+                }
+
                 if $/.from == $/.to {
                     $keep = $s.substr($/.to, 1);
                     $s.=substr($/.to + 1);

@p6rt
Copy link
Author

p6rt commented Apr 6, 2009

From @ronaldxs

An irc discussion (http://irclog.perlgeek.de/perl6/2009-04-
06#i_1042505) related to this ticket yielded some new insights. The
premise of the ticket differs some from the p6 synopsis/spec
(http://svn.pugscode.org/pugs/docs/Perl6/Spec/S32-setting-
library/Str.pod), seeming closer to the p5 behavior. It was also
decided that the return of capture results from a regex split pattern
should be explicitly requested with a ‘​:all’ flag. Some other flags
were proposed.

Please ignore any of my earlier proposed patches for now…

@p6rt
Copy link
Author

p6rt commented Apr 8, 2009

From @ronaldxs

Please find attached new, revised, hopefully more spec conformant,
patches.

One of the two patches includes new tests with some appropriate
revisions demonstrating the ability to address Mr. Duff’s concerns.
Specifically​:

split_test(
  'abc​::def​::ghi​::'.split(/(\​:)/, all => True, 3),
  ['abc', '​:', '', '​:', 'def​::ghi​::'],
  'split with capture obeyed limit'
);

@p6rt
Copy link
Author

p6rt commented Apr 8, 2009

From @ronaldxs

split-simple.patch
Index: t/spec/S32-str/split-simple.t
===================================================================
--- t/spec/S32-str/split-simple.t	(revision 26126)
+++ t/spec/S32-str/split-simple.t	(working copy)
@@ -2,7 +2,7 @@
 use Test;
 
 # L<S29/Str/"=item split">
-plan 45;
+plan 52;
 
 =begin description
 
@@ -81,14 +81,37 @@
 ok (''.split('')).elems == 0, q{''.split('') returns empty list};
 ok (split('', '')).elems == 0, q{''.split('') returns empty list};
 
-# split should return capture
-my @split = 'abc def ghi'.split(/(\s+)/);
-#?rakudo todo "split should return captures"
-#?DOES 3
-{
-    ok @split.elems == 5, q{split returns captured delimiter} ;
-    ok @split[1] eq ' ', q{split captured single space};
-    ok @split[3] eq ' ', q{split captured multiple spaces};
-}
+# split should return capture with all flag set
+split_test(
+    'abc def  ghi'.split(/(\s+)/, all => True),
+    ['abc', ' ', 'def', '  ', 'ghi'],
+    'split returns captured spaces'
+);
 
+# split should NOT return capture without all flag
+split_test(
+    'abc def  ghi'.split(/(\s+)/),
+    ['abc', 'def', 'ghi'],
+    'split ignores captures without all flag'
+);
+
+ok  'abc def  ghi'.split(/(\s+)/, :all(True)).[1] ~~ Match,
+    'capture returns match object not string';
+
+split_test(
+    'abc::def::ghi::'.split(/(\:)/, all => True, 3),
+    ['abc', ':', '', ':', 'def::ghi::'],
+    'split with capture obeyed limit'
+);
+
+split_test(
+    'AZZAZ'.split(/(Z)/, :all(True), 9),
+    ['A', 'Z', '', 'Z', 'A', 'Z', ''],
+    'end cases with trailing capture and too big limit'
+);
+
+my Match $mat = 'AZYAZ'.split(/(Z)(Y)/, :all(True))[1];
+ok  $mat eq 'ZY' && $mat[0] eq 'Z' && $mat[1] eq 'Y',
+    'basic test of match with multiple captures';
+
 # vim: ft=perl6

@p6rt
Copy link
Author

p6rt commented Apr 8, 2009

From @ronaldxs

Any-str.patch
diff --git a/src/setting/Any-str.pm b/src/setting/Any-str.pm
index 27c2080..a898064 100644
--- a/src/setting/Any-str.pm
+++ b/src/setting/Any-str.pm
@@ -50,13 +50,23 @@ class Any is also {
         $char
     }
 
-    our List multi method split(Code $delimiter, $limit = *) {
+    our List multi method split(Code $delimiter, $limit = *, :$all = False) {
         my $s = ~self;
         my $l = $limit ~~ Whatever ?? Inf !! $limit;
         my $keep = '';
+        
         return gather {
-            while $l > 1 && $s ~~ $delimiter {
+            # <ws> still off (rt 60366) and maybe not easy fix just now 3/31/09
+            # second test is hack to prevent <?wb> or <ws> from hanging 
+            while $l > 1 and $s ne '' and $s ~~ $delimiter {
                 take $keep ~ $s.substr(0, $/.from);
+                if $all and (@($/) or %($/)) {
+                    my Match $mat = $/.clone;
+                    # work around too close binding of match to underly string
+                    $mat[ $_ ] = $mat[ $_ ].clone for 0 .. (@().elems -1);
+                    $mat{ $_ } = $mat{ $_ }.clone for %().keys;
+                    take $mat;   
+                }
                 if $/.from == $/.to {
                     $keep = $s.substr($/.to, 1);
                     $s.=substr($/.to + 1);
@@ -66,12 +76,15 @@ class Any is also {
                 }
                 $l--;
             }
-            take $keep ~ $s if $l > 0;
+            unless $l <= 0 or ($keep eq '' and $s eq '' and $l == 0) {
+                take $keep ~ $s    
+            }
         }
     }
 
     # TODO: substitute with '$delimiter as Str' once coercion is implemented
-    our List multi method split($delimiter, $limit = *) {
+    # :$all = False just keeps MMD happy for now ...
+    our List multi method split($delimiter, $limit = *, :$all = False) {
         my Int $prev = 0;
         my $l = $limit ~~ Whatever ?? Inf !! $limit;
         my $s = ~self;

@p6rt
Copy link
Author

p6rt commented Mar 9, 2010

From @moritz

As was already mentioned, match objects are returned by split when the
:all adverb is present.

This is implemented and tested in Rakudo.

@p6rt
Copy link
Author

p6rt commented Mar 9, 2010

@moritz - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Mar 9, 2010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant