Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: builtin array holding ($1,$2,$3,...) #13412

Closed
p5pRT opened this issue Nov 12, 2013 · 45 comments
Closed

Feature request: builtin array holding ($1,$2,$3,...) #13412

p5pRT opened this issue Nov 12, 2013 · 45 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 12, 2013

Migrated from rt.perl.org#120521 (status was 'resolved')

Searchable as RT120521$

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2013

From @epa

Created by @epa

It would be useful to have a builtin array variable which holds the
values captured by the last successful regexp match. While you can
use m// in list context to return a list of matches, this doesn't help
so much when you want to get an array of all matches on the RHS of a
s///.

Suppose the builtin array is called @​CAPTURE. Then

  $_ = 'abc';
  /(.)(.)(.)/ or die;
  say scalar @​CAPTURE; # prints 3
  say $CAPTURE[0]; # prints a

There is a fair bit of boilerplate code with ($1,$2,$3,$4,$5) up to
some arbitrary limit, which would be made simpler and correct if there
were a way to get all captures as an array.

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.16.3:

Configured by Red Hat, Inc. at Fri May  3 12:10:03 UTC 2013.

Summary of my perl5 (revision 5 version 16 subversion 3) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-358.2.1.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-24.phx2.fedoraproject.org 2.6.32-358.2.1.el6.x86_64 #1 smp wed feb 20 12:17:37 est 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4  -m64 -mtune=generic -Wl,-z,relro  -DDEBUGGING=-g -Dversion=5.16.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.7.2 20121109 (Red Hat 4.7.2-8)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.16'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    


@INC for perl 5.16.3:
    /home/eda/lib/perl5/
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.16.3:
    HOME=/home/eda
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERL5LIB=/home/eda/lib/perl5/
    PERL_BADLANG (unset)
    SHELL=/bin/bash

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2013

From @Hugmeir

On Tue, Nov 12, 2013 at 2​:26 PM, Ed Avis <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #120521]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=120521 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.39 running under perl 5.16.3.

-----------------------------------------------------------------
[Please describe your issue here]

It would be useful to have a builtin array variable which holds the
values captured by the last successful regexp match. While you can
use m// in list context to return a list of matches, this doesn't help
so much when you want to get an array of all matches on the RHS of a
s///.

Suppose the builtin array is called @​CAPTURE. Then

$\_ = 'abc';
/\(\.\)\(\.\)\(\.\)/ or die;
say scalar @&#8203;CAPTURE; \# prints 3
say $CAPTURE\[0\];     \# prints a

There is a fair bit of boilerplate code with ($1,$2,$3,$4,$5) up to
some arbitrary limit, which would be made simpler and correct if there
were a way to get all captures as an array.

Eh, this looks like a feature better suited for CPAN. You can already
implement this in just a few lines​:

package Capture {
  use Tie​::Array;
  our @​ISA = qw(Tie​::StdArray);
  sub FETCH {
  ${*{$_[1]+1}{SCALAR}}
  }
  sub FETCHSIZE {
  scalar(@​-) - 1
  }
}

tie @​CAPTURE, "Capture";

Although a proper implementation could avoid ties, I imagine. Also, rather
than @​CAPTURE, you could take @​{^CAPTURE}, and make it available everywhere.

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

On 12 November 2013 20​:11, Brian Fraser <fraserbn@​gmail.com> wrote​:

On Tue, Nov 12, 2013 at 2​:26 PM, Ed Avis <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #120521]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=120521 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.39 running under perl 5.16.3.

-----------------------------------------------------------------
[Please describe your issue here]

It would be useful to have a builtin array variable which holds the
values captured by the last successful regexp match. While you can
use m// in list context to return a list of matches, this doesn't help
so much when you want to get an array of all matches on the RHS of a
s///.

Suppose the builtin array is called @​CAPTURE. Then

$\_ = 'abc';
/\(\.\)\(\.\)\(\.\)/ or die;
say scalar @&#8203;CAPTURE; \# prints 3
say $CAPTURE\[0\];     \# prints a

There is a fair bit of boilerplate code with ($1,$2,$3,$4,$5) up to
some arbitrary limit, which would be made simpler and correct if there
were a way to get all captures as an array.

Eh, this looks like a feature better suited for CPAN. You can already
implement this in just a few lines​:

package Capture {
use Tie​::Array;
our @​ISA = qw(Tie​::StdArray);
sub FETCH {
${*{$_[1]+1}{SCALAR}}

Ugh!

\}
sub FETCHSIZE \{
    scalar\(@&#8203;\-\) \- 1
\}

}

tie @​CAPTURE, "Capture";

Although a proper implementation could avoid ties, I imagine. Also, rather
than @​CAPTURE, you could take @​{^CAPTURE}, and make it available everywhere.

$1 is a tie itself. So @​{^CAPTURE} would be a tie as well. The
difference between a core @​CAPTURE and a non-core one is that the
non-core one would be a tie which accesses a second tie.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

On 12 November 2013 18​:26, Ed Avis <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by "Ed Avis"
# Please include the string​: [perl #120521]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=120521 >

This is a bug report for perl from eda@​waniasset.com,
generated with the help of perlbug 1.39 running under perl 5.16.3.

-----------------------------------------------------------------
[Please describe your issue here]

It would be useful to have a builtin array variable which holds the
values captured by the last successful regexp match. While you can
use m// in list context to return a list of matches, this doesn't help
so much when you want to get an array of all matches on the RHS of a
s///.

Suppose the builtin array is called @​CAPTURE. Then

$\_ = 'abc';
/\(\.\)\(\.\)\(\.\)/ or die;
say scalar @&#8203;CAPTURE; \# prints 3
say $CAPTURE\[0\];     \# prints a

There is a fair bit of boilerplate code with ($1,$2,$3,$4,$5) up to
some arbitrary limit, which would be made simpler and correct if there
were a way to get all captures as an array.

But there is​:

  @​CAPTURE= /(.)(.)(.)/ or die;

IOW, in scalar context a regex match returns a list of items captured,
which can be fed into an array.

Why would having a magic var for this be better?

cheers,
Yves

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @epa

Yves Orton wrote​:

@​CAPTURE= /(.)(.)(.)/ or die;

Why would having a magic var for this be better?

I mentioned one reason in the original report​: in s/// you don't have access to this.
You have to do s/$re/foo($1,$2,$3,$4,$5...)/

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

  $_ = 'abc';
  my @​CAPTURE = /abc/;
  say scalar @​CAPTURE;
  @​CAPTURE = /(abc)/;
  say scalar @​CAPTURE;

Both print '1'. So, if you have some code like

  if (/$re/) { do_something($1, $2, $3, $4, $5, ...) }

there is not an obvious way to turn it into the form you suggest​:

  @​CAPTURE = /$re/;
  # um, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

  if (/$re/) { do_something(@​CAPTURE) }

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @epa

I would add that in general the reason to prefer an array @​CAPTURE over individual $1, $2, $3
is the same reason why we usually work with arrays rather than a long list of variables.
Beginners to programming will sometimes write code with $one, $two, $three, $four and so on.

Similarly, think of @​ARGV. We could do without that and instead have $ARG1, $ARG2, $ARG3 etc.
But it is neater and cleaner to provide an array interface.

In the case of @​CAPTURE, the array would have to be read-only (so you can't shift it, etc) but
still you can work with it using the normal idioms for manipulating arrays and lists in Perl.

Has nobody else here ever written code like foo($1, $2, $3, $4, $5)? And never wished there were
a more simple and obvious way to express it, that won't have an arbitrary limit on the number
of captures?

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

On 2013-11-15 11​:55, Ed Avis wrote​:

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

$\_ = 'abc';
my @&#8203;CAPTURE = /abc/;
say scalar @&#8203;CAPTURE;
@&#8203;CAPTURE = /\(abc\)/;
say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

@&#8203;CAPTURE = /$re/;
\# um\, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

You can currently do​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

P.S.

Shouldn't be @​CAPTURE spelled @​CAPTURES?

-Emanuele

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

On 15 November 2013 13​:02, Emanuele Zeppieri <emazep@​gmail.com> wrote​:

On 2013-11-15 11​:55, Ed Avis wrote​:

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

$\_ = 'abc';
my @&#8203;CAPTURE = /abc/;
say scalar @&#8203;CAPTURE;
@&#8203;CAPTURE = /\(abc\)/;
say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

@&#8203;CAPTURE = /$re/;
\# um\, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

You can currently do​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

P.S.

Shouldn't be @​CAPTURE spelled @​CAPTURES?

Style preference. I tend to prefer to use singular for arrays where I
tend to access sub elements more than the whole list​:

if ($CAPTURE[1]) { }

reads better to me than

if ($CAPTURES[1]) { }

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

On 15 November 2013 11​:55, Ed Avis <eda@​waniasset.com> wrote​:

Yves Orton wrote​:

@​CAPTURE= /(.)(.)(.)/ or die;

Why would having a magic var for this be better?

I mentioned one reason in the original report​: in s/// you don't have access to this.
You have to do s/$re/foo($1,$2,$3,$4,$5...)/

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

$\_ = 'abc';
my @&#8203;CAPTURE = /abc/;
say scalar @&#8203;CAPTURE;
@&#8203;CAPTURE = /\(abc\)/;
say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

@&#8203;CAPTURE = /$re/;
\# um\, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

Color me convinced. :-)

If FC doesnt beat me to it (as he so often does) I will add support for this.

Let the debate begin about the name. I wont participate except to note
that there is a bit of a lack of consistency if we have

@​+
@​-
%+
%-

and then @​CAPTURE(S) as well. I leave that for other to agonize over.
Ever since mauve I refuse to play the bike-shed game.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @epa

Emanuele Zeppieri wrote​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

I don't think that works​:

  $_ = 'abc';
  my $re = qr/abc/;
  if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE } else { say 'no' }
  if (/$re/) { say 'simple match' }

For me this prints

no
simple match

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @epa

Thank you Yves for offering to implement this. You can call it @​BIKESHED if you want, and I'll still be delighted...

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @Leont

On Fri, Nov 15, 2013 at 11​:19 AM, demerphq <demerphq@​gmail.com> wrote​:

Eh, this looks like a feature better suited for CPAN. You can already
implement this in just a few lines​:

package Capture {
use Tie​::Array;
our @​ISA = qw(Tie​::StdArray);
sub FETCH {
${*{$_[1]+1}{SCALAR}}

Ugh!

\}
sub FETCHSIZE \{
    scalar\(@&#8203;\-\) \- 1
\}

}

tie @​CAPTURE, "Capture";

Although a proper implementation could avoid ties, I imagine. Also,
rather
than @​CAPTURE, you could take @​{^CAPTURE}, and make it available
everywhere.

$1 is a tie itself. So @​{^CAPTURE} would be a tie as well. The
difference between a core @​CAPTURE and a non-core one is that the
non-core one would be a tie which accesses a second tie.

Now if only we had array vtables, we could have implemented this in C
without having to use ties :-/.

Leon

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

The = () = is superfluous.

Yves

On 15 November 2013 13​:08, Ed Avis <eda@​waniasset.com> wrote​:

Emanuele Zeppieri wrote​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

I don't think that works​:

$\_ = 'abc';
my $re = qr/abc/;
if \( my @&#8203;CAPTURE =\(\)= /$re/ \) \{ say @&#8203;CAPTURE \} else \{ say 'no' \}
if \(/$re/\) \{ say 'simple match' \}

For me this prints

no
simple match

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @hvds

Ed Avis <eda@​waniasset.com> wrote​:
:I would add that in general the reason to prefer an array @​CAPTURE over individual $1, $2, $3
:is the same reason why we usually work with arrays rather than a long list of variables.
:Beginners to programming will sometimes write code with $one, $two, $three, $four and so on.
:
:Similarly, think of @​ARGV. We could do without that and instead have $ARG1, $ARG2, $ARG3 etc.
:But it is neater and cleaner to provide an array interface.
:
:In the case of @​CAPTURE, the array would have to be read-only (so you can't shift it, etc) but
:still you can work with it using the normal idioms for manipulating arrays and lists in Perl.
:
:Has nobody else here ever written code like foo($1, $2, $3, $4, $5)? And never wished there were
:a more simple and obvious way to express it, that won't have an arbitrary limit on the number
:of captures?

I haven't, but I have frequently written code that makes the relevant array
using @​+ and @​-; I've always found that perfectly adequate so haven't felt
a need for yet another magic regexp variable.

I'd rather see this prototyped on CPAN first, if only because IMO we have
a definite habit of failing to get the semantics tied down in ideal form
first time round whenever we add something new.

But I also feel that while the magic variable approach maybe made sense
when there were hardly any, we've moved incrementally far beyond sanity​:
I would rather see us moving in the direction of a capture object that
you can ask all this sort of thing of via method calls, like​:

  # assuming a new //mauve option that means "give me an object"
  if ($match = /(r)(e)/mauve) {
  print "Saw <$_>" for $match->captures;
  }

(Sorry I haven't really been following this discussion, I'm in the middle
of moving house, and will shortly be losing internet access for a few days.)

Hugo

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @epa

I agree that a 'match object' might be a cleaner way to access properties of a regexp match.
But in this case, I don't think there is any cloudiness or room for debate about the semantics​:
$CAPTURE[0] is exactly equivalent to $1, $CAPTURE[1] to $2, and so on. The semantics of
$1, $2, etc are already well defined (even if less than ideal in my view), it's just a case of
providing a more convenient syntax to access them.

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @demerphq

On 15 November 2013 15​:19, Ed Avis <eda@​waniasset.com> wrote​:

I agree that a 'match object' might be a cleaner way to access properties of a regexp match.
But in this case, I don't think there is any cloudiness or room for debate about the semantics​:
$CAPTURE[0] is exactly equivalent to $1, $CAPTURE[1] to $2, and so on. The semantics of
$1, $2, etc are already well defined (even if less than ideal in my view), it's just a case of
providing a more convenient syntax to access them.

I agree. We introduced %- and %+ without difficulty. I agree with hv
in general with regard to the regex engine. However I think this case
is different.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

[Sorry for the private mail Ed, it was intended for the list]

On 2013-11-15 13​:08, Ed Avis wrote​:

Emanuele Zeppieri wrote​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

I don't think that works​:

Indeed you are right, sorry again :-)

However, give me a second attempt​:

$_ = 'abc';

sub test {
  if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }
  else { say 'NO' }
}

sub f { say join '-', @​_ }

test qr/abc/; # match w/o captures​: f is called with ()
test qr/(a)(b)(c)/; # match w/ captures : f is called with qw(a b c)
test qr/abcz/; # it does not match : f is not called

What's the missing case?

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From ema.zep@libero.it

On 2013-11-15 13​:02, Emanuele Zeppieri wrote​:

On 2013-11-15 11​:55, Ed Avis wrote​:

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

$\_ = 'abc';
my @&#8203;CAPTURE = /abc/;
say scalar @&#8203;CAPTURE;
@&#8203;CAPTURE = /\(abc\)/;
say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

@&#8203;CAPTURE = /$re/;
\# um\, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

You can currently do​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

No, it doesn't work, sorry!

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

On 2013-11-15 13​:08, Ed Avis wrote​:

Emanuele Zeppieri wrote​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

I don't think that works​:

Indeed you are right, sorry again :-)

However, give me a second attempt​:

$_ = 'abc';

sub test {
  if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }
  else { say 'NO' }
}

sub f { say join '-', @​_ }

test qr/abc/; # match w/o captures​: f is called with ()
test qr/(a)(b)(c)/; # match w/ captures : f is called with qw(a b c)
test qr/abcz/; # it does not match : f is not called

What's the missing case?

1 similar comment
@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

On 2013-11-15 13​:08, Ed Avis wrote​:

Emanuele Zeppieri wrote​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

I don't think that works​:

Indeed you are right, sorry again :-)

However, give me a second attempt​:

$_ = 'abc';

sub test {
  if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }
  else { say 'NO' }
}

sub f { say join '-', @​_ }

test qr/abc/; # match w/o captures​: f is called with ()
test qr/(a)(b)(c)/; # match w/ captures : f is called with qw(a b c)
test qr/abcz/; # it does not match : f is not called

What's the missing case?

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From cm.perl@abtela.com

Le 15/11/2013 11​:55, Ed Avis a écrit :

Yves Orton wrote​:

@​CAPTURE= /(.)(.)(.)/ or die;

Why would having a magic var for this be better?

I mentioned one reason in the original report​: in s/// you don't have access to this.
You have to do s/$re/foo($1,$2,$3,$4,$5...)/

this is a problem only it you don't know beforehand the number of
capturing groups. Could you give an example ?

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

 $\_ = 'abc';
 my @&#8203;CAPTURE = /abc/;
 say scalar @&#8203;CAPTURE;
 @&#8203;CAPTURE = /\(abc\)/;
 say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

 if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

 @&#8203;CAPTURE = /$re/;
 \# um\, did the regexp succeed or not?

if (my @​CAPTURE = /$re/) {
  ...
}

I am thinking of regular expressions which are specified in a configuration table,
or generated at run time. So the number of capturing groups in the regexp is
not known in advance.

Again, can you provide an example ? I generally use either named capture
groups, as in

my ($header, $param1, $param2);
if (m/...(?<header>...)...(?<param1>)...(?<param2>)/) {
  do_someting_with_header($header);
  process_params($param1, $param2);
}

or named targets, as in

if (my ($header, $param1, $param2) = m/..../) {
  do_someting_with_header($header);
  process_params($param1, $param2);
}

@​CAPTURE as proposed here strikes me as too rigid for me in most cases
but I might be missing something...

But if @​CAPTURE were available as a builtin, you could simply say

 if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

--Christian

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From cm.perl@abtela.com

Le 15/11/2013 11​:55, Ed Avis a écrit :

Another reason is that m// in list context makes it hard to distinguish between a
match which failed, and one which succeeded but did not have capturing groups.

 $\_ = 'abc';
 my @&#8203;CAPTURE = /abc/;
 say scalar @&#8203;CAPTURE;
 @&#8203;CAPTURE = /\(abc\)/;
 say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

You can always add a capturing group, for instance an empty lookahead :

Taisha​:~/tttmp $ perl -E 'for (@​ARGV) { print; if (my @​CAPTURE =
/123(?=)/) { print qq{ match\n} } else { print qq{ fail\n}}}' 123 abc a123
123 match
abc fail
a123 match
Taisha​:~/tttmp $

--Christian

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

On Fri, Nov 15, 2013 at 1​:04 PM, demerphq <demerphq@​gmail.com> wrote​:

On 15 November 2013 13​:02, Emanuele Zeppieri <emazep@​gmail.com> wrote​:

On 2013-11-15 11​:55, Ed Avis wrote​:

Another reason is that m// in list context makes it hard to distinguish
between a
match which failed, and one which succeeded but did not have capturing
groups.

$\_ = 'abc';
my @&#8203;CAPTURE = /abc/;
say scalar @&#8203;CAPTURE;
@&#8203;CAPTURE = /\(abc\)/;
say scalar @&#8203;CAPTURE;

Both print '1'. So, if you have some code like

if \(/$re/\) \{ do\_something\($1\, $2\, $3\, $4\, $5\, \.\.\.\) \}

there is not an obvious way to turn it into the form you suggest​:

@&#8203;CAPTURE = /$re/;
\# um\, did the regexp succeed or not?

I am thinking of regular expressions which are specified in a
configuration table,
or generated at run time. So the number of capturing groups in the
regexp is
not known in advance.

But if @​CAPTURE were available as a builtin, you could simply say

if \(/$re/\) \{ do\_something\(@&#8203;CAPTURE\) \}

You can currently do​:

if ( my @​CAPTURE =()= /$re/ ) { say @​CAPTURE }

P.S.

Shouldn't be @​CAPTURE spelled @​CAPTURES?

Style preference. I tend to prefer to use singular for arrays where I
tend to access sub elements more than the whole list​:

if ($CAPTURE[1]) { }

reads better to me than

if ($CAPTURES[1]) { }

Yves

On a second thought, I see at least three points where you are right​:

1. it's a question of style preference;
2. the plural would be inconsistent with the other arrays special
variables​: @​ARG, @​LAST_MATCH_END, etc. (even if you didn't say it);
3. it's bikeshedding galore :-)

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2013

From @emazep

On 2013-11-15 15​:27, Ed Avis wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

At the moment my code is first doing the match, then performing the
regexp again to get the captures;
your version is certainly an improvement on that!

Well, here is an alternative solution, which permits to save the $1 check
and just test the match in the most natural way.

The trick is to wrap the given regex in extra brackets​:

if ( my @​CAPTURE = /($re)/ ) { do_something(@​CAPTURE) }

and then shift away the artificial capture in the called subroutine​:

sub do_something {
  shift;
  ...
}

(or before the sub call).

Just to realize how much simpler it would be if your proposal were
available, so in the end I cannot agree more with it.

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @emazep

On 2013-11-16 00​:21, Emanuele Zeppieri wrote​:

On 2013-11-15 15​:27, Ed Avis wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

At the moment my code is first doing the match, then performing the
regexp again to get the captures;
your version is certainly an improvement on that!

Well, here is an alternative solution, which permits to save the $1
check and just test the match in the most natural way.

The trick is to wrap the given regex in extra brackets​:

if ( my @​CAPTURE = /($re)/ ) { do_something(@​CAPTURE) }

and then shift away the artificial capture in the called subroutine​:

sub do_something {
shift;
...
}

And this further refinement avoids the shift too​:

if ( (undef, my @​CAPTURE) = /($_[0])/ ) { do_something(@​CAPTURE) }

Just to realize how much simpler it would be if your proposal were available, so in the end I cannot agree more with it.

No less ugly however (and potentially inefficient too), so the above is
fully confirmed anyway.

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @emazep

On Sat, Nov 16, 2013 at 1​:10 AM, Emanuele Zeppieri <emazep@​gmail.com> wrote​:

On 2013-11-16 00​:21, Emanuele Zeppieri wrote​:

On 2013-11-15 15​:27, Ed Avis wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

At the moment my code is first doing the match, then performing the
regexp again to get the captures;
your version is certainly an improvement on that!

Well, here is an alternative solution, which permits to save the $1
check and just test the match in the most natural way.

The trick is to wrap the given regex in extra brackets​:

if ( my @​CAPTURE = /($re)/ ) { do_something(@​CAPTURE) }

and then shift away the artificial capture in the called subroutine​:

sub do_something {
shift;
...
}

And this further refinement avoids the shift too​:

if ( (undef, my @​CAPTURE) = /($_[0])/ ) { do_something(@​CAPTURE) }

Just to realize how much simpler it would be if your proposal were available, so in the end I cannot agree more with it.

No less ugly however (and potentially inefficient too), so the above is
fully confirmed anyway.

Well, this avoids the potential inefficiency at least :-)

if ( (undef, my @​CAPTURE) = /()$_[0]/ ) { do_something(@​CAPTURE) }

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @tamias

On Sat, Nov 16, 2013 at 12​:21​:18AM +0100, Emanuele Zeppieri wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

You should be aware that you cannot depend on any particular capture being
defined, even if the match is successful​:

$_ = "b";
my $re = qr/(a)|(b)/;

if (my @​CAPTURE = /$re/) {
  foreach my $i (0 .. $#CAPTURE) {
  print '$', $i + 1, "​: ",
  defined $CAPTURE[$i] ? "" : "not ", "defined\n";
  }
}

Ronald

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @emazep

On Sat, Nov 16, 2013 at 6​:46 AM, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Sat, Nov 16, 2013 at 12​:21​:18AM +0100, Emanuele Zeppieri wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

You should be aware that you cannot depend on any particular capture being
defined, even if the match is successful​:

$_ = "b";
my $re = qr/(a)|(b)/;

if (my @​CAPTURE = /$re/) {
foreach my $i (0 .. $#CAPTURE) {
print '$', $i + 1, "​: ",
defined $CAPTURE[$i] ? "" : "not ", "defined\n";
}
}

Ronald

The error was mine of course.

However it's just a matter of finding a reliable indicator that a
capture has happened, and the idea still stands​:

if ( my @​CAPTURE = /$re/ ) {
  do_something( $#- ? @​CAPTURE : () )
}

Does $#- > 0 serve the scope or are there exceptions?

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @druud62

On 2013-11-15 11​:55, Ed Avis wrote​:

Yves Orton wrote​:

@​CAPTURE= /(.)(.)(.)/ or die;

Why would having a magic var for this be better?

I mentioned one reason in the original report​: in s/// you don't have access to this.
You have to do s/$re/foo($1,$2,$3,$4,$5...)/

  @​CAPTURE = m/$re/ and s//foo(@​CAPTURE)/e

But what if the g-modifier is used, AoA?

perl -Mstrict -MData​::Dumper -wle '
  $_ = q{a bc def};
  my $re = q{(\w?)(\w?)(\w?)};
  print "[@​-]\t[@​+]\t[$1,$2,$3]" while /$re/g;
'
[0 0 1 1] [1 1 1 1] [a,,]
[1 1 1 1] [1 1 1 1] [,,]
[2 2 3 4] [4 3 4 4] [b,c,]
[4 4 4 4] [4 4 4 4] [,,]
[5 5 6 7] [8 6 7 8] [d,e,f]
[8 8 8 8] [8 8 8 8] [,,]

--
Ruud

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @mauke

On 12.11.2013 18​:26, Ed Avis (via RT) wrote​:

It would be useful to have a builtin array variable which holds the
values captured by the last successful regexp match. While you can
use m// in list context to return a list of matches, this doesn't help
so much when you want to get an array of all matches on the RHS of a
s///.

Suppose the builtin array is called @​CAPTURE. Then

$\_ = 'abc';
/\(\.\)\(\.\)\(\.\)/ or die;
say scalar @&#8203;CAPTURE; \# prints 3
say $CAPTURE\[0\];     \# prints a

There is a fair bit of boilerplate code with ($1,$2,$3,$4,$5) up to
some arbitrary limit, which would be made simpler and correct if there
were a way to get all captures as an array.

Just FYI, that functionality is available as submatches() in Data​::Munge.

--
Lukas Mai <plokinom@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @rjbs

* hv@​crypt.org [2013-11-15T08​:44​:17]

But I also feel that while the magic variable approach maybe made sense
when there were hardly any, we've moved incrementally far beyond sanity​:
I would rather see us moving in the direction of a capture object that
you can ask all this sort of thing of via method calls, like​:

I agree. I threw this lousy code together a few months ago and it made me
think there's hope for such an approach​:

  use strict;
  use warnings;
  package Regexp​::Match;

  sub matches {
  my ($class, undef, $qr, $p) = @​_;
  my $str = "$_[1]"; # stringify only once

  return unless $str =~ $qr;
  my $result = { p => !! $p };
  $result->{str} = $str if $p;
  $result-&gt;{match_range} = [ $-[0], $+[0] ];
  $result-&gt;{g_starts} = [ @​-[ 1 .. $#- ] ];
  $result-&gt;{g_ends} = [ @​+[ 1 .. $#+ ] ];
  for (1 .. $#-) {
  no strict 'refs';
  push @​{ $result-&gt;{captures} }, ${ $_ };
  }

  $result->{capture_hash} = { %- };

  bless $result, $class;
  }

  sub capture {
  my ($self, $n) = @​_;
  return $_[0]{captures}[$n - 1] if $n =~ /\A[0-9]+\z/;
  return( ($_[0]{capture_hash}{$n} // [])->[0] );
  }

  sub capture_list {
  my ($self, $n) = @​_;
  # this is a mess in scalar context
  return $_[0]{captures}[$n - 1] if $n =~ /\A[0-9]+\z/;
  return @​{ $_[0]{capture_hash}{$n} || [] };
  }

  sub captures {
  return @​{ $_[0]{captures} },
  }

  sub pre_match {
  return unless $_[0]{p};
  return substr $_[0]{str}, 0, $_[0]{match_range}[0];
  }

  sub match {
  return unless $_[0]{p};
  return substr $_[0]{str}, $_[0]{match_range}[0],
  $_[0]{match_range}[1] - $_[0]{match_range}[0];
  }

  sub post_match {
  return unless $_[0]{p};
  return substr $_[0]{str}, $_[0]{match_range}[1];
  }

  1;

It would also be less surprising for programmers coming from just about
/anywhere/ else to be able to say qr(...)->match($str)

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From perl5-porters@perl.org

Ricardo Signes wrote​:

It would also be less surprising for programmers coming from just about
/anywhere/ else to be able to say qr(...)->match($str)

Why match? Why not just qr(...)->($str)?

(Then we can forget smart-match altogether, since ->(...) is our match
operator.)

@p5pRT
Copy link
Author

p5pRT commented Nov 16, 2013

From @kentfredric

On 17 November 2013 09​:53, Father Chrysostomos <sprout@​cpan.org> wrote​:

Why match? Why not just qr(...)->($str)?

(Then we can forget smart-match altogether, since ->(...) is our match
operator.)

Maybe because it lends itself to later enhancements, ie​:

qr(...)->replace(sub{

});

May be nicer s{...}{ code }e;

( Especially seeing this syntax is more recognised by code
highlighters, some treat both sides of s{}{} as strings )

--
Kent

@p5pRT
Copy link
Author

p5pRT commented Nov 17, 2013

From @tamias

On Fri, Nov 15, 2013 at 02​:19​:38PM +0000, Ed Avis wrote​:

I agree that a 'match object' might be a cleaner way to access properties of a regexp match.
But in this case, I don't think there is any cloudiness or room for debate about the semantics​:
$CAPTURE[0] is exactly equivalent to $1, $CAPTURE[1] to $2, and so on. The semantics of
$1, $2, etc are already well defined (even if less than ideal in my view), it's just a case of
providing a more convenient syntax to access them.

$-[0] and $+[0] are the offsets into the string of the entire match; $-[1]
and $+[1] are the offsets into the string of $1. One could argue that
$CAPTURE[0] should be the entire string that was matched, and $CAPTURE[1]
should be $1.

Ronald

@p5pRT
Copy link
Author

p5pRT commented Nov 17, 2013

From @epa

One could argue that $CAPTURE[0] should be the entire string that was matched, and $CAPTURE[1] should be $1.

I'd be fine with that. It does remove the off-by-one oddity compared to $1, etc.

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Nov 17, 2013

From @emazep

On Sat, Nov 16, 2013 at 12​:01 PM, Emanuele Zeppieri <emazep@​gmail.com> wrote​:

On Sat, Nov 16, 2013 at 6​:46 AM, Ronald J Kimball <rjk@​tamias.net> wrote​:

On Sat, Nov 16, 2013 at 12​:21​:18AM +0100, Emanuele Zeppieri wrote​:

if ( my @​CAPTURE = /$_[0]/ ) { f( defined $1 ? @​CAPTURE : () ) }

Yes, that works, or perhaps

if (my @​CAPTURE = /$re/) {
@​CAPTURE = () if not defined $1;
do_something(@​CAPTURE);
}

This does seem cumbersome, though, and involves looking at both $1 and
the list of captures returned by the match.

You should be aware that you cannot depend on any particular capture being
defined, even if the match is successful​:

$_ = "b";
my $re = qr/(a)|(b)/;

if (my @​CAPTURE = /$re/) {
foreach my $i (0 .. $#CAPTURE) {
print '$', $i + 1, "​: ",
defined $CAPTURE[$i] ? "" : "not ", "defined\n";
}
}

Ronald

The error was mine of course.

However it's just a matter of finding a reliable indicator that a
capture has happened, and the idea still stands​:

if ( my @​CAPTURE = /$re/ ) {
do_something( $#- ? @​CAPTURE : () )
}

Or rather​:

if ( my @​CAPTURE = /$re/ ) {
  do_something( $#+ ? @​CAPTURE : () )
}

otherwise the case of a successful match but no matching subgroups
(only undef elements in @​CAPTURE) is missing.

Sorry for the noise!

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2013

From @rjbs

* Kent Fredric <kentfredric@​gmail.com> [2013-11-16T16​:16​:46]

On 17 November 2013 09​:53, Father Chrysostomos <sprout@​cpan.org> wrote​:

Why match? Why not just qr(...)->($str)?

(Then we can forget smart-match altogether, since ->(...) is our match
operator.)

Maybe because it lends itself to later enhancements, ie​:

qr(...)->replace(sub{

qr(..)->(...) does not rule out qr(...)->replace(...)

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2013

From @demerphq

On 22 November 2013 04​:21, Ricardo Signes <perl.p5p@​rjbs.manxome.org> wrote​:

* Kent Fredric <kentfredric@​gmail.com> [2013-11-16T16​:16​:46]

On 17 November 2013 09​:53, Father Chrysostomos <sprout@​cpan.org> wrote​:

Why match? Why not just qr(...)->($str)?

(Then we can forget smart-match altogether, since ->(...) is our match
operator.)

Maybe because it lends itself to later enhancements, ie​:

qr(...)->replace(sub{

qr(..)->(...) does not rule out qr(...)->replace(...)

The rules for parsing the right side of a regex are special cased.

I fear adding support for the same special cases in a method call is
not going to fly.

s/(f)o(o)/$1$2/

how would you write that with method call notation without requiring
us to special case parse the method call?

qr/(f)o(o)/->replace("$1$2");

isn't going to work out of the box at all.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2013

From @kentfredric

On 22 November 2013 20​:55, demerphq <demerphq@​gmail.com> wrote​:

qr/(f)o(o)/->replace("$1$2");

isn't going to work out of the box at all.

That would of course not work. It would have to be implemented in terms of

qr/(f)o(o)/->replace(sub {
  "$1$2"
});

Where internally, it did :

  s{(f)o(o)}{ $code->() }e

At least, thats what I'd expect to be required given the current state
of how regexp works.

Now maybe

qr/(f)o(o)/->replace('$1$2');

Could be made work, just the surprisingness of that seems dangerous,
and I'd avoid it just in case, because accidentally using "" instead
of '' and getting radically different behaviours would be bad. (
Especially so if $1 happened to contain the string literal '$2' )

--
Kent

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2013

From @kentfredric

On 22 November 2013 23​:15, Kent Fredric <kentfredric@​gmail.com> wrote​:

qr/(f)o(o)/->replace(sub {
"$1$2"
});

Actually, on second thought, even that would not always be what you
wanted, and you might instead want :

  qr/(f)o(o)/->replace(sub {
  my $match = shift;
  return #somethingelsehere
  });

And that could lend itself to doing something like​:

  $match->iterpolate("%1helloworld%2");

With the communicative property​:

if( my $result = qr/(f)o(o)/->match($string) ) {
  print $result->interpolate("%1helloworld%2");
}

Essentially, making a sane way to pass state around from match results
without worrying about variable clobbering.

I don't really know, I must be tired or something.
--
Kent

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2013

From @demerphq

On 22 November 2013 11​:24, Kent Fredric <kentfredric@​gmail.com> wrote​:

On 22 November 2013 23​:15, Kent Fredric <kentfredric@​gmail.com> wrote​:

qr/(f)o(o)/->replace(sub {
"$1$2"
});

Actually, on second thought, even that would not always be what you
wanted, and you might instead want :

qr/(f)o(o)/->replace(sub {
my $match = shift;
return #somethingelsehere
});

And that could lend itself to doing something like​:

$match->iterpolate("%1helloworld%2");

With the communicative property​:

if( my $result = qr/(f)o(o)/->match($string) ) {
print $result->interpolate("%1helloworld%2");
}

Essentially, making a sane way to pass state around from match results
without worrying about variable clobbering.

Exactly. The delayed interpolation of the rhs of a s/// is hard to fit
into the normal model of perl code and calling conventions.

The only way to do so is require people pass in a closure. IMO hardly
an improvement.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 9, 2017

From @epa

I believe this has now been implemented as @​{^CAPTURE} in perl 5.26 (thanks demerphq!) so the bug can be closed.

@p5pRT
Copy link
Author

p5pRT commented Jun 9, 2017

From @jkeenan

On Fri, 09 Jun 2017 15​:46​:33 GMT, ed wrote​:

I believe this has now been implemented as @​{^CAPTURE} in perl 5.26
(thanks demerphq!) so the bug can be closed.

Confirmed.

#####
$ perl -v | head -2 | tail -1
This is perl 5, version 26, subversion 0 (v5.26.0) built for x86_64-linux

$ cat 120521-capture.pl
# perl
use warnings;
use 5.26.0;

$_ = 'abc';
/(.)(.)(.)/ or die;
say join(' ' => ($1,$2,$3));

$_ = 'def';
/(.)(.)(.)/ or die;
say "@​{^CAPTURE}";

$ perl 120521-capture.pl
a b c
d e f
#####

Marking ticket Resolved.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Jun 9, 2017

@jkeenan - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant