Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re modifier "h" - return named captures as hash expression #12893

Open
p5pRT opened this issue Apr 2, 2013 · 17 comments
Open

re modifier "h" - return named captures as hash expression #12893

p5pRT opened this issue Apr 2, 2013 · 17 comments
Labels

Comments

@p5pRT
Copy link

p5pRT commented Apr 2, 2013

Migrated from rt.perl.org#117447 (status was 'open')

Searchable as RT117447$

@p5pRT
Copy link
Author

p5pRT commented Apr 2, 2013

From @daxim

Created by @daxim

Example code adapted from perlretut.

Make the C<h> (mnemonic I<hash>) flag work (that line with the C<m>
operator)​:

  my $fmt1 = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)';
  my $fmt2 = '(?<m>\d\d)/(?<d>\d\d)/(?<y>\d\d\d\d)';
  my $fmt3 = '(?<d>\d\d)\.(?<m>\d\d)\.(?<y>\d\d\d\d)';

  for my $d (qw(2006-10-21 15.01.2007 10/31/2005)) {
  if (my (%date) = $d =~ m{$fmt1|$fmt2|$fmt3}h) {
  while (my ($k,$v) = each %date) {
  print "$k = $v\n";
  }
  }
  }

Works the same as​:

  if ($d =~ m{$fmt1|$fmt2|$fmt3}) {
  my %date = %+;

Rationale​: side effects are a weird-ass way to program in a language
that actually has operators/expressions/functions which are able to
return values. I'd like eventually to get rid of side effects, but
first there actually must be a way to do something without involving
action at a distance. If you can't see what's wrong with the code just
above, imagine you had do this to get the length of something​:

  length($something);
  # according to perlvar, $Ë� is set to the last successful length
  # measuring
  print "something is $Ë� long";
  # take care not to use an outdated $Ë� or accidently overwrite
  # it! :-o

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.16.3:

Configured by daxim at Fri Mar 29 15:53:58 CET 2013.

Summary of my perl5 (revision 5 version 16 subversion 3) configuration:
   
  Platform:
    osname=linux, osvers=3.4.28-2.20-desktop,
archname=x86_64-linux-thread-multi-ld uname='linux champion
3.4.28-2.20-desktop #1 smp preempt tue jan 29 16:51:37 utc 2013
(143156b) x86_64 x86_64 x86_64 gnulinux ' config_args='-de
-Dprefix=/home/daxim/local/share/perlbrew/perls/perl-5.16.3 -DDEBUGGING
-Dusemorebits -Dusethreads -Dcf_email=daxim@cpan.org
-Dperladmin=daxim@cpan.org -Accflags=-fPIC
-Aeval:scriptdir=/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin'
hint=recommended, useposix=true, d_sigaction=define useithreads=define,
usemultiplicity=define useperlio=define, d_sfio=undef,
uselargefiles=define, usesocks=undef use64bitint=define,
use64bitall=define, uselongdouble=define usemymalloc=n,
bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT
-D_GNU_SOURCE -fPIC -DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT
-D_GNU_SOURCE -fPIC -DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include' ccversion='', gccversion='4.7.2
20130108 [gcc-4_7-branch revision 195012]', gccosandvers='' intsize=4,
longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='long double', nvsize=16,
Off_t='off_t', lseeksize=8 alignbytes=16, prototype=define Linker and
Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/../lib64 /usr/lib/../lib64 /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64
libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
-lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so,
d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC',
lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.16.3:
    /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/x86_64-linux-thread-multi-ld
    /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3
    /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/5.16.3/x86_64-linux-thread-multi-ld
    /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/5.16.3
    .


Environment for perl 5.16.3:
    HOME=/home/daxim
    LANG=de_DE.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib64/mpi/gcc/openmpi/lib64
    LOGDIR (unset)
    PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin:/home/daxim/local/bin:/usr/local/cuda/bin:/opt/kde3/sbin:/sbin:/usr/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin
    PERLBREW_BASHRC_VERSION=0.61
    PERLBREW_HOME=/home/daxim/.perlbrew
    PERLBREW_MANPATH=/home/daxim/local/share/perlbrew/perls/perl-5.16.3/man
    PERLBREW_PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin
    PERLBREW_PERL=perl-5.16.3
    PERLBREW_ROOT=/home/daxim/local/share/perlbrew
    PERLBREW_VERSION=0.61
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2013

From @jkeenan

On Tue Apr 02 07​:36​:23 2013, daxim wrote​:

This is a bug report for perl from daxim@​cpan.org,
generated with the help of perlbug 1.39 running under perl 5.16.3.

-----------------------------------------------------------------
[Please describe your issue here]

Example code adapted from perlretut.

Make the C<h> (mnemonic I<hash>) flag work (that line with the C<m>
operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

Rationale​: side effects are a weird-ass way to program in a language
that actually has operators/expressions/functions which are able to
return values. I'd like eventually to get rid of side effects, but
first there actually must be a way to do something without involving
action at a distance. If you can't see what's wrong with the code just
above, imagine you had do this to get the length of something​:

length\($something\);
\# according to perlvar\, $� is set to the last successful length
\# measuring
print "something is $� long";
\# take care not to use an outdated $� or accidently overwrite
\# it\!   :\-o

This RT is a request for a new feature​: a new regex modifier '/h'.

Is there any support for development of this new feature? (I ask, in
part, because it hasn't received a "second the motion" in the three
months since the request was originally filed.)

Is there anyone who wants to try to write an implementation for this new
feature?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2013

From tchrist@perl.com

"James E Keenan via RT" <perlbug-followup@​perl.org> wrote
  on Sat, 29 Jun 2013 07​:12​:06 PDT​:

Make the C<h> (mnemonic I<hash>) flag work (that line with the C<m>
operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as", we don't need another way.

It increases the cognitive load unnecessarily for no real gain.

And I don�t want us to keep adding /mods. We have to think of
another way, something that embeds them and isn't stuck at mysterioius
one-letter identifiers.

--tom

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2013

From gottreu@gmail.com

 my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
 my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
 my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

 for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
     if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
         while \(my \($k\,$v\) = each %date\) \{
             print "$k = $v\\n";
         \}
     \}
 \}

I kinda like the idea.

On 06/29/2013 02​:36 PM, Tom Christiansen wrote​:

And I don�t want us to keep adding /mods. We have to think of
another way, something that embeds them and isn't stuck at mysterioius
one-letter identifiers.

Could you clarify what you mean by embed and what the antecedent of
'them' is?

Since =~ binds a scalar expression to a pattern match, essentially
replacing $_ with the expression, we could extend what =~ accepts on the
left side.

($d, %date) =~ m{...} # %date = %+

and it would still return the same value (depending on context of course).

Following down that path, one could imagine

($d, $lvalue) =~ s{...}{...} # set /r implicitly

($d, @​matches) =~ m{...} # @​matches = ($1,$2,$3,...)

Or a functional interface could be used​:

match(qr/.../, $d, named_captures => \%date)

I'm not necessarily advocating for any of these, it's just what I
thought of.

Brian Gottreu

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2013

From tchrist@perl.com

Brian Gottreu <gottreu@​gmail.com> wrote on Sat, 29 Jun 2013 18​:03​:43 CDT​:

On 06/29/2013 02​:36 PM, Tom Christiansen wrote​:

And I don�t want us to keep adding /mods. We have to think of
another way, something that embeds them and isn't stuck at mysterioius
one-letter identifiers.

Could you clarify what you mean by embed and what the antecedent of
'them' is?

The antecedent of �them� is /mods, like /acdgilmopsux, stressed on the
last syllable. Embedding them is necessary for pattern flags albeit
not for match flags. You know, the (?six-m​:...) thing.

I don�t like the idea of single-character signifiers carrying so much
meaning with no more readable way of expressing them. And I certainly
don�t think we should go adding more of those without coming up with
a way to somehow write something more meaningful, like a real word
for each of them.

--tom

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2013

From @iabyn

On Sat, Jun 29, 2013 at 01​:36​:52PM -0600, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as", we don't need another way.

If there were such a modifier (and I agree that it should be "better" than
another 1-letter flag), I would prefer to see some new semantics, such
as returning a structured (i.e. nested) match object from a nested search
pattern (although I have no idea how the details would pan out).

--
Spock (or Data) is fired from his high-ranking position for not being able
to understand the most basic nuances of about one in three sentences that
anyone says to him.
  -- Things That Never Happen in "Star Trek" #19

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2013

From @daxim

tchrist​:

If it "works the same as", we don't need another way.

Then please go ahead and remove the code responsible for return value
of the match operator, as in C<@​captures = $val =~ /â�¦(â�¦)â�¦(â�¦)â�¦(â�¦)â�¦/>.

L<Perl 1 capture variables|http​://perldoc.perl.org/
perlvar.html#Variables-related-to-regular-expressions> are good enough
for everyone! After all, Perl's motto is "there must be only one way to
do it".

It increases the cognitive load unnecessarily for no real gain.

The real gains are​:

* The match operator restores feature parity. When named captures were
added, proper return values for them were left out inadvertently, I
believe.

  match op/capturing|unnamed|named
  ----------------------------------------
  with-side effects |$1 etc.|%+
  return values |yes |unnamed only!

* In the future, C<no re 'side-effects'> becomes possible which
eliminates a source of bugs. (It used to be mentioned in perltrap that
$1 etc. are not reset after a match fails, but in 5.18 it's gone for
some weird reason.) Implementing that pragma unimport blocks on having
a side-effect free way to return named captured values in the first
place.

On the other side of the scale is​:

* One more flag where there are already twelve.

But this is hardly a straw that will break the camel's back. It's up to
you to back up your claim that there is a downside with concrete
evidence/examples, not just vague allusions.

Brian​:

($d, %date) =~ m{...} # %date = %+

I don't like that because it requires a variable. This is hardly an
advantage over %+.

A simple list of return values can flow freely between chained/nested
functions, which is very perlish.

davem​:

I would prefer to see some new semantics �
no idea how the details would pan out

Don't let that get in the way of the feature request under discussion.
Worse is better, and the like. I imagine this topic's feature request
is very easy to implement because the pieces are already there and
needs no further specification, whereas a different return value, with
nested structures as you said, or perhaps an L<object a la Perl6|http​://
doc.perl6.org/type/Match>, would be the topic of another bug.

PS​: I'm answering via RT web interface, no idea where in the p5p thread
this message ends up.

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2013

From @Hugmeir

On Tue, Apr 2, 2013 at 11​:36 AM, Lars Dɪá´�á´�á´�á´�á´¡ 迪æ��æ�¯ <perlbug-followup@​perl.org

wrote​:

# New Ticket Created by Lars Dɪ����ᴡ 迪��
# Please include the string​: [perl #117447]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117447 >

This is a bug report for perl from daxim@​cpan.org,
generated with the help of perlbug 1.39 running under perl 5.16.3.

-----------------------------------------------------------------
[Please describe your issue here]

Example code adapted from perlretut.

Make the C<h> (mnemonic I<hash>) flag work (that line with the C<m>
operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

How would it work with code that mixes traditional & named captures? For
example, what would this do?

my %matches = "ab" =~ /(.)(?<foo>.)/h;

@p5pRT
Copy link
Author

p5pRT commented Jul 1, 2013

From @daxim

How would it work with code that mixes traditional & named captures?

Same as %+.

For example, what would this do?

my %matches = "ab" =~ /(.)(?<foo>.)/h;

Expression returns (foo => 'b')

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @cpansprout

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013, davem wrote​:

On Sat, Jun 29, 2013 at 01​:36​:52PM -0600, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as", we don't need another way.

If there were such a modifier (and I agree that it should be "better" than
another 1-letter flag)

use v5.20;
qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern)
(?+turn these on - turn these off)
(?+turn on-turn off​:pattern)
(?+-turn off)
(?+-turn off​:pat)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @khwilliamson

On 07/01/2013 06​:51 PM, Father Chrysostomos via RT wrote​:

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013, davem wrote​:

On Sat, Jun 29, 2013 at 01​:36​:52PM -0600, Tom Christiansen wrote​:

Works the same as​:

     if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
         my %date = %\+;

I am opposed. If it "works the same as", we don't need another way.

If there were such a modifier (and I agree that it should be "better" than
another 1-letter flag)

use v5.20;
qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern)
(?+turn these on - turn these off)
(?+turn on-turn off​:pattern)
(?+-turn off)
(?+-turn off​:pat)

An idea I had quite some time ago was like this​:

m/(?^u{multiline, -ignorecase, ...}​:foo)/

in which long modifier names would come enclosed in braces anywhere
between the (? and the colon. Modifiers outside the braces would be
single character ones. pluses and minuses could be used. Any number of
braced sets would be acceptable.

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @demerphq

On 2 July 2013 02​:51, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013, davem wrote​:

On Sat, Jun 29, 2013 at 01​:36​:52PM -0600, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as", we don't need another way.

If there were such a modifier (and I agree that it should be "better" than
another 1-letter flag)

use v5.20;
qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern)
(?+turn these on - turn these off)
(?+turn on-turn off​:pattern)
(?+-turn off)
(?+-turn off​:pat)

FWIW, I hate it, and I would be against new modifiers you can only put
inside of a (?+ ... ).

My view is Perl has regex modifiers and its too late to argue about it anymore.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @davidnicol

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

the AAAD can be minimized with

  my %date = do { $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : () };

can it not?

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From gottreu@gmail.com

On 07/02/2013 01​:00 AM, David Nicol wrote​:

     if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
         my %date = %\+;

the AAAD can be minimized with

 my %date = do \{ $d =~ m\{$fmt1|$fmt2|$fmt3\} ? %\+ : \(\) \};

can it not?

Actually no it seems.

$ perl -MData​::Dumper -we \
'$_="a";print Dumper do{/(?<x>.)/;%+},do{/(?<x>.)/;my%h=%+};'
$VAR1 = 'x';
$VAR2 = undef;
$VAR3 = 'x';
$VAR4 = 'a';

It looks like it's been that way since at least 5.10.1.

Is this a bug?

Brian Gottreu

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @ikegami

On Tue, Jul 2, 2013 at 2​:00 AM, David Nicol <davidnicol@​gmail.com> wrote​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

the AAAD can be minimized with

my %date = do \{ $d =~ m\{$fmt1|$fmt2|$fmt3\} ? %\+ : \(\) \};

can it not?

or

my %date = ( $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : () );

or

my %date = $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : ();

@p5pRT
Copy link
Author

p5pRT commented Jul 2, 2013

From @cpansprout

On Mon Jul 01 22​:49​:20 2013, demerphq wrote​:

My view is Perl has regex modifiers and its too late to argue about it
anymore.

I actually agree with you on that last point. I had just resigned
myself to the fact that �everybody� wants longer flag names.

--

Father Chrysostomos

@p5pRT p5pRT added the Wishlist label Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant