Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment to List Breaks \G. #903

Closed
p5pRT opened this issue Nov 30, 1999 · 7 comments
Closed

Assignment to List Breaks \G. #903

p5pRT opened this issue Nov 30, 1999 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 30, 1999

Migrated from rt.perl.org#1838 (status was 'resolved')

Searchable as RT1838$

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 1999

From ralph@inputplus.demon.co.uk

Assigning a /gx regexp to a list breaks \G in the following regexp.

  #! /usr/local/bin/perl -w

  $_ = 'a 1 b 2 c 3';

  print "bug\n";
  ($a, $b) = /^(\w)\s(\d)\s/gx;
  print "a=$a b=$b\n";
  ($a, $b) = /\G(\w)\s(\d)/gx;
  print "a=$a b=$b\n";

  print "workaround\n";
  /^(\w)\s(\d)\s/gx;
  ($a, $b) = ($1, $2);
  print "a=$a b=$b\n";
  ($a, $b) = /\G(\w)\s(\d)/gx;
  print "a=$a b=$b\n";

Output is

  bug
  a=a b=1
  a=a b=1
  workaround
  a=a b=1
  a=b b=2

Here's the configuration of my perl at home, but the one at work,
5.00502, also exihibits this.

Ralph.

Site configuration information for perl 5.00404​:

Configured by root at Thu Sep 10 02​:15​:30 EDT 1998.

Summary of my perl5 (5.0 patchlevel 4 subversion 4) configuration​:
  Platform​:
  osname=linux, osvers=2.0.34, archname=i386-linux
  uname='linux porky.redhat.com 2.0.34 #1 thu may 7 10​:17​:44 edt 1998 i686 unknown '
  hint=recommended, useposix=true, d_sigaction=define
  bincompat3=y useperlio=undef d_sfio=undef
  Compiler​:
  cc='cc', optimize='-O2', gccversion=2.7.2.3
  cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
  ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
  stdchar='char', d_stdstdio=define, usevfork=false
  intsize=4, longsize=4, ptrsize=undef, doublesize=undef
  alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
  libc=, so=so
  useshrplib=false, libperl=libperl.a
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
  cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 1999

From @tamias

On Tue, Nov 30, 1999 at 04​:10​:20PM +0000, Ralph Corderoy wrote​:

Hi,

Assigning a /gx regexp to a list breaks \G in the following regexp.

\#\! /usr/local/bin/perl \-w

$\_ = 'a 1 b 2 c 3';

print "bug\\n";
\($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx;
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

print "workaround\\n";
/^\(\\w\)\\s\(\\d\)\\s/gx;
\($a\, $b\) = \($1\, $2\);
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

Output is

bug
a=a b=1
a=a b=1
workaround
a=a b=1
a=b b=2

This is the expected behavior for m//g in a list context. The regular
expression is applied repeatedly until it no longer matches, and the return
value is a list of all the substrings matched. After the final application
of the regex, which fails to match, pos() is reset to the beginning of the
string.

Try this instead​:

  #! /usr/local/bin/perl -w

  $_ = 'a 1 b 2 c 3';

  @​a = /\G(\w)\s(\d)\s?/gx;
  $" = ',';
  print "@​a\n";

Output​:

  a,1,b,2,c,3

Ronald

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 1999

From [Unknown Contact. See original ticket]

Assigning a /gx regexp to a list breaks \G in the following regexp.

\#\! /usr/local/bin/perl \-w

$\_ = 'a 1 b 2 c 3';

print "bug\\n";
\($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx;
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

print "workaround\\n";
/^\(\\w\)\\s\(\\d\)\\s/gx;
\($a\, $b\) = \($1\, $2\);
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

Output is

bug
a=a b=1
a=a b=1
workaround
a=a b=1
a=b b=2

Sound correct behaviour to me. From perlop​:

  /PATTERN/cgimosx
  [...]
  Options are​:

  c Do not reset search position on a failed match when /g is in effect.
----> g Match globally, i.e., find all occurrences.
  i Do case-insensitive pattern matching.
  m Treat string as multiple lines.
  o Compile pattern only once.
  s Treat string as single line.
  x Use extended regular expressions.

It just appears that your list is too small to hold the rest of the matches. And as
the match is finished, \G is resetted.
In a scalar context, the 'g' modifier work differently, as it doesn't eat up all matches.
Your "workaroud" is the way to do what you want to.

Hope it helps,

François Désarménien

@p5pRT
Copy link
Author

p5pRT commented Nov 30, 1999

From [Unknown Contact. See original ticket]

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular
expression is applied repeatedly until it no longer matches, and the return
value is a list of all the substrings matched. After the final application
of the regex, which fails to match, pos() is reset to the beginning of the
string.

I remember some discussion for making list context m//gc behave differently.

What was the result?

Ilya

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 1999

From @RandalSchwartz

"Ilya" == Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:

Ilya> I remember some discussion for making list context m//gc behave differently.

Ilya> What was the result?

If I recall (since it was me that brought it up), nothing. :(

Here's the thread, where I use the term "thread" loosely
(since it is a single message :)...

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

I know, *patches welcome*. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@​stonehenge.com> <URL​:http​://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 1999

From [Unknown Contact. See original ticket]

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular
expression is applied repeatedly until it no longer matches, and the
return value is a list of all the substrings matched. After the final
application of the regex, which fails to match, pos() is reset to the
beginning of the string.

Hi, thanks for all the replies. Having put -Dr to good use I fully
understand what is happening with my example script. However, I'd like
to put a case forward that this behaviour is broken and judging from
Randal's question in

  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

there might be others that agree.

Normally, the use of context in Perl to `do the right thing' works well.
I'd suggest that the overloading of /g based on context allows only two
alternatives (/g and no /g) to be selected from a larger desired set of
operations.

If I want to match many times, possibly leaving pos alone on failure,
then the existing /g and /c work well.

  @​w = /(\w+)/g;
  @​w = /(\w+)/gc;

But if I want to do some lexing then I need /g to enable \G. As long as
I use scalar context that's fine.

  if (/(\w)+/gc) {
  ($foo) = ($1);

but if I want to assign to a list I'm stuck; that would create a list
context indicating I want multiple matches.

  if (($foo) = /(\w)+/gc) { # bad

I want to give a list context for assignment to variables and I need \G
to work. With the context overloading of /g that's impossible.

If there was a /l for lexing that enabled \G without adding the global
matching of /g then I'd be happy and I suspect a lot of other people
would stop making the same mistake and bothering you about it.

  if (($foo) = /(\w)+/lc) { # bad

Making /gc work so pos wasn't reset on the last, guaranteed to fail,
iteration wouldn't cut it AFAICS. It would still give me multiple
matches which aren't desired.

  ($foo) = /(\w+)\s*/lc; # takes one word.
  ($foo) = /(\w+)\s*/gc; # takes many words returning one.

To sum up, /g means two things (many matches and enable \G), it means
you can't just enable \G.

Thanks for your time.

Ralph.

@p5pRT
Copy link
Author

p5pRT commented Dec 2, 1999

From [Unknown Contact. See original ticket]

Hi, I'm sending this again as I suspect that sending it from a
different to normal mail address yesterday might have made the list
drop me on the floor. Sorry for the noise if this isn't the case.

------- Forwarded Message

Date​: Wed, 01 Dec 1999 17​:04​:01 +0000
Subject​: Re​: [ID 19991130.003] Assignment to List Breaks \G.
In-reply-to​: Message from Ilya Zakharevich <ilya@​math.ohio-state.edu> "of Tue, 30 Nov 1999 11​:43​:00 EST."
<199911301643.LAA12574@​monk.mps.ohio-state.edu>
To​: Ilya Zakharevich <ilya@​math.ohio-state.edu>
Cc​: rjk@​linguist.dartmouth.edu (Ronald J Kimball), merlyn@​stonehenge.com,
  desar@​front1m.grolier.fr, ralph@​inputplus.demon.co.uk (Ralph Corderoy),
  perl5-porters@​perl.org
Message-id​: <199912011704.RAA105502@​cm01.ess>
Content-transfer-encoding​: 7BIT

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular
expression is applied repeatedly until it no longer matches, and the
return value is a list of all the substrings matched. After the final
application of the regex, which fails to match, pos() is reset to the
beginning of the string.

Hi, thanks for all the replies. Having put -Dr to good use I fully
understand what is happening with my example script. However, I'd like
to put a case forward that this behaviour is broken and judging from
Randal's question in

  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

there might be others that agree.

Normally, the use of context in Perl to `do the right thing' works well.
I'd suggest that the overloading of /g based on context allows only two
alternatives (/g and no /g) to be selected from a larger desired set of
operations.

If I want to match many times, possibly leaving pos alone on failure,
then the existing /g and /c work well.

  @​w = /(\w+)/g;
  @​w = /(\w+)/gc;

But if I want to do some lexing then I need /g to enable \G. As long as
I use scalar context that's fine.

  if (/(\w)+/gc) {
  ($foo) = ($1);

but if I want to assign to a list I'm stuck; that would create a list
context indicating I want multiple matches.

  if (($foo) = /(\w)+/gc) { # bad

I want to give a list context for assignment to variables and I need \G
to work. With the context overloading of /g that's impossible.

If there was a /l for lexing that enabled \G without adding the global
matching of /g then I'd be happy and I suspect a lot of other people
would stop making the same mistake and bothering you about it.

  if (($foo) = /(\w)+/lc) { # bad

Making /gc work so pos wasn't reset on the last, guaranteed to fail,
iteration wouldn't cut it AFAICS. It would still give me multiple
matches which aren't desired.

  ($foo) = /(\w+)\s*/lc; # takes one word.
  ($foo) = /(\w+)\s*/gc; # takes many words returning one.

To sum up, /g means two things (many matches and enable \G), it means
you can't just enable \G.

Thanks for your time.

Ralph.

------- End of Forwarded Message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant