Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty regular expression does not match in some cases #13141

Closed
p5pRT opened this issue Jul 31, 2013 · 21 comments
Closed

Empty regular expression does not match in some cases #13141

p5pRT opened this issue Jul 31, 2013 · 21 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 31, 2013

Migrated from rt.perl.org#119095 (status was 'resolved')

Searchable as RT119095$

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @ppisar

Hello,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) {
  print "TRUE\n";
} else {
  print "FALSE\n";
}

should print TRUE, but it prints FALSE.

In other words, empty regular expression does not match. There is some side
effect because it depends on previous regular match (the first line). If
I change the first line anyhow, like m/"/ changing to m/./, the code starts
working correctly.

I observe this behaviour with somewhat patched 5.16.3, vanilla 5.18.0
and current blead.

You can use this one-liner instead​:

$ perl -e 'q{"} =~ m/"/; if (q{a} =~ m//) { print qq{TRUE\n} }'

-- Petr

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @pjcj

On Wed, Jul 31, 2013 at 07​:34​:07AM -0700, Petr Pisar wrote​:

Hello,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) {
print "TRUE\n";
} else {
print "FALSE\n";
}

should print TRUE, but it prints FALSE.

In other words, empty regular expression does not match. There is some side
effect because it depends on previous regular match (the first line). If
I change the first line anyhow, like m/"/ changing to m/./, the code starts
working correctly.

I think this is one of those "it's a feature, not a bug" moments.
Though I'll admit that in over 20 years of using Perl, it's a feature
I've never made use of.

From perlop​:

  The empty pattern //
  If the PATTERN evaluates to the empty string, the last
  successfully matched regular expression is used instead. In this
  case, only the "g" and "c" flags on the empty pattern are honored;
  the other flags are taken from the original pattern. If no match
  has previously succeeded, this will (silently) act instead as a
  genuine empty pattern (which will always match).

--
Paul Johnson - paul@​pjcj.net
http​://www.pjcj.net

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @Tux

On Wed, 31 Jul 2013 07​:34​:07 -0700, Petr Pisar (via RT)
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Petr Pisar
# Please include the string​: [perl #119095]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119095 >

Hello,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) {
print "TRUE\n";
} else {
print "FALSE\n";
}

should print TRUE, but it prints FALSE.

Mope.

$ perldoc perlreref

  If 'pattern' is an empty string, the last successfully matched regex is
  used. Delimiters other than '/' may be used for both this operator and the
  following ones. The leading "m" can be omitted if the delimiter is '/'.

In other words, empty regular expression does not match. There is some side
effect because it depends on previous regular match (the first line). If
I change the first line anyhow, like m/"/ changing to m/./, the code starts
working correctly.

I observe this behaviour with somewhat patched 5.16.3, vanilla 5.18.0
and current blead.

You can use this one-liner instead​:

$ perl -e 'q{"} =~ m/"/; if (q{a} =~ m//) { print qq{TRUE\n} }'

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

@Tux - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Jul 31, 2013
@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From zefram@fysh.org

Paul Johnson wrote​:

       If the PATTERN evaluates to the empty string\, the last
       successfully matched regular expression is used instead\.

Addendum, which should probably go in the doc​: you can use /(?​:)/ to
get an effective empty pattern that will not invoke this magic.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @ppisar

On 2013-07-31, Paul Johnson <paul@​pjcj.net> wrote​:

I think this is one of those "it's a feature, not a bug" moments.
Though I'll admit that in over 20 years of using Perl, it's a feature
I've never made use of.

From perlop​:

The empty pattern //
        If the PATTERN evaluates to the empty string\, the last
        successfully matched regular expression is used instead\. In this
        case\, only the "g" and "c" flags on the empty pattern are honored;
        the other flags are taken from the original pattern\. If no match
        has previously succeeded\, this will \(silently\) act instead as a
        genuine empty pattern \(which will always match\)\.

I see. Then it's a feature. Never mind.

Just if you want to know my use case, the second match uses a regular
expression specified by an user. And user could assume that an empty
expression matches any string.

-- Petr

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From zefram@fysh.org

Petr Pisar wrote​:

Just if you want to know my use case, the second match uses a regular
expression specified by an user. And user could assume that an empty
expression matches any string.

To provide consistent semantics to the user, you need to process the
user-supplied regexp, by something like

  $perlre = $userre eq "" ? qr/(?​:)/ : qr/$userre/;

or

  $perlre = qr/(?​:$userre)/;

(Compiling the regexp early with qr// is often a good idea.)

-zefram

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @cpansprout

On Wed Jul 31 08​:34​:08 2013, zefram@​fysh.org wrote​:

Petr Pisar wrote​:

Just if you want to know my use case, the second match uses a regular
expression specified by an user. And user could assume that an empty
expression matches any string.

To provide consistent semantics to the user, you need to process the
user-supplied regexp, by something like

$perlre = $userre eq "" ? qr/\(?&#8203;:\)/ : qr/$userre/;

or

$perlre = qr/\(?&#8203;:$userre\)/;

(Compiling the regexp early with qr// is often a good idea.)

Watch out for qr/$userre/. I fixed that in perl 5.18 (commit
6a97c51), but in earlier perls qr// would trigger the same
behaviour. In 5.18+ qr/$userre/ will work as expected (like /(?​:)/)
with empty patterns.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

Watch out for qr/$userre/. I fixed that in perl 5.18 (commit
6a97c51), but in earlier perls qr// would trigger the same
behaviour.

That's what the conditional in my example is avoiding.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @epa

Out of interest is there a performance boost from reapplying the last
successfully matched regexp using // or is it just a golfing shortcut?

--
Ed Avis <eda@​waniasset.com>

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @demerphq

On 1 August 2013 12​:38, Ed Avis <eda@​waniasset.com> wrote​:

Out of interest is there a performance boost from reapplying the last
successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @demerphq

On 1 August 2013 13​:03, demerphq <demerphq@​gmail.com> wrote​:

On 1 August 2013 12​:38, Ed Avis <eda@​waniasset.com> wrote​:

Out of interest is there a performance boost from reapplying the last
successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) {
  s//$something/; # change whatever we matched
}

Or similar constructs. It actually makes no sense that it applies to
m//, to the extent it should exist at all it should apply only to
s///.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @demerphq

On 1 August 2013 13​:08, demerphq <demerphq@​gmail.com> wrote​:

On 1 August 2013 13​:03, demerphq <demerphq@​gmail.com> wrote​:

On 1 August 2013 12​:38, Ed Avis <eda@​waniasset.com> wrote​:

Out of interest is there a performance boost from reapplying the last
successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) {
s//$something/; # change whatever we matched
}

Or similar constructs. It actually makes no sense that it applies to
m//, to the extent it should exist at all it should apply only to
s///.

IMO we should nuke it and replace it with a (*LASTMATCH) metapattern.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 4, 2013

From @ap

* demerphq <demerphq@​gmail.com> [2013-08-01 13​:10]​:

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) {
s//$something/; # change whatever we matched
}

Or similar constructs. It actually makes no sense that it applies to
m//, to the extent it should exist at all it should apply only to
s///.

It makes perfect where this shortcut came from – namely ed, the ancient
Unix text editor. (Think about it​: the only way to interact with the
editor is a command line. You have just performed a search. In the next
command you want to specify the same search again. What is the natural
syntax to say that? Also​: the user will want to search for nothing… how
often?)

From there it was inherited by sed, and that is how it ended up in Perl.

A lot of the syntax and idioms lore that we think of as “regexps”, at
least in a Unix-y tradition, is really the regexp vernacular of ed. The
entire grep utility is an extraction of an ed idiom as a stand-alone
program.

And even when I say all this, I am almost certainly being ahistorical –
I do not know in detail the lineage and history of ed and all its next
of kin (ex/vi, grep, sed, patch etc) and would actually be surprised if
the story weren’t more intertwined and complex than my portrayal, even
WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

* demerphq <demerphq@​gmail.com> [2013-08-01 13​:10]​:

IMO we should nuke it and replace it with a (*LASTMATCH) metapattern.

Yes, probably. It made great sense in a text editor and may still make
sense in high-whipuptitude, low-manipulexity code (Perl as a glorified
sed, basically), but that is very little of the Perl that gets written
nowadays. It is effectively a pure liability in high-manipulexity code
(any code that has the CPAN nature, essentially).

But boy would we need a long deprecation cycle for this one. (It pre-
dates Perl itself!)

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @arc

Aristotle Pagaltzis <pagaltzis@​gmx.de> wrote​:

A lot of the syntax and idioms lore that we think of as “regexps”, at
least in a Unix-y tradition, is really the regexp vernacular of ed. The
entire grep utility is an extraction of an ed idiom as a stand-alone
program.

And even when I say all this, I am almost certainly being ahistorical –
I do not know in detail the lineage and history of ed and all its next
of kin (ex/vi, grep, sed, patch etc) and would actually be surprised if
the story weren’t more intertwined and complex than my portrayal, even
WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

Nope, your summary pretty much covers it. :-)

ed(1) already exists in the First Edition manual (so before November
1971), but neither sed(1) nor grep(1) do​:
http​://cm.bell-labs.com/cm/cs/who/dmr/man12.pdf
http​://cm.bell-labs.com/cm/cs/who/dmr/man13.pdf

grep(1) came next, in Fourth Edition (so between February and November 1973)​:

http​://www.tuhs.org/Archive/PDP-11/Distributions/research/Dennis_v4/v4man.tar.gz

In 1975, George Coulouris at Queen Mary College (in London;
subsequently renamed Queen Mary and Westfield, and then Queen Mary,
University of London) wrote em ("editor for mortals"), an interactive
ed(1)-like editor for cursor-addressed displays. When he visited
Berkeley in 1976, he took it with him, and a certain Bill Joy took it
and morphed it into ex(1), which shipped in 1BSD (March 1978)​:

http​://www.eecs.qmul.ac.uk/~gc/history/

vi(1) was originally (in 2BSD, May 1979) a hard link to ex(1); when it
was launched under that name, it would start in visual mode rather
than normal mode, but ex(1) had all the same abilities.

sed(1) didn't appear till Seventh Edition, in January 1979​:

http​://plan9.bell-labs.com/7thEdMan/v7vol1.pdf

The original diff(1) appeared in Fifth Edition (June 1974), and
originally generated only "edit scripts" (à la modern `diff -e`) that
could be passed to ed(1)​:

http​://www.tuhs.org/Archive/PDP-11/Distributions/research/Dennis_v5/v5man.pdf

As for patch(1), Larry first wrote it in 1984, and published it in
1985; it already handled context and unified diffs at that point, as
well as the traditional edit scripts​:

https://groups.google.com/forum/#!topic/mod.sources/xSQM63e39YY

Now, Ken Thompson wrote the Unix ed(1) in PDP-11 assembler​:

https://code.google.com/p/unix-jun72/source/browse/trunk/src/cmd/ed2.s
https://code.google.com/p/unix-jun72/source/browse/trunk/src/cmd/ed3.s

This means it can be dated to some time in 1971, according to Dennis Ritchie​:

http​://cm.bell-labs.com/who/dmr/hist.html

But it turns out we can rewind a little further. A team at UCB
(including L. Peter Deutsch) wrote an editor called qed in 1968​:

http​://web.archive.org/web/20120219114658/http​://www.computer-refuge.org/bitsavers/pdf/sds/ucbProjectGenie/mcjones/R-15_QED.pdf

It's still possible to see the core of the ed(1) design in that, even
though the details differ quite a lot; for example, the 1968 qed
doesn't have regexes at all.

Ken Thompson ported qed to CTSS circa 1970, and therefore shortly
*before* he wrote ed(1); the manual for his port can be found here​:

http​://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf

This is much more similar to the ed(1) we know and (presumably) love,
including regexes strictly more powerful than those in traditional
ed(1), and slashes to delimit them (where the 1968 qed used square
brackets for its search strings). And we find that the manual says
"The null regular expression standing alone is equivalent to the last
regular expression encountered."

So this aspect of Perl can be dated back to code written no later than
1970, for a text editor running on an operating system that I suspect
noone subscribed to this list has ever used.

Enjoy!

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @khwilliamson

On 08/05/2013 12​:09 PM, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than
1970, for a text editor running on an operating system that I suspect
noone subscribed to this list has ever used.

For the record, I come close. I used to use the qed text editor on a
Bell Labs operating system called TSS. I presume this is related to the
CTSS menioned.

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @arc

Karl Williamson <public@​khwilliamson.com> wrote​:

On 08/05/2013 12​:09 PM, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than
1970, for a text editor running on an operating system that I suspect
noone subscribed to this list has ever used.

For the record, I come close. I used to use the qed text editor on a Bell
Labs operating system called TSS. I presume this is related to the CTSS
menioned.

Thank you!

AFAICT from Googling, CTSS came first, and Bell Labs had both "Nike
TSS" (a copy of CTSS) and the IBM TSS/360 (which apparently isn't
closely related to it). This piece says that the Nike TSS ran at the
Bell Labs Whippany facility, and TSS/360 at Indian Hill​:

http​://manpages.bsd.lv/history/canaday_24_10_2011.txt

It does seem to rely on the participants' memory, though, so perhaps
it isn't entirely accurate.

--
Aaron Crane ** http​://aaroncrane.co.uk/

@p5pRT
Copy link
Author

p5pRT commented Aug 5, 2013

From @khwilliamson

On 08/05/2013 01​:10 PM, Aaron Crane wrote​:

Karl Williamson <public@​khwilliamson.com> wrote​:

On 08/05/2013 12​:09 PM, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than
1970, for a text editor running on an operating system that I suspect
noone subscribed to this list has ever used.

For the record, I come close. I used to use the qed text editor on a Bell
Labs operating system called TSS. I presume this is related to the CTSS
menioned.

Thank you!

AFAICT from Googling, CTSS came first, and Bell Labs had both "Nike
TSS" (a copy of CTSS) and the IBM TSS/360 (which apparently isn't
closely related to it). This piece says that the Nike TSS ran at the
Bell Labs Whippany facility, and TSS/360 at Indian Hill​:

http​://manpages.bsd.lv/history/canaday_24_10_2011.txt

It does seem to rely on the participants' memory, though, so perhaps
it isn't entirely accurate.

I remotely used the one from Indian Hill (IH for short; located in a
Chicago suburb), so a different OS, but I suspect that it's the same QED
that had been ported to it.

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2013

From @ap

* Aaron Crane <arc@​cpan.org> [2013-08-05 20​:10]​:

Aristotle Pagaltzis <pagaltzis@​gmx.de> wrote​:

And even when I say all this, I am almost certainly being
ahistorical – I do not know in detail the lineage and history of ed
and all its next of kin (ex/vi, grep, sed, patch etc) and would
actually be surprised if the story weren’t more intertwined and
complex than my portrayal, even WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

Nope, your summary pretty much covers it. :-)

Wow, there’s an actual straightforward corner within Unix history. :-)

So this aspect of Perl can be dated back to code written no later than
1970, for a text editor running on an operating system that I suspect
noone subscribed to this list has ever used.

Which, to be explicit, means it predates Perl (1.0 in 1987) by nearly
two decades. Not bad…

Enjoy!

I did, thank you. :-) I hadn’t heard of em! Nor qed, of course, but
I wouldn’t have expected to anyway. And d’oh, it was diff that I meant
to mention, not patch (though patch too).

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Aug 6, 2013

From @epa

Aaron Crane <arc <at> cpan.org> writes​:

A team at UCB
(including L. Peter Deutsch) wrote an editor called qed in 1968

I see that Perl's $. variable for the current line number can also be traced
back to qed, if not earlier.

--
Ed Avis <eda@​waniasset.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant