Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable interpolation in regex very slow #4954

Open
p6rt opened this issue Dec 28, 2015 · 8 comments
Open

Variable interpolation in regex very slow #4954

p6rt opened this issue Dec 28, 2015 · 8 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Dec 28, 2015

Migrated from rt.perl.org#127064 (status was 'open')

Searchable as RT127064$

@p6rt
Copy link
Author

p6rt commented Dec 28, 2015

From jules@jules.uk

Given
  my @​lines = "some-text.txt".IO.lines;
  my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
  my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
  my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I had to us
  my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

  my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed.
Is there a faster alternative? (other than EVAL to build the regex)

--
Jules@​Jules.uk

@p6rt
Copy link
Author

p6rt commented Dec 28, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Dec 29, 2015

From @timo

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I had to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed.
Is there a faster alternative? (other than EVAL to build the regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into a code
that uses .index to find a literal string and construct a match object
for that. Or even - if you put a literal "so" in front - turn it into
.contains($literal) if it knows that the match object will only be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit faster.
  - Timo

@p6rt
Copy link
Author

p6rt commented Dec 31, 2015

From jules@jules.uk

On 29/12/2015 23​:05, Timo Paulssen via RT wrote​:

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I had to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @&#8203;matching = @&#8203;lines\.grep\($s\);

doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in regexs is crippling the speed.
Is there a faster alternative? (other than EVAL to build the regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into a code
that uses .index to find a literal string and construct a match object
for that. Or even - if you put a literal "so" in front - turn it into
.contains($literal) if it knows that the match object will only be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit faster.
- Timo
Many thanks for that. I hadn't thought to use Whatever.

I would ideally also be doing case-insensitive regexps, but they are 50
times slower than case-sensitive ones, even in trivial cases.
Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style)
interpolation in this regex"?
I can see the advantage of passing the variables to the regex engine, as
then they can change over time.

But that's not something I want to do very often, far more frequently I
just need to construct the regex at run-time and have it go as fast as
possible.

Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of it!).

Jules

--
Jules@​Jules.UK
Twitter​: @​JulesFM

'If I were a Brazilian without land or money or the means to feed
  my children, I would be burning the rain forest too.' - Sting

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

@p6rt
Copy link
Author

p6rt commented Oct 15, 2017

From @MasterDuke17

On Thu, 31 Dec 2015 05​:39​:24 -0800, jules@​jules.uk wrote​:

On 29/12/2015 23​:05, Timo Paulssen via RT wrote​:

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I had
to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in regexs
is crippling the speed.
Is there a faster alternative? (other than EVAL to build the regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into a code
that uses .index to find a literal string and construct a match
object
for that. Or even - if you put a literal "so" in front - turn it into
.contains($literal) if it knows that the match object will only be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit faster.
- Timo
Many thanks for that. I hadn't thought to use Whatever.

I would ideally also be doing case-insensitive regexps, but they are
50
times slower than case-sensitive ones, even in trivial cases.
Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style)
interpolation in this regex"?
I can see the advantage of passing the variables to the regex engine,
as
then they can change over time.

But that's not something I want to do very often, far more frequently
I
just need to construct the regex at run-time and have it go as fast as
possible.

Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of
it!).

Jules

I recently attempted to make interpolating into regexes a little faster. This is what I was using for a benchmark​:
perl6 -e 'my @​l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my @​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
sm.sql is 10k lines, of which 1283 contain the text "Perl6".

This is Rakudo version 2017.09 built on MoarVM version 2017.09.1​:
/ $s / took 5.3s and / <$s> / took 16.5s.

This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version 2017.09.1-595-g716f2277f​:
/ $s / took 3.2s and / <$s> / took 14.5s.

However, if you type the string to interpolate it is *much* faster for literal interpolation.
perl6 -e 'my @​l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now; my @​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
This takes only 0.33s.

This is still nowhere near as fast as grep(*.contains($s)) though, which only takes 0.037s.

@p6rt
Copy link
Author

p6rt commented Nov 8, 2017

From @MasterDuke17

On Sun, 15 Oct 2017 05​:19​:54 -0700, ddgreen@​gmail.com wrote​:

On Thu, 31 Dec 2015 05​:39​:24 -0800, jules@​jules.uk wrote​:

On 29/12/2015 23​:05, Timo Paulssen via RT wrote​:

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this
issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I
had
to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in
regexs
is crippling the speed.
Is there a faster alternative? (other than EVAL to build the
regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into a
code
that uses .index to find a literal string and construct a match
object
for that. Or even - if you put a literal "so" in front - turn it
into
.contains($literal) if it knows that the match object will only be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit faster.
- Timo
Many thanks for that. I hadn't thought to use Whatever.

I would ideally also be doing case-insensitive regexps, but they are
50
times slower than case-sensitive ones, even in trivial cases.
Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style)
interpolation in this regex"?
I can see the advantage of passing the variables to the regex engine,
as
then they can change over time.

But that's not something I want to do very often, far more frequently
I
just need to construct the regex at run-time and have it go as fast
as
possible.

Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of
it!).

Jules

I recently attempted to make interpolating into regexes a little
faster. This is what I was using for a benchmark​:
perl6 -e 'my @​l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my
@​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
sm.sql is 10k lines, of which 1283 contain the text "Perl6".

This is Rakudo version 2017.09 built on MoarVM version 2017.09.1​:
/ $s / took 5.3s and / <$s> / took 16.5s.

This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version
2017.09.1-595-g716f2277f​:
/ $s / took 3.2s and / <$s> / took 14.5s.

However, if you type the string to interpolate it is *much* faster for
literal interpolation.
perl6 -e 'my @​l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now;
my @​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
This takes only 0.33s.

This is still nowhere near as fast as grep(*.contains($s)) though,
which only takes 0.037s.

This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version 2017.10-58-gad8618468​:
/ $s / took 2.7s and / <$s> / took 7.0s.

@p6rt
Copy link
Author

p6rt commented Nov 8, 2017

From @MasterDuke17

On Tue, 07 Nov 2017 17​:10​:29 -0800, ddgreen@​gmail.com wrote​:

On Sun, 15 Oct 2017 05​:19​:54 -0700, ddgreen@​gmail.com wrote​:

On Thu, 31 Dec 2015 05​:39​:24 -0800, jules@​jules.uk wrote​:

On 29/12/2015 23​:05, Timo Paulssen via RT wrote​:

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this
issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so I
had
to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in
regexs
is crippling the speed.
Is there a faster alternative? (other than EVAL to build the
regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into a
code
that uses .index to find a literal string and construct a match
object
for that. Or even - if you put a literal "so" in front - turn it
into
.contains($literal) if it knows that the match object will only
be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit
faster.
- Timo
Many thanks for that. I hadn't thought to use Whatever.

I would ideally also be doing case-insensitive regexps, but they
are
50
times slower than case-sensitive ones, even in trivial cases.
Maybe a :adverb for rx// that says "give me static (i.e. Perl5-
style)
interpolation in this regex"?
I can see the advantage of passing the variables to the regex
engine,
as
then they can change over time.

But that's not something I want to do very often, far more
frequently
I
just need to construct the regex at run-time and have it go as fast
as
possible.

Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines
of
it!).

Jules

I recently attempted to make interpolating into regexes a little
faster. This is what I was using for a benchmark​:
perl6 -e 'my @​l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my
@​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
sm.sql is 10k lines, of which 1283 contain the text "Perl6".

This is Rakudo version 2017.09 built on MoarVM version 2017.09.1​:
/ $s / took 5.3s and / <$s> / took 16.5s.

This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version
2017.09.1-595-g716f2277f​:
/ $s / took 3.2s and / <$s> / took 14.5s.

However, if you type the string to interpolate it is *much* faster
for
literal interpolation.
perl6 -e 'my @​l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t =
now;
my @​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
This takes only 0.33s.

This is still nowhere near as fast as grep(*.contains($s)) though,
which only takes 0.037s.

This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version
2017.10-58-gad8618468​:
/ $s / took 2.7s and / <$s> / took 7.0s.

Adding :i (case insensitive adverb), /​:i $s / took 3.0s and /​:i <$s> / took 7.7s.

@p6rt
Copy link
Author

p6rt commented May 13, 2018

From @MasterDuke17

On Tue, 07 Nov 2017 17​:14​:15 -0800, ddgreen@​gmail.com wrote​:

On Tue, 07 Nov 2017 17​:10​:29 -0800, ddgreen@​gmail.com wrote​:

On Sun, 15 Oct 2017 05​:19​:54 -0700, ddgreen@​gmail.com wrote​:

On Thu, 31 Dec 2015 05​:39​:24 -0800, jules@​jules.uk wrote​:

On 29/12/2015 23​:05, Timo Paulssen via RT wrote​:

On 12/29/2015 12​:46 AM, Jules Field (via RT) wrote​:

# New Ticket Created by Jules Field
# Please include the string​: [perl #​127064]
# in the subject line of all future correspondence about this
issue.
# <URL​: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=127064 >

Given
my @​lines = "some-text.txt".IO.lines;
my $s = 'Jules';
(some-text.txt is about 43k lines)

Doing
my @​matching = @​lines.grep(/ $s /);
is about 50 times slower than
my @​matching = @​lines.grep(/ Jules /);

And if $s happened to contain anything other than literals, so
I
had
to us
my @​matching = @​lines.grep(/ <$s> /);
then it's nearly 150 times slower.

my @​matching = @​lines.grep($s);
doesn't appear to work. It matches 0 lines but doesn't die.

The lack of Perl5's straightforward variable interpolation in
regexs
is crippling the speed.
Is there a faster alternative? (other than EVAL to build the
regex)

For now, you can use @​lines.grep(*.contains($s)), which will be
sufficiently fast.

Ideally, our regex optimizer would turn this simple regex into
a
code
that uses .index to find a literal string and construct a match
object
for that. Or even - if you put a literal "so" in front - turn
it
into
.contains($literal) if it knows that the match object will only
be
inspected for true/false.

Until then, we ought to be able to make interpolation a bit
faster.
- Timo
Many thanks for that. I hadn't thought to use Whatever.

I would ideally also be doing case-insensitive regexps, but they
are
50
times slower than case-sensitive ones, even in trivial cases.
Maybe a :adverb for rx// that says "give me static (i.e. Perl5-
style)
interpolation in this regex"?
I can see the advantage of passing the variables to the regex
engine,
as
then they can change over time.

But that's not something I want to do very often, far more
frequently
I
just need to construct the regex at run-time and have it go as
fast
as
possible.

Just thoughts from a big Perl5 user (e.g. MailScanner is 50k
lines
of
it!).

Jules

I recently attempted to make interpolating into regexes a little
faster. This is what I was using for a benchmark​:
perl6 -e 'my @​l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now;
my
@​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
sm.sql is 10k lines, of which 1283 contain the text "Perl6".

This is Rakudo version 2017.09 built on MoarVM version 2017.09.1​:
/ $s / took 5.3s and / <$s> / took 16.5s.

This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM
version
2017.09.1-595-g716f2277f​:
/ $s / took 3.2s and / <$s> / took 14.5s.

However, if you type the string to interpolate it is *much* faster
for
literal interpolation.
perl6 -e 'my @​l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t =
now;
my @​m = @​l.grep(/ $s /); say @​m.elems; say now - $t'
This takes only 0.33s.

This is still nowhere near as fast as grep(*.contains($s)) though,
which only takes 0.037s.

This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version
2017.10-58-gad8618468​:
/ $s / took 2.7s and / <$s> / took 7.0s.

Adding :i (case insensitive adverb), /​:i $s / took 3.0s and /​:i <$s> /
took 7.7s.

This is Rakudo version 2018.04.1-76-g9b915f09d built on MoarVM version 2018.04.1-98-g1aa02fe45
implementing Perl 6.c.
/ $s / took 1.8s and / <$s> / took 2.6s

@p6rt p6rt added the perf label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant